Moshe Zadka: Analyzing the Stack Overflow Survey

The Stack Overflow Survey Results for 2019 are in! There is some official analysis, that mentioned some things that mattered to me, and some that did not. I decided to dig into the data and see if I can find some things that would potentially interest my readership.

 import csv, collections, itertools 
 with open("survey_results_public.csv") as fpin:     reader = csv.DictReader(fpin)     responses = list(reader) 
 len(responses) 
 88883 

Wow, almost 90K respondents! This is the sweet spots of "enough to make meaningful generalizations" while being able to analyze with rudimentary tools, not big-data-ware.

 pythonistas = [x for x in responses if 'Python' in x['LanguageWorkedWith']] 
 len(pythonistas)/len(responses) 
 0.41001091322300104 

About 40% of the respondents use Python in some capacity. That is pretty cool! This is one of the things where I wonder if there is bias in the source data. Are people who use Stack Overflow, or respond to surveys for SO, more likely to be the kind of person who uses Python? Or less?

In any case, I am excited! This means my favorite language, for all its issues, is doing well. This is also a good reminder that we need to think about the consequences of our decisions on a big swath of developers we will never ever meet.

 opensource = collections.Counter(x['OpenSourcer'] for x in pythonistas) 
 sorted(opensource.items(), key=lambda x:x[1], reverse=True) 
 [('Never', 11310),  ('Less than once per year', 10374),  ('Less than once a month but more than once per year', 9572),  ('Once a month or more often', 5187)] 
 opensource['Once a month or more often']/len(pythonistas) 
 0.1423318607139917 

Python is open source. Almost all important libraries (Django, Pandas, PyTorch, requests) are open source. Many important tools (Jupyter) are open source. The number of people who contribute to them with any kind of regular cadence is less than 15%.

 general_opensource = collections.Counter(x['OpenSourcer'] for x in responses) sorted(general_opensource.items(), key=lambda x:x[1], reverse=True) 
 [('Never', 32295),  ('Less than once per year', 24972),  ('Less than once a month but more than once per year', 20561),  ('Once a month or more often', 11055)] 

The Python community does compare well to the general populace, though!

 devtype = collections.Counter(itertools.chain.from_iterable(x["DevType"].split(";") for x in pythonistas)) 
 devtype['DevOps specialist']/len(responses) 
 0.052282213696657406 

About 5% of total respondents are my peers: using Python for DevOps. That is pretty exciting! My interest in that is not merely theoretical, my upcoming book targets that crowd.

 general_devtype = collections.Counter(itertools.chain.from_iterable(x["DevType"].split(";") for x in responses)) general_devtype['DevOps specialist']/len(responses), devtype['DevOps specialist']/len(pythonistas) 
 (0.09970410539698255, 0.12751420025793705) 

In general, DevOps specialists are 10% of respondents.

 devtype['DevOps specialist']/general_devtype['DevOps specialist'] 
 0.524373730534868 

Over 50% of DevOps specialists use Python!

 def safe_int(x):     try:         return int(x)     except ValueError:         return -1  intermediate = sum(1 for x in pythonistas if 1<=safe_int(x['YearsCode'])<=5) 

My next hush-hush (for now!) project is going to be targeting intermediate Python developers. I wish I could slice by "number of years writing in Python, but this is the best I could do. (I treat "NA" responses as "not intermediate". This is OK, since I prefer to underestimate rather than overestimate.)

 intermediate/len(responses) 
 0.11346376697456206 

11%! Not bad.

 general_intermediate = sum(1 for x in responses if 1<=safe_int(x['YearsCode'])<=5) intermediate/len(pythonistas), general_intermediate/len(responses) 
 (0.27673352907279863, 0.2671264471271222) 

Seems like using Python does not change much the chances of someone being intermediate.

Summary

  • 40% of respondents use Python. Python is kind of a big deal.
  • 5% of respondents use Python for DevOps. This is a lot! DevOps as a profession is less than 10 years old.
  • 11% of respondents are intermediate Python users. My previous book targets this crowd.

(Thanks to Robert Collins and Matthew Broberg for their comments on an earlier draft. Any remaining issues are purely my responsibility.)

Planet Python

Portals for Tableau 101: Analyzing Your Google Analytics or Matomo Traffic

Portals for Tableau analytics

If you have a portal for Tableau, then you’re certainly interested in analytics, but are you interested enough to get analytics on your analytics solution? I know I am! Fortunately, your portal makes it is easy to add in Google Analytics or Matomo to begin tracking your traffic. It’s worth noting that Google Analytics and Matomo will track internal and external-facing sites, so whether you have a client-facing portal or one just for internal use, this functionality will be useful. This post will review how to set up each and begin viewing all that beautiful data:

Portals for Tableau analytics

Setting up Google Analytics

Go to https://analytics.google.com/ and hit Sign Up. Fortunately, Google Analytics accounts are free:

Google Analytics sign up

Fill out the fields with your applicable portal information, choose Google Analytics data-sharing settings and accept the terms of service:

setting up a new account in Google Analytics

Once you land on the account page for your portal, copy the Tracking ID:

Google Analytics tracking ID

Navigate to the backend of your portal, choose Settings from the top menu, then choose Portal Settings from the left menu:

analytics in Portal settings

Enter the tracking ID in the box under Google Analytics Tracking ID:

Google Analytics tracking ID

Begin tracking your portal traffic from your Google Analytics account!

Setting up Matomo (Formerly Piwik) – Cloud Service

Go to https://matomo.org/ and hit TRY IT FOR FREE:

signing up for Matomo analytics

Fill out the form with your portal URL and click the big green button:

setting up your account in Matomo

When you receive the confirmation email, log in with the credentials they give and note the URL. It’ll look something like https://yourPortal.matomo.cloud. This will be what you enter into Piwik/Matomo Server URL on the portal backend:

tracking code for Matomo

If this is your initial use of Matomo, your site ID will automatically be 1. If you are tracking several sites already and the portal is an addition, go to your Matomo account site (https://yourPortal.matomo.cloud usually), and navigate to your settings (the typical settings gear logo on the top right of your screen). You can select Manage under Websites in the left menu, and it’ll display all your sites and all the site IDs. Find the ID for the portal and enter it into Piwik/Matomo Site ID:

tracking code for Matomo

The Matomo cloud service unfortunately isn’t free forever, but it does allow you to skip configuring and managing the on-premise service on your server. Find the pricing for Matomo’s cloud service here.

Matomo – On-Premise

If your business prefers to keep all analytics on a server on premises or self-managed in a cloud service like AWS, Matomo also offers their web analytics solution for free and downloadable from here:

Matomo on-premise analytics

Here’s a thorough guide on how to install and configure Matomo on your server.

Once Matomo is installed, we follow the same process as the cloud service. Find your server URL (usually //matomo.site.com/) and specific site ID, and enter them into the backend of your portal under Settings > Portal Settings > Piwik/Matomo Server URL and Piwik/Matomo Site ID:

Matomo site ID for analytics

The post Portals for Tableau 101: Analyzing Your Google Analytics or Matomo Traffic appeared first on InterWorks.

InterWorks