How To Install phpMyAdmin From Source on Debian 10


While many users need the functionality of a database management system like MariaDB, they may not feel comfortable interacting with the system solely from the MariaDB prompt.

phpMyAdmin was created so that users can interact with MariaDB through a web interface. In this guide, we’ll discuss how to install and secure phpMyAdmin so that you can safely use it to manage your databases on a Debian 10 system.


Before you get started with this guide, you’ll need the following:

Note: MariaDB is a community-developed fork of MySQL, and although the two programs are closely related, they are not completely interchangeable. While phpMyAdmin was designed specifically for managing MySQL databases and makes reference to MySQL in various dialogue boxes, rest assured that your installation of MariaDB will work correctly with phpMyAdmin.

Finally, there are important security considerations when using software like phpMyAdmin, since it:

  • Communicates directly with your MariaDB installation
  • Handles authentication using MariaDB credentials
  • Executes and returns results for arbitrary SQL queries

For these reasons, and because it is a widely-deployed PHP application which is frequently targeted for attack, you should never run phpMyAdmin on remote systems over a plain HTTP connection.

If you do not have an existing domain configured with an SSL/TLS certificate, you can follow this guide on securing Apache with Let’s Encrypt on Debian 10 to set one up. This will require you to register a domain name, create DNS records for your server, and set up an Apache Virtual Host.

Once you are finished with these steps, you’re ready to get started with this guide.

Before installing and configuring phpMyAdmin, the official documentation recommends that you install a few PHP extensions onto your server to enable certain functionalities and improve performance.

If you followed the prerequisite LAMP stack tutorial, several of these modules will have been installed along with the php package. However, it’s recommended that you also install these packages:

  • php-mbstring: a PHP extension used to manage non-ASCII strings and convert strings to different encodings
  • php-zip: a PHP module that supports uploading .zip files to phpMyAdmin
  • php-gd: another PHP module, this one enables support for the GD Graphics Library

First, update your server’s package index if you’ve not done so recently:

  • sudo apt update

Then use apt to pull down the files and install them on your system:

  • sudo apt install php-mbstring php-zip php-gd

Next, we can install phpMyAdmin. As of this writing, phpMyAdmin is not available from the default Debian repositories, so you will need to download the source code to your server from the phpMyAdmin site.

In order to do that, navigate to the phpMyAdmin Downloads page, scroll down to the table with download links for the latest stable release, and copy the download link ending in tar.gz. This link points to an archive file known as a tarball that, when extracted, will create a number of files on your system. At the time of this writing, the latest release is version

Note: On this Downloads page, you will notice that there are download links labeled all-languages and english. The all-languages links will download a version of phpMyAdmin that will allow you to select one of 72 available languages, while the english links will only allow you to use phpMyAdmin in English.

This guide will use the all-languages package to illustrate how to install phpMyAdmin, but if you plan to use phpMyAdmin in English, you can install the english package. Just be sure to replace the links and file names as necessary in the following commands.

Replace the link in the following wget command with the download link you just copied, then press ENTER. This will run the command and download the tarball to your server:

  • wget

Then extract the tarball:

  • tar xvf phpMyAdmin-

This will create a number of new files and directories on your server under a parent directory named phpMyAdmin-

Then run the following command. This will move the phpMyAdmin- directory and all its subdirectories to the /usr/share/ directory, the location where phpMyAdmin expects to find its configuration files by default. It will also rename the directory in place to just phpmyadmin:

  • sudo mv phpMyAdmin- /usr/share/phpmyadmin

With that, you’ve installed phpMyAdmin, but there are a number of configuration changes you must make in order to be able to access phpMyAdmin through a web browser.

Step 2 — Configuring phpMyAdmin Manually

When installing phpMyAdmin with a package manager, as one might in an Ubuntu environment, phpMyAdmin defaults to a “Zero Configuration” mode which performs several actions automatically to set up the program. Because we installed it from source in this guide, we will need to perform those steps manually.

To begin, make a new directory where phpMyAdmin will store its temporary files:

  • sudo mkdir -p /var/lib/phpmyadmin/tmp

Set www-data — the Linux user profile that web servers like Apache use by default for normal operations in Ubuntu and Debian systems — as the owner of this directory:

  • sudo chown -R www-data:www-data /var/lib/phpmyadmin

The files you extracted previously include a sample configuration file that you can use as your base configuration file. Make a copy of this file, keeping it in the /usr/share/phpmyadmin directory, and rename it

  • sudo cp /usr/share/phpmyadmin/ /usr/share/phpmyadmin/

Open this file using your preferred text editor. Here, we’ll use nano:

  • sudo nano /usr/share/phpmyadmin/

phpMyAdmin uses the cookie authentication method by default, which allows you to log in to phpMyAdmin as any valid MariaDB user with the help of cookies. In this method, the MariaDB user password is stored and encrypted with the Advanced Encryption Standard (AES) algorithm in a temporary cookie.

Historically, phpMyAdmin instead used the Blowfish cipher for this purpose, and this is still reflected in its configuration file. Scroll down to the line that begins with $ cfg['blowfish_secret']. It will look like this:

. . . $  cfg['blowfish_secret'] = ''; /* YOU MUST FILL IN THIS FOR COOKIE AUTH! */ . . . 

In between the single quotes, enter a string of 32 random characters. This isn’t a passphrase you need to remember, it will just be used internally by the AES algorithm:


Note: If the passphrase you enter here is shorter than 32 characters in length, it will result in the encrypted cookies being less secure. Entering a string longer than 32 characters, though, won’t cause any harm.

To generate a truly random string of characters, you can install and use the pwgen program:

  • sudo apt install pwgen

By default, pwgen creates easily pronounceable, though less secure, passwords. However, by including the -s flag, as in the following command, you can create a completely random, difficult-to-memorize password. Note the final two arguments to this command: 32, which dictates how long the password string pwgen will generate should be; and 1 which tells pwgen how many strings it should generate:

  • pwgen -s 32 1

Next, scroll down to the comment reading /* User used to manipulate with storage */. This section includes some directives that define a MariaDB database user named pma which performs certain administrative tasks within phpMyAdmin. According to the official documentation, this special user account isn’t necessary in cases where only one user will access phpMyAdmin, but it is recommended in multi-user scenarios.

Uncomment the controluser and controlpass directives by removing the preceding slashes. Then update the controlpass directive to point to a secure password of your choosing. If you don’t do this, the default password will remain in place and unknown users could easily gain access to your database through the phpMyAdmin interface.

After making these changes, this section of the file will look like this:

. . . /* User used to manipulate with storage */ // $  cfg['Servers'][$  i]['controlhost'] = ''; // $  cfg['Servers'][$  i]['controlport'] = ''; $  cfg['Servers'][$  i]['controluser'] = 'pma'; $  cfg['Servers'][$  i]['controlpass'] = 'password'; . . . 

Below this section, you’ll find another section preceded by a comment reading /* Storage database and tables */. This section includes a number of directives that define the phpMyAdmin configuration storage, a database and several tables used by the administrative pma database user. These tables enable a number of features in phpMyAdmin, including Bookmarks, comments, PDF generation, and more.

Uncomment each line in this section by removing the slashes at the beginning of each line so it looks like this:

. . . /* Storage database and tables */ $  cfg['Servers'][$  i]['pmadb'] = 'phpmyadmin'; $  cfg['Servers'][$  i]['bookmarktable'] = 'pma__bookmark'; $  cfg['Servers'][$  i]['relation'] = 'pma__relation'; $  cfg['Servers'][$  i]['table_info'] = 'pma__table_info'; $  cfg['Servers'][$  i]['table_coords'] = 'pma__table_coords'; $  cfg['Servers'][$  i]['pdf_pages'] = 'pma__pdf_pages'; $  cfg['Servers'][$  i]['column_info'] = 'pma__column_info'; $  cfg['Servers'][$  i]['history'] = 'pma__history'; $  cfg['Servers'][$  i]['table_uiprefs'] = 'pma__table_uiprefs'; $  cfg['Servers'][$  i]['tracking'] = 'pma__tracking'; $  cfg['Servers'][$  i]['userconfig'] = 'pma__userconfig'; $  cfg['Servers'][$  i]['recent'] = 'pma__recent'; $  cfg['Servers'][$  i]['favorite'] = 'pma__favorite'; $  cfg['Servers'][$  i]['users'] = 'pma__users'; $  cfg['Servers'][$  i]['usergroups'] = 'pma__usergroups'; $  cfg['Servers'][$  i]['navigationhiding'] = 'pma__navigationhiding'; $  cfg['Servers'][$  i]['savedsearches'] = 'pma__savedsearches'; $  cfg['Servers'][$  i]['central_columns'] = 'pma__central_columns'; $  cfg['Servers'][$  i]['designer_settings'] = 'pma__designer_settings'; $  cfg['Servers'][$  i]['export_templates'] = 'pma__export_templates'; . . . 

These tables don’t yet exist, but we will create them shortly.

Lastly, scroll down to the bottom of the file and add the following line. This will configure phpMyAdmin to use the /var/lib/phpmyadmin/tmp directory you created earlier as its temporary directory. phpMyAdmin will use this temporary directory as a templates cache which allows for faster page loading:

. . . $  cfg['TempDir'] = '/var/lib/phpmyadmin/tmp'; 

Save and close the file after adding this line. If you used nano, you can do so by pressing CTRL + X, Y, then ENTER.

Next, you’ll need to create the phpMyAdmin storage database and tables. When you installed phpMyAdmin in the previous step, it came with a file named create_tables.sql. This SQL file contains all the commands needed to create the configuration storage database and tables phpMyAdmin needs to function correctly.

Run the following command to use the create_tables.sql file to create the configuration storage database and tables:

  • sudo mariadb < /usr/share/phpmyadmin/sql/create_tables.sql

Following that, you’ll need to create the administrative pma user. Open up the MariaDB prompt:

  • sudo mariadb

From the prompt, run the following command to create the pma user and grant it the appropriate permissions. Be sure to change password to align with the password you defined in the file:

  • GRANT SELECT, INSERT, UPDATE, DELETE ON phpmyadmin.* TO 'pma'@'localhost' IDENTIFIED BY 'password';

If haven’t created one already, you should also create a regular MariaDB user for the purpose of managing databases through phpMyAdmin, as it’s recommended that you log in using another account than the pma user. You could create a user that has privileges to all tables within the database, as well as the power to add, change, and remove user privileges, with this command. Whatever privileges you assign to this user, be sure to give it a strong password as well:

  • GRANT ALL PRIVILEGES ON *.* TO 'sammy'@'localhost' IDENTIFIED BY 'password' WITH GRANT OPTION;

Following that, exit the MariaDB shell:

  • exit

phpMyAdmin is now fully installed and configured on your server. However, your Apache server does not yet know how to serve the application. To resolve this, we will create an Apache configuration file for it.

Step 3 — Configuring Apache to Serve phpMyAdmin

When installing phpMyAdmin from the default repositories, the installation process creates an Apache configuration file automatically and places it in the /etc/apache2/conf-enabled/ directory. Because we installed phpMyAdmin from source, however, we will need to create and enable this file manually.

Create a file named phpmyadmin.conf in the /etc/apache2/conf-available/ directory:

  • sudo nano /etc/apache2/conf-available/phpmyadmin.conf

Then add the following content to the file

# phpMyAdmin default Apache configuration  Alias /phpmyadmin /usr/share/phpmyadmin  <Directory /usr/share/phpmyadmin>     Options SymLinksIfOwnerMatch     DirectoryIndex index.php      <IfModule mod_php5.c>         <IfModule mod_mime.c>             AddType application/x-httpd-php .php         </IfModule>         <FilesMatch ".+\.php$  ">             SetHandler application/x-httpd-php         </FilesMatch>          php_value include_path .         php_admin_value upload_tmp_dir /var/lib/phpmyadmin/tmp         php_admin_value open_basedir /usr/share/phpmyadmin/:/etc/phpmyadmin/:/var/lib/phpmyadmin/:/usr/share/php/php-gettext/:/usr/share/php/php-php-gettext/:/usr/share/javascript/:/usr/share/php/tcpdf/:/usr/share/doc/phpmyadmin/:/usr/share/php/phpseclib/         php_admin_value mbstring.func_overload 0     </IfModule>     <IfModule mod_php.c>         <IfModule mod_mime.c>             AddType application/x-httpd-php .php         </IfModule>         <FilesMatch ".+\.php$  ">             SetHandler application/x-httpd-php         </FilesMatch>          php_value include_path .         php_admin_value upload_tmp_dir /var/lib/phpmyadmin/tmp         php_admin_value open_basedir /usr/share/phpmyadmin/:/etc/phpmyadmin/:/var/lib/phpmyadmin/:/usr/share/php/php-gettext/:/usr/share/php/php-php-gettext/:/usr/share/javascript/:/usr/share/php/tcpdf/:/usr/share/doc/phpmyadmin/:/usr/share/php/phpseclib/         php_admin_value mbstring.func_overload 0     </IfModule>  </Directory>  # Authorize for setup <Directory /usr/share/phpmyadmin/setup>     <IfModule mod_authz_core.c>         <IfModule mod_authn_file.c>             AuthType Basic             AuthName "phpMyAdmin Setup"             AuthUserFile /etc/phpmyadmin/htpasswd.setup         </IfModule>         Require valid-user     </IfModule> </Directory>  # Disallow web access to directories that don't need it <Directory /usr/share/phpmyadmin/templates>     Require all denied </Directory> <Directory /usr/share/phpmyadmin/libraries>     Require all denied </Directory> <Directory /usr/share/phpmyadmin/setup/lib>     Require all denied </Directory> 

This is the default phpMyAdmin Apache configuration file found on Ubuntu installations, though it will be adequate for a Debian setup as well.

Save and close the file, then enable it by typing:

  • sudo a2enconf phpmyadmin.conf

Then reload the apache2 service to put the configuration changes into effect:

  • sudo systemctl reload apache2

Following that, you’ll be able to access the phpMyAdmin login screen by navigating to the following URL in your web browser:


You’ll see the following login screen:

phpMyAdmin login screen

Log in to the interface with the MariaDB username and password you configured. After logging in, you’ll see the user interface, which will look something like this:

phpMyAdmin user interface

Now that you’re able to connect and interact with phpMyAdmin, all that’s left to do is harden your system’s security to protect it from attackers.

Step 4 — Securing Your phpMyAdmin Instance

Because of its ubiquity, phpMyAdmin is a popular target for attackers, and you should take extra care to prevent unauthorized access. One of the easiest ways of doing this is to place a gateway in front of the entire application by using Apache’s built-in .htaccess authentication and authorization functionalities.

To do this, you must first enable the use of .htaccess file overrides by editing your Apache configuration file.

Edit the linked file that has been placed in your Apache configuration directory:

  • sudo nano /etc/apache2/conf-available/phpmyadmin.conf

Add an AllowOverride All directive within the <Directory /usr/share/phpmyadmin> section of the configuration file, like this:

<Directory /usr/share/phpmyadmin>     Options FollowSymLinks     DirectoryIndex index.php     AllowOverride All      <IfModule mod_php5.c>     . . . 

When you have added this line, save and close the file.

To implement the changes you made, restart Apache:

  • sudo systemctl restart apache2

Now that you have enabled .htaccess use for your application, you need to create one to actually implement some security.

In order for this to be successful, the file must be created within the application directory. You can create the necessary file and open it in your text editor with root privileges by typing:

  • sudo nano /usr/share/phpmyadmin/.htaccess

Within this file, enter the following content:

AuthType Basic AuthName "Restricted Files" AuthUserFile /usr/share/phpmyadmin/.htpasswd Require valid-user 

Here is what each of these lines mean:

  • AuthType Basic: This line specifies the authentication type that you are implementing. This type will implement password authentication using a password file.
  • AuthName: This sets the message for the authentication dialog box. You should keep this generic so that unauthorized users won’t gain any information about what is being protected.
  • AuthUserFile: This sets the location of the password file that will be used for authentication. This should be outside of the directories that are being served. We will create this file shortly.
  • Require valid-user: This specifies that only authenticated users should be given access to this resource. This is what actually stops unauthorized users from entering.

When you are finished, save and close the file.

The location that you selected for your password file was /usr/share/phpmyadmin/.htpasswd. You can now create this file and pass it an initial user with the htpasswd utility:

  • sudo htpasswd -c /usr/share/phpmyadmin/.htpasswd username

You will be prompted to select and confirm a password for the user you are creating. Afterwards, the file is created with the hashed password that you entered.

If you want to enter an additional user, you need to do so without the -c flag, like this:

  • sudo htpasswd /etc/phpmyadmin/.htpasswd additionaluser

Now, when you access your phpMyAdmin subdirectory, you will be prompted for the additional account name and password that you just configured:


phpMyAdmin apache password

After entering the Apache authentication, you’ll be taken to the regular phpMyAdmin authentication page to enter your MariaDB credentials. This setup adds an additional layer of security, which is desirable since phpMyAdmin has suffered from vulnerabilities in the past.


You should now have phpMyAdmin configured and ready to use on your Debian 10 server. Using this interface, you can easily create databases, users, tables, etc., and perform the usual operations like deleting and modifying structures and data.

DigitalOcean Community Tutorials

Podcast.__init__: Open Source Automated Machine Learning With MindsDB

Machine learning is growing in popularity and capability, but for a majority of people it is still a black box that we don’t fully understand. The team at MindsDB is working to change this state of affairs by creating an open source tool that is easy to use without a background in data science. By simplifying the training and use of neural networks, and making their logic explainable, they hope to bring AI capabilities to more people and organizations. In this interview George Hosu and Jorge Torres explain how MindsDB is built, how to use it for your own purposes, and how they view the current landscape of AI technologies. This is a great episode for anyone who is interested in experimenting with machine learning and artificial intelligence. Give it a listen and then try MindsDB for yourself.


Machine learning is growing in popularity and capability, but for a majority of people it is still a black box that we don’t fully understand. The team at MindsDB is working to change this state of affairs by creating an open source tool that is easy to use without a background in data science. By simplifying the training and use of neural networks, and making their logic explainable, they hope to bring AI capabilities to more people and organizations. In this interview George Hosu and Jorge Torres explain how MindsDB is built, how to use it for your own purposes, and how they view the current landscape of AI technologies. This is a great episode for anyone who is interested in experimenting with machine learning and artificial intelligence. Give it a listen and then try MindsDB for yourself.


  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to to get a $ 20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Podcast.init listeners get 2 months free on any plan by going to today and signing up for a trial.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $ 300 off is available until July 26th, with early bird pricing for up to $ 200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at
  • Your host as usual is Tobias Macey and today I’m interviewing George Hosu and Jorge Torres about MindsDB, a framework for streamlining the use of neural networks


  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what MindsDB is and the problem that it is trying to solve?
    • What was the motivation for creating the project?
  • Who is the target audience for MindsDB?
  • Before we go deep into MindsDB can you explain what a neural network is for anyone who isn’t familiar with the term?
  • For someone who is using MindsDB can you talk through their workflow?
    • What are the types of data that are supported for building predictions using MindsDB?
    • How much cleaning and preparation of the data is necessary before using it to generate a model?
    • What are the lower and upper bounds for volume and variety of data that can be used to build an effective model in MindsDB?
  • One of the interesting and useful features of MindsDB is the built in support for explaining the decisions reached by a model. How do you approach that challenge and what are the most difficult aspects?
  • Once a model is generated, what is the output format and can it be used separately from MindsDB for embedding the prediction capabilities into other scripts or services?
  • How is MindsDB implemented and how has the design changed since you first began working on it?
    • What are some of the assumptions that you made going into this project which have had to be modified or updated as it gained users and features?
  • What are the limitations of MindsDB and what are the cases where it is necessary to pass a task on to a data scientist?
  • In your experience, what are the common barriers for individuals and organizations adopting machine learning as a tool for addressing their needs?
  • What have been the most challenging, complex, or unexpected aspects of designing and building MindsDB?
  • What do you have planned for the future of MindsDB?

Keep In Touch



The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Planet Python

PyBites: The First Step in Contributing to Open Source Projects

Have you ever wanted to contribute to open source but weren’t sure how to get started? Marc found himself in just that situation. Sometimes it all comes down to taking that first step.


I’ve been “programming” in Python for a while now, enough to be dangerous as they say. I’ve learned enough to be able to help others from time to time, but new enough that I still get distracted by the next shiny package I hear about on a podcast.

I recently decided that, to help progress me further and to give back a little, I would help contribute to a package. I’ve heard the call to arms a few times, “Update the documentation, fix the bugs, build new features!”. It’s the Emerald City of the open source world. Being able to give back to the community that gives me something I enjoy so much was something that couldn’t be ignored.

But where to start? PyPI has over 100,000 packages, how does one just randomly pick one? Do I pick a well known product? Surely they can always use the help, but the large packages are so refined. With dozens of contributors already helping how could I possibly add something of value? Is it better to go for a smaller package? Find one that still needs work?

Finding the Project

Then I heard about A website just for people like me who want to help, but need help finding the right place.

So off I went to find someone who I could possibly offer assistance. Pages and pages of packages that needed help, but I couldn’t even understand what half of the issues were. Again came the thought… “Who am I to think I have anything to offer?”

So I backed off again.

Then on my favourite podcast, I heard of a newish package (newish when the episode was released 3 years prior anyway). The package sounded awesome, and the creator was a local to me. How awesome was this, he finished his interview with a call to action: “Come and help us, fix the documentation, fix the bugs, create new features”. Yeah, he sounded like everyone else. Then he said the piece that ignited my enthusiasm.

“We want anyone, even if they have no experience, we are happy to mentor them. Contact me directly even”.

I was hooked. I got to work, and I looked up the package. I found some PyCon talks and watched them. The same call to arms was repeated. PyCon 2018. PyCon 2019.

Finally, someone who wanted my help and was willing to help me help them.

Taking Action

So I jumped on my gmail and emailed the creator directly. “I’m here, I don’t know much but I want to help”. Maybe I said a few more words than that, but that was the gist of it.

I got a reply back the same day (local, but a few hours time difference). “Thanks for contacting us, is there anything you specifically wanted to help with? Do you have any particular skill sets?”.

My heart sank.

I had failed at my first break. I realised straight away where I had gone wrong: I hadn’t provided any useful information.

So I got back on my email and tried to be more useful. “I’ve got some Python skills, plenty of Windows experience (based on 20 years of desktop experience) but I have a distinct lack of Github experience”.

That was better. In response, I received meaningful direction to a particular part of the package with its own issue log, and a suggestion that I definitely look into Github more.

As much as my first reading of the dev’s email was harsh, it really wasn’t. He was busy, to the point and gave me solid direction. I respect him more now and really want to help his product.

I started the next day. I’m now watching YouTube videos about contributing to open source; am trying out the product to see how it works and am browsing their issue log to see where I can offer any assistance.

Lesson Learned

The lesson here I really want to pass on (if anything), is that wanting to contribute is great, but take a look at your skill sets first and get them in order. The core devs behind packages run their own lives and work full time jobs on top of the Free and Open Source Software (FOSS) work they do. If you reach out, be concise, clear and direct.

Keep Calm and Code in Python!


Planet Python

Ned Batchelder: Corporations and open source: why and how

Here’s a really simplistic model: if you want someone to do something, you have to give them a compelling reason to do it, and you have to make it as easy as possible for them to do it. That is, you need to have good answers to Why? and How? (I don’t know much about marketing, but I think these are the value proposition and the call to action.)

Let’s look at the Why and How model as it applies to corporations funding open source. They don’t do it because the answers to Why and How are really bad right now.

Why should a corporation fund open source? As much as I wish it were different for all sorts of reasons, corporations act only in purely selfish ways. In order to spend money, they need to see some positive benefit to them that wouldn’t happen if they didn’t spend the money.

This frustrates me because a corporation is a collection of people, none of whom would act this way. I could say much more about this, but we aren’t going to be able to change corporations.

Companies only spend money if doing so will bring them a (perceived) benefit. Funding open source would make it stronger and better, but that is a very long effect, and not one that accrues directly to the funder. This is the famous Tragedy of the Commons. It’s a fair question for companies to ask: if they fund open source, what do they get for their money?

That’s the difficulty with Why, but let’s imagine for a moment that we could somehow convince someone to spend their company’s money funding open source: now what? How do they do it? A significant Python project could have a hundred library dependencies. How do they decide how to allocate the funding budget among them? Once that decision is made, how does the money get delivered? Very few open source project are equipped to receive funds. If even 10% of the projects have a clear path for funding, now there are 10 checks to write, or 10 PayPal links to click through or whatever? Some of that money will need to be sent internationally, and it has to be considered at tax time. Does it have to be done again next year, and the year after that? It’s a logistical nightmare!

So when we try to convince companies to fund open source, we don’t have good answers for either Why? or How? It’s no wonder it doesn’t happen.

This is one of the reasons I am optimistic about Tidelift: they have good answers for both of these questions. The Tidelift subscription gives companies information and services around their open source dependencies, which answers the why. And the payment to Tidelift solves the how: Tidelift looks at the list of dependencies, decides an allocation, and distributes the money to the maintainers.

Sure, there are still lots of questions to be answered: is the allocation algorithm right? Will enough companies subscribe to make Tidelift itself sustainable? And even larger questions, like: if an interesting amount of money does flow to open source maintainers, what will be the cultural change in open source?

I don’t know the answers to those questions, but Tidelift seems like the most promising answer to how to support open source. I’m an enthusiastic participant. You should be too.

Planet Python

Understanding Tableau Prep and Conductor: Connecting to a Data Source with Builder

Understanding Tableau Prep and Conductor: Connecting to a Data Source

Before you begin building a workflow using Tableau Prep, it’s helpful to know a little bit about the data source(s) you need to connect to. Consider the following:

  • What kind of source are you connecting to?
  • How large are the files? Record counts? Rows/columns?
  • Are you familiar with the data structures?
  • Do you know of any data quality issues?
  • How frequently will you be updating the data?

Understanding the Basics of Your Data Sources

Understanding the basic data structure and size, as well as the granularity of different sources, helps you plan the flow in your mind before you get into the detailed challenges that the transformation of the raw data sources poses.

In the example I’m drawing upon for this series, I’m using a version of Superstore data I created, along with public data from the Census Bureau. I’m going to create a workflow that will combine four tables containing annual sales data and a single dimension table that will be joined to provide regional manager names. These will be joined with a population dataset from the Census, enabling us to normalize sales for population for each state that had sales.

Connecting to the Sales Data

The data used in this example comes from two different spreadsheets: one that contains four (4) sales worksheets and one (1) Regional Manager worksheet, and another spreadsheet containing the census data.

Experienced Tableau Desktop users should be familiar with the Superstore dataset. In this spreadsheet, I’ve separated each year’s sale into its own worksheet. This data could have been in a text file or a database:

Superstore data for Tableau Prep

The Census data provides population estimates for each state for corresponding years:

Census data for Tableau Prep

Because the world isn’t perfect, we will have to deal with data quality issues in these files, different aggregations of the data, union different files, join files, pivot the data and re-aggregate the data. There are also inconsistencies within specific fields that will have to be cleaned. The datasets are small, but every kind of data transformation step that Tableau Prep provides will have to be utilized to prepare the data for analysis in Tableau Desktop. We will also create a calculation in Prep to normalize sales by state and year for the population in each state.

That data is not perfectly clean, and some of the structures aren’t right. That’s the real world. We’ll use Tableau Prep to address all of the issues and create a clean dataset for other people to use.

Connecting to the Superstore Sales Data

In this first video, you’ll see how to make a data connection to an initial data source and then add other files to that data source. We’ll make the following connections:

  1. Connect to the four sales tables
  2. Demonstrate a wildcard union
  3. Demonstrate a manual union

Using the Wildcard Union in Tableau Prep

Wildcard unions offer an efficient way to bring together many different tables with similar naming conventions that also have consistent column and row structures. If you’re working with unfamiliar datasets that may have data inconsistencies, I believe creating a union manually gives you more direct control and may make it easier for you to deal with data quality issues that emerge as you join each table.

Using the Manual Union in Tableau Prep

I like using manual unions when I’m working with a new dataset because it’s easier to identify mismatched field-naming conventions. The inconsistent field names (Sales vs. Revenue) didn’t appear until I brought in the 2018 sales data. The visual cues that Tableau Prep provided, and the filtering in the data grid for mismatched fields, made it very easy to find and merge two different fields that were actually both holding sales data.

It was also easy to make minor changes using the Profile Cards for specific fields. I used that to remove the Table Names field, which Builder adds automatically when you union data. I don’t want to see that information in my output from this workflow, so I removed it. In addition, because Row ID is a number, Builder treated it as a number in the profile pane and generated a histogram in that field’s profile card. I wanted to validate that the Row ID field is a unique key for this dataset, so I changed the field to a string, and the profile card was changed in a way that made it easy to see every Row ID is, in fact, a single row of the data.

In the next post in this series, I’ll show you how to add a cleaning step to make additional modifications to fields.

The post Understanding Tableau Prep and Conductor: Connecting to a Data Source with Builder appeared first on InterWorks.