Podcast.__init__: Open Source Automated Machine Learning With MindsDB

Machine learning is growing in popularity and capability, but for a majority of people it is still a black box that we don’t fully understand. The team at MindsDB is working to change this state of affairs by creating an open source tool that is easy to use without a background in data science. By simplifying the training and use of neural networks, and making their logic explainable, they hope to bring AI capabilities to more people and organizations. In this interview George Hosu and Jorge Torres explain how MindsDB is built, how to use it for your own purposes, and how they view the current landscape of AI technologies. This is a great episode for anyone who is interested in experimenting with machine learning and artificial intelligence. Give it a listen and then try MindsDB for yourself.

Summary

Machine learning is growing in popularity and capability, but for a majority of people it is still a black box that we don’t fully understand. The team at MindsDB is working to change this state of affairs by creating an open source tool that is easy to use without a background in data science. By simplifying the training and use of neural networks, and making their logic explainable, they hope to bring AI capabilities to more people and organizations. In this interview George Hosu and Jorge Torres explain how MindsDB is built, how to use it for your own purposes, and how they view the current landscape of AI technologies. This is a great episode for anyone who is interested in experimenting with machine learning and artificial intelligence. Give it a listen and then try MindsDB for yourself.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $ 20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Podcast.init listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $ 300 off is available until July 26th, with early bird pricing for up to $ 200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing George Hosu and Jorge Torres about MindsDB, a framework for streamlining the use of neural networks

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what MindsDB is and the problem that it is trying to solve?
    • What was the motivation for creating the project?
  • Who is the target audience for MindsDB?
  • Before we go deep into MindsDB can you explain what a neural network is for anyone who isn’t familiar with the term?
  • For someone who is using MindsDB can you talk through their workflow?
    • What are the types of data that are supported for building predictions using MindsDB?
    • How much cleaning and preparation of the data is necessary before using it to generate a model?
    • What are the lower and upper bounds for volume and variety of data that can be used to build an effective model in MindsDB?
  • One of the interesting and useful features of MindsDB is the built in support for explaining the decisions reached by a model. How do you approach that challenge and what are the most difficult aspects?
  • Once a model is generated, what is the output format and can it be used separately from MindsDB for embedding the prediction capabilities into other scripts or services?
  • How is MindsDB implemented and how has the design changed since you first began working on it?
    • What are some of the assumptions that you made going into this project which have had to be modified or updated as it gained users and features?
  • What are the limitations of MindsDB and what are the cases where it is necessary to pass a task on to a data scientist?
  • In your experience, what are the common barriers for individuals and organizations adopting machine learning as a tool for addressing their needs?
  • What have been the most challenging, complex, or unexpected aspects of designing and building MindsDB?
  • What do you have planned for the future of MindsDB?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA


Planet Python

PyBites: The First Step in Contributing to Open Source Projects

Have you ever wanted to contribute to open source but weren’t sure how to get started? Marc found himself in just that situation. Sometimes it all comes down to taking that first step.

Backstory

I’ve been “programming” in Python for a while now, enough to be dangerous as they say. I’ve learned enough to be able to help others from time to time, but new enough that I still get distracted by the next shiny package I hear about on a podcast.

I recently decided that, to help progress me further and to give back a little, I would help contribute to a package. I’ve heard the call to arms a few times, “Update the documentation, fix the bugs, build new features!”. It’s the Emerald City of the open source world. Being able to give back to the community that gives me something I enjoy so much was something that couldn’t be ignored.

But where to start? PyPI has over 100,000 packages, how does one just randomly pick one? Do I pick a well known product? Surely they can always use the help, but the large packages are so refined. With dozens of contributors already helping how could I possibly add something of value? Is it better to go for a smaller package? Find one that still needs work?

Finding the Project

Then I heard about codetriage.com. A website just for people like me who want to help, but need help finding the right place.

So off I went to find someone who I could possibly offer assistance. Pages and pages of packages that needed help, but I couldn’t even understand what half of the issues were. Again came the thought… “Who am I to think I have anything to offer?”

So I backed off again.

Then on my favourite podcast, I heard of a newish package (newish when the episode was released 3 years prior anyway). The package sounded awesome, and the creator was a local to me. How awesome was this, he finished his interview with a call to action: “Come and help us, fix the documentation, fix the bugs, create new features”. Yeah, he sounded like everyone else. Then he said the piece that ignited my enthusiasm.

“We want anyone, even if they have no experience, we are happy to mentor them. Contact me directly even”.

I was hooked. I got to work, and I looked up the package. I found some PyCon talks and watched them. The same call to arms was repeated. PyCon 2018. PyCon 2019.

Finally, someone who wanted my help and was willing to help me help them.

Taking Action

So I jumped on my gmail and emailed the creator directly. “I’m here, I don’t know much but I want to help”. Maybe I said a few more words than that, but that was the gist of it.

I got a reply back the same day (local, but a few hours time difference). “Thanks for contacting us, is there anything you specifically wanted to help with? Do you have any particular skill sets?”.

My heart sank.

I had failed at my first break. I realised straight away where I had gone wrong: I hadn’t provided any useful information.

So I got back on my email and tried to be more useful. “I’ve got some Python skills, plenty of Windows experience (based on 20 years of desktop experience) but I have a distinct lack of Github experience”.

That was better. In response, I received meaningful direction to a particular part of the package with its own issue log, and a suggestion that I definitely look into Github more.

As much as my first reading of the dev’s email was harsh, it really wasn’t. He was busy, to the point and gave me solid direction. I respect him more now and really want to help his product.

I started the next day. I’m now watching YouTube videos about contributing to open source; am trying out the product to see how it works and am browsing their issue log to see where I can offer any assistance.

Lesson Learned

The lesson here I really want to pass on (if anything), is that wanting to contribute is great, but take a look at your skill sets first and get them in order. The core devs behind packages run their own lives and work full time jobs on top of the Free and Open Source Software (FOSS) work they do. If you reach out, be concise, clear and direct.


Keep Calm and Code in Python!

Marc

Planet Python

Ned Batchelder: Corporations and open source: why and how

Here’s a really simplistic model: if you want someone to do something, you have to give them a compelling reason to do it, and you have to make it as easy as possible for them to do it. That is, you need to have good answers to Why? and How? (I don’t know much about marketing, but I think these are the value proposition and the call to action.)

Let’s look at the Why and How model as it applies to corporations funding open source. They don’t do it because the answers to Why and How are really bad right now.

Why should a corporation fund open source? As much as I wish it were different for all sorts of reasons, corporations act only in purely selfish ways. In order to spend money, they need to see some positive benefit to them that wouldn’t happen if they didn’t spend the money.

This frustrates me because a corporation is a collection of people, none of whom would act this way. I could say much more about this, but we aren’t going to be able to change corporations.

Companies only spend money if doing so will bring them a (perceived) benefit. Funding open source would make it stronger and better, but that is a very long effect, and not one that accrues directly to the funder. This is the famous Tragedy of the Commons. It’s a fair question for companies to ask: if they fund open source, what do they get for their money?

That’s the difficulty with Why, but let’s imagine for a moment that we could somehow convince someone to spend their company’s money funding open source: now what? How do they do it? A significant Python project could have a hundred library dependencies. How do they decide how to allocate the funding budget among them? Once that decision is made, how does the money get delivered? Very few open source project are equipped to receive funds. If even 10% of the projects have a clear path for funding, now there are 10 checks to write, or 10 PayPal links to click through or whatever? Some of that money will need to be sent internationally, and it has to be considered at tax time. Does it have to be done again next year, and the year after that? It’s a logistical nightmare!

So when we try to convince companies to fund open source, we don’t have good answers for either Why? or How? It’s no wonder it doesn’t happen.

This is one of the reasons I am optimistic about Tidelift: they have good answers for both of these questions. The Tidelift subscription gives companies information and services around their open source dependencies, which answers the why. And the payment to Tidelift solves the how: Tidelift looks at the list of dependencies, decides an allocation, and distributes the money to the maintainers.

Sure, there are still lots of questions to be answered: is the allocation algorithm right? Will enough companies subscribe to make Tidelift itself sustainable? And even larger questions, like: if an interesting amount of money does flow to open source maintainers, what will be the cultural change in open source?

I don’t know the answers to those questions, but Tidelift seems like the most promising answer to how to support open source. I’m an enthusiastic participant. You should be too.

Planet Python

Open Apps with custom Shortcuts in macOS

Someone on the MacAdmins Slack recently asked how you could assign a global keyboard short cut to open Terminal on macOS.

Note: alternative terminal applications such as iTerm2 may have this built-in.

macOS has an option to assign custom global keystrokes to pretty much anything, but it is not obvious how to get there.

  • First, open the Automator application. In the chooser for a new Workflow, choose ‘Quick Action’ (on Mojave) or ‘Service’ on earlier versions of macOS.
The new Workflow chooser in Mojave
The new Workflow chooser in Mojave
  • In the new workflow configure the input to be ‘no input’ and the application to be ‘any application.’
  • Then search for ‘Launch Application’ action in the library pane on the left and add it to your workflow by double-clicking or dragging.
  • The popup menu where you can slect an application in the action will only show applications from the /Applications folder. Choose ‘Other…’ and select Terminal in the ’/Applications/Utilities` folder.
Configure your workflow
Configure your workflow
  • Save the workflow. Give it a meaningful name such as ‘Open Terminal.’ Since you chose Quick Action or Service, this workflow will be saved in ~/Library/Services.
  • Open System Preferences > Keyboard. Click the ‘Shortcuts’ tab and select ‘Services’ from the list on the left side. (Even on Mojave, it is still called ‘Services’.)
  • Scroll all the way down the list of services under the ‘General’ heading, you should find the service you just created. Select it and click ‘Add Shortcut’ to assign a global shortcut.
Keyboard Shortcut Preferences
Keyboard Shortcut Preferences
  • You are done!

When the active application uses the same keystroke, the application’s definition will precede your global shortcut.

Of course, you don’t have stop at launching applications. You can assign a global keyboard shortcut to any Automator workflow this way. Since Automator workflows can include AppleScript, Python or shell scripts, you can do pretty much anything this way!

However, most Apple users don’t bother with shortcuts to launch apps. Just invoke Spotlight with command-space and start typing term and hit return.

Scripting OS X