This week we welcome Valentin Haenel (@esc___) as our PyDev of the Week! Valentin is a core developer of Numba and several other packages that you can see either on his website or on Github. He has also given several talks at various conferences in Europe. Let’s spend some time getting to know Valentin better!
Can you tell us a little about yourself (hobbies, education, etc):
I went to the University of Edinburgh to get a bachelor in computer science and to the Bernstein Center in Berlin to get a master in computational neuroscience. I tend to favour more traditional computer science topics these days such as compression algorithms and compilers. In my spare time, I spend time with my lovely wife Gloria, fly quad-line sports kites and ride longboards through Berlin. I’ve been doing Python and open-source on Github for about 10 years.
Why did you start using Python?
I first started using Python as part of my Masters program. Python was—and still is—quite popular in computational neuroscience, both for doing machine learning on sensor data such as EEG and fMRI and also for simulating neural models and networks of neurons. I had been using Java before and it took some getting used to the dynamic (duck) typing style. As part of the academic work I came in touch with the early scientific stack, which at the time consisted mostly of Numpy, Scipy, Matplotlib and the command-line IPython shell. Some of my earliest Python work from that time still survives. A project I did to simulate spiking neurons using a specific type of model:
https://github.com/esc/molif — this was my first github repo ever.
Also from that time is the first of my packages to make it into Debian, a Python interface to a specific type of hardware photometer. In fact, I just checked on this Ubuntu machine (Mar 2019), the package is still available:
$ apt search pyoptical Sorting... Done Full Text Search... Done python-pyoptical/bionic,bionic 0.4-1.1 all python interface to the CRS 'OptiCAL' photometer :)
What other programming languages do you know and which is your favorite?
I know a little C, shell, go and Java, but Python is by far my favorite though. A friend of mine is working on a secret programming language project called ‘@’, which aims to be… well… runtime only — very intriguing.
What projects are you working on now?
I am now working on Numba for Anaconda Inc. Besides that, I am also working on Blosc (http://blosc.org/) including python-blosc and Bloscpack. In addition, there are a few smaller, but somewhat popular, projects that I run by myself, namely wiki2beamer, git-big-picture, conda-zsh-complation and yadoma. Most recently, I have been getting more interested in time tracking and stared using and contributing to Færeld.
Which Python libraries are your favorite (core or 3rd party)?
I have always had an interest in crafting command line interfaces. I have looked into many libraries for this task such as getopt, optparse, argparse, bup/options.py, miniparser, opster, blargs, plac, begins and click (have I forgot any?!). However, the one library that I keep coming back to and the one that I recommend over all others is docopt: http://docopt.org/ . There is something to be said about designing your command line interface as a program synopsis and then getting a fully fledged parser from just that. For me personally, this is the fastest and most natural, intuitive and convenient way to construct a command line argument parser. If you are not aware of it yet, you should definitely go and check it out!
How you get involved with Numba?
I saw an opening for a software engineering position at Anaconda Inc. doing mostly open source work on Numba. Working low-level and on a compiler is right down my alley and was something I had been wanting to do for a very long time. I applied, they made me an offer, the rest is history.
Can you explain why you would use Numba versus PyPy or Cython?
Cython is a superset of Python, it has additional syntax that allows for static typing which will then compile the code to run at C-speed, a.k.a to “cythonize” the code. That code can no longer run as regular Python code. Numba is much less invasive than this but has a similar goal. It provides the `@jit` decorator which allows Numba to perform Just In Time (JIT) type inference and compilation using the LLVM compiler infrastructure under the hood. Importantly, it does this on the Python bytecode, does not require any types to be annotated and the code can continue to run as regular Python (once you comment out the `@jit` decorator.) This has the advantage, that you can ship portable numeric code as pure Python with only Numba as a dependency which will significantly reduce your packaging and distribution overhead. Both Cython and Numba have been used in the scientific space traditionally. This is because they interact well with the existing ecosystem, the native libraries (where Cython can even interface with C++ which Numba can not) and are designed to be strongly aware of Numpy. So those are the ones you would use when working in that space: for example machine learning and broadly speaking any scientific algorithms and simulations. PyPy on the other hand has traditionally not had good support for the whole scientific stack. It is a bit better nowadays (early 2019) as both Numpy and Pandas can be compiled and a lot of work has gone into
making c-extensions work in PyPy.
Anyway, the primary goal of PyPy focuses on moving beyond CPython (the C implementation of the Python interpreter) as a base for a Python programs and it is slowly but surely getting there.
So, in conclusion: PyPy is the future of the Python language in general but it is not quite ready for data-intensive applications. If you want to have as much computational efficiency as possible today, then both Numba and Cython are good choices. Numba is very easy to try out—just decorate your bottlenecks—and has been known to accelerate code by one or two orders of magnitude.
What advice do you have for new people who want to start helping an open source project?
Go find yourself an itch; find a project in whatever your favorite language that you find personally useful and improve it. Then, contribute your changes back. Chances are, if it is useful for you, it will be useful for other people. Also, because it is useful to you personally, you are likely to continue contributing to it because you end up having a vested interest in it. And so obviously, personal utilities are a great category to go looking for such tools. Find something that is useful for you on a day-to-day basis and contribute to that. Also, don’t be afraid to put your code out there in the open and don’t let yourself be discouraged if your contributions are rejected, you are just at the beginning of your journey, so keep going. Good luck!
Is there anything else you’d like to say?
A big thank you goes out to all the open source/free software developers and contributors out there. I am very proud to be a part of this fantastic and inspirational community.
Thanks for doing the interview, Valentin!