Speed Matters: Python and Lua

Python is great, but pure Python code sometimes has one problem: It’s slow.

Fortunately there are several great solutions to improve the performance, like Numpy, Cython, Numba, Pypy.

All of the above solutions have different drawbacks:

Numpy and Numba are big modules. In addition Numpy is not always fast enough. Pypy is not 100% compatible and a heavy solution if it is used in addition to CPython. Cython is complex and you need a C-compiler. Recently I’ve found one more solution which maybe is not that well known: Lua integration in Python programs.

Lua is another scripting language with dynamic data types.

So I asked myself:

Does it make sense to have another scripting language inside Python scripts?

Let’s have a look at a simple example: Mandelbrot

First of all the pure Python example:

from numpy import complex, arange  def PyMandelbrotIter(c):     z = 0     for iters in range(200):         if abs(z) >= 2:             return iters         z = z ** 2 + c     return iters  def PyMandelbrot(size):     image = Image.new('RGB', (size, size))     pix = image.load()      t1 = time.time()     xPts = arange(-1.5, 0.5, 2.0 / size)     yPts = arange(-1, 1, 2.0 / size)      for xx, x in enumerate(xPts):         for yy, y in enumerate(yPts):             pix[xx, yy] = PyMandelbrotIter(complex(x, y))     dt = time.time() - t1     print(f"dt={dt:.2f}")     image.show() 

Runtimes of this example on a Core i7 laptop with Python 3.7 and Windows 10:

Size dt [s]
320 3.32
640 13.54
1280 55.59

Now the Lua example integrated in a Python script:

from lupa import LuaRuntime  lua_code = '''\ function(N, i, total)   local char, unpack = string.char, table.unpack   local result = ""   local M, ba, bb, buf = 2/N, 2^(N%8+1)-1, 2^(8-N%8), {}   local start_line, end_line = N/total * (i-1), N/total * i - 1   for y=start_line,end_line do     local Ci, b, p = y*M-1, 1, 0     for x=0,N-1 do       local Cr = x*M-1.5       local Zr, Zi, Zrq, Ziq = Cr, Ci, Cr*Cr, Ci*Ci       b = b + b       for i=1,49 do         Zi = Zr*Zi*2 + Ci         Zr = Zrq-Ziq + Cr         Ziq = Zi*Zi         Zrq = Zr*Zr         if Zrq+Ziq > 4.0 then b = b + 1; break; end       end       if b >= 256 then p = p + 1; buf[p] = 511 - b; b = 1; end     end     if b ~= 1 then p = p + 1; buf[p] = (ba-b)*bb; end       result = result .. char(unpack(buf, 1, p))     end     return result end '''  def LuaMandelbrot(thrCnt, size):      def LuaMandelbrotFunc(i, lua_func):         results[i] = lua_func(size, i + 1, thrCnt)      t1 = time.time()     lua_funcs = [LuaRuntime(encoding=None).eval(lua_code) for _ in range(thrCnt)]      results = [None] * thrCnt      threads = [threading.Thread(target=LuaMandelbrotFunc, args=(i, lua_func))                for i, lua_func in enumerate(lua_funcs)]     for thread in threads:         thread.start()     for thread in threads:         thread.join()      result_buffer = b''.join(results)     dt = time.time() - t1     print(f"dt={dt:.2f}")      image = Image.frombytes('1', (size, size), result_buffer)     image.show() 

Runtimes of this example on a performance laptop with Python 3.7 and Windows 10:

Size dt [s]
1 thread
dt [s]
2 threads
dt [s]
4 threads
dt [s]
8 threads
dt [s]
16 threads
320 0.22 0.11 0.07 0.06 0.04
640 0.68 0.38 0.26 0.21 0.17
1280 2.71 1.50 1.05 0.81 0.66

The above results are very impressive. As you can see Lua is really much faster. And as you can also see: You can parallelize with threads!

The module lupa, which comes with a Lua interpreter and a JIT compiler, is a very interesting alternative for speeding up long running tasks.

The Lua solution has the following advantages:

  • The lupa module is very small.
  • Lua is much faster than Python.
  • You can run Lua scripts in parallel with threads.
  • Lua is very easy to read and code.
  • You can easily integrate Lua scripts in Python code.
  • You can easily access Python objects within Lua and vice versa.
  • There are many extension modules available for Lua (~2600, see luarocks.org).

Give lupa a try. It’s easy to use and really great!

Planet Python

PSF GSoC students blogs: [Blog #4] Need For Speed

<meta charset=”utf-8″>Hey! This is my fifth blog post for GSoC 2019, covering week 7 and 8.

The most of week 7 was spent making Protego compatible with Google’s parser. I also worked on the documentation, since Protego codebase is small enough, proper comments and a good readme was sufficient. I uploaded Protego to PyPI – `pip install Protego` that’s all it takes to install Protego. 

Week 8 was quite interesting. For Protego to become default in Scrapy, it is necessary that it doesn’t throw any kind of error while parsing `robots.txt` files. To make sure that, I decided to download `robots.txt` from top 10,000 websites. I added tests to see if Protego throws any exceptions while parsing the downloaded `robots.txt`. I benchmarked Protego, and the results were quite disappointing. You can see the result here. 

We decided to spend the next week improving performance of Protego. I am going to try profiling and heuristics, and see if the performance can be improved.

Planet Python

How to Use a Remote Docker Server to Speed Up Your Workflow


Building CPU-intensive images and binaries is a very slow and time-consuming process that can turn your laptop into a space heater at times. Pushing Docker images on a slow connection takes a long time, too. Luckily, there’s an easy fix for these issues. Docker lets you offload all those tasks to a remote server so your local machine doesn’t have to do that hard work.

This feature was introduced in Docker 18.09. It brings support for connecting to a Docker host remotely via SSH. It requires very little configuration on the client, and only needs a regular Docker server without any special config running on a remote machine. Prior to Docker 18.09, you had to use Docker Machine to create a remote Docker server and then configure the local Docker environment to use it. This new method removes that additional complexity.

In this tutorial, you’ll create a Droplet to host the remote Docker server and configure the docker command on your local machine to use it.


To follow this tutorial, you’ll need:

  • A DigitalOcean account. You can create an account if you don’t have one already.
  • Docker installed on your local machine or development server. If you are working with Ubuntu 18.04, follow Steps 1 and 2 of How To Install and Use Docker on Ubuntu 18.04; otherwise, follow the official documentation for information about installing on other operating systems. Be sure to add your non-root user to the docker group, as described in Step 2 of the linked tutorial.

Step 1 – Creating the Docker Host

To get started, spin up a Droplet with a decent amount of processing power. The CPU Optimized plans are perfect for this purpose, but Standard ones work just as well. If you will be compiling resource-intensive programs, the CPU Optimized plans provide dedicated CPU cores which allow for faster builds. Otherwise, the Standard plans offer a more balanced CPU to RAM ratio.

The Docker One-click image takes care of all of the setup for us. Follow this link to create a 16GB/8vCPU CPU-Optimized Droplet with Docker from the control panel.

Alternatively, you can use doctl to create the Droplet from your local command line. To install it, follow the instructions in the doctl README file on GitHub.

The following command creates a new 16GB/8vCPU CPU-Optimized Droplet in the FRA1 region based on the Docker One-click image:

  • doctl compute droplet create docker-host \
  • --image docker-18-04 \
  • --region fra1 \
  • --size c-8 \
  • --wait \
  • --ssh-keys $ (doctl compute ssh-key list --format ID --no-header | sed 's/$ /,/' | tr -d '\n' | sed 's/,$ //')

The doctl command uses the ssh-keys value to specify which SSH keys it should apply to your new Droplet. We use a subshell to call doctl compute ssh-key-list to retrieve the SSH keys associated with your DigitalOcean account, and then parse the results using the sed and tr commands to format the data in the correct format. This command includes all of your account’s SSH keys, but you can replace the highlighted subcommand with the fingerprint of any key you have in your account.

Once the Droplet is created you’ll see its IP address among other details:

ID Name Public IPv4 Private IPv4 Public IPv6 Memory VCPUs Disk Region Image Status Tags Features Volumes 148681562 docker-host your_server_ip 16384 8 100 fra1 Ubuntu Docker 5:18.09.6~3 on 18.04 active

You can learn more about using the doctl command in the tutorial How To Use doctl, the Official DigitalOcean Command-Line Client.

When the Droplet is created, you’ll have a ready to use Docker server. For security purposes, create a Linux user to use instead of root.

First, connect to the Droplet with SSH as the root user:

  • ssh root@your_server_ip

Once connected, add a new user. This command adds one named sammy:

  • adduser sammy

Then add the user to the docker group to give it permission to run commands on the Docker host.

  • sudo usermod -aG docker sammy

Finally, exit from the remote server by typing exit.

Now that the server is ready, let’s configure the local docker command to use it.

Step 2 – Configuring Docker to Use the Remote Host

To use the remote host as your Docker host instead of your local machine, set the DOCKER_HOST environment variable to point to the remote host. This variable will instruct the Docker CLI client to connect to the remote server.

  • export DOCKER_HOST=ssh://sammy@your_server_ip

Now any Docker command you run will be run on the Droplet. For example, if you start a web server container and expose a port, it will be run on the Droplet and will be accessible through the port you exposed on the Droplet’s IP address.

To verify that you’re accessing the Droplet as the Docker host, run docker info.

  • docker info

You will see your Droplet’s hostname listed in the Name field like so:

… Name: docker-host

One thing to keep in mind is that when you run a docker build command, the build context (all files and folders accessible from the Dockerfile) will be sent to the host and then the build process will run. Depending on the size of the build context and the amount of files, it may take a longer time compared to building the image on a local machine. One solution would be to create a new directory dedicated to the Docker image and copy or link only the files that will be used in the image so that no unneeded files will be uploaded inadvertently.


You’ve created a remote Docker host and connected to it locally. The next time your laptop’s battery is running low or you need to build a heavy Docker image, use your shiny remote Docker server instead of your local machine.

You might also be interested in learning how to optimize Docker images for production, or how to optimize them specifically for Kubernetes.

DigitalOcean Community Tutorials

Continuum Analytics Blog: AnacondaCON 2019 Day 3 Recap: The Need for Speed, “Delightful UX” in Dev Tools, LOTR Jokes and More.

Everyone at Anaconda is still feeling the love AnacondaCON 2019. Day 3 wrapped up last Friday with one more day of talks and sessions, highlighted by some powerhouse keynotes. Let’s get right to the good…

The post AnacondaCON 2019 Day 3 Recap: The Need for Speed, “Delightful UX” in Dev Tools, LOTR Jokes and More. appeared first on Anaconda.

Planet Python

pythonwise: Speed: Default value vs checking for None

Python’s dict has a get method. It’ll either return an existing value for a given key or return a default value if the key is not in the dict. It’s very tempting to write code like val = d.get(key, Object()), however you need to think about the performance implications. Since function arguments are evaluated before calling the function, this means the a new Object will be created regardless if it’s in the dict or not. Let’s see this affects performance.

get_default will create new Point every time and get_none will create only if there’s no such object, it works since or evaluate it’s arguments lazily and will stop once the first one is True.

First we’ll try with a missing key:

In [1]: %run default_vs_none.py                                     
In [2]: locations = {}  # name -> Location 
In [3]: %timeit get_default(locations, ‘carmen’)
384 ns ± 2.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit get_none(locations, ‘carmen’)
394 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Not so much difference. However if the key exists:
In [5]: locations[‘carmen’] = Location(7, 3)
In [6]: %timeit get_default(locations, ‘carmen’)                 
377 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit get_none(locations, ‘carmen’)
135 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
We get much faster results.

Planet Python