Real Python: Functional Programming in Python

In this course, you’ll learn how to approach functional programming in Python. You’ll start with the absolute basics of Functional Programming (FP). After that, you’ll see hands-on examples for common FP patterns available, like using immutable data structures and the filter(), map(), and reduce() functions. You’ll end the course with actionable tips for parallelizing your code to make it run faster.

You’ll cover:

  1. What functional programming is
  2. How you can use immutable data structures to represent your data
  3. How to use filter(), map(), and reduce()
  4. How to do parallel processing with multiprocessing and concurrent.futures

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Planet Python

Real Python: How to Use Redis With Python

In this tutorial, you’ll learn how to use Python with Redis (pronounced RED-iss, or maybe REE-diss or Red-DEES, depending on who you ask), which is a lightning fast in-memory key-value store that can be used for anything from A to Z. Here’s what Seven Databases in Seven Weeks, a popular book on databases, has to say about Redis:

It’s not simply easy to use; it’s a joy. If an API is UX for programmers, then Redis should be in the Museum of Modern Art alongside the Mac Cube.

And when it comes to speed, Redis is hard to beat. Reads are fast, and writes are even faster, handling upwards of 100,000 SET operations per second by some benchmarks. (Source)

Intrigued? This tutorial is built for the Python programmer who may have zero to little Redis experience. We’ll tackle two tools at once and introduce both Redis itself as well as one of its Python client libraries, redis-py.

redis-py (which you import as just redis) is one of many Python clients for Redis, but it has the distinction of being billed as “currently the way to go for Python” by the Redis developers themselves. It lets you call Redis commands from Python, and get back familiar Python objects in return.

In this tutorial, you’ll cover:

  • Installing Redis from source and understanding the purpose of the resulting binaries
  • Learning a bite-size slice of Redis itself, including its syntax, protocol, and design
  • Mastering redis-py while also seeing glimpses of how it implements Redis’ protocol
  • Setting up and communicating with an Amazon ElastiCache Redis server instance

Free Bonus: Click here to get access to a chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

Installing Redis From Source

As my great-great-grandfather said, nothing builds grit like installing from source. This section will walk you through downloading, making, and installing Redis. I promise that this won’t hurt one bit!

Note: This section is oriented towards installation on Mac OS X or Linux. If you’re using Windows, there is a Microsoft fork of Redis that can be installed as a Windows Service. Suffice it to say that Redis as a program lives most comfortably on a Linux box and that setup and use on Windows may be finicky.

First, download the Redis source code as a tarball:

$   redisurl="http://download.redis.io/redis-stable.tar.gz" $   curl -s -o redis-stable.tar.gz $  redisurl 

Next, switch over to root and extract the archive’s source code to /usr/local/lib/:

$   sudo su root $   mkdir -p /usr/local/lib/ $   chmod a+w /usr/local/lib/ $   tar -C /usr/local/lib/ -xzf redis-stable.tar.gz 

Optionally, you can now remove the archive itself:

$   rm redis-stable.tar.gz 

This will leave you with a source code repository at /usr/local/lib/redis-stable/. Redis is written in C, so you’ll need to compile, link, and install with the make utility:

$   cd /usr/local/lib/redis-stable/ $   make && make install 

Using make install does two actions:

  1. The first make command compiles and links the source code.

  2. The make install part takes the binaries and copies them to /usr/local/bin/ so that you can run them from anywhere (assuming that /usr/local/bin/ is in PATH).

Here are all the steps so far:

$   redisurl="http://download.redis.io/redis-stable.tar.gz" $   curl -s -o redis-stable.tar.gz $  redisurl $   sudo su root $   mkdir -p /usr/local/lib/ $   chmod a+w /usr/local/lib/ $   tar -C /usr/local/lib/ -xzf redis-stable.tar.gz $   rm redis-stable.tar.gz $   cd /usr/local/lib/redis-stable/ $   make && make install 

At this point, take a moment to confirm that Redis is in your PATH and check its version:

$   redis-cli --version redis-cli 5.0.3 

If your shell can’t find redis-cli, check to make sure that /usr/local/bin/ is on your PATH environment variable, and add it if not.

In addition to redis-cli, make install actually leads to a handful of different executable files (and one symlink) being placed at /usr/local/bin/:

$   # A snapshot of executables that come bundled with Redis $   ls -hFG /usr/local/bin/redis-* | sort /usr/local/bin/redis-benchmark* /usr/local/bin/redis-check-aof* /usr/local/bin/redis-check-rdb* /usr/local/bin/redis-cli* /usr/local/bin/redis-sentinel@ /usr/local/bin/redis-server* 

While all of these have some intended use, the two you’ll probably care about most are redis-cli and redis-server, which we’ll outline shortly. But before we get to that, setting up some baseline configuration is in order.

Configuring Redis

Redis is highly configurable. While it runs fine out of the box, let’s take a minute to set some bare-bones configuration options that relate to database persistence and basic security:

$   sudo su root $   mkdir -p /etc/redis/ $   touch /etc/redis/6379.conf 

Now, write the following to /etc/redis/6379.conf. We’ll cover what most of these mean gradually throughout the tutorial:

# /etc/redis/6379.conf  port              6379 daemonize         yes save              60 1 bind              127.0.0.1 tcp-keepalive     300 dbfilename        dump.rdb dir               ./ rdbcompression    yes 

Redis configuration is self-documenting, with the sample redis.conf file located in the Redis source for your reading pleasure. If you’re using Redis in a production system, it pays to block out all distractions and take the time to read this sample file in full to familiarize yourself with the ins and outs of Redis and fine-tune your setup.

Some tutorials, including parts of Redis’ documentation, may also suggest running the Shell script install_server.sh located in redis/utils/install_server.sh. You’re by all means welcome to run this as a more comprehensive alternative to the above, but take note of a few finer points about install_server.sh:

  • It will not work on Mac OS X—only Debian and Ubuntu Linux.
  • It will inject a fuller set of configuration options into /etc/redis/6379.conf.
  • It will write a System V init script to /etc/init.d/redis_6379 that will let you do sudo service redis_6379 start.

The Redis quickstart guide also contains a section on a more proper Redis setup, but the configuration options above should be totally sufficient for this tutorial and getting started.

Security Note: A few years back, the author of Redis pointed out security vulnerabilities in earlier versions of Redis if no configuration was set. Redis 3.2 (the current version 5.0.3 as of March 2019) made steps to prevent this intrusion, setting the protected-mode option to yes by default.

We explicitly set bind 127.0.0.1 to let Redis listen for connections only from the localhost interface, although you would need to expand this whitelist in a real production server. The point of protected-mode is as a safeguard that will mimic this bind-to-localhost behavior if you don’t otherwise specify anything under the bind option.

With that squared away, we can now dig into using Redis itself.

Ten or So Minutes to Redis

This section will provide you with just enough knowledge of Redis to be dangerous, outlining its design and basic usage.

Getting Started

Redis has a client-server architecture and uses a request-response model. This means that you (the client) connect to a Redis server through TCP connection, on port 6379 by default. You request some action (like some form of reading, writing, getting, setting, or updating), and the server serves you back a response.

There can be many clients talking to the same server, which is really what Redis or any client-server application is all about. Each client does a (typically blocking) read on a socket waiting for the server response.

The cli in redis-cli stands for command line interface, and the server in redis-server is for, well, running a server. In the same way that you would run python at the command line, you can run redis-cli to jump into an interactive REPL (Read Eval Print Loop) where you can run client commands directly from the shell.

First, however, you’ll need to launch redis-server so that you have a running Redis server to talk to. A common way to do this in development is to start a server at localhost (IPv4 address 127.0.0.1), which is the default unless you tell Redis otherwise. You can also pass redis-server the name of your configuration file, which is akin to specifying all of its key-value pairs as command-line arguments:

$   redis-server /etc/redis/6379.conf 31829:C 07 Mar 2019 08:45:04.030 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 31829:C 07 Mar 2019 08:45:04.030 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=31829, just started 31829:C 07 Mar 2019 08:45:04.030 # Configuration loaded 

We set the daemonize configuration option to yes, so the server runs in the background. (Otherwise, use --daemonize yes as an option to redis-server.)

Now you’re ready to launch the Redis REPL. Enter redis-cli on your command line. You’ll see the server’s host:port pair followed by a > prompt:

127.0.0.1:6379> 

Here’s one of the simplest Redis commands, PING, which just tests connectivity to the server and returns "PONG" if things are okay:

127.0.0.1:6379> PING PONG 

Redis commands are case-insensitive, although their Python counterparts are most definitely not.

Note: As another sanity check, you can search for the process ID of the Redis server with pgrep:

$   pgrep redis-server 26983 

To kill the server, use pkill redis-server from the command line. On Mac OS X, you can also use redis-cli shutdown.

Next, we’ll use some of the common Redis commands and compare them to what they would look like in pure Python.

Redis as a Python Dictionary

Redis stands for Remote Dictionary Service.

“You mean, like a Python dictionary?” you may ask.

Yes. Broadly speaking, there are many parallels you can draw between a Python dictionary (or generic hash table) and what Redis is and does:

  • A Redis database holds key:value pairs and supports commands such as GET, SET, and DEL, as well as several hundred additional commands.

  • Redis keys are always strings.

  • Redis values may be a number of different data types. We’ll cover some of the more essential value data types in this tutorial: string, list, hashes, and sets. Some advanced types include geospatial items and the new stream type.

  • Many Redis commands operate in constant O(1) time, just like retrieving a value from a Python dict or any hash table.

Redis creator Salvatore Sanfilippo would probably not love the comparison of a Redis database to a plain-vanilla Python dict. He calls the project a “data structure server” (rather than a key-value store, such as memcached) because, to its credit, Redis supports storing additional types of key:value data types besides string:string. But for our purposes here, it’s a useful comparison if you’re familiar with Python’s dictionary object.

Let’s jump in and learn by example. Our first toy database (with ID 0) will be a mapping of country:capital city, where we use SET to set key-value pairs:

127.0.0.1:6379> SET Bahamas Nassau OK 127.0.0.1:6379> SET Croatia Zagreb OK 127.0.0.1:6379> GET Croatia "Zagreb" 127.0.0.1:6379> GET Japan (nil) 

The corresponding sequence of statements in pure Python would look like this:

>>>

>>> capitals = {} >>> capitals["Bahamas"] = "Nassau" >>> capitals["Croatia"] = "Zagreb" >>> capitals.get("Croatia") 'Zagreb' >>> capitals.get("Japan")  # None 

We use capitals.get("Japan") rather than capitals["Japan"] because Redis will return nil rather than an error when a key is not found, which is analogous to Python’s None.

Redis also allows you to set and get multiple key-value pairs in one command, MSET and MGET, respectively:

127.0.0.1:6379> MSET Lebanon Beirut Norway Oslo France Paris OK 127.0.0.1:6379> MGET Lebanon Norway Bahamas 1) "Beirut" 2) "Oslo" 3) "Nassau" 

The closest thing in Python is with dict.update():

>>>

>>> capitals.update({ ...     "Lebanon": "Beirut", ...     "Norway": "Oslo", ...     "France": "Paris", ... }) >>> [capitals[k] for k in ("Lebanon", "Norway", "Bahamas")] ['Beirut', 'Oslo', 'Nassau'] 

As a third example, the EXISTS command does what it sounds like, which is to check if a key exists:

127.0.0.1:6379> EXISTS Norway (integer) 1 127.0.0.1:6379> EXISTS Sweden (integer) 0 

Python has the in keyword to test the same thing, which routes to dict.__contains__(key):

>>>

>>> "Norway" in capitals True >>> "Sweden" in capitals False 

These few examples are meant to show, using native Python, what’s happening at a high level with a few common Redis commands. There’s no client-server component here to the Python examples, and redis-py has not yet entered the picture. This is only meant to show Redis functionality by example.

Here’s a summary of the few Redis commands you’ve seen and their functional Python equivalents:

capitals["Bahamas"] = "Nassau" 

capitals.get("Croatia") 

capitals.update(     {         "Lebanon": "Beirut",         "Norway": "Oslo",         "France": "Paris",     } ) 

[capitals[k] for k in ("Lebanon", "Norway", "Bahamas")] 

"Norway" in capitals 

The Python Redis client library, redis-py, that you’ll dive into shortly in this article, does things differently. It encapsulates an actual TCP connection to a Redis server and sends raw commands, as bytes serialized using the REdis Serialization Protocol (RESP), to the server. It then takes the raw reply and parses it back into a Python object such as bytes, int, or even datetime.datetime.

Note: So far, you’ve been talking to the Redis server through the interactive redis-cli REPL. You can also issue commands directly, in the same way that you would pass the name of a script to the python executable, such as python myscript.py.

So far, you’ve seen a few of Redis’ fundamental data types, which is a mapping of string:string. While this key-value pair is common in most key-value stores, Redis offers a number of other possible value types, which you’ll see next.

More Data Types in Python vs Redis

Before you fire up the redis-py Python client, it also helps to have a basic grasp on a few more Redis data types. To be clear, all Redis keys are strings. It’s the value that can take on data types (or structures) in addition to the string values used in the examples so far.

A hash is a mapping of string:string, called field-value pairs, that sits under one top-level key:

127.0.0.1:6379> HSET realpython url "https://realpython.com/" (integer) 1 127.0.0.1:6379> HSET realpython github realpython (integer) 1 127.0.0.1:6379> HSET realpython fullname "Real Python" (integer) 1 

This sets three field-value pairs for one key, "realpython". If you’re used to Python’s terminology and objects, this can be confusing. A Redis hash is roughly analogous to a Python dict that is nested one level deep:

data = {     "realpython": {         "url": "https://realpython.com/",         "github": "realpython",         "fullname": "Real Python",     } } 

Redis’ fields are akin to the Python keys of each nested key-value pair in the inner dictionary above. Redis reserves the term key for the top-level database key that holds the hash structure itself.

Just like there’s MSET for basic string:string key-value pairs, there is also HMSET for hashes to set multiple pairs within the hash value object:

127.0.0.1:6379> HMSET pypa url "https://www.pypa.io/" github pypa fullname "Python Packaging Authority" OK 127.0.0.1:6379> HGETALL pypa 1) "url" 2) "https://www.pypa.io/" 3) "github" 4) "pypa" 5) "fullname" 6) "Python Packaging Authority" 

Using HMSET is probably a closer parallel for the way that we assigned data to a nested dictionary above, rather than setting each nested pair as is done with HSET.

Two additional value types are lists and sets, which can take the place of a hash or string as a Redis value. They are largely what they sound like, so I won’t take up your time with additional examples. Hashes, lists, and sets each have some commands that are particular to that given data type, which are in some cases denoted by their initial letter:

  • Hashes: Commands to operate on hashes begin with an H, such as HSET, HGET, or HMSET.

  • Sets: Commands to operate on sets begin with an S, such as SCARD, which gets the number of elements at the set value corresponding to a given key.

  • Lists: Commands to operate on lists begin with an L or R. Examples include LPOP and RPUSH. The L or R refers to which side of the list is operated on. A few list commands are also prefaced with a B, which means blocking. A blocking operation doesn’t let other operations interrupt it while it’s executing. For instance, BLPOP executes a blocking left-pop on a list structure.

Note: One noteworthy feature of Redis’ list type is that it is a linked list rather than an array. This means that appending is O(1) while indexing at an arbitrary index number is O(N).

Here is a quick listing of commands that are particular to the string, hash, list, and set data types in Redis:

Type Commands
Sets SADD, SCARD, SDIFF, SDIFFSTORE, SINTER, SINTERSTORE, SISMEMBER, SMEMBERS, SMOVE, SPOP, SRANDMEMBER, SREM, SSCAN, SUNION, SUNIONSTORE
Hashes HDEL, HEXISTS, HGET, HGETALL, HINCRBY, HINCRBYFLOAT, HKEYS, HLEN, HMGET, HMSET, HSCAN, HSET, HSETNX, HSTRLEN, HVALS
Lists BLPOP, BRPOP, BRPOPLPUSH, LINDEX, LINSERT, LLEN, LPOP, LPUSH, LPUSHX, LRANGE, LREM, LSET, LTRIM, RPOP, RPOPLPUSH, RPUSH, RPUSHX
Strings APPEND, BITCOUNT, BITFIELD, BITOP, BITPOS, DECR, DECRBY, GET, GETBIT, GETRANGE, GETSET, INCR, INCRBY, INCRBYFLOAT, MGET, MSET, MSETNX, PSETEX, SET, SETBIT, SETEX, SETNX, SETRANGE, STRLEN

This table isn’t a complete picture of Redis commands and types. There’s a smorgasbord of more advanced data types, such as geospatial items, sorted sets, and HyperLogLog. At the Redis commands page, you can filter by data-structure group. There is also the data types summary and introduction to Redis data types.

Since we’re going to be switching over to doing things in Python, you can now clear your toy database with FLUSHDB and quit the redis-cli REPL:

127.0.0.1:6379> FLUSHDB OK 127.0.0.1:6379> QUIT 

This will bring you back to your shell prompt. You can leave redis-server running in the background, since you’ll need it for the rest of the tutorial also.

Using redis-py: Redis in Python

Now that you’ve mastered some basics of Redis, it’s time to jump into redis-py, the Python client that lets you talk to Redis from a user-friendly Python API.

First Steps

redis-py is a well-established Python client library that lets you talk to a Redis server directly through Python calls:

$   python -m pip install redis 

Next, make sure that your Redis server is still up and running in the background. You can check with pgrep redis-server, and if you come up empty-handed, then restart a local server with redis-server /etc/redis/6379.conf.

Now, let’s get into the Python-centric part of things. Here’s the “hello world” of redis-py:

>>>

 1 >>> import redis  2 >>> r = redis.Redis()  3 >>> r.mset({"Croatia": "Zagreb", "Bahamas": "Nassau"})  4 True  5 >>> r.get("Bahamas")  6 b'Nassau' 

Redis, used in Line 2, is the central class of the package and the workhorse by which you execute (almost) any Redis command. The TCP socket connection and reuse is done for you behind the scenes, and you call Redis commands using methods on the class instance r.

Notice also that the type of the returned object, b'Nassau' in Line 6, is Python’s bytes type, not str. It is bytes rather than str that is the most common return type across redis-py, so you may need to call r.get("Bahamas").decode("utf-8") depending on what you want to actually do with the returned bytestring.

Does the code above look familiar? The methods in almost all cases match the name of the Redis command that does the same thing. Here, you called r.mset() and r.get(), which correspond to MSET and GET in the native Redis API.

This also means that HGETALL becomes r.hgetall(), PING becomes r.ping(), and so on. There are a few exceptions, but the rule holds for the large majority of commands.

While the Redis command arguments usually translate into a similar-looking method signature, they take Python objects. For example, the call to r.mset() in the example above uses a Python dict as its first argument, rather than a sequence of bytestrings.

We built the Redis instance r with no arguments, but it comes bundled with a number of parameters if you need them:

# From redis/client.py class Redis(object):     def __init__(self, host='localhost', port=6379,                  db=0, password=None, socket_timeout=None,                  # ... 

You can see that the default hostname:port pair is localhost:6379, which is exactly what we need in the case of our locally kept redis-server instance.

The db parameter is the database number. You can manage multiple databases in Redis at once, and each is identified by an integer. The max number of databases is 16 by default.

When you run just redis-cli from the command line, this starts you at database 0. Use the -n flag to start a new database, as in redis-cli -n 5.

Allowed Key Types

One thing that’s worth knowing is that redis-py requires that you pass it keys that are bytes, str, int, or float. (It will convert the last 3 of these types to bytes before sending them off to the server.)

Consider a case where you want to use calendar dates as keys:

>>>

>>> import datetime >>> today = datetime.date.today() >>> visitors = {"dan", "jon", "alex"} >>> r.sadd(today, *visitors) Traceback (most recent call last): # ... redis.exceptions.DataError: Invalid input of type: 'date'. Convert to a byte, string or number first. 

You’ll need to explicitly convert the Python date object to str, which you can do with .isoformat():

>>>

>>> stoday = today.isoformat()  # Python 3.7+, or use str(today) >>> stoday '2019-03-10' >>> r.sadd(stoday, *visitors)  # sadd: set-add 3 >>> r.smembers(stoday) {b'dan', b'alex', b'jon'} >>> r.scard(today.isoformat()) 3 

To recap, Redis itself only allows strings as keys. redis-py is a bit more liberal in what Python types it will accept, although it ultimately converts everything to bytes before sending them off to a Redis server.

Example: PyHats.com

It’s time to break out a fuller example. Let’s pretend we’ve decided to start a lucrative website, PyHats.com, that sells outrageously overpriced hats to anyone who will buy them, and hired you to build the site.

You’ll use Redis to handle some of the product catalog, inventorying, and bot traffic detection for PyHats.com.

It’s day one for the site, and we’re going to be selling three limited-edition hats. Each hat gets held in a Redis hash of field-value pairs, and the hash has a key that is a prefixed random integer , such as hat:56854717. Using the hat: prefix is Redis convention for creating a sort of namespace within a Redis database:

import random  random.seed(444) hats = {f"hat:{random.getrandbits(32)}": i for i in (     {         "color": "black",         "price": 49.99,         "style": "fitted",         "quantity": 1000,         "npurchased": 0,     },     {         "color": "maroon",         "price": 59.99,         "style": "hipster",         "quantity": 500,         "npurchased": 0,     },     {         "color": "green",         "price": 99.99,         "style": "baseball",         "quantity": 200,         "npurchased": 0,     }) } 

Let’s start with database 1 since we used database 0 in a previous example:

>>>

>>> r = redis.Redis(db=1) 

To do an initial write of this data into Redis, we can use .hmset() (hash multi-set), calling it for each dictionary. The “multi” is a reference to setting multiple field-value pairs, where “field” in this case corresponds to a key of any of the nested dictionaries in hats:

 1 >>> with r.pipeline() as pipe:  2 ...    for h_id, hat in hats:  3 ...        pipe.hmset(h_id, hat)  4 ...    pipe.execute()  5 Pipeline<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>  6 Pipeline<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>  7 Pipeline<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>  8 [True, True, True]  9  10 >>> r.bgsave() 11 True 

The code block above also introduces the concept of Redis pipelining, which is a way to cut down the number of round-trip transactions that you need to write or read data from your Redis server. If you would have just called r.hmset() three times, then this would necessitate a back-and-forth round trip operation for each row written.

With a pipeline, all the commands are buffered on the client side and then sent at once, in one fell swoop, using pipe.hmset() in Line 3. This is why the three True responses are all returned at once, when you call pipe.execute() in Line 4. You’ll see a more advanced use case for a pipeline shortly.

Note: The Redis docs provide an example of doing this same thing with the redis-cli, where you can pipe the contents of a local file to do mass insertion.

Let’s do a quick check that everything is there in our Redis database:

>>>

>>> pprint(r.hgetall("hat:56854717")) {b'color': b'green',  b'npurchased': b'0',  b'price': b'99.99',  b'quantity': b'200',  b'style': b'baseball'}  >>> r.keys()  # Careful on a big DB. keys() is O(N) [b'56854717', b'1236154736', b'1326692461'] 

The first thing that we want to simulate is what happens when a user clicks Purchase. If the item is in stock, increase its npurchased by 1 and decrease its quantity (inventory) by 1. You can use .hincrby() to do this:

>>>

>>> r.hincrby("hat:56854717", "quantity", -1) 199 >>> r.hget("hat:56854717", "quantity") b'199' >>> r.hincrby("hat:56854717", "npurchased", 1) 1 

Note: HINCRBY still operates on a hash value that is a string, but it tries to interpret the string as a base-10 64-bit signed integer to execute the operation.

This applies to other commands related to incrementing and decrementing for other data structures, namely INCR, INCRBY, INCRBYFLOAT, ZINCRBY, and HINCRBYFLOAT. You’ll get an error if the string at the value can’t be represented as an integer.

It isn’t really that simple, though. Changing the quantity and npurchased in two lines of code hides the reality that a click, purchase, and payment entails more than this. We need to do a few more checks to make sure we don’t leave someone with a lighter wallet and no hat:

  • Step 1: Check if the item is in stock, or otherwise raise an exception on the backend.
  • Step 2: If it is in stock, then execute the transaction, decrease the quantity field, and increase the npurchased field.
  • Step 3: Be alert for any changes that alter the inventory in between the first two steps (a race condition).

Step 1 is relatively straightforward: it consists of an .hget() to check the available quantity.

Step 2 is a little bit more involved. The pair of increase and decrease operations need to be executed atomically: either both should be completed successfully, or neither should be (in the case that at least one fails).

With client-server frameworks, it’s always crucial to pay attention to atomicity and look out for what could go wrong in instances where multiple clients are trying to talk to the server at once. The answer to this in Redis is to use a transaction block, meaning that either both or neither of the commands get through.

In redis-py, Pipeline is a transactional pipeline class by default. This means that, even though the class is actually named for something else (pipelining), it can be used to create a transaction block also.

In Redis, a transaction starts with MULTI and ends with EXEC:

 1 127.0.0.1:6379> MULTI  2 127.0.0.1:6379> HINCRBY 56854717 quantity -1  3 127.0.0.1:6379> HINCRBY 56854717 npurchased 1  4 127.0.0.1:6379> EXEC 

MULTI (Line 1) marks the start of the transaction, and EXEC (Line 4) marks the end. Everything in between is executed as one all-or-nothing buffered sequence of commands. This means that it will be impossible to decrement quantity (Line 2) but then have the balancing npurchased increment operation fail (Line 3).

Let’s circle back to Step 3: we need to be aware of any changes that alter the inventory in between the first two steps.

Step 3 is the trickiest. Let’s say that there is one lone hat remaining in our inventory. In between the time that User A checks the quantity of hats remaining and actually processes their transaction, User B also checks the inventory and finds likewise that there is one hat listed in stock. Both users will be allowed to purchase the hat, but we have 1 hat to sell, not 2, so we’re on the hook and one user is out of their money. Not good.

Redis has a clever answer for the dilemma in Step 3: it’s called optimistic locking, and is different than how typical locking works in an RDBMS such as PostgreSQL. Optimistic locking, in a nutshell, means that the calling function (client) does not acquire a lock, but rather monitors for changes in the data it is writing to during the time it would have held a lock. If there’s a conflict during that time, the calling function simply tries the whole process again.

You can effect optimistic locking by using the WATCH command (.watch() in redis-py), which provides a check-and-set behavior.

Let’s introduce a big chunk of code and walk through it afterwards step by step. You can picture buyitem() as being called any time a user clicks on a Buy Now or Purchase button. Its purpose is to confirm the item is in stock and take an action based on that result, all in a safe manner that looks out for race conditions and retries if one is detected:

 1 import logging  2 import redis  3   4 logging.basicConfig()  5   6 class OutOfStockError(Exception):  7     """Raised when PyHats.com is all out of today's hottest hat"""  8   9 def buyitem(r: redis.Redis, itemid: int) -> None: 10     with r.pipeline() as pipe: 11         error_count = 0 12         while True: 13             try: 14                 # Get available inventory, watching for changes 15                 # related to this itemid before the transaction 16                 pipe.watch(itemid) 17                 nleft: bytes = r.hget(itemid, "quantity") 18                 if nleft > b"0": 19                     pipe.multi() 20                     pipe.hincrby(itemid, "quantity", -1) 21                     pipe.hincrby(itemid, "npurchased", 1) 22                     pipe.execute() 23                     break 24                 else: 25                     # Stop watching the itemid and raise to break out 26                     pipe.unwatch() 27                     raise OutOfStockError( 28                         f"Sorry, {itemid} is out of stock!" 29                     ) 30             except redis.WatchError: 31                 # Log total num. of errors by this user to buy this item, 32                 # then try the same process again of WATCH/HGET/MULTI/EXEC 33                 error_count += 1 34                 logging.warning( 35                     "WatchError #%d: %s; retrying", 36                     error_count, itemid 37                 ) 38     return None 

The critical line occurs at Line 16 with pipe.watch(itemid), which tells Redis to monitor the given itemid for any changes to its value. The program checks the inventory through the call to r.hget(itemid, "quantity"), in Line 17:

16 pipe.watch(itemid) 17 nleft: bytes = r.hget(itemid, "quantity") 18 if nleft > b"0": 19     # Item in stock. Proceed with transaction. 

If the inventory gets touched during this short window between when the user checks the item stock and tries to purchase it, then Redis will return an error, and redis-py will raise a WatchError (Line 30). That is, if any of the hash pointed to by itemid changes after the .hget() call but before the subsequent .hincrby() calls in Lines 20 and 21, then we’ll re-run the whole process in another iteration of the while True loop as a result.

This is the “optimistic” part of the locking: rather than letting the client have a time-consuming total lock on the database through the getting and setting operations, we leave it up to Redis to notify the client and user only in the case that calls for a retry of the inventory check.

One key here is in understanding the difference between client-side and server-side operations:

nleft = r.hget(itemid, "quantity") 

This Python assignment brings the result of r.hget() client-side. Conversely, methods that you call on pipe effectively buffer all of the commands into one, and then send them to the server in a single request:

16 pipe.multi() 17 pipe.hincrby(itemid, "quantity", -1) 18 pipe.hincrby(itemid, "npurchased", 1) 19 pipe.execute() 

No data comes back to the client side in the middle of the transactional pipeline. You need to call .execute() (Line 19) to get the sequence of results back all at once.

Even though this block contains two commands, it consists of exactly one round-trip operation from client to server and back.

This means that the client can’t immediately use the result of pipe.hincrby(itemid, "quantity", -1), from Line 20, because methods on a Pipeline return just the pipe instance itself. We haven’t asked anything from the server at this point. While normally .hincrby() returns the resulting value, you can’t immediately reference it on the client side until the entire transaction is completed.

There’s a catch-22: this is also why you can’t put the call to .hget() into the transaction block. If you did this, then you’d be unable to know if you want to increment the npurchased field yet, since you can’t get real-time results from commands that are inserted into a transactional pipeline.

Finally, if the inventory sits at zero, then we UNWATCH the item ID and raise an OutOfStockError (Line 27), ultimately displaying that coveted Sold Out page that will make our hat buyers desperately want to buy even more of our hats at ever more outlandish prices:

24 else: 25     # Stop watching the itemid and raise to break out 26     pipe.unwatch() 27     raise OutOfStockError( 28         f"Sorry, {itemid} is out of stock!" 29     ) 

Here’s an illustration. Keep in mind that our starting quantity is 199 for hat 56854717 since we called .hincrby() above. Let’s mimic 3 purchases, which should modify the quantity and npurchased fields:

>>>

>>> buyitem(r, "hat:56854717") >>> buyitem(r, "hat:56854717") >>> buyitem(r, "hat:56854717") >>> r.hmget("hat:56854717", "quantity", "npurchased")  # Hash multi-get [b'196', b'4'] 

Now, we can fast-forward through more purchases, mimicking a stream of purchases until the stock depletes to zero. Again, picture these coming from a whole bunch of different clients rather than just one Redis instance:

>>>

>>> # Buy remaining 196 hats for item 56854717 and deplete stock to 0 >>> for _ in range(196): ...     buyitem(r, "hat:56854717") >>> r.hmget("hat:56854717", "quantity", "npurchased") [b'0', b'200'] 

Now, when some poor user is late to the game, they should be met with an OutOfStockError that tells our application to render an error message page on the frontend:

>>>

>>> buyitem(r, "hat:56854717") Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "<stdin>", line 20, in buyitem __main__.OutOfStockError: Sorry, hat:56854717 is out of stock! 

Looks like it’s time to restock.

Using Key Expiry

Let’s introduce key expiry, which is another distinguishing feature in Redis. When you expire a key, that key and its corresponding value will be automatically deleted from the database after a certain number of seconds or at a certain timestamp.

In redis-py, one way that you can accomplish this is through .setex(), which lets you set a basic string:string key-value pair with an expiration:

>>>

 1 >>> from datetime import timedelta  2   3 >>> # setex: "SET" with expiration  4 >>> r.setex(  5 ...     "runner",  6 ...     timedelta(minutes=1),  7 ...     value="now you see me, now you don't"  8 ... )  9 True 

You can specify the second argument as a number in seconds or a timedelta object, as in Line 6 above. I like the latter because it seems less ambiguous and more deliberate.

There are also methods (and corresponding Redis commands, of course) to get the remaining lifetime (time-to-live) of a key that you’ve set to expire:

>>>

>>> r.ttl("runner")  # "Time To Live", in seconds 58 >>> r.pttl("runner")  # Like ttl, but milliseconds 54368 

Below, you can accelerate the window until expiration, and then watch the key expire, after which r.get() will return None and .exists() will return 0:

>>>

>>> r.get("runner")  # Not expired yet b"now you see me, now you don't"  >>> r.expire("runner", timedelta(seconds=3))  # Set new expire window True >>> # Pause for a few seconds >>> r.get("runner") >>> r.exists("runner")  # Key & value are both gone (expired) 0 

The table below summarizes commands related to key-value expiration, including the ones covered above. The explanations are taken directly from redis-py method docstrings:

Signature Purpose
r.setex(name, time, value) Sets the value of key name to value that expires in time seconds, where time can be represented by an int or a Python timedelta object
r.psetex(name, time_ms, value) Sets the value of key name to value that expires in time_ms milliseconds, where time_ms can be represented by an int or a Python timedelta object
r.expire(name, time) Sets an expire flag on key name for time seconds, where time can be represented by an int or a Python timedelta object
r.expireat(name, when) Sets an expire flag on key name, where when can be represented as an int indicating Unix time or a Python datetime object
r.persist(name) Removes an expiration on name
r.pexpire(name, time) Sets an expire flag on key name for time milliseconds, and time can be represented by an int or a Python timedelta object
r.pexpireat(name, when) Sets an expire flag on key name, where when can be represented as an int representing Unix time in milliseconds (Unix time * 1000) or a Python datetime object
r.pttl(name) Returns the number of milliseconds until the key name will expire
r.ttl(name) Returns the number of seconds until the key name will expire

PyHats.com, Part 2

A few days after its debut, PyHats.com has attracted so much hype that some enterprising users are creating bots to buy hundreds of items within seconds, which you’ve decided isn’t good for the long-term health of your hat business.

Now that you’ve seen how to expire keys, let’s put it to use on the backend of PyHats.com.

We’re going to create a new Redis client that acts as a consumer (or watcher) and processes a stream of incoming IP addresses, which in turn may come from multiple HTTPS connections to the website’s server.

The watcher’s goal is to monitor a stream of IP addresses from multiple sources, keeping an eye out for a flood of requests from a single address within a suspiciously short amount of time.

Some middleware on the website server pushes all incoming IP addresses into a Redis list with .lpush(). Here’s a crude way of mimicking some incoming IPs, using a fresh Redis database:

>>>

>>> r = redis.Redis(db=5) >>> r.lpush("ips", "51.218.112.236") 1 >>> r.lpush("ips", "90.213.45.98") 2 >>> r.lpush("ips", "115.215.230.176") 3 >>> r.lpush("ips", "51.218.112.236") 4 

As you can see, .lpush() returns the length of the list after the push operation succeeds. Each call of .lpush() puts the IP at the beginning of the Redis list that is keyed by the string "ips".

In this simplified simulation, the requests are all technically from the same client, but you can think of them as potentially coming from many different clients and all being pushed to the same database on the same Redis server.

Now, open up a new shell tab or window and launch a new Python REPL. In this shell, you’ll create a new client that serves a very different purpose than the rest, which sits in an endless while True loop and does a blocking left-pop BLPOP call on the ips list, processing each address:

 1 # New shell window or tab  2   3 import datetime  4 import ipaddress  5   6 import redis  7   8 # Where we put all the bad egg IP addresses  9 blacklist = set() 10 MAXVISITS = 15 11  12 ipwatcher = redis.Redis(db=5) 13  14 while True: 15     _, addr = ipwatcher.blpop("ips") 16     addr = ipaddress.ip_address(addr.decode("utf-8")) 17     now = datetime.datetime.utcnow() 18     addrts = f"{addr}:{now.minute}" 19     n = ipwatcher.incrby(addrts, 1) 20     if n >= MAXVISITS: 21         print(f"Hat bot detected!:  {addr}") 22         blacklist.add(addr) 23     else: 24         print(f"{now}:  saw {addr}") 25     _ = ipwatcher.expire(addrts, 60) 

Let’s walk through a few important concepts.

The ipwatcher acts like a consumer, sitting around and waiting for new IPs to be pushed on the "ips" Redis list. It receives them as bytes, such as b”51.218.112.236”, and makes them into a more proper address object with the ipaddress module:

15 _, addr = ipwatcher.blpop("ips") 16 addr = ipaddress.ip_address(addr.decode("utf-8")) 

Then you form a Redis string key using the address and minute of the hour at which the ipwatcher saw the address, incrementing the corresponding count by 1 and getting the new count in the process:

17 now = datetime.datetime.utcnow() 18 addrts = f"{addr}:{now.minute}" 19 n = ipwatcher.incrby(addrts, 1) 

If the address has been seen more than MAXVISITS, then it looks as if we have a PyHats.com web scraper on our hands trying to create the next tulip bubble. Alas, we have no choice but to give this user back something like a dreaded 403 status code.

We use ipwatcher.expire(addrts, 60) to expire the (address minute) combination 60 seconds from when it was last seen. This is to prevent our database from becoming clogged up with stale one-time page viewers.

If you execute this code block in a new shell, you should immediately see this output:

2019-03-11 15:10:41.489214:  saw 51.218.112.236 2019-03-11 15:10:41.490298:  saw 115.215.230.176 2019-03-11 15:10:41.490839:  saw 90.213.45.98 2019-03-11 15:10:41.491387:  saw 51.218.112.236 

The output appears right away because those four IPs were sitting in the queue-like list keyed by "ips", waiting to be pulled out by our ipwatcher. Using .blpop() (or the BLPOP command) will block until an item is available in the list, then pops it off. It behaves like Python’s Queue.get(), which also blocks until an item is available.

Besides just spitting out IP addresses, our ipwatcher has a second job. For a given minute of an hour (minute 1 through minute 60), ipwatcher will classify an IP address as a hat-bot if it sends 15 or more GET requests in that minute.

Switch back to your first shell and mimic a page scraper that blasts the site with 20 requests in a few milliseconds:

for _ in range(20):     r.lpush("ips", "104.174.118.18") 

Finally, toggle back to the second shell holding ipwatcher, and you should see an output like this:

2019-03-11 15:15:43.041363:  saw 104.174.118.18 2019-03-11 15:15:43.042027:  saw 104.174.118.18 2019-03-11 15:15:43.042598:  saw 104.174.118.18 2019-03-11 15:15:43.043143:  saw 104.174.118.18 2019-03-11 15:15:43.043725:  saw 104.174.118.18 2019-03-11 15:15:43.044244:  saw 104.174.118.18 2019-03-11 15:15:43.044760:  saw 104.174.118.18 2019-03-11 15:15:43.045288:  saw 104.174.118.18 2019-03-11 15:15:43.045806:  saw 104.174.118.18 2019-03-11 15:15:43.046318:  saw 104.174.118.18 2019-03-11 15:15:43.046829:  saw 104.174.118.18 2019-03-11 15:15:43.047392:  saw 104.174.118.18 2019-03-11 15:15:43.047966:  saw 104.174.118.18 2019-03-11 15:15:43.048479:  saw 104.174.118.18 Hat bot detected!:  104.174.118.18 Hat bot detected!:  104.174.118.18 Hat bot detected!:  104.174.118.18 Hat bot detected!:  104.174.118.18 Hat bot detected!:  104.174.118.18 Hat bot detected!:  104.174.118.18 

Now, Ctrl+C out of the while True loop and you’ll see that the offending IP has been added to your blacklist:

>>>

>>> blacklist {IPv4Address('104.174.118.18')} 

Can you find the defect in this detection system? The filter checks the minute as .minute rather than the last 60 seconds (a rolling minute). Implementing a rolling check to monitor how many times a user has been seen in the last 60 seconds would be trickier. There’s a crafty solution using using Redis’ sorted sets at ClassDojo. Josiah Carlson’s Redis in Action also presents a more elaborate and general-purpose example of this section using an IP-to-location cache table.

Persistence and Snapshotting

One of the reasons that Redis is so fast in both read and write operations is that the database is held in memory (RAM) on the server. However, a Redis database can also be stored (persisted) to disk in a process called snapshotting. The point behind this is to keep a physical backup in binary format so that data can be reconstructed and put back into memory when needed, such as at server startup.

You already enabled snapshotting without knowing it when you set up basic configuration at the beginning of this tutorial with the save option:

# /etc/redis/6379.conf  port              6379 daemonize         yes save              60 1 bind              127.0.0.1 tcp-keepalive     300 dbfilename        dump.rdb dir               ./ rdbcompression    yes 

The format is save <seconds> <changes>. This tells Redis to save the database to disk if both the given number of seconds and number of write operations against the database occurred. In this case, we’re telling Redis to save the database to disk every 60 seconds if at least one modifying write operation occurred in that 60-second timespan. This is a fairly aggressive setting versus the sample Redis config file, which uses these three save directives:

# Default redis/redis.conf save 900 1 save 300 10 save 60 10000 

An RDB snapshot is a full (rather than incremental) point-in-time capture of the database. (RDB refers to a Redis Database File.) We also specified the directory and file name of the resulting data file that gets written:

# /etc/redis/6379.conf  port              6379 daemonize         yes save              60 1 bind              127.0.0.1 tcp-keepalive     300 dbfilename        dump.rdb dir               ./ rdbcompression    yes 

This instructs Redis to save to a binary data file called dump.rdb in the current working directory of wherever redis-server was executed from:

$   file -b dump.rdb data 

You can also manually invoke a save with the Redis command BGSAVE:

127.0.0.1:6379> BGSAVE Background saving started 

The “BG” in BGSAVE indicates that the save occurs in the background. This option is available in a redis-py method as well:

>>>

>>> r.lastsave()  # Redis command: LASTSAVE datetime.datetime(2019, 3, 10, 21, 56, 50) >>> r.bgsave() True >>> r.lastsave() datetime.datetime(2019, 3, 10, 22, 4, 2) 

This example introduces another new command and method, .lastsave(). In Redis, it returns the Unix timestamp of the last DB save, which Python gives back to you as a datetime object. Above, you can see that the r.lastsave() result changes as a result of r.bgsave().

r.lastsave() will also change if you enable automatic snapshotting with the save configuration option.

To rephrase all of this, there are two ways to enable snapshotting:

  1. Explicitly, through the Redis command BGSAVE or redis-py method .bgsave()
  2. Implicitly, through the save configuration option (which you can also set with .config_set() in redis-py)

RDB snapshotting is fast because the parent process uses the fork() system call to pass off the time-intensive write to disk to a child process, so that the parent process can continue on its way. This is what the background in BGSAVE refers to.

There’s also SAVE (.save() in redis-py), but this does a synchronous (blocking) save rather than using fork(), so you shouldn’t use it without a specific reason.

Even though .bgsave() occurs in the background, it’s not without its costs. The time for fork() itself to occur can actually be substantial if the Redis database is large enough in the first place.

If this is a concern, or if you can’t afford to miss even a tiny slice of data lost due to the periodic nature of RDB snapshotting, then you should look into the append-only file (AOF) strategy that is an alternative to snapshotting. AOF copies Redis commands to disk in real time, allowing you to do a literal command-based reconstruction by replaying these commands.

Serialization Workarounds

Let’s get back to talking about Redis data structures. With its hash data structure, Redis in effect supports nesting one level deep:

127.0.0.1:6379> hset mykey field1 value1 

The Python client equivalent would look like this:

r.hset("mykey", "field1", "value1") 

Here, you can think of "field1": "value1" as being the key-value pair of a Python dict, {"field1": "value1"}, while mykey is the top-level key:

Redis Command Pure-Python Equivalent
r.set("key", "value") r = {"key": "value"}
r.hset("key", "field", "value") r = {"key": {"field": "value"}}

But what if you want the value of this dictionary (the Redis hash) to contain something other than a string, such as a list or nested dictionary with strings as values?

Here’s an example using some JSON-like data to make the distinction clearer:

restaurant_484272 = {     "name": "Ravagh",     "type": "Persian",     "address": {         "street": {             "line1": "11 E 30th St",             "line2": "APT 1",         },         "city": "New York",         "state": "NY",         "zip": 10016,     } } 

Say that we want to set a Redis hash with the key 484272 and field-value pairs corresponding to the key-value pairs from restaurant_484272. Redis does not support this directly, because restaurant_484272 is nested:

>>>

>>> r.hmset(484272, restaurant_484272) Traceback (most recent call last): # ... redis.exceptions.DataError: Invalid input of type: 'dict'. Convert to a byte, string or number first. 

You can in fact make this work with Redis. There are two different ways to mimic nested data in redis-py and Redis:

  1. Serialize the values into a string with something like json.dumps()
  2. Use a delimiter in the key strings to mimic nesting in the values

Let’s take a look at an example of each.

Option 1: Serialize the Values Into a String

You can use json.dumps() to serialize the dict into a JSON-formatted string:

>>>

>>> import json >>> r.set(484272, json.dumps(restaurant_484272)) True 

If you call .get(), the value you get back will be a bytes object, so don’t forget to deserialize it to get back the original object. json.dumps() and json.loads() are inverses of each other, for serializing and deserializing data, respectively:

>>>

>>> from pprint import pprint >>> pprint(json.loads(r.get(484272))) {'address': {'city': 'New York',              'state': 'NY',              'street': '11 E 30th St',              'zip': 10016},  'name': 'Ravagh',  'type': 'Persian'} 

This applies to any serialization protocol, with another common choice being yaml:

>>>

>>> import yaml  # python -m pip install PyYAML >>> yaml.dump(restaurant_484272) 'address: {city: New York, state: NY, street: 11 E 30th St, zip: 10016}\nname: Ravagh\ntype: Persian\n' 

No matter what serialization protocol you choose to go with, the concept is the same: you’re taking an object that is unique to Python and converting it to a bytestring that is recognized and exchangeable across multiple languages.

Option 2: Use a Delimiter in Key Strings

There’s a section option that involves mimicking “nestedness” by concatenating multiple levels of keys in a Python dict. This consists of flattening the nested dictionary through recursion, so that each key is a concatenated string of keys, and the values are the deepest-nested values from the original dictionary. Consider our dictionary object restaurant_484272:

restaurant_484272 = {     "name": "Ravagh",     "type": "Persian",     "address": {         "street": {             "line1": "11 E 30th St",             "line2": "APT 1",         },         "city": "New York",         "state": "NY",         "zip": 10016,     } } 

We want to get it into this form:

{     "484272:name":                     "Ravagh",     "484272:type":                     "Persian",     "484272:address:street:line1":     "11 E 30th St",     "484272:address:street:line2":     "APT 1",     "484272:address:city":             "New York",     "484272:address:state":            "NY",     "484272:address:zip":              "10016", } 

That’s what setflat_skeys() below does, with the added feature that it does inplace .set() operations on the Redis instance itself rather than returning a copy of the input dictionary:

 1 from collections.abc import MutableMapping  2   3 def setflat_skeys(  4     r: redis.Redis,  5     obj: dict,  6     prefix: str,  7     delim: str = ":",  8     *,  9     _autopfix="" 10 ) -> None: 11     """Flatten `obj` and set resulting field-value pairs into `r`. 12  13     Calls `.set()` to write to Redis instance inplace and returns None. 14  15     `prefix` is an optional str that prefixes all keys. 16     `delim` is the delimiter that separates the joined, flattened keys. 17     `_autopfix` is used in recursive calls to created de-nested keys. 18  19     The deepest-nested keys must be str, bytes, float, or int. 20     Otherwise a TypeError is raised. 21     """ 22     allowed_vtypes = (str, bytes, float, int) 23     for key, value in obj.items(): 24         key = _autopfix + key 25         if isinstance(value, allowed_vtypes): 26             r.set(f"{prefix}{delim}{key}", value) 27         elif isinstance(value, MutableMapping): 28             setflat_skeys( 29                 r, value, prefix, delim, _autopfix=f"{key}{delim}" 30             ) 31         else: 32             raise TypeError(f"Unsupported value type: {type(value)}") 

The function iterates over the key-value pairs of obj, first checking the type of the value (Line 25) to see if it looks like it should stop recursing further and set that key-value pair. Otherwise, if the value looks like a dict (Line 27), then it recurses into that mapping, adding the previously seen keys as a key prefix (Line 28).

Let’s see it at work:

>>>

>>> r.flushdb()  # Flush database: clear old entries >>> setflat_skeys(r, restaurant_484272, 484272)  >>> for key in sorted(r.keys("484272*")):  # Filter to this pattern ...     print(f"{repr(key):35}{repr(r.get(key)):15}") ... b'484272:address:city'             b'New York' b'484272:address:state'            b'NY' b'484272:address:street:line1'     b'11 E 30th St' b'484272:address:street:line2'     b'APT 1' b'484272:address:zip'              b'10016' b'484272:name'                     b'Ravagh' b'484272:type'                     b'Persian'  >>> r.get("484272:address:street:line1") b'11 E 30th St' 

The final loop above uses r.keys("484272*"), where "484272*" is interpreted as a pattern and matches all keys in the database that begin with "484272".

Notice also how setflat_skeys() calls just .set() rather than .hset(), because we’re working with plain string:string field-value pairs, and the 484272 ID key is prepended to each field string.

Encryption

Another trick to help you sleep well at night is to add symmetric encryption before sending anything to a Redis server. Consider this as an add-on to the security that you should make sure is in place by setting proper values in your Redis configuration. The example below uses the cryptography package:

$   python -m pip install cryptography 

To illustrate, pretend that you have some sensitive cardholder data (CD) that you never want to have sitting around in plaintext on any server, no matter what. Before caching it in Redis, you can serialize the data and then encrypt the serialized string using Fernet:

>>>

>>> import json >>> from cryptography.fernet import Fernet  >>> cipher = Fernet(Fernet.generate_key()) >>> info = { ...     "cardnum": 2211849528391929, ...     "exp": [2020, 9], ...     "cv2": 842, ... }  >>> r.set( ...     "user:1000", ...     cipher.encrypt(json.dumps(info).encode("utf-8")) ... )  >>> r.get("user:1000") b'gAAAAABcg8-LfQw9TeFZ1eXbi'  # ... [truncated]  >>> cipher.decrypt(r.get("user:1000")) b'{"cardnum": 2211849528391929, "exp": [2020, 9], "cv2": 842}'  >>> json.loads(cipher.decrypt(r.get("user:1000"))) {'cardnum': 2211849528391929, 'exp': [2020, 9], 'cv2': 842} 

Because info contains a value that is a list, you’ll need to serialize this into a string that’s acceptable by Redis. (You could use json, yaml, or any other serialization for this.) Next, you encrypt and decrypt that string using the cipher object. You need to deserialize the decrypted bytes using json.loads() so that you can get the result back into the type of your initial input, a dict.

Note: Fernet uses AES 128 encryption in CBC mode. See the cryptography docs for an example of using AES 256. Whatever you choose to do, use cryptography, not pycrypto (imported as Crypto), which is no longer actively maintained.

If security is paramount, encrypting strings before they make their way across a network connection is never a bad idea.

Compression

One last quick optimization is compression. If bandwidth is a concern or you’re cost-conscious, you can implement a lossless compression and decompression scheme when you send and receive data from Redis. Here’s an example using the bzip2 compression algorithm, which in this extreme case cuts down on the number of bytes sent across the connection by a factor of over 2,000:

>>>

 1 >>> import bz2  2   3 >>> blob = "i have a lot to talk about" * 10000  4 >>> len(blob.encode("utf-8"))  5 260000  6   7 >>> # Set the compressed string as value  8 >>> r.set("msg:500", bz2.compress(blob.encode("utf-8")))  9 >>> r.get("msg:500") 10 b'BZh91AY&SY\xdaM\x1eu\x01\x11o\x91\x80@\x002l\x87\'  # ... [truncated] 11 >>> len(r.get("msg:500")) 12 122 13 >>> 260_000 / 122  # Magnitude of savings 14 2131.1475409836066 15  16 >>> # Get and decompress the value, then confirm it's equal to the original 17 >>> rblob = bz2.decompress(r.get("msg:500")).decode("utf-8") 18 >>> rblob == blob 19 True 

The way that serialization, encryption, and compression are related here is that they all occur client-side. You do some operation on the original object on the client-side that ends up making more efficient use of Redis once you send the string over to the server. The inverse operation then happens again on the client side when you request whatever it was that you sent to the server in the first place.

Using Hiredis

It’s common for a client library such as redis-py to follow a protocol in how it is built. In this case, redis-py implements the REdis Serialization Protocol, or RESP.

Part of fulfilling this protocol consists of converting some Python object in a raw bytestring, sending it to the Redis server, and parsing the response back into an intelligible Python object.

For example, the string response “OK” would come back as "+OK\r\n", while the integer response 1000 would come back as ":1000\r\n". This can get more complex with other data types such as RESP arrays.

A parser is a tool in the request-response cycle that interprets this raw response and crafts it into something recognizable to the client. redis-py ships with its own parser class, PythonParser, which does the parsing in pure Python. (See .read_response() if you’re curious.)

However, there’s also a C library, Hiredis, that contains a fast parser that can offer significant speedups for some Redis commands such as LRANGE. You can think of Hiredis as an optional accelerator that it doesn’t hurt to have around in niche cases.

All that you have to do to enable redis-py to use the Hiredis parser is to install its Python bindings in the same environment as redis-py:

$   python -m pip install hiredis 

What you’re actually installing here is hiredis-py, which is a Python wrapper for a portion of the hiredis C library.

The nice thing is that you don’t really need to call hiredis yourself. Just pip install it, and this will let redis-py see that it’s available and use its HiredisParser instead of PythonParser.

Internally, redis-py will attempt to import hiredis, and use a HiredisParser class to match it, but will fall back to its PythonParser instead, which may be slower in some cases:

# redis/utils.py try:     import hiredis     HIREDIS_AVAILABLE = True except ImportError:     HIREDIS_AVAILABLE = False   # redis/connection.py if HIREDIS_AVAILABLE:     DefaultParser = HiredisParser else:     DefaultParser = PythonParser 

Using Enterprise Redis Applications

While Redis itself is open-source and free, several managed services have sprung up that offer a data store with Redis as the core and some additional features built on top of the open-source Redis server:

The designs of the two have some commonalities. You typically specify a custom name for your cache, which is embedded as part of a DNS name, such as demo.abcdef.xz.0009.use1.cache.amazonaws.com (AWS) or demo.redis.cache.windows.net (Azure).

Once you’re set up, here are a few quick tips on how to connect.

From the command line, it’s largely the same as in our earlier examples, but you’ll need to specify a host with the h flag rather than using the default localhost. For Amazon AWS, execute the following from your instance shell:

$   export REDIS_ENDPOINT="demo.abcdef.xz.0009.use1.cache.amazonaws.com" $   redis-cli -h $  REDIS_ENDPOINT 

For Microsoft Azure, you can use a similar call. Azure Cache for Redis uses SSL (port 6380) by default rather than port 6379, allowing for encrypted communication to and from Redis, which can’t be said of TCP. All that you’ll need to supply in addition is a non-default port and access key:

$   export REDIS_ENDPOINT="demo.redis.cache.windows.net" $   redis-cli -h $  REDIS_ENDPOINT -p 6380 -a <primary-access-key> 

The -h flag specifies a host, which as you’ve seen is 127.0.0.1 (localhost) by default.

When you’re using redis-py in Python, it’s always a good idea to keep sensitive variables out of Python scripts themselves, and to be careful about what read and write permissions you afford those files. The Python version would look like this:

>>>

>>> import os >>> import redis  >>> # Specify a DNS endpoint instead of the default localhost >>> os.environ["REDIS_ENDPOINT"] 'demo.abcdef.xz.0009.use1.cache.amazonaws.com' >>> r = redis.Redis(host=os.environ["REDIS_ENDPOINT"]) 

That’s all there is to it. Besides specifying a different host, you can now call command-related methods such as r.get() as normal.

Note: If you want to use solely the combination of redis-py and an AWS or Azure Redis instance, then you don’t really need to install and make Redis itself locally on your machine, since you don’t need either redis-cli or redis-server.

If you’re deploying a medium- to large-scale production application where Redis plays a key role, then going with AWS or Azure’s service solutions can be a scalable, cost-effective, and security-conscious way to operate.

Wrapping Up

That concludes our whirlwind tour of accessing Redis through Python, including installing and using the Redis REPL connected to a Redis server and using redis-py in real-life examples. Here’s some of what you learned:

  • redis-py lets you do (almost) everything that you can do with the Redis CLI through an intuitive Python API.
  • Mastering topics such as persistence, serialization, encryption, and compression lets you use Redis to its full potential.
  • Redis transactions and pipelines are essential parts of the library in more complex situations.
  • Enterprise-level Redis services can help you smoothly use Redis in production.

Redis has an extensive set of features, some of which we didn’t really get to cover here, including server-side Luda scripting, sharding, and master-slave replication. If you think that Redis is up your alley, then make sure to follow developments as it implements an updated protocol, RESP3.

Further Reading

Here are some resources that you can check out to learn more.

Books:

Redis in use:

Other:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Planet Python

Real Python: Python Community Interview With Katrina Durance

With PyCon US 2019 over, I decided to catch up with a PyCon first-timer, Katrina Durance. I was curious to see how she found the experience and what her highlights were. I also wanted to understand how attending a conference like PyCon influenced her programming chops.

Ricky: Let’s start with the same questions I ask all my guests. How’d you get into programming? When did you start using Python?

Katrina Durance

Katrina: Python was my first programming course in grad school back in 2013. I also learned R and SQL. The two jobs I’ve been in since I graduated have been completely SQL-focused, so Python and R fell by the wayside.

Since I work at an arts college in Chicago (Columbia College Chicago) I’m able to take courses for free. We have a gaming program so all our programming courses are game related. Since I’m a gamer and a very visual learner, I decided to take a C# course where we worked on building small games. I liked it.

I really struggled a lot with Python when I first learned it, but working with a programming language in a visual context started to clarify a lot of concepts for me. I knew after that course that I wanted to build my skills to get into full-time programming. I eventually came up with a study plan and decided to return to my Python roots at the beginning of this year.

Ricky: This year you attended PyCon US for the first time. I’m curious why this year was your first. What changed or made you want to go this time around?

Katrina: What really clinched it was knowing that there was going to be a group from PythonistaCafe, and a big contingent from the Chicago Python User Group meetup there. So I knew I would see some friendly faces. I didn’t feel I was doing it all alone.

Ricky: Everyone’s PyCon experience is different. I’m wondering if you had a standout moment this year? Is there one thing that you’ll remember synonymously with your first PyCon experience?

Katrina: The mentored sprints were a big deal for me. I ended up working on an issue on a tool (Hypothesis) that I didn’t understand because I haven’t learned much about testing at all yet.

I was feeling frustrated and concerned that everything would go over my head. But our mentor was amazing and so encouraging and showed me a bunch of things I was learning and kept me going to the end. Now I have a closed issue on the software with my name on it, which is pretty cool.

Ricky: We, of course, met face to face several times over the weekend. But most notably at the PythonistaCafe open space. How was your open space experience? Did you learn anything new, or were there any actionable takeaways?

Katrina: I was excited about the PythonistaCafe open space because of the chance to meet some of the folks I’ve interacted with or just seen on the forum, and I wasn’t disappointed.

I hosted an open space for self-taught programmers like me. I was stunned when 30-ish people showed up. I did my best to manage it and got some positive feedback and helpful advice.

The PyCon Africa meetup was very enlightening because I learned that the reason we’re not seeing a boom in innovation from Africa yet is that the internet is prohibitively expensive in every country represented in the room. It didn’t matter if it was government regulated or privately owned. I would love to help figure out how to solve that problem.

Pyhtonista Cafe Open Space - PyconPythonistaCafe Members Coming Together at a PyCon Open Space

Ricky: There is so much to do at PyCon that there’s just not enough time to do it all. So was there anything you wish you had done or a talk you’d missed that you wish you hadn’t? Anything you’d do differently next time?

Katrina: I felt PyCon was what I was hoping it would be for this first time around, to be honest. Next year I want to stay for the sprints, which people kept telling me were amazing.

Ricky: So for those reading this that have yet to go to their first PyCon, this might be the most important question… How did attending PyCon affect how you will write your Python code going forward?

Katrina: I had to talk about my code. When you’re self-taught, you don’t get a lot of opportunities to talk through your code. Through the conversations I had at PyCon, I was encouraged to be more deliberate about taking advantage of my Slack and local Python communities to practice those communication skills.

In other words, I need to notch up my courage and not worry if I’m not coding it or explaining it well yet. I just need to keep coding and explaining.

Ricky: Last but not least, what else do you get up to in your spare time? What other hobbies and interests do you have, aside from Python and coding? Anything you’d like to plug?

Katrina: I love all things Sci-Fi and weird: movies, TV series, books, etc. I’m a LEGO enthusiast and have a penchant for building lost temples out of my love for H. P. Lovecraft stories. I really like VR and enjoy playing games on my Oculus Go. I also make jewelry out of felt, found objects, or even LEGO pieces.

If you’d like to catch up with Katrina and say hi, drop her a message on Twitter.

If there’s someone from the Python community that you’d love me to interview, leave a comment below and let me know.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Planet Python

Real Python: How to Use Python lambda Functions

Python and other languages like Java, C#, and even C++ have had lambda functions added to their syntax, whereas languages like LISP or the ML family of languages, Haskell, OCaml, and F#, use lambdas as a core concept.

Python lambdas are little, anonymous functions, subject to a more restrictive but more concise syntax than regular Python functions.

By the end of this article, you’ll know:

  • How Python lambdas came to be
  • How lambdas compare with regular function objects
  • How to write lambda functions
  • Which functions in the Python standard library leverage lambdas
  • When to use or avoid Python lambda functions

Notes: You’ll see some code examples using lambda that seem to blatantly ignore Python style best practices. This is only intended to illustrate lambda calculus concepts or to highlight the capabilities of Python lambda.

Those questionable examples will be contrasted with better approaches or alternatives as you progress through the article.

This tutorial is mainly for intermediate to experienced Python programmers, but it is accessible to any curious minds with interest in programming and lambda calculus.

All the examples included in this tutorial have been tested with Python 3.7.

Free Bonus: Click here to get access to a chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

Lambda Calculus

Lambda expressions in Python and other programming languages have their roots in lambda calculus, a model of computation invented by Alonzo Church. You’ll uncover when lambda calculus was introduced and why it’s a fundamental concept that ended up in the Python ecosystem.

History

Alonzo Church formalized lambda calculus, a language based on pure abstraction, in the 1930s. Lambda functions are also referred to as lambda abstractions, a direct reference to the abstraction model of Alonzo Church’s original creation.

Lambda calculus can encode any computation. It is Turing complete, but contrary to the concept of a Turing machine, it is pure and does not keep any state.

Functional languages get their origin in mathematical logic and lambda calculus, while imperative programming languages embrace the state-based model of computation invented by Alan Turing. The two models of computation, lambda calculus and Turing machines, can be translated into each another. This equivalence is known as the Church-Turing hypothesis.

Functional languages directly inherit the lambda calculus philosophy, adopting a declarative approach of programming that emphasizes abstraction, data transformation, composition, and purity (no state and no side effects). Examples of functional languages include Haskell, Lisp, or Erlang.

By contrast, the Turing Machine led to imperative programming found in languages like Fortran, C, or Python.

The imperative style consists of programming with statements, driving the flow of the program step by step with detailed instructions. This approach promotes mutation and requires managing state.

The separation in both families presents some nuances, as some functional languages incorporate imperative features, like OCaml, while functional features have been permeating the imperative family of languages in particular with the introduction of lambda functions in Java, or Python.

Python is not inherently a functional language, but it adopted some functional concepts early on. In January 1994, map(), filter(), reduce(), and the lambda operator were added to the language.

First Example

Here are a few examples to give you an appetite for some Python code, functional style.

The identity function, a function that returns its argument, is expressed with a standard Python function definition using the keyword def as follows:

>>>

>>> def identity(x): ...     return x 

identity() takes an argument x and returns it upon invocation.

In contrast, if you use a Python lambda construction, you get the following:

>>>

>>> lambda x: x 

In the example above, the expression is composed of:

  • The keyword: lambda
  • A bound variable: x
  • A body: x

Note: In the context of this article, a bound variable is an argument to a lambda function.

In contrast, a free variable is not bound and may be referenced in the body of the expression. A free variable can be a constant or a variable defined in the enclosing scope of the function.

You can write a slightly more elaborated example, a function that adds 1 to an argument, as follows:

>>>

>>> lambda x: x + 1 

You can apply the function above to an argument by surrounding the function and its argument with parentheses:

>>>

>>> (lambda x: x + 1)(2) 3 

Reduction is a lambda calculus strategy to compute the value of the expression. It consists of substituting the argument 2 for x:

(lambda x: x + 1)(2) = lambda 2: 2 + 1                      = 2 + 1                      = 3 

Because a lambda function is an expression, it can be named. Therefore you could write the previous code as follows:

>>>

>>> add_one = lambda x: x + 1 >>> add_one(2) 3 

The above lambda function is equivalent to writing this:

def add_one(x):     return x + 1 

These functions all take a single argument. You may have noticed that, in the definition of the lambdas, the arguments don’t have parentheses around them. Multi-argument functions (functions that take more than one argument) are expressed in Python lambdas by listing arguments and separating them with a comma (,) but without surrounding them with parentheses:

>>>

>>> full_name = lambda first, last: f'Full name: {first.title()} {last.title()}' >>> full_name('guido', 'van rossum') 'Full name: Guido Van Rossum' 

The lambda function assigned to full_name takes two arguments and returns a string interpolating the two parameters first and last. As expected, the definition of the lambda lists the arguments with no parentheses, whereas calling the function is done exactly like a normal Python function, with parentheses surrounding the arguments.

Anonymous Functions

The following terms may be used interchangeably depending on the programming language type and culture:

  • Anonymous functions
  • Lambda functions
  • Lambda expressions
  • Lambda abstractions
  • Lambda form
  • Function literals

For the rest of this article after this section, you’ll mostly see the term lambda function.

Taken literally, an anonymous function is a function without a name. In Python, an anonymous function is created with the lambda keyword. More loosely, it may or not be assigned a name. Consider a two-argument anonymous function defined with lambda but not bound to a variable. The lambda is not given a name:

>>>

>>> lambda x, y: x + y 

The function above defines a lambda expression that takes two arguments and returns their sum.

Other than providing you with the feedback that Python is perfectly fine with this form, it doesn’t lead to any practical use. You could invoke the function in the Python interpreter:

>>>

>>> _(1, 2) 3 

The example above is taking advantage of the interactive interpreter-only feature provided via the underscore (_). See the note below for more details.

You could not write similar code in a Python module. Consider the _ in the interpreter as a side effect that you took advantage of. In a Python module, you would assign a name to the lambda, or you would pass the lambda to a function. You’ll use those two approaches later in this article.

Note: In the interactive interpreter, the single underscore (_) is bound to the last expression evaluated.

In the example above, the _ points to the lambda function. For more details about the usage of this special character in Python, check out The Meaning of Underscores in Python.

Another pattern used in other languages like JavaScript is to immediately execute a Python lambda function. This is known as an Immediately Invoked Function Expression (IIFE, pronounce “iffy”). Here’s an example:

>>>

>>> (lambda x, y: x + y)(2, 3) 5 

The lambda function above is defined and then immediately called with two arguments (2 and 3). It returns the value 5, which is the sum of the arguments.

Several examples in this tutorial use this format to highlight the anonymous aspect of a lambda function and avoid focusing on lambda in Python as a shorter way of defining a function.

Python does not encourage using immediately invoked lambda expressions. It simply results from a lambda expression being callable, unlike the body of a normal function.

Lambda functions are frequently used with higher-order functions, which take one or more functions as arguments or return one or more functions.

A lambda function can be a higher-order function by taking a function (normal or lambda) as an argument like in the following contrived example:

>>>

>>> high_ord_func = lambda x, func: x + func(x) >>> high_ord_func(2, lambda x: x * x) 6 >>> high_ord_func(2, lambda x: x + 3) 7 

Python exposes higher-order functions as built-in functions or in the standard library. Examples include map(), filter(), functools.reduce(), as well as key functions like sort(), sorted(), min(), and max(). You’ll use lambda functions together with Python higher-order functions in Appropriate Uses of Lambda Expressions.

Python Lambda and Regular Functions

This quote from the Python Design and History FAQ seems to set the tone about the overall expectation regarding the usage of lambda functions in Python:

Unlike lambda forms in other languages, where they add functionality, Python lambdas are only a shorthand notation if you’re too lazy to define a function. (Source)

Nevertheless, don’t let this statement deter you from using Python’s lambda. At first glance, you may accept that a lambda function is a function with some syntactic sugar shortening the code to define or invoke a function. The following sections highlight the commonalities and subtle differences between normal Python functions and lambda functions.

Functions

At this point, you may wonder what fundamentally distinguishes a lambda function bound to a variable from a regular function with a single return line: under the surface, almost nothing. Let’s verify how Python sees a function built with a single return statement versus a function constructed as an expression (lambda).

The dis module exposes functions to analyze Python bytecode generated by the Python compiler:

>>>

>>> import dis >>> add = lambda x, y: x + y >>> type(add) <class 'function'> >>> dis.dis(add)   1           0 LOAD_FAST                0 (x)               2 LOAD_FAST                1 (y)               4 BINARY_ADD               6 RETURN_VALUE >>> add <function <lambda> at 0x7f30c6ce9ea0> 

You can see that dis() expose a readable version of the Python bytecode allowing the inspection of the low-level instructions that the Python interpreter will use while executing the program.

Now see it with a regular function object:

>>>

>>> import dis >>> def add(x, y): return x + y >>> type(add) <class 'function'> >>> dis.dis(add)   1           0 LOAD_FAST                0 (x)               2 LOAD_FAST                1 (y)               4 BINARY_ADD               6 RETURN_VALUE >>> add <function add at 0x7f30c6ce9f28> 

The bytecode interpreted by Python is the same for both functions. But you may notice that the naming is different: the function name is add for a function defined with def, whereas the Python lambda function is seen as lambda.

Traceback

You saw in the previous section that, in the context of the lambda function, Python did not provide the name of the function, but only <lambda>. This can be a limitation to consider when an exception occurs, and a traceback shows only <lambda>:

>>>

>>> div_zero = lambda x: x / 0 >>> div_zero(2) Traceback (most recent call last):     File "<stdin>", line 1, in <module>     File "<stdin>", line 1, in <lambda> ZeroDivisionError: division by zero 

The traceback of an exception raised while a lambda function is executed only identifies the function causing the exception as <lambda>.

Here’s the same exception raised by a normal function:

>>>

>>> def div_zero(x): return x / 0 >>> div_zero(2) Traceback (most recent call last):     File "<stdin>", line 1, in <module>     File "<stdin>", line 1, in div_zero ZeroDivisionError: division by zero 

The normal function causes a similar error but results in a more precise traceback because it gives the function name, div_zero.

Syntax

As you saw in the previous sections, a lambda form presents syntactic distinctions from a normal function. In particular, a lambda function has the following characteristics:

  • It can only contain expressions and can’t include statements in its body.
  • It is written as a single line of execution.
  • It does not support type annotations.
  • It can be immediately invoked (IIFE).

No Statements

A lambda function can’t contain any statements. In a lambda function, statements like return, pass, assert, or raise will raise a SyntaxError exception. Here’s an example of adding assert to the body of a lambda:

>>>

>>> (lambda x: assert x == 2)(2)   File "<input>", line 1     (lambda x: assert x == 2)(2)                     ^ SyntaxError: invalid syntax 

This contrived example intended to assert that parameter x had a value of 2. But, the interpreter identifies a SyntaxError while parsing the code that involves the statement assert in the body of the lambda.

Single Expression

In contrast to a normal function, a Python lambda function is a single expression. Although, in the body of a lambda, you can spread the expression over several lines using parentheses or a multiline string, it remains a single expression:

>>>

>>> (lambda x: ... (x % 2 and 'odd' or 'even'))(3) 'odd' 

The example above returns the string 'odd' when the lambda argument is odd, and 'even' when the argument is even. It spreads across two lines because it is contained in a set of parentheses, but it remains a single expression.

Type Annotations

If you’ve started adopting type hinting, which is now available in Python, then you have another good reason to prefer normal functions over Python lambda functions. Check out Python Type Checking (Guide) to get learn more about Python type hints and type checking. In a lambda function, there is no equivalent for the following:

def full_name(first: str, last: str) -> str:     return f'{first.title()} {last.title()}' 

Any type error with full_name() can be caught by tools like mypy or pyre, whereas a SyntaxError with the equivalent lambda function is raised at runtime:

>>>

>>> lambda first: str, last: str: first.title() + " " + last.title() -> str   File "<stdin>", line 1     lambda first: str, last: str: first.title() + " " + last.title() -> str  SyntaxError: invalid syntax 

Like trying to include a statement in a lambda, adding type annotation immediately results in a SyntaxError at runtime.

IIFE

You’ve already seen several examples of immediately invoked function execution:

>>>

>>> (lambda x: x * x)(3) 9 

Outside of the Python interpreter, this feature is probably not used in practice. It’s a direct consequence of a lambda function being callable as it is defined. For example, this allows you to pass the definition of a Python lambda expression to a higher-order function like map(), filter(), or functools.reduce(), or to a key function.

Arguments

Like a normal function object defined with def, Python lambda expressions support all the different ways of passing arguments. This includes:

  • Positional arguments
  • Named arguments (sometimes called keyword arguments)
  • Variable list of arguments (often referred to as varargs)
  • Variable list of keyword arguments
  • Keyword-only arguments

The following examples illustrate options open to you in order to pass arguments to lambda expressions:

>>>

>>> (lambda x, y, z: x + y + z)(1, 2, 3) 6 >>> (lambda x, y, z=3: x + y + z)(1, 2) 6 >>> (lambda x, y, z=3: x + y + z)(1, y=2) 6 >>> (lambda *args: sum(args))(1,2,3) 6 >>> (lambda **kwargs: sum(kwargs.values()))(one=1, two=2, three=3) 6 >>> (lambda x, *, y=0, z=0: x + y + z)(1, y=2, z=3) 6 

Decorators

In Python, a decorator is the implementation of a pattern that allows adding a behavior to a function or a class. It is usually expressed with the @decorator syntax prefixing a function. Here’s a contrived example:

def some_decorator(f):     def wraps(*args):         print(f"Calling function '{f.__name__}'")         return f(args)     return wraps  @some_decorator def decorated_function(x):     print(f"With argument '{x}'") 

In the example above, some_decorator() is a function that adds a behavior to decorated_function(), so that invoking decorated_function(2) results in the following output:

Calling function 'decorated_function' With argument 'Python' 

decorated_function() only prints With argument 'Python', but the decorator adds an extra behavior that also prints Calling function 'decorated_function'.

A decorator can be applied to a lambda. Although it’s not possible to decorate a lambda with the @decorator syntax, a decorator is just a function, so it can call the lambda function:

 1 # Defining a decorator  2 def trace(f):  3     def wrap(*args, **kwargs):  4         print(f"[TRACE] func: {f.__name__}, args: {args}, kwargs: {kwargs}")  5         return f(*args, **kwargs)  6   7     return wrap  8   9 # Applying decorator to a function 10 @trace 11 def add_two(x): 12     return x + 2 13  14 # Calling the decorated function 15 add_two(3) 16  17 # Applying decorator to a lambda 18 print((trace(lambda x: x ** 2))(3)) 

add_two(), decorated with @trace on line 11, is invoked with argument 3 on line 15. By contrast, on line 18, a lambda function is immediately involved and embedded in a call to trace(), the decorator. When you execute the code above you obtain the following:

[TRACE] func: add_two, args: (3,), kwargs: {} [TRACE] func: <lambda>, args: (3,), kwargs: {} 9 

See how, as you’ve already seen, the name of the lambda function appears as <lambda>, whereas add_two is clearly identified for the normal function.

Decorating the lambda function this way could be useful for debugging purposes, possibly to debug the behavior of a lambda function used in the context of a higher-order function or a key function. Let’s see an example with map():

list(map(trace(lambda x: x*2), range(3))) 

The first argument of map() is a lambda that multiplies its argument by 2. This lambda is decorated with trace(). When executed, the example above outputs the following:

[TRACE] Calling <lambda> with args (0,) and kwargs {} [TRACE] Calling <lambda> with args (1,) and kwargs {} [TRACE] Calling <lambda> with args (2,) and kwargs {} [0, 2, 4] 

The result [0, 2, 4] is a list obtained from multiplying each element of range(3). For now, consider range(3) equivalent to the list [0, 1, 2].

You will be exposed to map() in more details in Map.

A lambda can also be a decorator, but it’s not recommended. If you find yourself needing to do this, consult PEP 8, Programming Recommendations.

For more on Python decorators, check out Primer on Python Decorators.

Closure

A closure is a function where every free variable, everything except parameters, used in that function is bound to a specific value defined in the enclosing scope of that function. In effect, closures define the environment in which they run, and so can be called from anywhere.

The concepts of lambdas and closures are not necessarily related, although lambda functions can be closures in the same way that normal functions can also be closures. Some languages have special constructs for closure or lambda (for example, Groovy with an anonymous block of code as Closure object), or a lambda expression (for example, Java Lambda expression with a limited option for closure).

Here’s a closure constructed with a normal Python function:

 1 def outer_func(x):  2     y = 4  3     def inner_func(z):  4         print(f"x = {x}, y = {y}, z = {z}")  5         return x + y + z  6     return inner_func  7   8 for i in range(3):  9     closure = outer_func(i) 10     print(f"closure({i+5}) = {closure(i+5)}") 

outer_func() returns inner_func(), a nested function that computes the sum of three arguments:

  • x is passed as an argument to outer_func().
  • y is a variable local to outer_func().
  • z is an argument passed to inner_func().

To test the behavior of outer_func() and inner_func(), outer_func() is invoked three times in a for loop that prints the following:

x = 0, y = 4, z = 5 closure(5) = 9 x = 1, y = 4, z = 6 closure(6) = 11 x = 2, y = 4, z = 7 closure(7) = 13 

On line 9 of the code, inner_func() returned by the invocation of outer_func() is bound to the name closure. On line 5, inner_func() captures x and y because it has access to its embedding environment, such that upon invocation of the closure, it is able to operate on the two free variables x and y.

Similarly, a lambda can also be a closure. Here’s the same example with a Python lambda function:

 1 def outer_func(x):  2     y = 4  3     return lambda z: x + y + z  4   5 for i in range(3):  6     closure = outer_func(i)  7     print(closure(i)) 

When you execute the code above, you obtain the following output:

closure(5) = 9 closure(6) = 11 closure(7) = 13 

On line 6, outer_func() returns a lambda and assigns it to to the variable closure. On line 3, the body of the lambda function references x and y. The variable y is available at definition time, whereas x is defined at runtime when outer_func() is invoked.

In this situation, both the normal function and the lambda behave similarly. In the next section, you’ll see a situation where the behavior of a lambda can be deceptive due to its evaluation time (definition time vs runtime).

Evaluation Time

In some situations involving loops, the behavior of a Python lambda function as a closure may be counterintuitive. It requires understanding when free variables are bound in the context of a lambda. The following examples demonstrate the difference when using a regular function vs using a Python lambda.

Test the scenario first using a regular function:

>>>

 1 >>> def wrap(n):  2 ...     def f():  3 ...         print(n)  4 ...     return f  5 ...  6 >>> numbers = 'one', 'two', 'three'  7 >>> funcs = []  8 >>> for n in numbers:  9 ...     funcs.append(wrap(n)) 10 ... 11 >>> for f in funcs: 12 ...     f() 13 ... 14 one 15 two 16 three 

In a normal function, n is evaluated at definition time, on line 9, when the function is added to the list: funcs.append(wrap(n)).

Now, with the implementation of the same logic with a lambda function, observe the unexpected behavior:

>>>

 1 >>> numbers = 'one', 'two', 'three'  2 >>> funcs = []  3 >>> for n in numbers:  4 ...     funcs.append(lambda: print(n))  5 ...  6 >>> for f in funcs:  7 ...     f()  8 ...  9 three 10 three 11 three 

The unexpected result occurs because the free variable n, as implemented, is bound at the execution time of the lambda expression. The Python lambda function on line 4 is a closure that captures n, a free variable bound at runtime. At runtime, while invoking the function f on line 7, the value of n is three.

To overcome this issue, you can assign the free variable at definition time as follows:

>>>

 1 >>> numbers = 'one', 'two', 'three'  2 >>> funcs = []  3 >>> for n in numbers:  4 ...     funcs.append(lambda n=n: print(n))  5 ...  6 >>> for f in funcs:  7 ...     f()  8 ...  9 one 10 two 11 three 

A Python lambda function behaves like a normal function in regard to arguments. Therefore, a lambda parameter can be initialized with a default value: the parameter n takes the outer n as a default value. The Python lambda function could have been written as lambda x=n: print(x) and have the same result.

The Python lambda function is invoked without any argument on line 7, and it uses the default value n set at definition time.

Testing Lambdas

Python lambdas can be tested similarly to regular functions. It’s possible to use both unittest and doctest.

unittest

The unittest module handles Python lambda functions similarly to regular functions:

import unittest  addtwo = lambda x: x + 2  class LambdaTest(unittest.TestCase):     def test_add_two(self):         self.assertEqual(addtwo(2), 4)      def test_add_two_point_two(self):         self.assertEqual(addtwo(2.2), 4.2)      def test_add_three(self):         # Should fail         self.assertEqual(addtwo(3), 6)  if __name__ == '__main__':     unittest.main(verbosity=2) 

LambdaTest defines a test case with three test methods, each of them exercising a test scenario for addtwo() implemented as a lambda function. The execution of the Python file lambda_unittest.py that contains LambdaTest produces the following:

$   python lambda_unittest.py test_add_three (__main__.LambdaTest) ... FAIL test_add_two (__main__.LambdaTest) ... ok test_add_two_point_two (__main__.LambdaTest) ... ok  ====================================================================== FAIL: test_add_three (__main__.LambdaTest) ---------------------------------------------------------------------- Traceback (most recent call last):   File "lambda_unittest.py", line 18, in test_add_three     self.assertEqual(addtwo(3), 6) AssertionError: 5 != 6  ---------------------------------------------------------------------- Ran 3 tests in 0.001s  FAILED (failures=1) 

As expected, we have two successful test cases and one failure for test_add_three: the result is 5, but the expected result was 6. This failure is due to an intentional mistake in the test case. Changing the expected result from 6 to 5 will satisfy all the tests for LambdaTest.

doctest

The doctest module extracts interactive Python code from docstring to execute tests. Although the syntax of Python lambda functions does not support a typical docstring, it is possible to assign a string to the __doc__ element of a named lambda:

addtwo = lambda x: x + 2 addtwo.__doc__ = """Add 2 to a number.     >>> addtwo(2)     4     >>> addtwo(2.2)     4.2     >>> addtwo(3) # Should fail     6     """  if __name__ == '__main__':     import doctest     doctest.testmod(verbose=True) 

The doctest in the doc comment of lambda addtwo() describes the same test cases as in the previous section.

When you execute the tests via doctest.testmod(), you get the following:

$   python lambda_doctest.py Trying:     addtwo(2) Expecting:     4 ok Trying:     addtwo(2.2) Expecting:     4.2 ok Trying:     addtwo(3) # Should fail Expecting:     6 ********************************************************************** File "lambda_doctest.py", line 16, in __main__.addtwo Failed example:     addtwo(3) # Should fail Expected:     6 Got:     5 1 items had no tests:     __main__ ********************************************************************** 1 items had failures:    1 of   3 in __main__.addtwo 3 tests in 2 items. 2 passed and 1 failed. ***Test Failed*** 1 failures. 

The failed test results from the same failure explained in the execution of the unit tests in the previous section.

You can add a docstring to a Python lambda via an assignment to __doc__ to document a lambda function. Although possible, the Python syntax better accommodates docstring for normal functions than lambda functions.

For a comprehensive overview of unit testing in Python, you may want to refer to Getting Started With Testing in Python.

Lambda Expression Abuses

Several examples in this article, if written in the context of professional Python code, would qualify as abuses.

If you find yourself trying to overcome something that a lambda expression does not support, this is probably a sign that a normal function would be better suited. The docstring for a lambda expression in the previous section is a good example. Attempting to overcome the fact that a Python lambda function does not support statements is another red flag.

The next sections illustrate a few examples of lambda usages that should be avoided. Those examples might be situations where, in the context of Python lambda, the code exhibits the following pattern:

  • It doesn’t follow the Python style guide (PEP 8)
  • It’s cumbersome and difficult to read.
  • It’s unnecessarily clever at the cost of difficult readability.

Raising an Exception

Trying to raise an exception in a Python lambda should make you think twice. There are some clever ways to do so, but even something like the following is better to avoid:

>>>

>>> def throw(ex): raise ex >>> (lambda: throw(Exception('Something bad happened')))() Traceback (most recent call last):     File "<stdin>", line 1, in <module>     File "<stdin>", line 1, in <lambda>     File "<stdin>", line 1, in throw Exception: Something bad happened 

Because a statement is not syntactically correct in a Python lambda body, the workaround in the example above consists of abstracting the statement call with a dedicated function throw(). Using this type of workaround should be avoided. If you encounter this type of code, you should consider refactoring the code to use a regular function.

Cryptic Style

As in any programming languages, you will find Python code that can be difficult to read because of the style used. Lambda functions, due to their conciseness, can be conducive to writing code that is difficult to read.

The following lambda example contains several bad style choices:

>>>

>>> (lambda _: list(map(lambda _: _ // 2, _)))([1,2,3,4,5,6,7,8,9,10]) [0, 1, 1, 2, 2, 3, 3, 4, 4, 5] 

The underscore (_) refers to a variable that you don’t need to refer to explicitly. But in this example, three _ refer to different variables. An initial upgrade to this lambda code could be to name the variables:

>>>

>>> (lambda some_list: list(map(lambda n: n // 2,                                 some_list)))([1,2,3,4,5,6,7,8,9,10]) [0, 1, 1, 2, 2, 3, 3, 4, 4, 5] 

Admittedly, it’s still difficult to read. By still taking advantage of a lambda, a regular function would go a long way to render this code more readable, spreading the logic over a few lines and function calls:

>>>

>>> def div_items(some_list):       div_by_two = lambda n: n // 2       return map(div_by_two, some_list) >>> list(div_items([1,2,3,4,5,6,7,8,9,10]))) [0, 1, 1, 2, 2, 3, 3, 4, 4, 5] 

This is still not optimal but shows you a possible path to make code, and Python lambda functions in particular, more readable. In Alternatives to Lambdas, you’ll learn to replace map() and lambda with list comprehensions or generator expressions. This will drastically improve the readability of the code.

Python Classes

You can but should not write class methods as Python lambda functions. The following example is perfectly legal Python code but exhibits unconventional Python code relying on lambda. For example, instead of implementing __str__ as a regular function, it uses a lambda. Similarly, brand and year are properties also implemented with lambda functions, instead of regular functions or decorators:

class Car:     """Car with methods as lambda functions."""     def __init__(self, brand, year):         self.brand = brand         self.year = year      brand = property(lambda self: getattr(self, '_brand'),                      lambda self, value: setattr(self, '_brand', value))      year = property(lambda self: getattr(self, '_year'),                     lambda self, value: setattr(self, '_year', value))      __str__ = lambda self: f'{self.brand} {self.year}'  # 1: error E731      honk = lambda self: print('Honk!')     # 2: error E731 

Running a tool like flake8, a style guide enforcement tool, will display the following errors for __str__ and honk:

E731 do not assign a lambda expression, use a def 

Although flake8 doesn’t point out an issue for the usage of the Python lambda functions in the properties, they are difficult to read and prone to error because of the usage of multiple strings like '_brand' and '_year'.

Proper implementation of __str__ would be expected to be as follows:

def __str__(self):     return f'{self.brand} {self.year}' 

brand would be written as follows:

@property def brand(self):     return self._brand  @brand.setter def brand(self, value):     self._brand = value 

As a general rule, in the context of code written in Python, prefer regular functions over lambda expressions. Nonetheless, there are cases that benefit from lambda syntax, as you will see in the next section.

Appropriate Uses of Lambda Expressions

Lambdas in Python tend to be the subject of controversies. Some of the arguments against lambdas in Python are:

  • Issues with readability
  • The imposition of a functional way of thinking
  • Heavy syntax with the lambda keyword

Despite the heated debates questioning the mere existence of this feature in Python, lambda functions have properties that sometimes provide value to the Python language and to developers.

The following examples illustrate scenarios where the use of lambda functions is not only suitable but encouraged in Python code.

Classic Functional Constructs

Lambda functions are regularly used with the built-in functions map() and filter(), as well as functools.reduce(), exposed in the module functools. The following three examples are respective illustrations of using those functions with lambda expressions as companions:

>>>

>>> list(map(lambda x: x.upper(), ['cat', 'dog', 'cow'])) ['CAT', 'DOG', 'COW'] >>> list(filter(lambda x: 'o' in x, ['cat', 'dog', 'cow'])) ['dog', 'cow'] >>> from functools import reduce >>> reduce(lambda acc, x: f'{acc} | {x}', ['cat', 'dog', 'cow']) 'cat | dog | cow' 

You may have to read code resembling the examples above, albeit with more relevant data. For that reason, it’s important to recognize those constructs. Nevertheless, those constructs have equivalent alternatives that are considered more Pythonic. In Alternatives to Lambdas, you’ll learn how to convert higher-order functions and their accompanying lambdas into other more idiomatic forms.

Key Functions

Key functions in Python are higher-order functions that take a parameter key as a named argument. key receives a function that can be a lambda. This function directly influences the algorithm driven by the key function itself. Here are some key functions:

  • sort(): list method
  • sorted(), min(), max(): built-in functions
  • nlargest() and nsmallest(): in the Heap queue algorithm module heapq

Imagine that you want to sort a list of IDs represented as strings. Each ID is the concatenation of the string id and a number. Sorting this list with the built-in function sorted(), by default, uses a lexicographic order as the elements in the list are strings.

To influence the sorting execution, you can assign a lambda to the named argument key, such that the sorting will use the number associated with the ID:

>>>

>>> ids = ['id1', 'id2', 'id30', 'id3', 'id22', 'id100'] >>> print(sorted(ids)) # Lexicographic sort ['id1', 'id2', 'id30', 'id3', 'id22', 'id100'] >>> sorted_ids = sorted(ids, key=lambda x: int(x[2:])) # Integer sort >>> print(sorted_ids) ['id1', 'id2', 'id3', 'id22', 'id30', 'id100'] 

UI Frameworks

UI frameworks like Tkinter, wxPython, or .NET Windows Forms with IronPython take advantage of lambda functions for mapping actions in response to UI events.

The naive Tkinter program below demonstrates the usage of a lambda assigned to the command of the Reverse button:

import tkinter as tk import sys  window = tk.Tk() window.grid_columnconfigure(0, weight=1) window.title("Lambda") window.geometry("300x100") label = tk.Label(window, text="Lambda Calculus") label.grid(column=0, row=0) button = tk.Button(     window,     text="Reverse",     command=lambda: label.configure(text=label.cget("text")[::-1]), ) button.grid(column=0, row=1) window.mainloop() 

Clicking the button Reverse fires an event that triggers the lambda function, changing the label from Lambda Calculus to suluclaC adbmaL*:

Animated TkInter Windows demonstrating the action of the button to the text

Both wxPython and IronPython on the .NET platform share a similar approach for handling events. Note that lambda is one way to handle firing events, but a function may be used for the same purpose. It ends up being self-contained and less verbose to use a lambda when the amount of code needed is very short.

To explore wxPython, check out How to Build a Python GUI Application With wxPython.

Python Interpreter

When you’re playing with Python code in the interactive interpreter, Python lambda functions are often a blessing. It’s easy to craft a quick one-liner function to explore some snippets of code that will never see the light of day outside of the interpreter. The lambdas written in the interpreter, for the sake of speedy discovery, are like scrap paper that you can throw away after use.

timeit

In the same spirit as the experimentation in the Python interpreter, the module timeit provides functions to time small code fragments. timeit.timeit() in particular can be called directly, passing some Python code in a string. Here’s an example:

>>>

>>> from timeit import timeit >>> timeit("factorial(999)", "from math import factorial", number=10) 0.0013087529951008037 

When the statement is passed as a string, timeit() needs the full context. In the example above, this is provided by the second argument that sets up the environment needed by the main function to be timed. Not doing so would raise a NameError exception.

Another approach is to use a lambda:

>>>

>>> from math import factorial >>> timeit(lambda: factorial(999), number=10) 0.0012704220062005334 

This solution is cleaner, more readable, and quicker to type in the interpreter. Although the execution time was slightly less for the lambda version, executing the functions again may show a slight advantage for the string version. The execution time of the setup is excluded from the overall execution time and shouldn’t have any impact on the result.

Monkey Patching

For testing, it’s sometimes necessary to rely on repeatable results, even if during the normal execution of a given software, the corresponding results are expected to differ, or even be totally random.

Let’s say you want to test a function that, at runtime, handles random values. But, during the testing execution, you need to assert against predictable values in a repeatable manner. The following example shows how, with a lambda function, monkey patching can help you:

from contextlib import contextmanager import secrets  def gen_token():     """Generate a random token."""     return f'TOKEN_{secrets.token_hex(8)}'  @contextmanager def mock_token():     """Context manager to monkey patch the secrets.token_hex     function during testing.     """     default_token_hex = secrets.token_hex     secrets.token_hex = lambda _: 'feedfacecafebeef'     yield     secrets.token_hex = default_token_hex  def test_gen_key():     """Test the random token."""     with mock_token():         assert gen_token() == f"TOKEN_{'feedfacecafebeef'}"  test_gen_key() 

A context manager helps with insulating the operation of monkey patching a function from the standard library (secrets, in this example). The lambda function assigned to secrets.token_hex() substitutes the default behavior by returning a static value.

This allows testing any function depending on token_hex() in a predictable fashion. Prior to exiting from the context manager, the default behavior of token_hex() is reestablished to eliminate any unexpected side effects that would affect other areas of the testing that may depend on the default behavior of token_hex().

Unit test frameworks like unittest and pytest take this concept to a higher level of sophistication.

With pytest, still using a lambda function, the same example becomes more elegant and concise :

import secrets  def gen_token():     return f'TOKEN_{secrets.token_hex(8)}'  def test_gen_key(monkeypatch):     monkeypatch.setattr('secrets.token_hex', lambda _: 'feedfacecafebeef')     assert gen_token() == f"TOKEN_{'feedfacecafebeef'}" 

With the pytest monkeypatch fixture, secrets.token_hex() is overwritten with a lambda that will return a deterministic value, feedfacecafebeef, allowing to validate the test. The pytest monkeypatch fixture allows you to control the scope of the override. In the example above, invoking secrets.token_hex() in subsequent tests, without using monkey patching, would execute the normal implementation of this function.

Executing the pytest test gives the following result:

$   pytest test_token.py -v ============================= test session starts ============================== platform linux -- Python 3.7.2, pytest-4.3.0, py-1.8.0, pluggy-0.9.0 cachedir: .pytest_cache rootdir: /home/andre/AB/tools/bpython, inifile: collected 1 item  test_token.py::test_gen_key PASSED                                       [100%]  =========================== 1 passed in 0.01 seconds =========================== 

The test passes as we validated that the gen_token() was exercised, and the results were the expected ones in the context of the test.

Alternatives to Lambdas

While there are great reasons to use lambda, there are instances where its use is frowned upon. So what are the alternatives?

Higher-order functions like map(), filter(), and functools.reduce() can be converted to more elegant forms with slight twists of creativity, in particular with list comprehensions or generator expressions.

Watch Using List Comprehensions Effectively to learn more about list comprehensions.

Map

The built-in function map() takes a function as a first argument and applies it to each of the elements of its second argument, an iterable. Examples of iterables are strings, lists, and tuples. For more information on iterables and iterators, check out Iterables and Iterators.

map() returns an iterator corresponding to the transformed collection. As an example, if you wanted to transform a list of strings to a new list with each string capitalized, you could use map(), as follows:

>>>

>>> list(map(lambda x: x.capitalize(), ['cat', 'dog', 'cow'])) ['Cat', 'Dog', 'Cow'] 

You need to invoke list() to convert the iterator returned by map() into an expanded list that can be displayed in the Python shell interpreter.

Using a list comprehension eliminates the need for defining and invoking the lambda function:

>>>

>>> [x.capitalize() for x in ['cat', 'dog', 'cow']] ['Cat', 'Dog', 'Cow'] 

Filter

The built-in function filter(), another classic functional construct, can be converted into a list comprehension. It takes a predicate as a first argument and an iterable as a second argument. It builds an iterator containing all the elements of the initial collection that satisfies the predicate function. Here’s an example that filters all the even numbers in a given list of integers:

>>>

>>> even = lambda x: x%2 == 0 >>> list(filter(even, range(11))) [0, 2, 4, 6, 8, 10] 

Note that filter() returns an iterator, hence the need to invoke the built-in type list that constructs a list given an iterator.

The implementation leveraging the list comprehension construct gives the following:

>>>

>>> [x for x in range(11) if x%2 == 0] [0, 2, 4, 6, 8, 10] 

Reduce

Since Python 3, reduce() has bgone from a built-in function to a functools module function. As map() and filter(), its first two arguments are respectively a function and an iterable. It may also take an initializer as a third argument that is used as the initial value of the resulting accumulator. For each element of the iterable, reduce() applies the function and accumulates the result that is returned when the iterable is exhausted.

To apply reduce() to a list of pairs and calculate the sum of the first item of each pair, you could write this:

>>>

>>> import functools >>> pairs = [(1, 'a'), (2, 'b'), (3, 'c')] >>> functools.reduce(lambda acc, pair: acc + pair[0], pairs, 0) 6 

A more idiomatic approach using a generator expression, as an argument to sum() in the example, is the following:

>>>

>>> pairs = [(1, 'a'), (2, 'b'), (3, 'c')] >>> sum(x[0] for x in pairs) 6 

A slightly different and possibly cleaner solution removes the need to explicitly access the first element of the pair and instead use unpacking:

>>>

>>> pairs = [(1, 'a'), (2, 'b'), (3, 'c')] >>> sum(x for x, _ in pairs) 6 

The use of underscore (_) is a Python convention indicating that you can ignore the second value of the pair.

sum() takes a unique argument, so the generator expression does not need to be in parentheses.

Are Lambdas Pythonic or Not?

PEP 8, which is the style guide for Python code, reads:

Always use a def statement instead of an assignment statement that binds a lambda expression directly to an identifier. (Source)

This strongly discourages using lambda bound to an identifier, mainly where functions should be used and have more benefits. PEP 8 does not mention other usages of lambda. As you have seen in the previous sections, lambda functions may certainly have good uses, although they are limited.

A possible way to answer the question is that lambda functions are perfectly Pythonic if there is nothing more Pythonic available. I’m staying away from defining what “Pythonic” means, leaving you with the definition that best suits your mindset, as well as your personal or your team’s coding style.

Beyond the narrow scope of Python lambda, How to Write Beautiful Python Code With PEP 8 is a great resource that you may want to check out regarding code style in Python.

Conclusion

You now know how to use Python lambda functions and can:

  • Write Python lambdas and use anonymous functions
  • Choose wisely between lambdas or normal Python functions
  • Avoid excessive use of lambdas
  • Use lambdas with higher-order functions or Python key functions

If you have a penchant for mathematics, you may have some fun exploring the fascinating world of lambda calculus.

Python lambdas are like salt. A pinch in your spam, ham, and eggs will enhance the flavors, but too much will spoil the dish.

Note: The Python programming language, named after Monty Python, prefers to use spam, ham, and eggs as metasyntactic variables, instead of the traditional foo, bar, and baz.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Planet Python

Real Python: How to Build Command Line Interfaces in Python With argparse

One of the strengths of Python is that it comes with batteries included: it has a rich and versatile standard library that makes it one of the best programming languages for writing scripts for the command line. But, if you write scripts for the command line, then you also need to provide a good command line interface, which you can create with the Python argparse library.

In this article, you’ll learn:

  • What the Python argparse library is, and why it’s important to use it if you need to write command line scripts in Python
  • How to use the Python argparse library to quickly create a simple CLI in Python
  • What the advanced usage of the Python argparse library is

This article is written for early intermediate Pythonistas who probably write scripts in Python for their everyday work but have never implemented a command line interface for their scripts.

If that sounds like you, and you’re used to setting variable values at the beginning of your scripts or manually parsing the sys.argv system list instead of using a more robust CLI development tool, then this article is for you.

Free Bonus: Click here to get access to a chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

What Is a Command Line Interface?

The command line interface (also known as CLI) is a means to interact with a command line script. Python comes with several different libraries that allow you to write a command line interface for your scripts, but the standard way for creating a CLI in Python is currently the Python argparse library.

The Python argparse library was released as part of the standard library with Python 3.2 on February the 20th, 2011. It was introduced with Python Enhancement Proposal 389 and is now the standard way to create a CLI in Python, both in 2.7 and 3.2+ versions.

This new module was released as a replacement for the older getopt and optparse modules because they were lacking some important features.

The Python argparse library:

  • Allows the use of positional arguments
  • Allows the customization of the prefix chars
  • Supports variable numbers of parameters for a single option
  • Supports subcommands (A main command line parser can use other command line parsers depending on some arguments.)

Before getting started, you need to know how a command line interface works, so open a terminal on your computer and execute the command ls to get the list of the files contained in the current directory like this:

$   ls dcdb_20180201.sg4    mastro35.sg4        openings.sg4 dcdb_20180201.si4    mastro35.si4        openings.si4 dcdb_20180201.sn4    mastro35.sn4        openings.sn4 

As you can see, there are a bunch of files in the current directory, but the command didn’t return a lot of information about these files.

The good news is that you don’t need to look around for another program to have a richer list of the files contained in the current directory. You also don’t need to modify the ls command yourself, because it adopts a command line interface, that is just a set of tokens (called arguments) that you can use to configure the behavior of this command.

Now try to execute the command ls again, but with adding the -l option to the command line as in the example below:

$   ls -l total 641824 -rw-------  1 dave  staff  204558286  5 Mar  2018 dcdb_20180201.sg4 -rw-------  1 dave  staff  110588409  5 Mar  2018 dcdb_20180201.si4 -rw-------  1 dave  staff    2937516  5 Mar  2018 dcdb_20180201.sn4 -rw-------  1 dave  staff     550127 27 Mar  2018 mastro35.sg4 -rw-------  1 dave  staff      15974 11 Gen 17:01 mastro35.si4 -rw-------  1 dave  staff       3636 27 Mar  2018 mastro35.sn4 -rw-------  1 dave  staff      29128 17 Apr  2018 openings.sg4 -rw-------  1 dave  staff        276 17 Apr  2018 openings.si4 -rw-------  1 dave  staff         86 18 Apr  2018 openings.sn4 

The output is very different now. The command has returned a lot of information about the permissions, owner, group, and size of each file and the total directory occupation on the disk.

This is because you used the command line interface of the ls command and specified the -l option that enables the long format, a special format that returns a lot more information for every single file listed.

In order to familiarize yourself with this topic, you’re going to read a lot about arguments, options, and parameters, so let’s clarify the terminology right away:

  • An argument is a single part of a command line, delimited by blanks.
  • An option is a particular type of argument (or a part of an argument) that can modify the behavior of the command line.
  • A parameter is a particular type of argument that provides additional information to a single option or command.

Consider the following command:

$   ls -l -s -k /var/log 

In this example, you have five different arguments:

  1. ls: the name of the command you are executing
  2. -l: an option to enable the long list format
  3. -s: an option to print the allocated size of each file
  4. -k: an option to have the size in kilobytes
  5. /var/log: a parameter that provides additional information (the path to list) to the command

Note that, if you have multiple options in a single command line, then you can combine them into a single argument like this:

$   ls -lsk /var/log 

Here you have only three arguments:

  1. ls : the name of the command you are executing
  2. -lsk: the three different options you want to enable (a combination of -l, -s, and -k)
  3. /var/log: a parameter that provides additional information (the path to list) to the command

When to Use a Command Line Interface

Now that you know what a command line interface is, you may be wondering when it’s a good idea to implement one in your programs. The rule of thumb is that, if you want to provide a user-friendly approach to configuring your program, then you should consider a command line interface, and the standard way to do it is by using the Python argparse library.

Even if you’re creating a complex command line program that needs a configuration file to work, if you want to let your user specify which configuration file to use, it’s a good idea to accept this value by creating a command line interface with the Python argparse library.

How to Use the Python argparse Library to Create a Command Line Interface

Using the Python argparse library has four steps:

  1. Import the Python argparse library
  2. Create the parser
  3. Add optional and positional arguments to the parser
  4. Execute .parse_args()

After you execute .parse_args(), what you get is a Namespace object that contains a simple property for each input argument received from the command line.

In order to see these four steps in detail with an example, let’s pretend you’re creating a program named myls.py that lists the files contained in the current directory. Here’s a possible implementation of your command line interface without using the Python argparse library:

# myls.py import os import sys  if len(sys.argv) > 2:     print('You have specified too many arguments')     sys.exit()  if len(sys.argv) < 2:     print('You need to specify the path to be listed')     sys.exit()  input_path = sys.argv[1]  if not os.path.isdir(input_path):     print('The path specified does not exist')     sys.exit()  print('\n'.join(os.listdir(input_path))) 

This is a possible implementation of the command line interface for your program that doesn’t use the Python argparse library, but if you try to execute it, then you’ll see that it works:

$   python myls.py You need to specify the path to be listed  $   python myls.py /mnt /proc /dev You have specified too many arguments  $   python myls.py /mnt dir1 dir2 

As you can see, the script does work, but the output is quite different from the output you’d expect from a standard built-in command.

Now, let’s see how the Python argparse library can improve this code:

# myls.py # Import the argparse library import argparse  import os import sys  # Create the parser my_parser = argparse.ArgumentParser(description='List the content of a folder')  # Add the arguments my_parser.add_argument('Path',                        metavar='path',                        type=str,                        help='the path to list')  # Execute the parse_args() method args = my_parser.parse_args()  input_path = args.Path  if not os.path.isdir(input_path):     print('The path specified does not exist')     sys.exit()  print('\n'.join(os.listdir(input_path))) 

The code has changed a lot with the introduction of the Python argparse library.

The first big difference compared to the previous version is that the if statements to check the arguments provided by the user are gone because the library will check the presence of the arguments for us.

We’ve imported the Python argparse library, created a simple parser with a brief description of the program’s goal, and defined the positional argument we want to get from the user. Lastly, we have executed .parse_args() to parse the input arguments and get a Namespace object that contains the user input.

Now, if you run this code, you’ll see that with just four lines of code. You have a very different output:

$   python myls.py usage: myls.py [-h] path myls.py: error: the following arguments are required: path 

As you can see, the program has detected that you needed at least a positional argument (path), and so the execution of the program has been interrupted with a specific error message.

You may also have noticed that now your program accepts an optional -h flag, like in the example below:

$   python myls.py -h usage: myls.py [-h] path  List the content of a folder  positional arguments: path        the path to list  optional arguments: -h, --help  show this help message and exit 

Good, now the program responds to the -h flag, displaying a help message that tells the user how to use the program. Isn’t that neat, considering that you didn’t even need to ask for that feature?

Lastly, with just four lines of code, now the args variable is a Namespace object, which has a property for each argument that has been gathered from the command line. That’s super convenient.

The Advanced Use of the Python argparse Library

In the previous section, you learned the basic usage of the Python argparse library, and now you can implement a simple command line interfaces for all your programs. However, there’s a lot more that you can achieve with this library. In this section, you’ll see almost everything this library can offer you.

Setting the Name of the Program

By default, the library uses the value of the sys.argv[0] element to set the name of the program, which as you probably already know is the name of the Python script you have executed. However, you can specify the name of your program just by using the prog keyword:

# Create the parser my_parser = argparse.ArgumentParser(prog='myls',                                     description='List the content of a folder') 

With the prog keyword, you specify the name of the program that will be used in the help text:

$   python myls.py usage: myls [-h] path myls.py: error: the following arguments are required: path 

As you can see, now the program name is just myls instead of myls.py.

Displaying a Custom Program Usage Help

By default, the program usage help has a standard format defined by the Python argparse library. However, you can customize it with the usage keyword like this:

# Create the parser my_parser = argparse.ArgumentParser(prog='myls',                                     usage='%(prog)s [options] path',                                     description='List the content of a folder') 

Note that, at runtime, the %(prog)s token is automatically replaced with the name of your program:

$   python myls.py usage: myls [options] path myls: error: too few arguments 

As you can see, the help of the program now shows a different usage string, where the [-h] option has been replaced by a generic [options] token.

Displaying Text Before and After the Arguments Help

To customize the text displayed before and after the arguments help text, you can use two different keywords:

  1. description: for the text that is shown before the help text
  2. epilog: for the text shown after the help text

You’ve already seen the description keyword in the previous chapter, so let’s see an example of how the epilog keyword works:

# Create the parser my_parser = argparse.ArgumentParser(description='List the content of a folder',                                     epilog='Enjoy the program! :)') 

The epilog keyword here has customized the text that will be shown after the standard help text:

$   python myls.py -h usage: myls.py [-h] path  List the content of a folder  positional arguments: path        the path to list  optional arguments: -h, --help  show this help message and exit  Enjoy the program! :) 

Now the output shows the extra text that has been customized by the epilog keyword.

Customizing the Allowed Prefix Chars

Another feature that the Python argparse library offers you is the ability to customize the prefix chars, which are the chars that you can use to pass optional arguments to the command line interface.

By default, the standard prefix char is the dash (-) character, but if you want to use a different character, then you can customize it by using the prefix_chars keyword while defining the parser like this:

# Create the parser my_parser = argparse.ArgumentParser(description='List the content of a folder',                                     epilog='Enjoy the program! :)',                                     prefix_chars='/') 

After the redefinition, the program now supports a completely different prefix char, and the help text has changed accordingly:

$   python myls.py usage: myls.py [/h] path myls.py: error: too few arguments 

As you can see, now your program does not support the -h flag but the /h flag. That’s especially useful when you’re coding for Microsoft Windows because Windows users are used to these prefix chars when working with the command line.

Setting Prefix Chars for Files That Contain Arguments to Be Included

When you are dealing with a very long or complicated command line, it can be a good idea to save the arguments to an external file and ask your program to load arguments from it. The Python argparse library can do this work for you out of the box.

To test this feature, create the following Python program:

# fromfile_example.py import argparse  my_parser = argparse.ArgumentParser(fromfile_prefix_chars='@')  my_parser.add_argument('a',                        help='a first argument')  my_parser.add_argument('b',                        help='a second argument')  my_parser.add_argument('c',                        help='a third argument')  my_parser.add_argument('d',                        help='a fourth argument')  my_parser.add_argument('e',                        help='a fifth argument')  my_parser.add_argument('-v',                        '--verbose',                        action='store_true',                        help='an optional argument')  # Execute parse_args() args = my_parser.parse_args()  print('If you read this line it means that you have provided '       'all the parameters') 

Note that we have used the fromfile_prefix_chars keyword while creating the parser.

Now, if you try to execute your program without passing any arguments, then you’ll get an error message:

$   python fromfile_example.py usage: fromfile_example.py [-h] [-v] a b c d e fromfile_example.py: error: the following arguments are required: a, b, c, d, e 

Here you can see that the Python argparse library is complaining because you have not provided enough arguments.

So let’s create a file named args.txt that contains all the necessary parameters, with an argument on each line like this:

first second third fourth fifth 

Now that you have specified a prefix char to get arguments from an external file, open a terminal and try to execute the previous program:

$   python fromfile_example.py @args.txt If you read this line it means that you have provided all the parameters 

In this example, you can see that argparse has read the arguments from the args.txt file.

Allowing or Disallowing Abbreviations

One of the features that the Python argparse library provides out of the box is the ability to handle abbreviations. Consider the following program, which prints out the value you specify on the command line interface for the --input argument:

# abbrev_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('--input', action='store', type=int, required=True) my_parser.add_argument('--id', action='store', type=int)  args = my_parser.parse_args()  print(args.input) 

This program prints out the value you specify for the --input argument. We haven’t looked at the optional arguments yet, but don’t worry, we will discuss them in depth in just a moment. For now, just consider this argument like any other positional argument we already saw, with the difference that the name starts with a couple of dashes.

Now let’s see how the Python argparse library can handle abbreviations, by calling our program multiple times, specifying a different abbreviation of the input argument at each run:

$   python abbrev_example.py --input 42 42  $   python abbrev_example.py --inpu 42 42  $   python abbrev_example.py --inp 42 42  $   python abbrev_example.py --in 42 42 

As you can see, the optional parameters can always be shortened unless the abbreviation can lead to an incorrect interpretation. But what happens if you try to execute the program specifying just --i 42? In this case, argparse doesn’t know if you want to pass the value 42 to the --input argument or to the --id argument, so it exits the program with a specific error message:

$   python abbrev_example.py --i 42 usage: abbrev_example.py [-h] --input INPUT [--id ID] abbrev_example.py: error: ambiguous option: --i could match --input, --id 

However, if you don’t like this behavior, and you want to force your users to specify the full name of the options they use, then you can just disable this feature with the keyword allow_abbrev set to False during the creation of the parser:

# abbrev_example.py import argparse  my_parser = argparse.ArgumentParser(allow_abbrev=False) my_parser.add_argument('--input', action='store', type=int, required=True)  args = my_parser.parse_args()  print(args.input) 

Now, if you try the code above, you’ll see that the abbreviations are no longer permitted:

$   python abbrev_example.py --inp 42 usage: abbrev_example.py [-h] --input INPUT abbrev_example.py: error: the following arguments are required: --input 

The error message tells the user that the --input parameter has not been specified because the --inp abbreviation has not been recognized.

Using Auto Help

In some of the previous examples, you used the -h flag to get a help text. This is a very convenient feature that the Python argparse library allows you to use without having to code anything. However, sometimes you may want to disable this feature. To do that, just use the add_help keyword when creating the parser:

# Create the parser my_parser = argparse.ArgumentParser(description='List the content of a folder',                                     add_help=False) 

The code in the example above specifies the add_help keyword set to False, so now if you run the code, you’ll see that the -h flag isn’t accepted anymore:

$   myls.py usage: myls.py path myls.py: error: the following arguments are required: path 

As you can see, the -h flag is no longer shown or accepted.

Setting the Name or Flags of the Arguments

There are basically two different types of arguments that you can add to your command line interface:

  1. Positional arguments
  2. Optional arguments

Positional arguments are the ones your command needs to operate.

In the previous example, the argument path was a positional argument, and our program couldn’t work without it. They are called positional because their position defines their function.

For example, consider the cp command on Linux (or the copy command in Windows). Here’s the standard usage:

$   cp [OPTION]... [-T] SOURCE DEST 

The first positional argument after the cp command is the source of the file you’re going to copy. The second one is the destination where you want to copy it.

Optional arguments are not mandatory, and when they are used they can modify the behavior of the command at runtime. In the cp example, an optional argument is, for example, the -r flag, which makes the command copy directories recursively.

Syntactically, the difference between positional and optional arguments is that optional arguments start with - or --, while positional arguments don’t.

To add an optional argument, you just need to call .add_argument() again and name the new argument with a starting -.

For example, try to modify the myls.py like this:

# myls.py # Import the argparse library import argparse  import os import sys  # Create the parser my_parser = argparse.ArgumentParser(description='List the content of a folder')  # Add the arguments my_parser.add_argument('Path',                        metavar='path',                        type=str,                        help='the path to list') my_parser.add_argument('-l',                        '--long',                        action='store_true',                        help='enable the long listing format')  # Execute parse_args() args = my_parser.parse_args()  input_path = args.Path  if not os.path.isdir(input_path):     print('The path specified does not exist')     sys.exit()  print('\n'.join(os.listdir(input_path))) 

Now, try to execute this program to see if the new -l option is accepted:

$   python myls.py -h usage: myls.py [-h] [-l] path  List the content of a folder  positional arguments: path        the path to list  optional arguments: -h, --help  show this help message and exit -l, --long  enable the long listing format 

As you can see, now the program also accepts (but doesn’t require) the -l option, which allows the user to get a long listing format for the directory content.

Setting the Action to Be Taken for an Argument

When you add an optional argument to your command line interface, you can also define what kind of action to take when the argument is specified. This means that you usually need to specify how to store the value to the Namespace object you will get when .parse_args() is executed.

There are several actions that are already defined and ready to be used. Let’s analyze them in detail:

  • store stores the input value to the Namespace object. (This is the default action.)
  • store_const stores a constant value when the corresponding optional arguments are specified.
  • store_true stores the Boolean value True when the corresponding optional argument is specified and stores a False elsewhere.
  • store_false stores the Boolean value False when the corresponding optional argument is specified and stores True elsewhere.
  • append stores a list, appending a value to the list each time the option is provided.
  • append_const stores a list appending a constant value to the list each time the option is provided.
  • count stores an int that is equal to the times the option has been provided.
  • help shows a help text and exits.
  • version shows the version of the program and exits.

Let’s create an example to test all the actions we have seen so far:

# actions_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.version = '1.0' my_parser.add_argument('-a', action='store') my_parser.add_argument('-b', action='store_const', const=42) my_parser.add_argument('-c', action='store_true') my_parser.add_argument('-d', action='store_false') my_parser.add_argument('-e', action='append') my_parser.add_argument('-f', action='append_const', const=42) my_parser.add_argument('-g', action='count') my_parser.add_argument('-i', action='help') my_parser.add_argument('-j', action='version')  args = my_parser.parse_args()  print(vars(args)) 

This script accepts an optional argument for each type of action discussed and then prints the value of the arguments read from the command line. Test it by executing this example:

$   python actions_example.py {'a': None, 'b': None, 'c': False, 'd': True, 'e': None, 'f': None, 'g': None} 

As you can see, if we do not specify any arguments, then the default values are generally None, at least for the actions that do not store a Boolean value.

The use of the store action, instead, stores the value we pass without any further consideration:

$   python actions_example.py -a 42 {'a': '42', 'b': None, 'c': False, 'd': True, 'e': None, 'f': None, 'g': None}  $   python actions_example.py -a "test" {'a': 'test', 'b': None, 'c': False, 'd': True, 'e': None, 'f': None, 'g': None} 

The store_const action, stores the defined const when the arguments are provided. In our test, we provided just the b argument and the value of args.b is now 42:

$   python actions_example.py -b {'a': None, 'b': 42, 'c': False, 'd': True, 'e': None, 'f': None, 'g': None} 

The store_true action stores a True Boolean when the argument is passed and store a False Boolean elsewhere. If you need the opposite behavior, just use the store_false action:

$   python actions_example.py {'a': None, 'b': None, 'c': False, 'd': True, 'e': None, 'f': None, 'g': None} $   python actions_example.py -c {'a': None, 'b': None, 'c': True, 'd': True, 'e': None, 'f': None, 'g': None} $   python actions_example.py -d {'a': None, 'b': None, 'c': False, 'd': False, 'e': None, 'f': None, 'g': None} 

The append action lets you create a list of all the values passed to the CLI with the same argument:

$   python actions_example.py -e me -e you -e us {'a': None, 'b': None, 'c': False, 'd': True, 'e': ['me', 'you', 'us'], 'f': None, 'g': None} 

The append_const action is similar to the append one, but it always appends the same constant value:

$   python actions_example.py -f -f {'a': None, 'b': None, 'c': False, 'd': True, 'e': None, 'f': [42, 42], 'g': None} 

The count action counts how many time an argument is passed. It’s quite useful when you want to implement a verbosity level for your program, since you can define a level where -v is less verbose than -vvv:

$   python actions_example.py -ggg {'a': None, 'b': None, 'c': False, 'd': True, 'e': None, 'f': None, 'g': 3} $   python actions_example.py -ggggg {'a': None, 'b': None, 'c': False, 'd': True, 'e': None, 'f': None, 'g': 5} 

The help action is the one you already saw at the beginning of the article. It’s enabled for the -h flag by default, but you can use it for another flag if you want:

$   python actions_example.py -i usage: actions_example.py [-h] [-a A] [-b] [-c] [-d] [-e E] [-f] [-g] [-i] [-j]  optional arguments: -h, --help  show this help message and exit -a A -b -c -d -e E -f -g -i -j          show program's version number and exit 

The version action is the last one you can use. It just shows the version of the program (defined by assigning a value to the .version property of the parser) and then ends the execution of the script:

$   python actions_example.py -j 1.0 

Another possibility you have is to create a custom action. That’s done by subclassing the argparse.Action class and implementing a couple of methods.

Look at the following example, which is a custom store action that is just a little bit more verbose than the standard one:

# custom_action.py import argparse  class VerboseStore(argparse.Action):     def __init__(self, option_strings, dest, nargs=None, **kwargs):         if nargs is not None:             raise ValueError('nargs not allowed')         super(VerboseStore, self).__init__(option_strings, dest, **kwargs)      def __call__(self, parser, namespace, values, option_string=None):         print('Here I am, setting the ' \               'values %r for the %r option...' % (values, option_string))         setattr(namespace, self.dest, values)  my_parser = argparse.ArgumentParser() my_parser.add_argument('-i', '--input', action=VerboseStore, type=int)  args = my_parser.parse_args()  print(vars(args)) 

This example defines a custom action that is just like the store action but a little bit more verbose. Try to execute it to test how it works:

$   python custom_action.py -i 42 Here I am, setting the values 42 for the '-i' option... {'input': 42} 

As you can see, the program has printed out a line before setting the value 42 for the -i parameter.

Setting the Number of Values That Should Be Consumed by the Option

The parser, by default, assumes that you’ll consume a single parameter for each argument, but you can modify this default behavior by specifying a different number of values with the nargs keyword.

For example, if you want to create an optional argument that consumes exactly three values, then you can specify the number 3 as the value for the nargs keyword while adding the parameter to the parser:

# nargs_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('--input', action='store', type=int, nargs=3)  args = my_parser.parse_args()  print(args.input) 

Now, the program accepts three values for the --input parameter:

$   python nargs_example.py --input 42 usage: nargs_example.py [-h] [--input INPUT INPUT INPUT] nargs_example.py: error: argument --input: expected 3 arguments  $   python nargs_example.py --input 42 42 42 [42, 42, 42] 

As you can see, the value of the args.input variable is now a list that contains three values.

However, the nargs keyword can also accept the following:

  • ?: a single value, which can be optional
  • *: a flexible number of values, which will be gathered into a list
  • +: like *, but requiring at least one value
  • argparse.REMAINDER: all the values that are remaining in the command line

So, for example, in the following program, the positional argument input takes a single value when provided, but if the value is not provided, then the one specified by the default keyword is used:

# nargs_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('input',                        action='store',                        nargs='?',                        default='my default value')  args = my_parser.parse_args()  print(args.input) 

Now you can choose to set a specific value for the input argument or not. In this case, the default value will be used:

$   python nargs_example.py 'my custom value' my custom value  $   python nargs_example.py my default value 

To take a flexible number of values and gather them all into a single list, you need to specify the * value for the nargs keyword like this:

# nargs_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('input',                        action='store',                        nargs='*',                        default='my default value')  args = my_parser.parse_args()  print(args.input) 

See how this code allows the user to set a flexible number of values for the expected argument:

$   python nargs_example.py me you us ['me', 'you', 'us']  $   python nargs_example.py my default value 

If you need to take a variable number of values, but you have to be sure that at least one value is specified, then you can use the + value for the nargs keyword like this:

# nargs_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('input', action='store', nargs='+')  args = my_parser.parse_args()  print(args.input) 

In this case, if you execute the program with no positional arguments, then you will receive an explicit error message:

$   python nargs_example.py me you us ['me', 'you', 'us']  $   python nargs_example.py usage: nargs_example.py [-h] input [input ...] nargs_example.py: error: the following arguments are required: input 

Lastly, if you need to grab all the remaining arguments that have been specified on the command line and put them all in a list, then the nargs keyword has to be set to argparse.REMAINDER like this:

# nargs_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('first', action='store') my_parser.add_argument('others', action='store', nargs=argparse.REMAINDER)  args = my_parser.parse_args()  print('first = %r' % args.first) print('others = %r' % args.others) 

Now if you execute this program, you will see that the first value will be associated with the first parameters, while all the other values provided will be associated with the second one:

$   python nargs_example.py me you us first = 'me' others = ['you', 'us'] 

Note how all the remaining values are put in a single list.

Setting a Default Value Produced if the Argument Is Missing

You already know that the user can decide whether or not to specify optional arguments in the command line. When arguments are not specified, the corresponding value is generally set to None.

However, it is possible to define a default value for an argument when it’s not provided:

# default_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-a', action='store', default='42')  args = my_parser.parse_args()  print(vars(args)) 

If you execute this example without passing the -a option, then this is the output you get:

$   python default_example.py {'a': '42'} 

You can see that now the option -a is set to 42, even if you didn’t explicitly set the value on the command line.

Setting the Type of the Argument

By default, all the input argument values are treated as if they were strings. However, it’s possible to define the type for the corresponding property of the Namespace object you get after .parse_args() is invoked just by defining it with the type keyword like this:

# type_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-a', action='store', type=int)  args = my_parser.parse_args()  print(vars(args)) 

Specifying the int value for the argument, you are telling argparse that the .a property of your Namespace object has to be an int (instead of a string):

$   python type_example.py -a 42 {'a': 42} 

Besides, now the value of the argument is checked at runtime, and if there’s a problem with the type of the value provided at the command line, then the execution is interrupted with a clear error message:

$   python type_example.py -a "that's a string" usage: type_example.py [-h] [-a A] type_example.py: error: argument -a: invalid int value: "that's a string" 

In this case, the error message is very clear because it states that you were expected to pass an int instead of a string.

Setting a Domain of Allowed Values for a Specific Argument

Another interesting possibility with the Python argparse library creating a domain of allowed values for specific arguments. You can do this by providing a list of accepted values while adding the new option:

# choices_ex.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-a', action='store', choices=['head', 'tail'])  args = my_parser.parse_args() 

Please note that if you are accepting numeric values, then you can even use range() to specify a range of accepted values:

# choices_ex.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-a', action='store', type=int, choices=range(1, 5))  args = my_parser.parse_args()  print(vars(args)) 

In this case, the value provided on the command line will be automatically checked against the range defined:

$   python choices_ex.py -a 4 {'a': 4}  $   python choices_ex.py -a 40 usage: choices_ex.py [-h] [-a {1,2,3,4}] choices_ex.py: error: argument -a: invalid choice: 40 (choose from 1, 2, 3, 4) 

If the input number is outside the defined range, then you’ll get an error message.

Setting Whether the Argument Is Required

If you want to force your user to specify the value for an optional argument, then you can use the required keyword:

# required_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-a',                        action='store',                        choices=['head', 'tail'],                        required=True)  args = my_parser.parse_args()  print(vars(args)) 

If you use the required keyword set to True for an optional argument, then the user will be forced to set a value for that argument:

$   python required_example.py usage: required_example.py [-h] -a {head,tail} required_example.py: error: the following arguments are required: -a  $   python required_example.py -a head {'a': 'head'} 

That said, please bear in mind that requiring an optional argument is usually considered bad practice since the user wouldn’t expect to have to set a value for an argument that should be optional.

Showing a Brief Description of What an Argument Does

A great feature of the Python argparse library is that, by default, you have the ability to ask for help just by adding the -h flag to your command line.

To make it even better, you can add help text to your arguments, so as to give the users even more help when they execute your program with the -h flag:

# help_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-a',                        action='store',                        choices=['head', 'tail'],                        help='set the user choice to head or tail')  args = my_parser.parse_args()  print(vars(args)) 

This example shows you how to define a custom help text for the -a argument, and now the help text will be more clear for the user:

$   python help_example.py -h usage: help_example.py [-h] [-a {head,tail}]  optional arguments: -h, --help      show this help message and exit -a {head, tail}  set the user choice to head or tail 

Defining a help text for all the arguments is a really good idea because it makes the usage of your program more clear to the user.

Defining Mutually Exclusive Groups

Another interesting option you have when working with the Python argparse library is the ability to create a mutually exclusive group for options that cannot coexist in the same command line:

# groups.py import argparse  my_parser = argparse.ArgumentParser() my_group = my_parser.add_mutually_exclusive_group(required=True)  my_group.add_argument('-v', '--verbose', action='store_true') my_group.add_argument('-s', '--silent', action='store_true')  args = my_parser.parse_args()  print(vars(args)) 

You can specify the -v or the -s flags, unless they aren’t on the same command line, and also the help text that argparse provides reflects this constraint:

$   python groups.py -h usage: groups.py [-h] (-v | -s)  optional arguments:   -h, --help     show this help message and exit   -v, --verbose   -s, --silent  $   python groups.py -v -s usage: groups.py [-h] (-v | -s) groups.py: error: argument -s/--silent: not allowed with argument -v/--verbose 

If you specify all the options of a mutually exclusive group on the same command line you will get an error.

Setting the Argument Name in Usage Messages

If an argument accepts an input value, it can be useful to give this value a name that the parser can use to generate the help message, and this can be done by using the metavar keyword. In the following example, you can see how you can use the metavar keyword to specify a name for the value of the -v flag:

# metavar_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-v',                        '--verbosity',                        action='store',                        type=int,                        metavar='LEVEL')  args = my_parser.parse_args()  print(vars(args)) 

Now, if you run your program with the -h flag, the help text assigns the name LEVEL to the value of the -v flag:

$   python metavar_example.py -h usage: metavar_example.py [-h] [-v LEVEL]  optional arguments:   -h, --help            show this help message and exit   -v LEVEL, --verbosity LEVEL 

Please note that, in the help message, the value accepted for the -v flag is now named LEVEL.

Setting the Name of the Attribute to Be Added to the Object Once Parsed

As you have already seen, when you add an argument to the parser, the value of this argument is stored in a property of the Namespace object. This property is named by default as the first argument passed to .add_argument() for the positional argument and as the long option string for optional arguments.

If an option uses dashes (as is fairly common), they will be converted to underscores in the property name:

import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-v',                        '--verbosity-level',                        action='store',                        type=int)  args = my_parser.parse_args() print(args.verbosity_level) 

However, it’s possible to specify the name of this property just by using the keyword dest when you’re adding an argument to the parser:

# dest_example.py import argparse  my_parser = argparse.ArgumentParser() my_parser.add_argument('-v',                        '--verbosity',                        action='store',                        type=int,                        dest='my_verbosity_level')  args = my_parser.parse_args()  print(vars(args)) 

By running this program, you’ll see that now the args variable contains a .my_verbosity_level property, even if by default the name of the property should have been .verbosity:

$   python dest_example.py -v 42 {'my_verbosity_level': 42} 

The default name of this property would have been .verbosity, but since a different name has been specified by the dest keyword, .my_verbosity_level has been used.

Conclusion

Now you know what a command line interface is and how you can create one in Python by using the Python argparse library.

In this article, you’ve learned:

  • What the Python argparse library is, and why it’s important to use it if you need to write command line scripts in Python
  • How to use the Python argparse library to quickly create a simple CLI in Python
  • What the advanced usage of the Python argparse library is

Writing a good command line interface is a good way to create self-explanatory programs and give users a means of interacting with your application.

If you still have questions, don’t hesitate to reach out in the comment section below and take a look at the official documentation and the Tshepang Lekhonkhobe tutorial that is part of the official Python 3 HOWTO documentation.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Planet Python