Managed Databases Connection Pools and PostgreSQL Benchmarking Using pgbench

Introduction

DigitalOcean Managed Databases allows you to scale your PostgreSQL database using several methods. One such method is a built-in connection pooler that allows you to efficiently handle large numbers of client connections and reduce the CPU and memory footprint of these open connections. By using a connection pool and sharing a fixed set of recyclable connections, you can handle significantly more concurrent client connections, and squeeze extra performance out of your PostgreSQL database.

In this tutorial we’ll use pgbench, PostgreSQL’s built-in benchmarking tool, to run load tests on a DigitalOcean Managed PostgreSQL Database. We’ll dive in to connection pools, describe how they work, and show how to create one using the Cloud Control panel. Finally, using results from the pgbench tests, we’ll demonstrate how using a connection pool can be an inexpensive method of increasing database throughput.

Prerequisites

To complete this tutorial, you’ll need:

  • A DigitalOcean Managed PostgreSQL Database cluster. To learn how to provision and configure a DigitalOcean PostgreSQL cluster, consult the Managed Database product documentation.
  • A client machine with PostgreSQL installed. By default, your PostgreSQL installation will contain the pgbench benchmarking utility and the psql client, both of which we’ll use in this guide. Consult How To Install and Use PostgreSQL on Ubuntu 18.04 to learn how to Install PostgreSQL. If you’re not running Ubuntu on your client machine, you can use the version finder to find the appropriate tutorial.

Once you have a DigitalOcean PostgreSQL cluster up and running and a client machine with pgbench installed, you’re ready to begin with this guide.

Step 1 — Creating and Initializing benchmark Database

Before we create a connection pool for our database, we’ll first create the benchmark database on our PostgreSQL cluster and populate it with some dummy data on which pgbench will run its tests. The pgbench utility repeatedly runs a series of five SQL commands (consisting of SELECT, UPDATE, and INSERT queries) in a transaction, using multiple threads and clients, and calculates a useful performance metric called Transactions per Second (TPS). TPS is a measure of database throughput, counting the number of atomic transactions processed by the database in one second. To learn more about the specific commands executed by pgbench, consult What is the “Transaction” Actually Performed in pgbench? from the official pgbench documentation.

Let’s begin by connecting to our PostgreSQL cluster and creating the benchmark database.

First, retrieve your cluster’s Connection Details by navigating to Databases and locating your PostgreSQL cluster. Click into your cluster. You should see a cluster overview page containing the following Connection Details box:

PostgreSQL Cluster Connection Details

From this, we can parse the following config variables:

  • Admin user: doadmin
  • Admin password: your_password
  • Cluster endpoint: dbaas-test-do-user-3587522-0.db.ondigitalocean.com
  • Connection port: 25060
  • Database to connect to: defaultdb
  • SSL Mode: require (use an SSL-encrypted connection for increased security)

Take note of these parameters, as you’ll need them when using both the psql client and pgbench tool.

Click on the dropdown above this box and select Connection String. We’ll copy this string and pass it in to psql to connect to this PostgreSQL node.

Connect to your cluster using psql and the connection string you just copied:

  • psql postgresql://doadmin:your_password@your_cluster_endpoint:25060/defaultdb?sslmode=require

You should see the following PostgreSQL client prompt, indicating that you’ve connected to your PostgreSQL cluster successfully:

Output
psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. defaultdb=>

From here, create the benchmark database:

  • CREATE DATABASE benchmark;

You should see the following output:

Output
CREATE DATABASE

Now, disconnect from the cluster:

  • \q

Before we run the pgbench tests, we need to populate this benchmark database with some tables and dummy data required to run the tests.

To do this, we’ll run pgbench with the following flags:

  • -h: The PostgreSQL cluster endpoint
  • -p: The PostgreSQL cluster connection port
  • -U: The database username
  • -i: Indicates that we’d like to initialize the benchmark database with benchmarking tables and their dummy data.
  • -s : Set a scale factor of 150, which will multiply table sizes by 150. The default scale factor of 1 results in tables of the following sizes:

    table                   # of rows --------------------------------- pgbench_branches        1 pgbench_tellers         10 pgbench_accounts        100000 pgbench_history         0 

    Using a scale factor of 150, the pgbench_accounts table will contain 15,000,000 rows.

    Note: To avoid excessive blocked transactions, be sure to set the scale factor to a value at least as large as the number of concurrent clients you intend to test with. In this tutorial we’ll test with 150 clients at most, so we set -s to 150 here. To learn more, consult these recommended practices from the official pgbench documentation.

Run the complete pgbench command:

  • pgbench -h your_cluster_endpoint -p 25060 -U doadmin -i -s 150 benchmark

After running this command, you will be prompted to enter the password for the database user you specified. Enter the password, and hit ENTER.

You should see the following output:

Output
dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data... 100000 of 15000000 tuples (0%) done (elapsed 0.19 s, remaining 27.93 s) 200000 of 15000000 tuples (1%) done (elapsed 0.85 s, remaining 62.62 s) 300000 of 15000000 tuples (2%) done (elapsed 1.21 s, remaining 59.23 s) 400000 of 15000000 tuples (2%) done (elapsed 1.63 s, remaining 59.44 s) 500000 of 15000000 tuples (3%) done (elapsed 2.05 s, remaining 59.51 s) . . . 14700000 of 15000000 tuples (98%) done (elapsed 70.87 s, remaining 1.45 s) 14800000 of 15000000 tuples (98%) done (elapsed 71.39 s, remaining 0.96 s) 14900000 of 15000000 tuples (99%) done (elapsed 71.91 s, remaining 0.48 s) 15000000 of 15000000 tuples (100%) done (elapsed 72.42 s, remaining 0.00 s) vacuuming... creating primary keys... done.

At this point, we’ve created a benchmarking database, populated with the tables and data required to run the pgbench tests. We can now move on to running a baseline test which we’ll use to compare performance before and after connection pooling is enabled.

Step 2 — Running a Baseline pgbench Test

Before we run our first benchmark, it’s worth diving into what we’re trying to optimize with connection pools.

Typically when a client connects to a PostgreSQL database, the main PostgreSQL OS process forks itself into a child process corresponding to this new connection. When there are only a few connections, this rarely presents an issue. However, as clients and connections scale, the CPU and memory overhead of creating and maintaining these connections begins to add up, especially if the application in question does not efficiently use database connections. In addition, the max_connections PostgreSQL setting may limit the number of client connections allowed, resulting in additional connections being refused or dropped.

A connection pool keeps open a fixed number of database connections, the pool size, which it then uses to distribute and execute client requests. This means that you can accommodate far more simultaneous connections, efficiently deal with idle or stagnant clients, as well as queue up client requests during traffic spikes instead of rejecting them. By recycling connections, you can more efficiently use your machine’s resources in an environment where there is a heavy connection volume, and squeeze extra performance out of your database.

A connection pool can be implemented either on the application side or as middleware between the database and your application. The Managed Databases connection pooler is built on top of pgBouncer, a lightweight, open-source middleware connection pooler for PostgreSQL. Its interface is available via the Cloud Control Panel UI.

Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool. You should see the following configuration window:

Connection Pools Config Window

Here, you can configure the following fields:

  • Pool Name: A unique name for your connection pool
  • Database: The database for which you’d like to pool connections
  • User: The PostgreSQL user the connection pool will authenticate as
  • Mode: One of Session, Transaction, or Statement. This option controls how long the pool assigns a backend connection to a client.
    • Session: The client holds on to the connection until it explicitly disconnects.
    • Transaction: The client obtains the connection until it completes a transaction, after which the connection is returned to the pool.
    • Statement: The pool aggressively recycles connections after each client statement. In statement mode, multi-statement transactions are not allowed. To learn more, consult the Connection Pools product documentation.
  • Pool Size: The number of connections the connection pool will keep open between itself and the database.

Before we create a connection pool, we’ll run a baseline test to which we can compare database performance with connection pooling.

In this tutorial, we’ll use a 4 GB RAM, 2 vCPU, 80 GB Disk, primary node only Managed Database setup. You can scale the benchmark test parameters in this section according to your PostgreSQL cluster specs.

DigitalOcean Managed Database clusters have the PostgreSQL max_connections parameter preset to 25 connections per 1 GB RAM. A 4 GB RAM PostgreSQL node therefore has max_connections set to 100. In addition, for all clusters, 3 connections are reserved for maintenance. So for this 4 GB RAM PostgreSQL cluster, 97 connections are available for connection pooling.

With this in mind, let’s run our first baseline pgbench test.

Log in to your client machine. We’ll run pgbench, specifying the database endpoint, port and user as usual. In addition, we’ll provide the following flags:

  • -c: The number of concurrent clients or database sessions to simulate. We set this to 50 so as to simulate a number of concurrent connections smaller than the max_connections parameter for our PostgreSQL cluster.
  • -j: The number of worker threads pgbench will use to run the benchmark. If you’re using a multi-CPU machine, you can tune this upwards to distribute clients across threads. On a two-core machine, we set this to 2.
  • -P: Display progress and metrics every 60 seconds.
  • -T: Run the benchmark for 600 seconds (10 minutes). To produce consistent, reproducible results, it’s important that you run the benchmark for several minutes, or through one checkpoint cycle.

We’ll also specify that we’d like to run the benchmark against the benchmark database we created and populated earlier.

Run the following complete pgbench command:

  • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 50 -j 2 -P 60 -T 600 benchmark

Hit ENTER and then type in the password for the doadmin user to begin running the test. You should see output similar to the following (results will depend on the specs of your PostgreSQL cluster):

Output
starting vacuum...end. progress: 60.0 s, 157.4 tps, lat 282.988 ms stddev 40.261 progress: 120.0 s, 176.2 tps, lat 283.726 ms stddev 38.722 progress: 180.0 s, 167.4 tps, lat 298.663 ms stddev 238.124 progress: 240.0 s, 178.9 tps, lat 279.564 ms stddev 43.619 progress: 300.0 s, 178.5 tps, lat 280.016 ms stddev 43.235 progress: 360.0 s, 178.8 tps, lat 279.737 ms stddev 43.307 progress: 420.0 s, 179.3 tps, lat 278.837 ms stddev 43.783 progress: 480.0 s, 178.5 tps, lat 280.203 ms stddev 43.921 progress: 540.0 s, 180.0 tps, lat 277.816 ms stddev 43.742 progress: 600.0 s, 178.5 tps, lat 280.044 ms stddev 43.705 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 105256 latency average = 282.039 ms latency stddev = 84.244 ms tps = 175.329321 (including connections establishing) tps = 175.404174 (excluding connections establishing)

Here, we observed that over a 10 minute run with 50 concurrent sessions, we processed 105,256 transactions with a throughput of roughly 175 transactions per second.

Now, let’s run the same test, this time using 150 concurrent clients, a value that is higher than max_connections for this database, to synthetically simulate a mass influx of client connections:

  • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 150 -j 2 -P 60 -T 600 benchmark

You should see output similar to the following:

Output
starting vacuum...end. connection to database "pgbench" failed: FATAL: remaining connection slots are reserved for non-replication superuser connections progress: 60.0 s, 182.6 tps, lat 280.069 ms stddev 42.009 progress: 120.0 s, 253.8 tps, lat 295.612 ms stddev 237.448 progress: 180.0 s, 271.3 tps, lat 276.411 ms stddev 40.643 progress: 240.0 s, 273.0 tps, lat 274.653 ms stddev 40.942 progress: 300.0 s, 272.8 tps, lat 274.977 ms stddev 41.660 progress: 360.0 s, 250.0 tps, lat 300.033 ms stddev 282.712 progress: 420.0 s, 272.1 tps, lat 275.614 ms stddev 42.901 progress: 480.0 s, 261.1 tps, lat 287.226 ms stddev 112.499 progress: 540.0 s, 272.5 tps, lat 275.309 ms stddev 41.740 progress: 600.0 s, 271.2 tps, lat 276.585 ms stddev 41.221 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 154892 latency average = 281.421 ms latency stddev = 125.929 ms tps = 257.994941 (including connections establishing) tps = 258.049251 (excluding connections establishing)

Note the FATAL error, indicating that pgbench hit the 100 connection limit threshold set by max_connections, resulting in a refused connection. The test was still able to complete, with a TPS of roughly 257.

At this point we can investigate how a connection pool could potentially improve our database’s throughput.

Step 3 — Creating and Testing a Connection Pool

In this step we’ll create a connection pool and rerun the previous pgbench test to see if we can improve our database’s throughput.

In general, the max_connections setting and connection pool parameters are tuned in tandem to max out the database’s load. However, because max_connections is abstracted away from the user in DigitalOcean Managed Databases, our main levers here are the connection pool Mode and Size settings.

To begin, let’s create a connection pool in Transaction mode that keeps open all the available backend connections.

Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool.

In the configuration window that appears, fill in the following values:

Connection Pool Configuration Values

Here we name our connection pool test-pool, and use it with the benchmark database. Our database user is doadmin and we set the connection pool to Transaction mode. Recall from earlier that for a managed database cluster with 4GB of RAM, there are 97 available database connections. Accordingly, configure the pool to keep open 97 database connections.

When you’re done, hit Create Pool.

You should now see this pool in the Control Panel:

Connection Pool in Control Panel

Grab its URI by clicking Connection Details. It should look something like the following

postgres://doadmin:password@pool_endpoint:pool_port/test-pool?sslmode=require 

You should notice a different port here, and potentially a different endpoint and database name, corresponding to the pool name test-pool.

Now that we’ve created the test-pool connection pool, we can rerun the pgbench test we ran above.

Rerun pgbench

From your client machine, run the following pgbench command (with 150 concurrent clients), making sure to substitute the highlighted values with those in your connection pool URI:

  • pgbench -h pool_endpoint -p pool_port -U doadmin -c 150 -j 2 -P 60 -T 600 test-pool

Here we once again use 150 concurrent clients, run the test across 2 threads, print progress every 60 seconds, and run the test for 600 seconds. We set the database name to test-pool, the name of the connection pool.

Once the test completes, you should see output similar to the following (note that these results will vary depending on the specs of your database node):

Output
starting vacuum...end. progress: 60.0 s, 240.0 tps, lat 425.251 ms stddev 59.773 progress: 120.0 s, 350.0 tps, lat 428.647 ms stddev 57.084 progress: 180.0 s, 340.3 tps, lat 440.680 ms stddev 313.631 progress: 240.0 s, 364.9 tps, lat 411.083 ms stddev 61.106 progress: 300.0 s, 366.5 tps, lat 409.367 ms stddev 60.165 progress: 360.0 s, 362.5 tps, lat 413.750 ms stddev 59.005 progress: 420.0 s, 359.5 tps, lat 417.292 ms stddev 60.395 progress: 480.0 s, 363.8 tps, lat 412.130 ms stddev 60.361 progress: 540.0 s, 351.6 tps, lat 426.661 ms stddev 62.960 progress: 600.0 s, 344.5 tps, lat 435.516 ms stddev 65.182 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 206768 latency average = 421.719 ms latency stddev = 114.676 ms tps = 344.240797 (including connections establishing) tps = 344.385646 (excluding connections establishing)

Notice here that we were able to increase our database’s throughput from 257 TPS to 344 TPS with 150 concurrent connections (an increase of 33%), and did not run up against the max_connections limit we previously hit without a connection pool. By placing a connection pool in front of the database, we can avoid dropped connections and significantly increase database throughput in an environment with a large number of simultaneous connections.

If you run this same test, but with a -c value of 50 (specifying a smaller number of clients), the gains from using a connection pool become much less evident:

Output
starting vacuum...end. progress: 60.0 s, 154.0 tps, lat 290.592 ms stddev 35.530 progress: 120.0 s, 162.7 tps, lat 307.168 ms stddev 241.003 progress: 180.0 s, 172.0 tps, lat 290.678 ms stddev 36.225 progress: 240.0 s, 172.4 tps, lat 290.169 ms stddev 37.603 progress: 300.0 s, 177.8 tps, lat 281.214 ms stddev 35.365 progress: 360.0 s, 177.7 tps, lat 281.402 ms stddev 35.227 progress: 420.0 s, 174.5 tps, lat 286.404 ms stddev 34.797 progress: 480.0 s, 176.1 tps, lat 284.107 ms stddev 36.540 progress: 540.0 s, 173.1 tps, lat 288.771 ms stddev 38.059 progress: 600.0 s, 174.5 tps, lat 286.508 ms stddev 59.941 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 102938 latency average = 288.509 ms latency stddev = 83.503 ms tps = 171.482966 (including connections establishing) tps = 171.553434 (excluding connections establishing)

Here we see that we were not able to increase throughput by using a connection pool. Our throughput went down to 171 TPS from 175 TPS.

Although in this guide we use pgbench with its built-in benchmark data set, the best test for determining whether or not to use a connection pool is a benchmark load that accurately represents production load on your database, against production data. Creating custom benchmarking scripts and data is beyond the scope of this guide, but to learn more, consult the official pgbench documentation.

Note: The pool size setting is highly workload-specific. In this guide, we configured the connection pool to use all the available backend database connections. This was because throughout our benchmark, the database rarely reached full utilization (you can monitor database load from the Metrics tab in the Cloud Control Panel). Depending on your database’s load, this may not be the optimal setting. If you notice that your database is constantly fully saturated, shrinking the connection pool may increase throughput and improve performance by queuing additional requests instead of trying to execute them all at the same time on an already loaded server.

Conclusion

DigitalOcean Managed Databases connection pooling is a powerful feature that can help you quickly squeeze extra performance out of your database. Along with other techniques like replication, caching, and sharding, connection pooling can help you scale your database layer to process an even greater volume of requests.

In this guide we focused on a simplistic and synthetic testing scenario using PostgreSQL’s built-in pgbench benchmarking tool and its default benchmark test. In any production scenario, you should run benchmarks against actual production data while simulating production load. This will allow you to tune your database for your particular usage pattern.

Along with pgbench, other tools exist to benchmark and load your database. One such tool developed by Percona is sysbench-tpcc. Another is Apache’s JMeter, which can load test databases as well as web applications.

To learn more about DigitalOcean Managed Databases, consult the Managed Databases product documentation. To learn more about sharding, another useful scaling technique, consult Understanding Database Sharding.

References

DigitalOcean Community Tutorials

Davide Moro: High quality automated docker hub push using Github, TravisCI and pyup for Python tool distributions

Let’s say you want to distribute a Python tool with docker using known good dependency versions ready to be used by end users… In this article you will see how to continuously keeping up to date a Docker Hub container with minimal managing effort (because I’m a lazy guy) using github, TravisCI and pyup.

The goal was to reduce as much as possible any manual activity for updates, check all works fine before pushing, minimize build times and keep docker container always secure and updated with a final high quality confidence.

As an example let’s see what happens under the hood behind every pytest-play Docker Hub update on the official container https://cloud.docker.com/u/davidemoro/repository/docker/davidemoro/pytest-play (by the way if you are a pytest-play user: did you know that you can use Docker for running pytest-play and that there is a docker container ready to be used on Docker Hub? See a complete and working example here https://davidemoro.blogspot.com/2019/02/api-rest-testing-pytest-play-yaml-chuck-norris.html)

Repositories

The docker build/publish stuff lives on another repository, so https://github.com/davidemoro/pytest-play-docker is the repository that implements the Docker releasing workflow for https://github.com/pytest-dev/pytest-play on Docker Hub (https://hub.docker.com/r/davidemoro/pytest-play).

Workflow

This is the highly automated workflow at this time of writing for the pytest-play publishing on Docker Hub:

All tests executions run against the docker build so there is a warranty that what is pushed to Docker Hub works fine (it doesn’t check only that the build was successful but it runs integration tests against the docker build), so no versions incompatibilities, no integration issues between all the integrated third party pytest-play plugins and no issues due to the operative system integration (e.g., I recently experienced an issue on alpine linux with a pip install psycopg2-binary that apparently worked fine but if you try to import psycopg2 inside your code you get an unexpected import error due to a recent issue reported here https://github.com/psycopg/psycopg2/issues/684).

So now every time you run a command like the following one (see a complete and working example here https://davidemoro.blogspot.com/2019/02/api-rest-testing-pytest-play-yaml-chuck-norris.html):

docker run –rm -v $ (pwd):/src davidemoro/pytest-play

you know what was the workflow for every automated docker push for pytest-play.

Acknowledgements

Many thanks to Andrea Ratto for the 10 minutes travis build speedup due to Docker cache, from ~11 minutes to ~1 minute is a huge improvement indeed! It was possible thanks to the docker pull davidemoro/pytest-play command, the build with the –cache-from davidemoro/pytest-play option and running the longest steps in a separate and cacheable step (e.g., the very very long cassandra-driver compilation moved to requirements_cassandra.txt will be executed only if necessary).

Relevant technical details about pytest-play-docker follows (some minor optimizations are still possible saving in terms of final size).

pytest-play-docker/.travis.yml

sudo: required
services:
– docker
– …

env:
  global:
  – IMAGE_NAME=davidemoro/pytest-play
  – secure: …
before_script:
– …

script:
– travis_wait docker pull python:3.7
– travis_wait docker pull “$ IMAGE_NAME:latest”
– travis_wait 25 docker build –cache-from “$ IMAGE_NAME:latest” -t “$ IMAGE_NAME” .
– docker run -i –rm -v $ (pwd)/tests:/src –network host -v /var/run/mysqld/mysqld.sock:/var/run/mysqld/mysqld.sock $ IMAGE_NAME –splinter-webdriver=remote
  –splinter-remote-url=$ REMOTE_URL
deploy:
  provider: script
  script: bash docker_push
  on:
    branch: master

pytest-play-docker/docker_push

#!/bin/bash
echo “$ DOCKER_PASSWORD” | docker login -u “$ DOCKER_USERNAME” –password-stdin
docker tag “$ IMAGE_NAME” “$ IMAGE_NAME:$ TRAVIS_COMMIT”
docker tag “$ IMAGE_NAME” “$ IMAGE_NAME:latest”
docker push “$ IMAGE_NAME”:”$ TRAVIS_COMMIT”
docker push “$ IMAGE_NAME”:latest

Feedback

Any feedback will be always appreciated.

Do you like the Docker hub push process for pytest-play? Let me know becoming a pytest-play stargazer! Star
Planet Python

How To Implement Continuous Testing of Ansible Roles Using Molecule and Travis CI on Ubuntu 18.04

The author selected the Mozilla Foundation to receive a donation as part of the Write for DOnations program.

Introduction

Ansible is an agentless configuration management tool that uses YAML templates to define a list of tasks to be performed on hosts. In Ansible, roles are a collection of variables, tasks, files, templates and modules that are used together to perform a singular, complex function.

Molecule is a tool for performing automated testing of Ansible roles, specifically designed to support the development of consistently well-written and maintained roles. Molecule’s unit tests allow developers to test roles simultaneously against multiple environments and under different parameters. It’s important that developers continuously run tests against code that often changes; this workflow ensures that roles continue to work as you update code libraries. Running Molecule using a continuous integration tool, like Travis CI, allows for tests to run continuously, ensuring that contributions to your code do not introduce breaking changes.

In this tutorial, you will use a pre-made base role that installs and configures an Apache web server and a firewall on Ubuntu and CentOS servers. Then, you will initialize a Molecule scenario in that role to create tests and ensure that the role performs as intended in your target environments. After configuring Molecule, you will use Travis CI to continuously test your newly created role. Every time a change is made to your code, Travis CI will run molecule test to make sure that the role still performs correctly.

Prerequisites

Before you begin this tutorial, you will need:

Step 1 — Forking the Base Role Repository

You will be using a pre-made role called ansible-apache that installs Apache and configures a firewall on Debian- and Red Hat-based distributions. You will fork and use this role as a base and then build Molecule tests on top of it. Forking allows you to create a copy of a repository so you can make changes to it without tampering with the original project.

Start by creating a fork of the ansible-apache role. Go to the ansible-apache repository and click on the Fork button.

Once you have forked the repository, GitHub will lead you to your fork’s page. This will be a copy of the base repository, but on your own account.

Click on the green Clone or Download button and you’ll see a box with Clone with HTTPS.

Copy the URL shown for your repository. You’ll use this in the next step. The URL will be similar to this:

https://github.com/username/ansible-apache.git 

You will replace username with your GitHub username.

With your fork set up, you will clone it on your server and begin preparing your role in the next section.

Step 2 — Preparing Your Role

Having followed Step 1 of the prerequisite How To Test Ansible Roles with Molecule on Ubuntu 18.04, you will have Molecule and Ansible installed in a virtual environment. You will use this virtual environment for developing your new role.

First, activate the virtual environment you created while following the prerequisites by running:

  • source my_env/bin/activate

Run the following command to clone the repository using the URL you just copied in Step 1:

  • git clone https://github.com/username/ansible-apache.git

Your output will look similar to the following:

Output
Cloning into 'ansible-apache'... remote: Enumerating objects: 16, done. remote: Total 16 (delta 0), reused 0 (delta 0), pack-reused 16 Unpacking objects: 100% (16/16), done.

Move into the newly created directory:

  • cd ansible-apache

The base role you’ve downloaded performs the following tasks:

  • Includes variables: The role starts by including all the required variables according to the distribution of the host. Ansible uses variables to handle the disparities between different systems. Since you are using Ubuntu 18.04 and CentOS 7 as hosts, the role will recognize that the OS families are Debian and Red Hat respectively and include variables from vars/Debian.yml and vars/RedHat.yml.

  • Includes distribution-relevant tasks: These tasks include tasks/install-Debian.yml and tasks/install-RedHat.yml. Depending on the specified distribution, it installs the relevant packages. For Ubuntu, these packages are apache2 and ufw. For CentOS, these packages are httpd and firewalld.

  • Ensures latest index.html is present: This task copies over a template templates/index.html.j2 that Apache will use as the web server’s home page.

  • Starts relevant services and enables them on boot: Starts and enables the required services installed as part of the first task. For CentOS, these services are httpd and firewalld, and for Ubuntu, they are apache2 and ufw.

  • Configures firewall to allow traffic: This includes either tasks/configure-Debian-firewall.yml or tasks/configure-RedHat-firewall.yml. Ansible configures either Firewalld or UFW as the firewall and whitelists the http service.

Now that you have an understanding of how this role works, you will configure Molecule to test it. You will write test cases for these tasks that cover the changes they make.

Step 3 — Writing Your Tests

To check that your base role performs its tasks as intended, you will start a Molecule scenario, specify your target environments, and create three custom test files.

Begin by initializing a Molecule scenario for this role using the following command:

  • molecule init scenario -r ansible-apache

You will see the following output:

Output
--> Initializing new scenario default... Initialized scenario in /home/sammy/ansible-apache/molecule/default successfully.

You will add CentOS and Ubuntu as your target environments by including them as platforms in your Molecule configuration file. To do this, edit the molecule.yml file using a text editor:

  • nano molecule/default/molecule.yml

Add the following highlighted content to the Molecule configuration:

~/ansible-apache/molecule/default/molecule.yml
--- dependency:   name: galaxy driver:   name: docker lint:   name: yamllint platforms:   - name: centos7     image: milcom/centos7-systemd     privileged: true   - name: ubuntu18     image: solita/ubuntu-systemd     command: /sbin/init     privileged: true     volumes:       - /lib/modules:/lib/modules:ro provisioner:   name: ansible   lint:     name: ansible-lint scenario:   name: default verifier:   name: testinfra   lint:     name: flake8 

Here, you’re specifying two target platforms that are launched in privileged mode since you’re working with systemd services:

  • centos7 is the first platform and uses the milcom/centos7-systemd image.
  • ubuntu18 is the second platform and uses the solita/ubuntu-systemd image. In addition to using privileged mode and mounting the required kernel modules, you’re running /sbin/init on launch to make sure iptables is up and running.

Save and exit the file.

For more information on running privileged containers visit the official Molecule documentation.

Instead of using the default Molecule test file, you will be creating three custom test files, one for each target platform, and one file for writing tests that are common between all platforms. Start by deleting the scenario’s default test file test_default.py with the following command:

  • rm molecule/default/tests/test_default.py

You can now move on to creating the three custom test files, test_common.py, test_Debian.py, and test_RedHat.py for each of your target platforms.

The first test file, test_common.py, will contain the common tests that each of the hosts will perform. Create and edit the common test file, test_common.py:

  • nano molecule/default/tests/test_common.py

Add the following code to the file:

~/ansible-apache/molecule/default/tests/test_common.py
import os import pytest  import testinfra.utils.ansible_runner  testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(     os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')   @pytest.mark.parametrize('file, content', [   ("/var/www/html/index.html", "Managed by Ansible") ]) def test_files(host, file, content):     file = host.file(file)      assert file.exists     assert file.contains(content) 

In your test_common.py file, you have imported the required libraries. You have also written a test called test_files(), which holds the only common task between distributions that your role performs: copying your template as the web servers homepage.

The next test file, test_Debian.py, holds tests specific to Debian distributions. This test file will specifically target your Ubuntu platform.

Create and edit the Ubuntu test file by running the following command:

  • nano molecule/default/tests/test_Debian.py

You can now import the required libraries and define the ubuntu18 platform as the target host. Add the following code to the start of this file:

~/ansible-apache/molecule/default/tests/test_Debian.py
import os import pytest  import testinfra.utils.ansible_runner  testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(     os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('ubuntu18') 

Then, in the same file, you’ll add test_pkg() test.

Add the following code to the file, which defines the test_pkg() test:

~/ansible-apache/molecule/default/tests/test_Debian.py
... @pytest.mark.parametrize('pkg', [     'apache2',     'ufw' ]) def test_pkg(host, pkg):     package = host.package(pkg)      assert package.is_installed 

This test will check if apache2 and ufw packages are installed on the host.

Note: When adding multiple tests to a Molecule test file, make sure there are two blank lines between each test or you’ll get a syntax error from Molecule.

To define the next test, test_svc(), add the following code under the test_pkg() test in your file:

~/ansible-apache/molecule/default/tests/test_Debian.py
... @pytest.mark.parametrize('svc', [     'apache2',     'ufw' ]) def test_svc(host, svc):     service = host.service(svc)      assert service.is_running     assert service.is_enabled 

test_svc() will check if the apache2 and ufw services are running and enabled.

Finally you will add your last test, test_ufw_rules(), to the test_Debian.py file.

Add this code under the test_svc() test in your file to define test_ufw_rules():

~/ansible-apache/molecule/default/tests/test_Debian.py
... @pytest.mark.parametrize('rule', [     '-A ufw-user-input -p tcp -m tcp --dport 80 -j ACCEPT' ]) def test_ufw_rules(host, rule):     cmd = host.run('iptables -t filter -S')      assert rule in cmd.stdout 

test_ufw_rules() will check that your firewall configuration permits traffic on the port used by the Apache service.

With each of these tests added, your test_Debian.py file will look like this:

~/ansible-apache/molecule/default/tests/test_Debian.py
import os import pytest  import testinfra.utils.ansible_runner  testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(     os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('ubuntu18')   @pytest.mark.parametrize('pkg', [     'apache2',     'ufw' ]) def test_pkg(host, pkg):     package = host.package(pkg)      assert package.is_installed   @pytest.mark.parametrize('svc', [     'apache2',     'ufw' ]) def test_svc(host, svc):     service = host.service(svc)      assert service.is_running     assert service.is_enabled   @pytest.mark.parametrize('rule', [     '-A ufw-user-input -p tcp -m tcp --dport 80 -j ACCEPT' ]) def test_ufw_rules(host, rule):     cmd = host.run('iptables -t filter -S')      assert rule in cmd.stdout 

The test_Debian.py file now includes the three tests: test_pkg(), test_svc(), and test_ufw_rules().

Save and exit test_Debian.py.

Next you’ll create the test_RedHat.py test file, which will contain tests specific to Red Hat distributions to target your CentOS platform.

Create and edit the CentOS test file, test_RedHat.py, by running the following command:

  • nano molecule/default/tests/test_RedHat.py

Similarly to the Ubuntu test file, you will now write three tests to include in your test_RedHat.py file. Before adding the test code, you can import the required libraries and define the centos7 platform as the target host, by adding the following code to the beginning of your file:

~/ansible-apache/molecule/default/tests/test_RedHat.py
import os import pytest  import testinfra.utils.ansible_runner  testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(     os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('centos7') 

Then, add the test_pkg() test, which will check if the httpd and firewalld packages are installed on the host.

Following the code for your library imports, add the test_pkg() test to your file. (Again, remember to include two blank lines before each new test.)

~/ansible-apache/molecule/default/tests/test_RedHat.py
... @pytest.mark.parametrize('pkg', [     'httpd',     'firewalld' ]) def test_pkg(host, pkg):     package = host.package(pkg)        assert package.is_installed 

Now, you can add the test_svc() test to ensure that httpd and firewalld services are running and enabled.

Add the test_svc() code to your file following the test_pkg() test:

~/ansible-apache/molecule/default/tests/test_RedHat.py
... @pytest.mark.parametrize('svc', [     'httpd',     'firewalld' ])   def test_svc(host, svc):     service = host.service(svc)      assert service.is_running     assert service.is_enabled 

The final test in test_RedHat.py file will be test_firewalld(), which will check if Firewalld has the http service whitelisted.

Add the test_firewalld() test to your file after the test_svc() code:

~/ansible-apache/molecule/default/tests/test_RedHat.py
... @pytest.mark.parametrize('file, content', [     ("/etc/firewalld/zones/public.xml", "<service name=\"http\"/>") ]) def test_firewalld(host, file, content):     file = host.file(file)      assert file.exists     assert file.contains(content) 

After importing the libraries and adding the three tests, your test_RedHat.py file will look like this:

~/ansible-apache/molecule/default/tests/test_RedHat.py
import os import pytest  import testinfra.utils.ansible_runner  testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(     os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('centos7')   @pytest.mark.parametrize('pkg', [     'httpd',     'firewalld' ]) def test_pkg(host, pkg):     package = host.package(pkg)      assert package.is_installed   @pytest.mark.parametrize('svc', [     'httpd',     'firewalld' ]) def test_svc(host, svc):     service = host.service(svc)      assert service.is_running     assert service.is_enabled   @pytest.mark.parametrize('file, content', [     ("/etc/firewalld/zones/public.xml", "<service name=\"http\"/>") ]) def test_firewalld(host, file, content):     file = host.file(file)      assert file.exists     assert file.contains(content) 

Now that you’ve completed writing tests in all three files, test_common.py, test_Debian.py, and test_RedHat.py, your role is ready for testing. In the next step, you will use Molecule to run these tests against your newly configured role.

Step 4 — Testing Against Your Role

You will now execute your newly created tests against the base role ansible-apache using Molecule. To run your tests, use the following command:

  • molecule test

You’ll see the following output once Molecule has finished running all the tests:

Output
... --> Scenario: 'default' --> Action: 'verify' --> Executing Testinfra tests found in /home/sammy/ansible-apache/molecule/default/tests/... ============================= test session starts ============================== platform linux -- Python 3.6.7, pytest-4.1.1, py-1.7.0, pluggy-0.8.1 rootdir: /home/sammy/ansible-apache/molecule/default, inifile: plugins: testinfra-1.16.0 collected 12 items tests/test_common.py .. [ 16%] tests/test_RedHat.py ..... [ 58%] tests/test_Debian.py ..... [100%] ========================== 12 passed in 80.70 seconds ========================== Verifier completed successfully.

You’ll see Verifier completed successfully in your output; this means that the verifier executed all of your tests and returned them successfully.

Now that you’ve successfully completed the development of your role, you can commit your changes to Git and set up Travis CI for continuous testing.

Step 5 — Using Git to Share Your Updated Role

In this tutorial, so far, you have cloned a role called ansible-apache and added tests to it to make sure it works against Ubuntu and CentOS hosts. To share your updated role with the public, you must commit these changes and push them to your fork.

Run the following command to add the files and commit the changes you’ve made:

  • git add .

This command will add all the files that you have modified in the current directory to the staging area.

You also need to set your name and email address in the git config in order to commit successfully. You can do so using the following commands:

  • git config user.email "sammy@digitalocean.com"
  • git config user.name "John Doe"

Commit the changed files to your repository:

  • git commit -m "Configured Molecule"

You’ll see the following output:

Output
[master b2d5a5c] Configured Molecule 8 files changed, 155 insertions(+), 1 deletion(-) create mode 100644 molecule/default/Dockerfile.j2 create mode 100644 molecule/default/INSTALL.rst create mode 100644 molecule/default/molecule.yml create mode 100644 molecule/default/playbook.yml create mode 100644 molecule/default/tests/test_Debian.py create mode 100644 molecule/default/tests/test_RedHat.py create mode 100644 molecule/default/tests/test_common.py

This signifies that you have committed your changes successfully. Now, push these changes to your fork with the following command:

  • git push -u origin master

You will see a prompt for your GitHub credentials. After entering these credentials, your code will be pushed to your repository and you’ll see this output:

Output
Counting objects: 13, done. Compressing objects: 100% (12/12), done. Writing objects: 100% (13/13), 2.32 KiB | 2.32 MiB/s, done. Total 13 (delta 3), reused 0 (delta 0) remote: Resolving deltas: 100% (3/3), completed with 2 local objects. To https://github.com/username/ansible-apache.git 009d5d6..e4e6959 master -> master Branch 'master' set up to track remote branch 'master' from 'origin'.

If you go to your fork’s repository at github.com/username/ansible-apache, you’ll see a new commit called Configured Molecule reflecting the changes you made in the files.

Now, you can integrate Travis CI with your new repository so that any changes made to your role will automatically trigger Molecule tests. This will ensure that your role always works with Ubuntu and CentOS hosts.

Step 6 — Integrating Travis CI

In this step, you’re going to integrate Travis CI into your workflow. Once enabled, any changes you push to your fork will trigger a Travis CI build. The purpose of this is to ensure Travis CI always runs molecule test whenever contributors make changes. If any breaking changes are made, Travis will declare the build status as such.

Proceed to Travis CI to enable your repository. Navigate to your profile page where you can click the Activate button for GitHub.

You can find further guidance here on activating repositories in Travis CI.

For Travis CI to work, you must create a configuration file containing instructions for it. To create the Travis configuration file, return to your server and run the following command:

  • nano .travis.yml

To duplicate the environment you’ve created in this tutorial, you will specify parameters in the Travis configuration file. Add the following content to your file:

~/ansible-apache/.travis.yml
--- language: python python:   - "2.7"   - "3.6" services:   - docker install:   - pip install molecule docker script:   - molecule --version   - ansible --version   - molecule test 

The parameters you’ve specified in this file are:

  • language: When you specify Python as the language, the CI environment uses separate virtualenv instances for each Python version you specify under the python key.
  • python: Here, you’re specifying that Travis will use both Python 2.7 and Python 3.6 to run your tests.
  • services: You need Docker to run tests in Molecule. You’re specifying that Travis should ensure Docker is present in your CI environment.
  • install: Here, you’re specifying preliminary installation steps that Travis CI will carry out in your virtualenv.
    • pip install molecule docker to check that Ansible and Molecule are present along with the Python library for the Docker remote API.
  • script: This is to specify the steps that Travis CI needs to carry out. In your file, you’re specifying three steps:
    • molecule --version prints the Molecule version if Molecule has been successfully installed.
    • ansible --version prints the Ansible version if Ansible has been successfully installed.
    • molecule test finally runs your Molecule tests.

The reason you specify molecule --version and ansible --version is to catch errors in case the build fails as a result of ansible or molecule misconfiguration due to versioning.

Once you’ve added the content to the Travis CI configuration file, save and exit .travis.yml.

Now, every time you push any changes to your repository, Travis CI will automatically run a build based on the above configuration file. If any of the commands in the script block fail, Travis CI will report the build status as such.

To make it easier to see the build status, you can add a badge indicating the build status to the README of your role. Open the README.md file using a text editor:

  • nano README.md

Add the following line to the README.md to display the build status:

~/ansible-apache/README.md
[![Build Status](https://travis-ci.org/username/ansible-apache.svg?branch=master)](https://travis-ci.org/username/ansible-apache) 

Replace username with your GitHub username. Commit and push the changes to your repository as you did earlier.

First, run the following command to add .travis.yml and README.md to the staging area:

  • git add .travis.yml README.md

Now commit the changes to your repository by executing:

  • git commit -m "Configured Travis"

Finally, push these changes to your fork with the following command:

  • git push -u origin master

If you navigate over to your GitHub repository, you will see that it initially reports build: unknown.

build-status-unknown

Within a few minutes, Travis will initiate a build that you can monitor at the Travis CI website. Once the build is a success, GitHub will report the status as such on your repository as well — using the badge you’ve placed in your README file:

build-status-passing

You can access the complete details of the builds by going to the Travis CI website:

travis-build-status

Now that you’ve successfully set up Travis CI for your new role, you can continuously test and integrate changes to your Ansible roles.

Conclusion

In this tutorial, you forked a role that installs and configures an Apache web server from GitHub and added integrations for Molecule by writing tests and configuring these tests to work on Docker containers running Ubuntu and CentOS. By pushing your newly created role to GitHub, you have allowed other users to access your role. When there are changes to your role by contributors, Travis CI will automatically run Molecule to test your role.

Once you’re comfortable with the creation of roles and testing them with Molecule, you can integrate this with Ansible Galaxy so that roles are automatically pushed once the build is successful.

DigitalOcean Community Tutorials

Davide Moro: API/REST testing like Chuck Norris with pytest play using YAML

In this article we will see how to write HTTP API tests with pytest using YAML files thanks to pytest-play >= 2.0.0 (pytest-play provides support for Selenium, MQTT, SQL and more. See third party pytest-play plugins).

The guest star is Chuck Norris thanks to the public JSON endpoint available at https://api.chucknorris.io/ so you will be able to run your test by your own following this example.

Obviously this is a joke because Chuck Norris cannot fail so tests are not needed.

Prerequisites and installation

Installation is not needed, the only prerequisite is Docker thanks to https://hub.docker.com/r/davidemoro/pytest-play.

Inside the above link you’ll find the instructions needed for installing Docker for any platform.

If you want to run this example without docker install pytest-play with the external plugin play_requests based on the fantastic requests library (play_requests is already included in docker container).

Project structure

You need:

  • a folder (e.g., chuck-norris-api-test)
  • one or more test_XXX.yml files containing your steps (test_ and .yml extension matter)

For example:

As you can see each scenario will be repeated for any item you provide in test_data structure.

The first example asserts that the categories list contains some values against this endpoint https://api.chucknorris.io/jokes/categories; the second example shows how to search for category (probably Chuck Norris will find you according to this Chuck Norris fact “You don’t find Chuck Norris, Chuck Norris finds you!“)

Alternatively you can checkout this folder:

Usage

Visit the project folder and run the following command line command:

docker run –rm -v $ (pwd):/src davidemoro/pytest-play

You can append extra standard pytest variables like -x, –pdb and so on. See  https://docs.pytest.org/en/latest/

Homeworks

It’s time to show off with a GET roundhouse kick! Ping me on twitter @davidemoro sharing your pytest-play implementation against the random Chuck Norris fact generator by category!

GET https://api.chucknorris.io/jokes/random?category=dev

{
    “category”: [“dev”],
    “icon_url”: “https:\/\/assets.chucknorris.host\/img\/avatar\/chuck-norris.png”,
    “id”: “yrvjrpx3t4qxqmowpyvxbq”,
    “url”: “https:\/\/api.chucknorris.io\/jokes\/yrvjrpx3t4qxqmowpyvxbq”,
    “value”: “Chuck Norris protocol design method has no status, requests or responses, only commands.”
}

Do you like pytest-play?

Let’s get in touch for any suggestion, contribution or comments. Contributions will be very appreciated too!
Star
Planet Python