How To Monitor Your Managed PostgreSQL Database Using Nagios Core on Ubuntu 18.04

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Database monitoring is key to understanding how a database performs over time. It can help you uncover hidden usage problems and bottlenecks happening in your database. Implementing database monitoring systems can quickly turn out to be a long-term advantage, which will positively influence your infrastructure management process. You’ll be able to swiftly react to status changes of your database and will quickly be notified when monitored services return to normal functioning.

Nagios Core is a popular monitoring system that you can use to monitor your managed database. The benefits of using Nagios for this task are its versatility—it’s easy to configure and use—a large repository of available plugins, and most importantly, integrated alerting.

In this tutorial, you will set up PostgreSQL database monitoring in Nagios Core using the check_postgres Nagios plugin and set up Slack-based alerting. In the end, you’ll have a monitoring system in place for your managed PostgreSQL database, and will be notified of status changes of various functionality immediately.

Prerequisites

  • An Ubuntu 18.04 server with root privileges, and a secondary, non-root account. You can set this up by following this initial server setup guide. For this tutorial the non-root user is sammy.

  • Nagios Core installed on your server. To achieve this, complete the first five steps of the How To Install Nagios 4 and Monitor Your Servers on Ubuntu 18.04 tutorial.

  • A DigitalOcean account and a PostgreSQL managed database provisioned from DigitalOcean with connection information available. Make sure that your server’s IP address is on the whitelist. To learn more about DigitalOcean Managed Databases, visit the product docs.

  • A Slack account with full access, added to a workspace where you’ll want to receive status updates.

Step 1 — Installing check_postgres

In this section, you’ll download the latest version of the check_postgres plugin from Github and make it available to Nagios Core. You’ll also install the PostgreSQL client (psql), so that check_postgres will be able to connect to your managed database.

Start off by installing the PostgreSQL client by running the following command:

  • sudo apt install postgresql-client

Next, you’ll download check_postgres to your home directory. First, navigate to it:

  • cd ~

Head over to the Github releases page and copy the link of the latest version of the plugin. At the time of writing, the latest version of check_postgres was 2.24.0; keep in mind that this will update, and where possible it’s best practice to use the latest version.

Now download it using curl:

  • curl -LO https://github.com/bucardo/check_postgres/releases/download/2.24.0/check_postgres-2.24.0.tar.gz

Extract it using the following command:

  • tar xvf check_postgres-*.tar.gz

This will create a directory with the same name as the file you have downloaded. That folder contains the check_postgres executable, which you’ll need to copy to the directory where Nagios stores its plugins (usually /usr/local/nagios/libexec/). Copy it by running the following command:

  • sudo cp check_postgres-*/check_postgres.pl /usr/local/nagios/libexec/

Next, you’ll need to give the nagios user ownership of it, so that it can be run from Nagios:

  • sudo chown nagios:nagios /usr/local/nagios/libexec/check_postgres.pl

check_postgres is now available to Nagios and can be used from it. However, it provides a lot of commands pertaining to different aspects of PostgreSQL, and for better service maintainability, it’s better to break them up so that they can be called separately. You’ll achieve this by creating a symlink to every check_postgres command in the plugin directory.

Navigate to the directory where Nagios stores plugins by running the following command:

  • cd /usr/local/nagios/libexec

Then, create the symlinks with:

  • sudo perl check_postgres.pl --symlinks

The output will look like this:

Output
Created "check_postgres_archive_ready" Created "check_postgres_autovac_freeze" Created "check_postgres_backends" Created "check_postgres_bloat" Created "check_postgres_checkpoint" Created "check_postgres_cluster_id" Created "check_postgres_commitratio" Created "check_postgres_connection" Created "check_postgres_custom_query" Created "check_postgres_database_size" Created "check_postgres_dbstats" Created "check_postgres_disabled_triggers" Created "check_postgres_disk_space" Created "check_postgres_fsm_pages" Created "check_postgres_fsm_relations" Created "check_postgres_hitratio" Created "check_postgres_hot_standby_delay" Created "check_postgres_index_size" Created "check_postgres_indexes_size" Created "check_postgres_last_analyze" Created "check_postgres_last_autoanalyze" Created "check_postgres_last_autovacuum" Created "check_postgres_last_vacuum" Created "check_postgres_listener" Created "check_postgres_locks" Created "check_postgres_logfile" Created "check_postgres_new_version_bc" Created "check_postgres_new_version_box" Created "check_postgres_new_version_cp" Created "check_postgres_new_version_pg" Created "check_postgres_new_version_tnm" Created "check_postgres_pgagent_jobs" Created "check_postgres_pgb_pool_cl_active" Created "check_postgres_pgb_pool_cl_waiting" Created "check_postgres_pgb_pool_maxwait" Created "check_postgres_pgb_pool_sv_active" Created "check_postgres_pgb_pool_sv_idle" Created "check_postgres_pgb_pool_sv_login" Created "check_postgres_pgb_pool_sv_tested" Created "check_postgres_pgb_pool_sv_used" Created "check_postgres_pgbouncer_backends" Created "check_postgres_pgbouncer_checksum" Created "check_postgres_prepared_txns" Created "check_postgres_query_runtime" Created "check_postgres_query_time" Created "check_postgres_relation_size" Created "check_postgres_replicate_row" Created "check_postgres_replication_slots" Created "check_postgres_same_schema" Created "check_postgres_sequence" Created "check_postgres_settings_checksum" Created "check_postgres_slony_status" Created "check_postgres_table_size" Created "check_postgres_timesync" Created "check_postgres_total_relation_size" Created "check_postgres_txn_idle" Created "check_postgres_txn_time" Created "check_postgres_txn_wraparound" Created "check_postgres_version" Created "check_postgres_wal_files"

Perl listed all the functions it created a symlink for. These can now be executed from the command line as usual.

You’ve downloaded and installed the check_postgres plugin. You have also created symlinks to all the commands of the plugin, so that they can be used individually from Nagios. In the next step, you’ll create a connection service file, which check_postgres will use to connect to your managed database.

Step 2 — Configuring Your Database

In this section, you will create a PostgreSQL connection service file containing the connection information of your database. Then, you will test the connection data by invoking check_postgres on it.

The connection service file is by convention called pg_service.conf, and must be located under /etc/postgresql-common/. Create it for editing with your favorite editor (for example, nano):

  • sudo nano /etc/postgresql-common/pg_service.conf

Add the following lines, replacing the highlighted placeholders with the actual values shown in your Managed Database Control Panel under the section Connection Details:

/etc/postgresql-common/pg_service.conf
[managed-db] host=host port=port user=username password=password dbname=defaultdb sslmode=require 

The connection service file can house multiple database connection info groups. The beginning of a group is signaled by putting its name in square brackets. After that comes the connection parameters (host, port, user, password, and so on), separated by new lines, which must be given a value.

Save and close the file when you are finished.

You’ll now test the validity of the configuration by connecting to the database via check_postgres by running the following command:

  • ./check_postgres.pl --dbservice=managed-db --action=connection

Here, you tell check_postgres which database connection info group to use with the parameter --dbservice, and also specify that it should only try to connect to it by specifying connection as the action.

Your output will look similar to this:

Output
POSTGRES_CONNECTION OK: service=managed-db version 11.4 | time=0.10s

This means that check_postgres succeeded in connecting to the database, according to the parameters from pg_service.conf. If you get an error, double check what you have just entered in that config file.

You’ve created and filled out a PostgreSQL connection service file, which works as a connection string. You have also tested the connection data by running check_postgres on it and observing the output. In the next step, you will configure Nagios to monitor various parts of your database.

Step 3 — Creating Monitoring Services in Nagios

Now you will configure Nagios to watch over various metrics of your database by defining a host and multiple services, which will call the check_postgres plugin and its symlinks.

Nagios stores your custom configuration files under /usr/local/nagios/etc/objects. New files you add there must be manually enabled in the central Nagios config file, located at /usr/local/nagios/etc/nagios.cfg. You’ll now define commands, a host, and multiple services, which you’ll use to monitor your managed database in Nagios.

First, create a folder under /usr/local/nagios/etc/objects to store your PostgreSQL related configuration by running the following command:

  • sudo mkdir /usr/local/nagios/etc/objects/postgresql

You’ll store Nagios commands for check_nagios in a file named commands.cfg. Create it for editing:

  • sudo nano /usr/local/nagios/etc/objects/postgresql/commands.cfg

Add the following lines:

/usr/local/nagios/etc/objects/postgresql/commands.cfg
define command {     command_name           check_postgres_connection     command_line           /usr/local/nagios/libexec/check_postgres_connection --dbservice=$  ARG1$   }  define command {     command_name           check_postgres_database_size     command_line           /usr/local/nagios/libexec/check_postgres_database_size --dbservice=$  ARG1$   --critical='$  ARG2$  ' }  define command {     command_name           check_postgres_locks     command_line           /usr/local/nagios/libexec/check_postgres_locks --dbservice=$  ARG1$   }  define command {     command_name           check_postgres_backends     command_line           /usr/local/nagios/libexec/check_postgres_backends --dbservice=$  ARG1$   } 

Save and close the file.

In this file, you define four Nagios commands that call different parts of the check_postgres plugin (checking connectivity, getting the number of locks and connections, and the size of the whole database). They all accept an argument that is passed to the --dbservice parameter, and specify which of the databases defined in pg_service.conf to connect to.

The check_postgres_database_size command accepts a second argument that gets passed to the --critical parameter, which specifies the point at which the database storage is becoming full. Accepted values include 1 KB for a kilobyte, 1 MB for a megabyte, and so on, up to exabytes (EB). A number without a capacity unit is treated as being expressed in bytes.

Now that the necessary commands are defined, you’ll define the host (essentially, the database) and its monitoring services in a file named services.cfg. Create it using your favorite editor:

  • sudo nano /usr/local/nagios/etc/objects/postgresql/services.cfg

Add the following lines, replacing db_max_storage_size with a value pertaining to the available storage of your database. It is recommended to set it to 90 percent of the storage size you have allocated to it:

/usr/local/nagios/etc/objects/postgresql/services.cfg
define host {       use                    linux-server       host_name              postgres       check_command          check_postgres_connection!managed-db }  define service {       use                    generic-service       host_name              postgres       service_description    PostgreSQL Connection       check_command          check_postgres_connection!managed-db       notification_options   w,u,c,r,f,s }  define service {       use                    generic-service       host_name              postgres       service_description    PostgreSQL Database Size       check_command          check_postgres_database_size!managed-db!db_max_storage_size       notification_options   w,u,c,r,f,s }  define service {       use                    generic-service       host_name              postgres       service_description    PostgreSQL Locks       check_command          check_postgres_locks!managed-db       notification_options   w,u,c,r,f,s }  define service {       use                    generic-service       host_name              postgres       service_description    PostgreSQL Backends       check_command          check_postgres_backends!managed-db       notification_options   w,u,c,r,f,s } 

You first define a host, so that Nagios will know what entity the services relate to. Then, you create four services, which call the commands you just defined. Each one passes managed-db as the argument, detailing that the managed-db you defined in Step 2 should be monitored.

Regarding notification options, each service specifies that notifications should be sent out when the service state becomes WARNING, UNKNOWN, CRITICAL, OK (when it recovers from downtime), when the service starts flapping, or when scheduled downtime starts or ends. Without explicitly giving this option a value, no notifications would be sent out (to available contacts) at all, except if triggered manually.

Save and close the file.

Next, you’ll need to explicitly tell Nagios to read config files from this new directory, by editing the general Nagios config file. Open it for editing by running the following command:

  • sudo nano /usr/local/nagios/etc/nagios.cfg

Find this highlighted line in the file:

/usr/local/nagios/etc/nagios.cfg
... # directive as shown below:  cfg_dir=/usr/local/nagios/etc/servers #cfg_dir=/usr/local/nagios/etc/printers ... 

Above it, add the following highlighted line:

/usr/local/nagios/etc/nagios.cfg
... cfg_dir=/usr/local/nagios/etc/objects/postgresql cfg_dir=/usr/local/nagios/etc/servers ... 

Save and close the file. This line tells Nagios to load all config files from the /usr/local/nagios/etc/objects/postgresql directory, where your configuration files are located.

Before restarting Nagios, check the validity of the configuration by running the following command:

  • sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

The end of the output will look similar to this:

Output
Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check

This means that Nagios found no errors in the configuration. If it shows you an error, you’ll also see a hint as to what went wrong, so you’ll be able to fix the error more easily.

To make Nagios reload its configuration, restart its service by running the following command:

  • sudo systemctl restart nagios

You can now navigate to Nagios in your browser. Once it loads, press on the Services option from the left-hand menu. You’ll see the postgres host and a list of services, along with their current statuses:

PostgreSQL Monitoring Services - Pending

They will all soon turn to green and show an OK status. You’ll see the command output under the Status Information column. You can click on the service name and see detailed information about its status and availability.

You’ve added check_postgres commands, a host, and multiple services to your Nagios installation to monitor your database. You’ve also checked that the services are working properly by examining them via the Nagios web interface. In the next step, you will configure Slack-based alerting.

Step 4 — Configuring Slack Alerting

In this section, you will configure Nagios to alert you about events via Slack, by posting them into desired channels in your workspace.

Before you start, log in to your desired workspace on Slack and create two channels where you’ll want to receive status messages from Nagios: one for host, and the other one for service notifications. If you wish, you can create only one channel where you’ll receive both kinds of alerts.

Then, head over to the Nagios app in the Slack App Directory and press on Add Configuration. You’ll see a page for adding the Nagios Integration.

Slack - Add Nagios Integration

Press on Add Nagios Integration. When the page loads, scroll down and take note of the token, because you’ll need it further on.

Slack - Integration Token

You’ll now install and configure the Slack plugin (written in Perl) for Nagios on your server. First, install the required Perl prerequisites by running the following command:

  • sudo apt install libwww-perl libcrypt-ssleay-perl -y

Then, download the plugin to your Nagios plugin directory:

  • sudo curl https://raw.githubusercontent.com/tinyspeck/services-examples/master/nagios.pl -o slack.pl

Make it executable by running the following command:

  • sudo chmod +x slack.pl

Now, you’ll need to edit it to connect to your workspace using the token you got from Slack. Open it for editing:

  • sudo nano slack.pl

Find the following lines in the file:

/usr/local/nagios/libexec/slack.pl
... my $  opt_domain = "foo.slack.com"; # Your team's domain my $  opt_token = "your_token"; # The token from your Nagios services page ... 

Replace foo.slack.com with your workspace domain and your_token with your Nagios app integration token, then save and close the file. The script will now be able to send proper requests to Slack, which you’ll now test by running the following command:

  • ./slack.pl -field slack_channel=#your_channel_name -field HOSTALIAS="Test Host" -field HOSTSTATE="UP" -field HOSTOUTPUT="Host is UP" -field NOTIFICATIONTYPE="RECOVERY"

Replace your_channel_name with the name of the channel where you’ll want to receive status alerts. The script will output information about the HTTP request it made to Slack, and if everything went through correctly, the last line of the output will be ok. If you get an error, double check if the Slack channel you specified exists in the workspace.

You can now head over to your Slack workspace and select the channel you specified. You’ll see a test message coming from Nagios.

Slack - Nagios Test Message

This confirms that you have properly configured the Slack script. You’ll now move on to configuring Nagios to alert you via Slack using this script.

You’ll need to create a contact for Slack and two commands that will send messages to it. You’ll store this config in a file named slack.cfg, in the same folder as the previous config files. Create it for editing by running the following command:

  • sudo nano /usr/local/nagios/etc/objects/postgresql/slack.cfg

Add the following lines:

/usr/local/nagios/etc/objects/postgresql/slack.cfg
define contact {       contact_name                             slack       alias                                    Slack       service_notification_period              24x7       host_notification_period                 24x7       service_notification_options             w,u,c,f,s,r       host_notification_options                d,u,r,f,s       service_notification_commands            notify-service-by-slack       host_notification_commands               notify-host-by-slack }  define command {       command_name     notify-service-by-slack       command_line     /usr/local/nagios/libexec/slack.pl -field slack_channel=#service_alerts_channel }  define command {       command_name     notify-host-by-slack       command_line     /usr/local/nagios/libexec/slack.pl -field slack_channel=#host_alerts_channel } 

Here you define a contact named slack, state that it can be contacted anytime and specify which commands to use for notifying service and host related events. Those two commands are defined after it and call the script you have just configured. You’ll need to replace service_alerts_channel and host_alerts_channel with the names of the channels where you want to receive service and host messages, respectively. If preferred, you can use the same channel names.

Similarly to the service creation in the last step, setting service and host notification options on the contact is crucial, because it governs what kind of alerts the contact will receive. Omitting those options would result in sending out notifications only when manually triggered from the web interface.

When you are done with editing, save and close the file.

To enable alerting via the slack contact you just defined, you’ll need to add it to the admin contact group, defined in the contacts.cfg config file, located under /usr/local/nagios/etc/objects/. Open it for editing by running the following command:

  • sudo nano /usr/local/nagios/etc/objects/contacts.cfg

Find the config block that looks like this:

/usr/local/nagios/etc/objects/contacts.cfg
define contactgroup {      contactgroup_name       admins     alias                   Nagios Administrators     members                 nagiosadmin } 

Add slack to the list of members, like so:

/usr/local/nagios/etc/objects/contacts.cfg
define contactgroup {      contactgroup_name       admins     alias                   Nagios Administrators     members                 nagiosadmin,slack } 

Save and close the file.

By default when running scripts, Nagios does not make host and service information available via environment variables, which is what the Slack script requires in order to send meaningful messages. To remedy this, you’ll need to set the enable_environment_macros setting in nagios.cfg to 1. Open it for editing by running the following command:

  • sudo nano /usr/local/nagios/etc/nagios.cfg

Find the line that looks like this:

/usr/local/nagios/etc/nagios.cfg
enable_environment_macros=0 

Change the value to 1, like so:

/usr/local/nagios/etc/nagios.cfg
enable_environment_macros=1 

Save and close the file.

Test the validity of the Nagios configuration by running the following command:

  • sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

The end of the output will look like:

Output
Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check

Proceed to restart Nagios by running the following command:

  • sudo systemctl restart nagios

To test the Slack integration, you’ll send out a custom notification via the web interface. Reload the Nagios Services status page in your browser. Press on the PostgreSQL Backends service and press on Send custom service notification on the right when the page loads.

Nagios - Custom Service Notification

Type in a comment of your choice and press on Commit, and then press on Done. You’ll immediately receive a new message in Slack.

Slack - Status Alert From Nagios

You have now integrated Slack with Nagios, so you’ll receive messages about critical events and status changes immediately. You’ve also tested the integration by manually triggering an event from within Nagios.

Conclusion

You now have Nagios Core configured to watch over your managed PostgreSQL database and report any status changes and events to Slack, so you’ll always be in the loop of what is happening to your database. This will allow you to swiftly react in case of an emergency, because you’ll be getting the status feed in real time.

If you’d like to learn more about the features of check_postgres, check out its docs, where you’ll find a lot more commands that you can possibly use.

For more information about what you can do with your PostgreSQL Managed Database, visit the product docs.

DigitalOcean Community Tutorials

Basket Analysis Using Parameter Actions and LOD Calculations

The initial request that led me to create this dashboard came from a client working in the sales department of a manufacturing company. The challenge was unique: if a customer requested a product that was out of stock, they could quickly find alternative products with similar components or identify what parts needed to be added/removed from another product to fulfill the order. While that situation may have been fairly rare, I believe the underlying logic needed to do this can be used in a variety of applications.

Logic Use Case

Being able to select an object in Tableau and identify shared underlying components with another object in the date source.

For this example, I have used the same logic to be able to do a rough basket analysis on customers in the Superstore sample dataset built into Tableau. The dashboard is set up to start with a simple view showing sales dollars by customer and count of unique customers by product. Users are also able to drill down and find shared products between customers to potentially identify other products that customer might be interested in. This dataset is very limited in that there is very little overlap in the products customers buy.

When I select Customer A, I want to check the products that they purchased and flag all other products shared with other customers:

Dashboard Navigation

Optional: Selecting a product filters the chart on the left to show a ranked list of customer sales dollars for that product while still showing the total number of unique products purchased by that customer.

Selecting a customer triggers two charts to appear in place of the product list. A list of products purchased by the customer selected appears below. showing the number of customers who bought that product and the sales dollars for each product purchased by the customer.

A list of other customers that bought products also purchased by the customer in question appears ranked by sales dollars on the right. Hovering over a customer on the right side shows a list of the Top 20 products for that customer by sales dollars with shared products in red.

Selecting a customer on the right side triggers a list of products purchased by both of the customers. That list is broken out by those products that are purchased by only the first customer, only the second customer or both.

Parameters and Set Actions

This can be done with either a parameter action or a set action. Using a parameter action allows for the name of the customer selected to feed into the display. A set action would allow for a situation where a multi-select is needed.

Create parameters for the first and second customers selected with String as the Data type and all values allowed:

Create a calculation to flag all products purchased by the customer selected in the parameter. This example is using a customer name, which should be replaced by a customer ID field in a real environment:

Create an LOD calculation to flag all instances of a product purchased by the customer selected for all customers:

To use these calculations, I need two sheets showing sales by customer to pass the customer selection between. The second sheet is filtered so that only sales of shared products are displayed. The Products by Customer 1 field is set as a filter to only keep records where it equals 1. I then have to decide if I want to see records from the customer selected in the second sheet. I decided to remove these by adding a filter using an ad hoc calculation: [Customer Name]<>[Customer 1 Name Parameter]. In a dashboard, I add both sheets and set up the parameter action filter to update the parameter when I select a customer:

Action Filter Using Dummy Field

The second sheet now will always display customers that purchased the same products for the last customer selected. Since I want this view to only be displayed on selection, I created a dummy field that could be passed between the sheets to activate the display. This could be any value; it just needs to be consistent. I place this field on Detail on the first sheet to make it available. Next, I set up an exclude action filter using the field to the sheets I was hidden until a customer is selected:

I’ll repeat these calculations for the second parameter in order to label products as either purchased by both customers selected or just one:

The post Basket Analysis Using Parameter Actions and LOD Calculations appeared first on InterWorks.

InterWorks

How To Analyze Managed PostgreSQL Database Statistics Using the Elastic Stack on Ubuntu 18.04

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Database monitoring is the continuous process of systematically tracking various metrics that show how the database is performing. By observing the performance data, you can gain valuable insights and identify possible bottlenecks, as well as find additional ways of improving database performance. Such systems often implement alerting, which notifies administrators when things go wrong. Gathered statistics can be used to not only improve the configuration and workflow of the database, but also those of client applications.

The benefit of using the Elastic Stack (ELK stack) for monitoring your managed database is its excellent support for searching and the ability to ingest new data very quickly. It does not excel at updating the data, but this trade off is acceptable for monitoring and logging purposes, where past data is almost never changed. Elasticsearch offers powerful means of querying the data, which you can use through Kibana to get a better understanding of how the database fares through different time periods. This will allow you to correlate database load with real-life events to gain insight into how the database is being used.

In this tutorial, you’ll import database metrics, generated by the PostgreSQL statistics collector, into Elasticsearch via Logstash. This entails configuring Logstash to pull data from the database using the PostgreSQL JDBC connector to send it to Elasticsearch for indexing immediately afterward. The imported data can later be analyzed and visualized in Kibana. Then, if your database is brand new, you’ll use pgbench, a PostgreSQL benchmarking tool, to create more interesting visualizations. In the end, you’ll have an automated system pulling in PostgreSQL statistics for later analysis.

Prerequisites

Step 1 — Setting up Logstash and the PostgreSQL JDBC Driver

In this section, you will install Logstash and download the PostgreSQL JDBC driver so that Logstash will be able to connect to your managed database.

Start off by installing Logstash with the following command:

  • sudo apt install logstash -y

Once Logstash is installed, enable the service to automatically start on boot:

  • sudo systemctl enable logstash

Logstash is written in Java, so in order to connect to PostgreSQL it requires the PostgreSQL JDBC (Java Database Connectivity) library to be available on the system it is running on. Because of an internal limitation, Logstash will properly load the library only if it is found under the /usr/share/logstash/logstash-core/lib/jars directory, where it stores third-party libraries it uses.

Head over to the download page of the JDBC library and copy the link to latest version. Then, download it using curl by running the following command:

  • sudo curl https://jdbc.postgresql.org/download/postgresql-42.2.6.jar -o /usr/share/logstash/logstash-core/lib/jars/postgresql-jdbc.jar

At the time of writing, the latest version of the library was 42.2.6, with Java 8 as the supported runtime version. Ensure you download the latest version; pairing it with the correct Java version that both JDBC and Logstash support.

Logstash stores its configuration files under /etc/logstash/conf.d, and is itself stored under /usr/share/logstash/bin. Before you create a configuration that will pull statistics from your database, you’ll need to enable the JDBC plugin in Logstash by running the following command:

  • sudo /usr/share/logstash/bin/logstash-plugin install logstash-input-jdbc

You’ve installed Logstash using apt and downloaded the PostgreSQL JDBC library so that Logstash can use it to connect to your managed database. In the next step, you will configure Logstash to pull statistical data from it.

Step 2 — Configuring Logstash To Pull Statistics

In this section, you will configure Logstash to pull metrics from your managed PostgreSQL database.

You’ll configure Logstash to watch over three system databases in PostgreSQL, namely:

  • pg_stat_database: provides statistics about each database, including its name, number of connections, transactions, rollbacks, rows returned by querying the database, deadlocks, and so on. It has a stats_reset field, which specifies when the statistics were last reset.
  • pg_stat_user_tables: provides statistics about each table created by the user, such as the number of inserted, deleted, and updated rows.
  • pg_stat_user_indexes: collects data about all indexes in user-created tables, such as the number of times a particular index has been scanned.

You’ll store the configuration for indexing PostgreSQL statistics in Elasticsearch in a file named postgresql.conf under the /etc/logstash/conf.d directory, where Logstash stores configuration files. When started as a service, it will automatically run them in the background.

Create postgresql.conf using your favorite editor (for example, nano):

  • sudo nano /etc/logstash/conf.d/postgresql.conf

Add the following lines:

/etc/logstash/conf.d/postgresql.conf
input {         # pg_stat_database         jdbc {                 jdbc_driver_library => ""                 jdbc_driver_class => "org.postgresql.Driver"                 jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"                 jdbc_user => "username"                 jdbc_password => "password"                 statement => "SELECT * FROM pg_stat_database"                 schedule => "* * * * *"                 type => "pg_stat_database"         }          # pg_stat_user_tables         jdbc {                 jdbc_driver_library => ""                 jdbc_driver_class => "org.postgresql.Driver"                 jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"                 jdbc_user => "username"                 jdbc_password => "password"                 statement => "SELECT * FROM pg_stat_user_tables"                 schedule => "* * * * *"                 type => "pg_stat_user_tables"         }          # pg_stat_user_indexes         jdbc {                 jdbc_driver_library => ""                 jdbc_driver_class => "org.postgresql.Driver"                 jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"                 jdbc_user => "username"                 jdbc_password => "password"                 statement => "SELECT * FROM pg_stat_user_indexes"                 schedule => "* * * * *"                 type => "pg_stat_user_indexes"         } }  output {         elasticsearch {                 hosts => "http://localhost:9200"                 index => "%{type}"         } } 

Remember to replace host with your host address, port with the port to which you can connect to your database, username with the database user username, and password with its password. All these values can be found in the Control Panel of your managed database.

In this configuration, you define three JDBC inputs and one Elasticsearch output. The three inputs pull data from the pg_stat_database, pg_stat_user_tables, and pg_stat_user_indexes databases, respectively. They all set the jdbc_driver_library parameter to an empty string, because the PostgreSQL JDBC library is in a folder that Logstash automatically loads.

Then, they set the jdbc_driver_class, whose value is specific to the JDBC library, and provide a jdbc_connection_string, which details how to connect to the database. The jdbc: part signifies that it is a JDBC connection, while postgres:// indicates that the target database is PostgreSQL. Next come the host and port of the database, and after the forward slash you also specify a database to connect to; this is because PostgreSQL requires you to be connected to a database to be able to issue any queries. Here, it is set to the default database that always exists and can not be deleted, aptly named defaultdb.

Next, they set a username and password of the user through which the database will be accessed. The statement parameter contains a SQL query that should return the data you wish to process—in this configuration, it selects all rows from the appropriate database.

The schedule parameter accepts a string in cron syntax that defines when Logstash should run this input; omitting it completely will make Logstash run it only once. Specifying * * * * *, as you have done so here, will tell Logstash to run it every minute. You can specify your own cron string if you want to collect data at different intervals.

There is only one output, which accepts data from three inputs. They all send data to Elasticsearch, which is running locally and is reachable at http://localhost:9200. The index parameter defines to which Elasticsearch index it will send the data, and its value is passed in from the type field of the input.

When you are done with editing, save and close the file.

You’ve configured Logstash to gather data from various PostgreSQL statistical tables and send them to Elasticsearch for storage and indexing. Next, you’ll run Logstash to test the configuration.

Step 3 — Testing the Logstash Configuration

In this section, you will test the configuration by running Logstash to verify it will properly pull the data. Then, you will make this configuration run in the background by configuring it as a Logstash pipeline.

Logstash supports running a specific configuration by passing its file path to the -f parameter. Run the following command to test your new configuration from the last step:

  • sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/postgresql.conf

It may take some time before it shows any output, which will look similar to this:

Output
Thread.exclusive is deprecated, use Thread::Mutex WARNING: Could not find logstash.yml which is typically located in $ LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console [WARN ] 2019-08-02 18:29:15.123 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified [INFO ] 2019-08-02 18:29:15.154 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"7.3.0"} [INFO ] 2019-08-02 18:29:18.209 [Converge PipelineAction::Create<main>] Reflections - Reflections took 77 ms to scan 1 urls, producing 19 keys and 39 values [INFO ] 2019-08-02 18:29:20.195 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:20.667 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.221 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.230 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.274 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.337 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.369 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.409 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.430 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.444 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.465 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.466 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.468 [Ruby-0-Thread-7: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.538 [Ruby-0-Thread-5: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.545 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.589 [Ruby-0-Thread-9: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.696 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.769 [Ruby-0-Thread-7: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.771 [Ruby-0-Thread-9: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [WARN ] 2019-08-02 18:29:21.871 [[main]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been create for key: cluster_uuids. This may result in invalid serialization. It is recommended to log an issue to the responsible developer/development team. [INFO ] 2019-08-02 18:29:21.878 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, :thread=>"#<Thread:0x470bf1ca run>"} [INFO ] 2019-08-02 18:29:22.351 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"} [INFO ] 2019-08-02 18:29:22.721 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]} [INFO ] 2019-08-02 18:29:23.798 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600} /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated [INFO ] 2019-08-02 18:30:02.333 [Ruby-0-Thread-22: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:284] jdbc - (0.042932s) SELECT * FROM pg_stat_user_indexes [INFO ] 2019-08-02 18:30:02.340 [Ruby-0-Thread-23: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:331] jdbc - (0.043178s) SELECT * FROM pg_stat_user_tables [INFO ] 2019-08-02 18:30:02.340 [Ruby-0-Thread-24: :1] jdbc - (0.036469s) SELECT * FROM pg_stat_database ...

If Logstash does not show any errors and logs that it has successfully SELECTed rows from the three databases, your database metrics will be shipped to Elasticsearch. If you get an error, double check all the values in the configuration file to ensure that the machine you’re running Logstash on can connect to the managed database.

Logstash will continue importing the data at specified times. You can safely stop it by pressing CTRL+C.

As previously mentioned, when started as a service, Logstash automatically runs all configuration files it finds under /etc/logstash/conf.d in the background. Run the following command to start it as a service:

  • sudo systemctl start logstash

In this step, you ran Logstash to check if it can connect to your database and gather data. Next, you’ll visualize and explore some of the statistical data in Kibana.

Step 4 — Exploring Imported Data in Kibana

In this section, you’ll see how you can explore the statistical data describing your database’s performance in Kibana.

In your browser, navigate to the Kibana installation you set up as a prerequisite. You’ll see the default welcome page.

Kibana - Default Welcome Page

To interact with Elasticsearch indexes in Kibana, you’ll need to create an index pattern. Index patterns specify on which indexes Kibana should operate. To create one, press on the last icon (wrench) from the left-hand vertical sidebar to open the Management page. Then, from the left menu, press on Index Patterns under Kibana. You’ll see a dialog box for creating an index pattern.

Kibana - Add Index Pattern

Listed are the three indexes where Logstash has been sending statistics. Type in pg_stat_database in the Index Pattern input box and then press Next step. You’ll be asked to select a field that stores time, so you’ll be able to later narrow your data by a time range. From the dropdown, select @timestamp.

Kibana - Index Pattern Timestamp Field

Press on Create index pattern to finish creating the index pattern. You’ll now be able to explore it using Kibana. To create a visualization, press on the second icon in the sidebar, and then on Create new visualization. Select the Line visualization when the form pops up, and then choose the index pattern you have just created (pg_stat_database). You’ll see an empty visualization.

Kibana - Empty Visualisation

On the central part of the screen is the resulting plot—the left-side panel governs its generation from which you can set the data for X and Y axis. In the upper right-hand side of the screen is the date range picker. Unless you specifically choose another range when configuring the data, that range will be shown on the plot.

You’ll now visualize the average number of data tuples INSERTed on minutes in the given interval. Press on Y-Axis under Metrics in the panel on the left to unfold it. Select Average as the Aggregation and select tup_inserted as the Field. This will populate the Y axis of the plot with the average values.

Next, press on X-Axis under Buckets. For the Aggregation, choose Date Histogram. @timestamp should be automatically selected as the Field. Then, press on the blue play button on the top of the panel to generate your graph. If your database is brand new and not used, you won’t see anything yet. In all cases, however, you will see an accurate portrayal of database usage.

Kibana supports many other visualization forms—you can explore other forms in the Kibana documentation. You can also add the two remaining indexes, mentioned in Step 2, into Kibana to be able to visualize them as well.

In this step, you have learned how to visualize some of the PostgreSQL statistical data, using Kibana.

Step 5 — (Optional) Benchmarking Using pgbench

If you haven’t yet worked in your database outside of this tutorial, you can complete this step to create more interesting visualizations by using pgbench to benchmark your database. pgbench will run the same SQL commands over and over, simulating real-world database use by an actual client.

You’ll first need to install pgbench by running the following command:

  • sudo apt install postgresql-contrib -y

Because pgbench will insert and update test data, you’ll need to create a separate database for it. To do so, head over to the Users & Databases tab in the Control Panel of your managed database, and scroll down to the Databases section. Type in pgbench as the name of the new database, and then press on Save. You’ll pass this name, as well as the host, port, and username information to pgbench.

Accessing Databases section in DO control panel

Before actually running pgbench, you’ll need to run it with the -i flag to initialize its database:

  • pgbench -h host -p port -U username -i pgbench

You’ll need to replace host with your host address, port with the port to which you can connect to your database, and username with the database user username. You can find all these values in the Control Panel of your managed database.

Notice that pgbench does not have a password argument; instead, you’ll be asked for it every time you run it.

The output will look like the following:

Output
NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping creating tables... 100000 of 100000 tuples (100%) done (elapsed 0.16 s, remaining 0.00 s) vacuum... set primary keys... done.

pgbench created four tables, which it will use for benchmarking, and populated them with some example rows. You’ll now be able to run benchmarks.

The two most important arguments that limit for how long the benchmark will run are -t, which specifies the number of transactions to complete, and -T, which defines for how many seconds the benchmark should run. These two options are mutually exclusive. At the end of each benchmark, you’ll receive statistics, such as the number of transactions per second (tps).

Now, start a benchmark that will last for 30 seconds by running the following command:

  • pgbench -h host -p port -U username pgbench -T 30

The output will look like:

Output
starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 30 s number of transactions actually processed: 7602 latency average = 3.947 ms tps = 253.382298 (including connections establishing) tps = 253.535257 (excluding connections establishing)

In this output, you see the general info about the benchmark, such as the total number of transactions executed. The effect of these benchmarks is that the statistics Logstash ships to Elasticsearch will reflect that number, which will in turn make visualizations in Kibana more interesting and closer to real-world graphs. You can run the preceding command a few more times, and possibly alter the duration.

When you are done, head over to Kibana and press on Refresh in the upper right corner. You’ll now see a different line than before, which shows the number of INSERTs. Feel free to change the time range of the data shown by changing the values in the picker positioned above the refresh button. Here is how the graph may look after multiple benchmarks of varying duration:

Kibana - Visualization After Benchmarks

You’ve used pgbench to benchmark your database, and evaluated the resulting graphs in Kibana.

Conclusion

You now have the Elastic stack installed on your server and configured to pull statistics data from your managed PostgreSQL database on a regular basis. You can analyze and visualize the data using Kibana, or some other suitable software, which will help you gather valuable insights and real-world correlations into how your database is performing.

For more information about what you can do with your PostgreSQL Managed Database, visit the product docs.

DigitalOcean Community Tutorials

Malthe Borch: Using built-in transparent compression on MacOS

Ever since DriveSpace on MS-DOS (or really, Stacker), we’ve had transparent file compression, with varying degrees of automation; in fact, while the DriveSpace-compression on MS-DOS was a fully automated affair, the built-in transparent compression in newer filesystems such as ZFS, Btrfs, APFS (and even HFS+), is engaged manually on a per-file or folder basis.

But no one’s using it!

On my system, compressing /Applications saved 18GB (38.7%).

MacOS doesn’t actually come with a utility to do this even though the core functionality is included, so you’ll need to install an open source tool in order to use it.

$   brew install afsctool

To compress a file or folder, use the -c flag like so:

$   afsctool -c /Applications

(You might need to use root for some application and/or system files).

Planet Python

Stack Abuse: Using Django Signals to Simplify and Decouple Code

Introduction

Systems are getting more complex as time goes by and this warrants the need to decouple systems more. A decoupled system is easier to build, extend, and maintain in the long run since not only does decoupling reduce the complexity of the system, each part of the system can be managed individually. Fault tolerance has also enhanced since, in a decoupled system, a failing component does not drag down the entire system with it.

Django is a powerful open-source web framework that can be used to build large and complex systems, as well as small ones. It follows the model-template-view architectural pattern and it is true to its goal of helping developers achieve the delivery of complex data-driven web-based applications.

Django enables us to decouple system functionality by building separate apps within a project. For instance, we can have a shopping system and have separate apps that handle accounts, emailing of receipts, and notifications, among other things.

In such a system, several apps may be need to perform an action when certain events take place. One event can occur when a customer places an order. For exmaple, we will need to notify the user via email and also send the order to the supplier or vendor, at the same time we can be able to receive and process payments. All these events happen at the same time and since our application is decoupled, we need to keep every component in sync, but how do we achieve this?

Django Signals come in handy in such a situation, all that needs to happen is a signal is sent when a user places an order, and every related or affected component listens out for it and performs its operations. Let us explore more about signals in this post.

Signals at a Glance

Django Signals are an implementation of the Observer Pattern. In such a design pattern, a subscription mechanism is implemented where multiple objects are subscribed to, or "observing", a particular object and any events that may happen to it. A good analogy is how all subscribers to a YouTube channel get a notification when a content creator uploads new content.

Through a "signal dispatcher", Django is able to distribute signals in a decoupled setup to registered "receivers" in the various system components. Signals are registered and triggered whenever certain events occur, and any listeners to that event will get notified that event has occurred, alongside receiving some contextual data within the payload that may be relevant to the functionality of the receiver. A receiver can be any Python function or method. More on this later on.

Aside from the signals dispatcher, Django also ships with some useful signals that we can listen on. They include:

  • post_save, which is sent out whenever a new Django model has been created and saved. For instance, when a user signs up or uploads a new post,
  • pre_delete, which is sent out just before a Django model is deleted. A good scenario would be when a user is deleting a message or their account,
  • request_finished, which is fired whenever Django completes serving an HTTP request. This can range from opening the website or accessing a particular resource.

Another advantage of Django is that it is a highly customizable framework. In our case, we can create our custom signals and use the built-in system to dispatch and receive them in our decoupled system. In the demo section, we will subscribe to some of Django’s built-in signals, and also create some custom ones of our own.

When to Use Signals

We have already identified what Django Signals are and how they work, but as is with any other framework feature, it is not meant to be used at every turn. There are particular scenarios where it is highly recommended that we use Django signals, and they include:

  • When we have many separate pieces of code interested in the same events, a signal would help distribute the event notification as opposed to us invoking all of the different pieces of code at the same point, which can get untidy and introduce bugs
  • We can also use Django signals to handle interactions between components in a decoupled system as an alternative to interaction through RESTful communication mechanisms
  • Signals are also useful when extending third-party libraries where we want to avoid modifying them, but need to add extra functionality

Advantages of Signals

Django Signals simplify the implementation of our decoupled systems in various ways. They help us implement reusable applications and instead of reimplementing functionality severally, or modifying other parts of the system, we can just respond to signals without affecting other code. This way, components of a system can be modified, added, or removed without touching the existing codebase.

Signals also provide a simplified mechanism for keeping different components of a decoupled system in sync and up to date with each other.

Demo Project

In our demo project, we will build a simple jobs board where users will access the site, view available jobs and pick a job posting to subscribe to. The users will subscribe just by submitting their email address and will be notified of any changes to the job. For instance, if the requirements change, the job opening is closed, or if the job posting is taken down. All of these changes will be performed by an admin who will have a dashboard to create, update, and even take down job postings.

In the spirit of decoupling our application, we will build the main Jobs Board application and a separate Notifications application that will be tasked with notifying users whenever required. We will then use signals to invoke functionality in the Notifications app from the main Jobs Board app.

Another testament of Django’s expansive feature-set is the built-in administration dashboard that our administrators will use to manage jobs. Our work on that front is greatly reduced and we can prototype our application faster.

Project Setup

It is good practice to build Python projects in a virtual environment so that we work in an isolated environment that does not affect the system’s Python setup, so we’ll use Pipenv.

Let us first set up our environment:

# Set up the environment $   pipenv install --three  # Activate the virtual environment $   pipenv shell  # Install Django $   pipenv install django 

Django comes with some commands that help us perform various tasks such as creating a project, creating apps, migrating data, and testing code, among others. To create our project:

# Create the project $   django-admin startproject jobs_board && cd jobs_board  # Create the decoupled applications $   django-admin startapp jobs_board_main $   django-admin startapp jobs_board_notifications 

The commands above will create a Django project with two applications within it, which are decoupled from each other but can still work together. To confirm that our setup was successful, let us migrate the default migrations that come with Django and set up our database and tables:

$   python manage.py migrate $   python manage.py runserver 

When we access the local running instance of our Django project, we should see the following:

django set up

This means we have set up our Django project successfully and can now start implementing our logic.

Implementation

Django is based on a model-view-template architecture pattern, and this pattern will also guide our implementation. We will create models to define our data, then implement views to handle data access and manipulation, and finally templates to render our data to the end-user on the browser.

To have our applications integrated into the main Django application, we have to add them to the jobs_board/settings.py under INSTALLED_APPS, as follows:

INSTALLED_APPS = [     # Existing apps remain...      # jobs_board apps     'jobs_board_main',     'jobs_board_notifications', ] 

Part 1: The Main Jobs Board App

This is where the bulk of our system’s functionality will reside and it will be the interaction point with our users. It will contain our models, views and templates and some bespoke signals that we will use to interact with the Notifications app.

Let us start by creating our models in jobs_board_main/models.py:

# jobs_board_main/models.py  class Job(models.Model):     company = models.CharField(max_length=255, blank=False)     company_email = models.CharField(max_length=255, blank=False)     title = models.CharField(max_length=255, blank=False)     details = models.CharField(max_length=255, blank=True)     status = models.BooleanField(default=True)     date_created = models.DateTimeField(auto_now_add=True)     date_modified = models.DateTimeField(auto_now=True)  class Subscriber(models.Model):     email = models.CharField(max_length=255, blank=False, unique=True)     date_created = models.DateTimeField(auto_now_add=True)     date_modified = models.DateTimeField(auto_now=True)  class Subscription(models.Model):     email = models.CharField(max_length=255, blank=False, unique=True)     user = models.ForeignKey(Subscriber, related_name="subscriptions", on_delete=models.CASCADE)     job = models.ForeignKey(Job, related_name="jobs", on_delete=models.CASCADE)     date_created = models.DateTimeField(auto_now_add=True)     date_modified = models.DateTimeField(auto_now=True) 

We create a model to define our Job posting, which will only have a company name and the job details alongside the status of the job opening. We will also have a model to store our subscribers by only taking their email addresses. The Subscribers and the Jobs come together through the Subscription model where we will store details on subscriptions to Job postings.

With our models in place, we need to make migrations and migrate them to have the tables created in the database:

$   python manage.py makemigrations $   python manage.py migrate 

Next we move on to the view section of our application. Let’s create a view to display all job postings, and another to display individual job postings where users can subscribe to them by submitting their emails.

We will start by creating the view that will handle the displaying of all our jobs:

# jobs_board_main/views.py  from .models import Job  def get_jobs(request):     # get all jobs from the DB     jobs = Job.objects.all()     return render(request, 'jobs.html', {'jobs': jobs}) 

For this project we will use function-based views, the alternative being class-based views, but that is not part of this discussion. We query the database for all the jobs and respond to the request by specifying the template that will render the jobs and also including the jobs in the response.

Django ships with the Jinja templating engine which we will use to create the HTML files that will be rendered to the end-user. In our jobs_board_main application, we will create a templates folder that will host all the HTML files that we will render to the end-users.

The template to render all the jobs will display all jobs with links to individual job postings, as follows:

<!-- jobs_board_main/templates/jobs.html --> <!DOCTYPE html> <html>   <head>     <title>Jobs Board Homepage</title>   </head>   <body>     <h2> Welcome to the Jobs board </h2>      {% for job in jobs %}       <div>         <a href="/jobs/{{ job.id }}">{{ job.title }} at {{ job.company }}</a>         <p>           {{ job.details }}         </p>       </div>     {% endfor %}    </body> </html> 

We have created the Job model, the get_jobs view to get and display all the views, and the template to render the jobs listing. To bring all this work together, we have to create an endpoint from which the jobs will be accessible, and we do so by creating a urls.py file in our jobs_board_main_application:

# jobs_board_main/urls.py  from django.urls import path from .views import get_jobs  urlpatterns = [     # All jobs     path('jobs/', get_jobs, name="jobs_view"), ] 

In this file, we import our view, create a path and attach our view to it. We will now register our applications URLs in the main urls.py file in the jobs_board project folder:

# jobs_board/urls.py  urlpatterns = [     path('admin/', admin.site.urls),     path('', include('jobs_board_main.urls')), # <--- Add this line ] 

Our project is ready to be tested now. This is what we get when we run the application and navigate to localhost:8000/jobs:

jobs board landing 1

We currently have no jobs in place. Django ships with an administration application that we can use to perform our data entry. First, we start by creating a superuser:

create superuser

With the superuser created, we need to register our models in the admin.py file in our jobs_board_main application:

# jobs_board_main/admin.py from django.contrib import admin from .models import Job  # Register your models here. admin.site.register(Job) 

We restart our application and navigate to localhost:8000/admin and log in with the credentials we just set up. This is the result:

jobs board admin 1

When we click on the plus sign on the "Jobs" row, we get a form where we fill in details about our job posting:

jobs data entry

When we save the job and navigate back to the jobs endpoint, we are greeted by the job posting that we have just created:

jobs board landing 2

We will now create the views, templates, and URLs to display a single job and also allow users to subscribe by submitting their email.

Our jobs_board_main/views.py will be extended as follows:

# jobs_board_main/views.py # previous code remains def get_job(request, id):     job = Job.objects.get(pk=id)     return render(request, 'job.html', {'job': job})  def subscribe(request, id):     job = Job.objects.get(pk=id)     sub = Subscriber(email=request.POST['email'])     sub.save()      subscription = Subscription(user=sub, job=job)     subscription.save()      payload = {       'job': job,       'email': request.POST['email']     }     return render(request, 'subscribed.html', {'payload': payload}) 

We will also need to create the template for a single view of a job posting in templates/job.html, which includes the form that will take in a user’s email and subscribe them to the job posting:

<!-- jobs_board_main/templates/job.html --> <html>   <head>     <title>Jobs Board - {{ job.title }}</title>   </head>   <body>       <div>         <h3>{{ job.title }} at {{ job.company }}</h3>         <p>           {{ job.details }}         </p>         <br>         <p>Subscribe to this job posting by submitting your email</p>         <form action="/jobs/{{ job.id }}/subscribe" method="POST">           {% csrf_token %}           <input type="email" name="email" id="email" placeholder="Enter your email"/>           <input type="submit" value="Subscribe">         </form>         <hr>       </div>   </body> </html> 

Once a user subscribes to a job, we will need to redirect them to a confirmation page whose subscribed.html template will be as follows:

<!-- jobs_board_main/templates/subscribed.html --> <!DOCTYPE html> <html>   <head>     <title>Jobs Board - Subscribed</title>   </head>   <body>       <div>         <h3>Subscription confirmed!</h3>         <p>           Dear {{ payload.email }}, thank you for subscribing to {{ payload.job.title }}         </p>       </div>   </body> </html> 

Finally, our new functionality will need to exposed via endpoints which we will append to our existing jobs_board_main/urls.py as follows:

# jobs_board_main/urls.py from .views import get_jobs, get_job, subscribe  urlpatterns = [     # All jobs     path('jobs/', get_jobs, name="jobs_view"),     path('jobs/<int:id>', get_job, name="job_view"),     path('jobs/<int:id>/subscribe', subscribe, name="subscribe_view"), ] 

We can now test our main Jobs Board application by viewing the job posts listings, clicking on one, and submitting an email address that will receive updates.

Now that we have a working application, it is time to bring in Django Signals and notify users/subscribers when certain events take place. Job postings are tied to a certain company whose email we record, we want to notify them when a new user subscribes to their job posting. We also want to notify subscribed users when a job posting is taken down.

To notify users when a job posting is taken down or deleted, we will utilize Django’s built-in post_delete signal. We will also create our signal called new_subscriber that we will use to notify companies when users subscribe to their job posting.

We create our custom signals by creating a signals.py file in our jobs_board_main application:

# jobs_board_main/signals.py from django.dispatch import Signal  new_subscriber = Signal(providing_args=["job", "subscriber"]) 

That’s it! Our custom signal is ready to be invoked after a user has successfully subscribed to a job posting as follows in our jobs_board_main/views.py file:

# jobs_board_main/views.py  # Existing imports and code are maintained and truncated for brevity from .signals import new_subscriber  def subscribe(request, id):     job = Job.objects.get(pk=id)     subscriber = Subscriber(email=request.POST['email'])     subscriber.save()      subscription = Subscription(user=subscriber, job=job, email=subscriber.email)     subscription.save()      # Add this line that sends our custom signal     new_subscriber.send(sender=subscription, job=job, subscriber=subscriber)      payload = {       'job': job,       'email': request.POST['email']     }     return render(request, 'subscribed.html', {'payload': payload}) 

We don’t have to worry about the pre_delete signal as Django will send that for us automatically just before a job posting is deleted. The reason we are using pre_delete and not post_delete signal is because, when a Job is deleted, all linked subscriptions are also deleted in the process and we need that data before they are also deleted.

Let us now consume the signals that we have just sent in a separate jobs_board_notifications app.

Part 2: The Jobs Board Notifications App

We already created the jobs_board_notifications application and connected it to our Django project. In this section, we will consume the signals sent from our main application and send out the notifications. Django has built-in functionality to send out emails, but for development purposes, we will print the messages to the console instead.

Our jobs_board_notifications application does not need user interaction, therefore, we do not need to create any views or templates for that purpose. The only goal is for our jobs_board_notifications is to receive signals and send out notifications. We will implement this functionality in our models.py since it gets imported early when the application is starting.

Let us receive our signals in our jobs_board_notifications/models.py:

# jobs_board_notifications/models.py. from django.db.models.signals import pre_delete from django.dispatch import receiver  from jobs_board_main.signals import new_subscriber from jobs_board_main.models import Job, Subscriber, Subscription  @receiver(new_subscriber, sender=Subscription) def handle_new_subscription(sender, **kwargs):     subscriber = kwargs['subscriber']     job = kwargs['job']      message = """User {} has just subscribed to the Job {}.     """.format(subscriber.email, job.title)      print(message)  @receiver(pre_delete, sender=Job) def handle_deleted_job_posting(**kwargs):     job = kwargs['instance']      # Find the subscribers list     subscribers = Subscription.objects.filter(job=job)      for subscriber in subscribers:         message = """Dear {}, the job posting {} by {} has been taken down.         """.format(subscriber.email, job.title, job.company)          print(message) 

In our jobs_board_notifications, we import our custom signal, the pre_save signal, and our models. Using the @receiver decorator, we capture the signals and the contextual data passed with them as keyword arguments.

Upon receiving the contextual data, we use it to dispatch the "emails" (recall that we’re just printing to the console for the sake of simplicity) to subscribers and companies when a user subscribes and a job posting is deleted by responding to the signals we sent out.

Testing

Once we have created a job in our admin dashboard, it is available for users to subscribe. When users subscribe, the following email is sent out from the jobs_board_notifications application to the company that owns the posting:

job subscribe message

This is proof that our new_subscriber signal was sent out from the jobs_board_main application and received by the jobs_board_notifications application.

When a job posting is deleted, all users who subscribed to the job posting get notified via email, as follows:

job deleted email

Django’s pre_delete signal came in handy and our handler sent out notifications to the subscribed users that the particular job posting has been taken down.

Summary

In this article we have built a Django project with two applications which communicate through Django Signals in response to certain events. Our two applications are decoupled and the complexity in communication between our applications has been greatly reduced. When a user subscribes to a job posting, we notify the company. In turn, when a job posting has been deleted, we notify all subscribed customers that the job posting has been taken down.

However, there are some things we should have in mind when utilizing Django Signals. When signals are not well documented, new maintainers may have a hard time identifying the root cause of certain issues or unexpected behavior. Therefore, when signals are used in an application, it is a good idea to document the signals used, where they are received, and the reason behind them. This will help anyone maintaining the code to understand application behavior and solve issues faster and better. Also, it is helpful to note that signals are sent out synchronously. They are not executed in the background or by any asynchronous jobs.

With all this information about Django’s Signals and the demo project, we should be able to harness the power of Signals in our Django web projects.

The source code for this project is available here on Github.


Planet Python