Erik Marsja: How to Read and Write JSON Files using Python and Pandas

In this post we will learn how to read and write JSON files using Python. In the first, part we are going to use the Python package json to create a JSON file and write a JSON file. In the next part we are going to use Pandas json method to load JSON files into Pandas dataframe. Here, we will learn how to read from a JSON file locally and from an URL as well as how to read a nested JSON file using Pandas.

Finally, as a bonus, we will also learn how to manipulate data in Pandas dataframes, rename columns, and plot the data using Seaborn.

What is a JSON File?

JSON, short for JavaScript Object Notation, is a compact, text based format used to exchange data. This format that is common for downloading, and storing, information from web servers via so-called Web APIs. JSON is a text-based format and  when opening up a JSON file, we will recognize the structure. That is, it is not so different from Python’s structure for a dictionary.

Example JSON file

In the first example we are going to use the Python module json to create a JSON file. After we’ve done that we are going to load the JSON file. In this Python JSON tutorial, we start by create a dictionary for our data:

import json  data = {"Sub_ID":["1","2","3","4","5","6","7","8" ],         "Name":["Erik", "Daniel", "Michael", "Sven",                 "Gary", "Carol","Lisa", "Elisabeth" ],         "Salary":["723.3", "515.2", "621", "731",                    "844.15","558", "642.8", "732.5" ],         "StartDate":[ "1/1/2011", "7/23/2013", "12/15/2011",                      "6/11/2013", "3/27/2011","5/21/2012",                       "7/30/2013", "6/17/2014"],         "Department":[ "IT", "Manegement", "IT", "HR",                        "Finance", "IT", "Manegement", "IT"],         "Sex":[ "M", "M", "M",                "M", "M", "F", "F", "F"]}  print(data)

Python dictionary

Saving to a JSON file

In Python, there is the module json that enables us read and write content to and from a JSON file. This module converts the JSONs format to Python’s internal format for Data Structures. So we can work with JSON structures just as we do in the usual way with Python’s own data structures.

Python JSON Example:

In the example code below, we start by importing the json module. After we’ve done that, we open up a new file and use the dump method to write a json file using Python.

import json with open('data.json', 'w') as outfile:     json.dump(data, outfile)

How to Use Pandas to Load a JSON File

Now, if we are going to work with the data we might want to use Pandas to load the JSON file into a Pandas dataframe. This will enable us to manipulate data, do summary statistics, and data visualization using Pandas built-in methods. Note, we will cover this briefly later in this post also.

Pandas Read Json Example:

In the next example we are going to use Pandas read_json method to read the JSON file we wrote earlier (i.e., data.json). It’s fairly simple we start by importing pandas as pd:

import pandas as pd  df = pd.read_json('data.json')  df

The output, when working with Jupyter Notebooks, will look like this:

Data Manipulation using Pandas

Now that we have loaded the JSON file into a Pandas dataframe we are going use Pandas inplace method to modify our dataframe. We start by setting the Sub_ID column as index.

df.set_index('Sub_ID', inplace=True) df

Pandas JSON to CSV Example

Now when we have loaded a JSON file into a dataframe we may want to save it in another format. For instance, we may want to save it as a CSV file and we can do that using Pandas read_csv method. It may be useful to store it in a CSV, if we prefer to browse through the data in a text editor or Excel.

In the Pandas JSON to CSV example below, we carry out the same data manipulation method.

df.to_csv("data.csv")

Learn more about working with CSV files using Pandas in the  Pandas Read CSV Tutorial

How to Load JSON from an URL

We have now seen how easy it is to create a JSON file, write it to our hard drive using Python, and, finally, how to read it using Pandas. However, as previously mentioned, many times the data in stored in the JSON format are on the web.

Thus, in this section of the Python json guide, we are going to learn how to use Pandas read_json method to read a JSON file from an URL. Most often, it’s fairly simple we just create a string variable pointing to the URL:

url = "https://api.exchangerate-api.com/v4/latest/USD" df = pd.read_json(url) df.head()

Load JSON from an URL Second Example

When loading some data, using Pandas read_json seems to create a dataframe with dictionaries within each cell. One way to deal with these dictionaries, nested within dictionaries, is to work with the Python module request. This module also have a method for parsing JSON files. After we have parsed the JSON file we will use the method json_normalize to convert the JSON file to a dataframe.

Pandas Dataframe from JSON
import requests from pandas.io.json import json_normalize  url = "https://think.cs.vt.edu/corgis/json/airlines/airlines.json" resp = requests.get(url=url)  df = json_normalize(resp.json()) df.head()

As can be seen in the image above, the column names are quite long. This is quite impractical when we are going to create a time series plot, later, using Seaborn. We are now going to rename the columns so they become a bit easier to use.

In the code example below, we use Pandas rename method together with the Python module re. That is, we are using a regular expression to remove “statistics.# of” and “statistics.” from the column names. Finally, we are also replacing dots (“.”) with underscores (“_”) using the str.replace method:

import re  df.rename(columns=lambda x: re.sub("statistics.# of","",x),            inplace=True) df.rename(columns=lambda x: re.sub("statistics.","",x),            inplace=True)  df.columns = df.columns.str.replace("[.]", "_") df.head()

 

Time Series Plot from JSON Data using Seaborn

In the last example, in this post, we are going to use Seaborn to create a time series plot. The data we loaded from JSON to a dataframe contains data about delayed and canceled flights. We are going to use Seaborns lineplot method to create a time series plot of the number of canceled flights throughout 2003 to 2016, grouped by carrier code.

%matplotlib inline  import matplotlib.pyplot as plt import seaborn as sns  fig = plt.figure(figsize=(10, 7)) g = sns.lineplot(x="timeyear", y="flightscancelled", ci=False,              hue="carriercode", data=df)  g.set_ylabel("Flights Cancelled",fontsize=20) g.set_xlabel("Year",fontsize=20)   plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

Note, we changed the font size as well as the x- and y-axis’ labels using the methods set_ylabel and set_xlabel. Furthermore, we also moved the legend using the legend method from matplotlib.

For more about exploratory data analysis using Python:

Conclusion

In this post we have learned how to write a JSON file from a Python dictionary, how to load that JSON file using Python and Pandas. Furthermore, we have also learned how to use Pandas to load a JSON file from an URL to a dataframe, how to read a nested JSON file to a dataframe.

Here’s a link to a Jupyter Notebook containing all code examples in this post.

The post How to Read and Write JSON Files using Python and Pandas appeared first on Erik Marsja.

Planet Python

How To Configure a Galera Cluster with MariaDB on Debian 9 Servers

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Clustering adds high availability to your database by distributing changes to different servers. In the event that one of the instances fails, others are quickly available to continue serving.

Clusters come in two general configurations, active-passive and active-active. In active-passive clusters, all writes are done on a single active server and then copied to one or more passive servers that are poised to take over only in the event of an active server failure. Some active-passive clusters also allow SELECT operations on passive nodes. In an active-active cluster, every node is read-write and a change made to one is replicated to all.

MariaDB is an open source relational database system that is fully compatible with the popular MySQL RDBMS system. You can read the official documentation for MariaDB at this page. Galera is a database clustering solution that enables you to set up multi-master clusters using synchronous replication. Galera automatically handles keeping the data on different nodes in sync while allowing you to send read and write queries to any of the nodes in the cluster. You can learn more about Galera at the official documentation page.

In this guide, you will configure an active-active MariaDB Galera cluster. For demonstration purposes, you will configure and test three Debian 9 Droplets that will act as nodes in the cluster. This is the smallest configurable cluster.

Prerequisites

To follow along, you will need a DigitalOcean account, in addition to the following:

  • Three Debian 9 Droplets with private networking enabled, each with a non-root user with sudo privileges.

While the steps in this tutorial have been written for and tested against DigitalOcean Droplets, much of them should also be applicable to non-DigitalOcean servers with private networking enabled.

Step 1 — Adding the MariaDB Repositories to All Servers

In this step, you will add the relevant MariaDB package repositories to each of your three servers so that you will be able to install the right version of MariaDB used in this tutorial. Once the repositories are updated on all three servers, you will be ready to install MariaDB.

One thing to note about MariaDB is that it originated as a drop-in replacement for MySQL, so in many configuration files and startup scripts, you’ll see mysql rather than mariadb. For consistency’s sake, we will use mysql in this guide where either could work.

In this tutorial, you will use MariaDB version 10.4. Since this version isn’t included in the default Debian repositories, you’ll start by adding the external Debian repository maintained by the MariaDB project to all three of your servers.

To add the repository, you will first need to install the dirmngr and software-properties-common packages. dirmngr is a server for managing repository certificates and keys. software-properties-common is a package that allows easy addition and updates of source repository locations. Install the two packages by running:

  • sudo apt install dirmngr software-properties-common

Note: MariaDB is a well-respected provider, but not all external repositories are reliable. Be sure to install only from trusted sources.

You’ll add the MariaDB repository key with the apt-key command, which the APT package manager will use to verify that the package is authentic:

  • sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8

Once you have the trusted key in the database, you can add the repository with the following command:

  • sudo add-apt-repository 'deb [arch=amd64] http://nyc2.mirrors.digitalocean.com/mariadb/repo/10.4/debian stretch main'

After adding the repository, run apt update in order to include package manifests from the new repository:

  • sudo apt update

Once you have completed this step on your first server, repeat for your second and third servers.

Now that you have successfully added the package repository on all three of your servers, you’re ready to install MariaDB in the next section.

Step 2 — Installing MariaDB on All Servers

In this step, you will install the actual MariaDB packages on your three servers.

Beginning with version 10.1, the MariaDB Server and MariaDB Galera Server packages are combined, so installing mariadb-server will automatically install Galera and several dependencies:

  • sudo apt install mariadb-server

You will be asked to confirm whether you would like to proceed with the installation. Enter yes to continue with the installation.

From MariaDB version 10.4 onwards, the root MariaDB user does not have a password by default. To set a password for the root user, start by logging into MariaDB:

  • sudo mysql -uroot

Once you’re inside the MariaDB shell, change the password by executing the following statement:

  • set password = password("your_password");

You will see the following output indicating that the password was set correctly:

Output
Query OK, 0 rows affected (0.001 sec)

Exit the MariaDB shell by running the following command:

  • quit;

If you would like to learn more about SQL or need a quick refresher, check out our MySQL tutorial.

You now have all of the pieces necessary to begin configuring the cluster, but since you’ll be relying on rsync in later steps, make sure it’s installed:

  • sudo apt install rsync

This will confirm that the newest version of rsync is already available or prompt you to upgrade or install it.

Once you have installed MariaDB and set the root password on your first server, repeat these steps for your other two servers.

Now that you have installed MariaDB successfully on each of the three servers, you can proceed to the configuration step in the next section.

Step 3 — Configuring the First Node

In this step you will configure your first node. Each node in the cluster needs to have a nearly identical configuration. Because of this, you will do all of the configuration on your first machine, and then copy it to the other nodes.

By default, MariaDB is configured to check the /etc/mysql/conf.d directory to get additional configuration settings from files ending in .cnf. Create a file in this directory with all of your cluster-specific directives:

  • sudo nano /etc/mysql/conf.d/galera.cnf

Add the following configuration into the file. The configuration specifies different cluster options, details about the current server and the other servers in the cluster, and replication-related settings. Note that the IP addresses in the configuration are the private addresses of your respective servers; replace the highlighted lines with the appropriate IP addresses.

/etc/mysql/conf.d/galera.cnf
[mysqld] binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 bind-address=0.0.0.0  # Galera Provider Configuration wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so  # Galera Cluster Configuration wsrep_cluster_name="test_cluster" wsrep_cluster_address="gcomm://First_Node_IP,Second_Node_IP,Third_Node_IP"  # Galera Synchronization Configuration wsrep_sst_method=rsync  # Galera Node Configuration wsrep_node_address="This_Node_IP" wsrep_node_name="This_Node_Name" 
  • The first section modifies or re-asserts MariaDB/MySQL settings that will allow the cluster to function correctly. For example, Galera won’t work with MyISAM or similar non-transactional storage engines, and mysqld must not be bound to the IP address for localhost. You can learn about the settings in more detail on the Galera Cluster system configuration page.
  • The “Galera Provider Configuration” section configures the MariaDB components that provide a WriteSet replication API. This means Galera in your case, since Galera is a wsrep (WriteSet Replication) provider. You specify the general parameters to configure the initial replication environment. This doesn’t require any customization, but you can learn more about Galera configuration options.
  • The “Galera Cluster Configuration” section defines the cluster, identifying the cluster members by IP address or resolvable domain name and creating a name for the cluster to ensure that members join the correct group. You can change the wsrep_cluster_name to something more meaningful than test_cluster or leave it as-is, but you must update wsrep_cluster_address with the private IP addresses of your three servers.
  • The “Galera Synchronization Configuration” section defines how the cluster will communicate and synchronize data between members. This is used only for the state transfer that happens when a node comes online. For your initial setup, you are using rsync, because it’s commonly available and does what you’ll need for now.
  • The “Galera Node Configuration” section clarifies the IP address and the name of the current server. This is helpful when trying to diagnose problems in logs and for referencing each server in multiple ways. The wsrep_node_address must match the address of the machine you’re on, but you can choose any name you want in order to help you identify the node in log files.

When you are satisfied with your cluster configuration file, copy the contents into your clipboard, save and close the file. With the nano text editor, you can do this by pressing CTRL+X, typing y, and pressing ENTER.

Now that you have configured your first node successfully, you can move on to configuring the remaining nodes in the next section.

Step 4 — Configuring the Remaining Nodes

In this step, you will configure the remaining two nodes. On your second node, open the configuration file:

  • sudo nano /etc/mysql/conf.d/galera.cnf

Paste in the configuration you copied from the first node, then update the Galera Node Configuration to use the IP address or resolvable domain name for the specific node you’re setting up. Finally, update its name, which you can set to whatever helps you identify the node in your log files:

/etc/mysql/conf.d/galera.cnf
. . . # Galera Node Configuration wsrep_node_address="This_Node_IP" wsrep_node_name="This_Node_Name" . . . 

Save and exit the file.

Once you have completed these steps, repeat them on the third node.

You’re almost ready to bring up the cluster, but before you do, make sure that the appropriate ports are open in your firewall.

Step 5 — Opening the Firewall on Every Server

In this step, you will configure your firewall so that the ports required for inter-node communication are open. On every server, check the status of the firewall by running:

  • sudo ufw status

In this case, only SSH is allowed through:

Output
Status: active To Action From -- ------ ---- OpenSSH ALLOW Anywhere OpenSSH (v6) ALLOW Anywhere (v6)

Since only SSH traffic is permitted in this case, you’ll need to add rules for MySQL and Galera traffic. If you tried to start the cluster, it would fail because of firewall rules.

Galera can make use of four ports:

  • 3306 For MySQL client connections and State Snapshot Transfer that use the mysqldump method.
  • 4567 For Galera Cluster replication traffic. Multicast replication uses both UDP transport and TCP on this port.
  • 4568 For Incremental State Transfer.
  • 4444 For all other State Snapshot Transfer.

In this example, you’ll open all four ports while you do your setup. Once you’ve confirmed that replication is working, you’d want to close any ports you’re not actually using and restrict traffic to just servers in the cluster.

Open the ports with the following command:

  • sudo ufw allow 3306,4567,4568,4444/tcp
  • sudo ufw allow 4567/udp

Note: Depending on what else is running on your servers you might want to restrict access right away. The UFW Essentials: Common Firewall Rules and Commands guide can help with this.

After you have configured your firewall on the first node, create the same firewall settings on the second and third node.

Now that you have configured the firewalls successfully, you’re ready to start the cluster in the next step.

Step 6 — Starting the Cluster

In this step, you will start your MariaDB cluster. To begin, you need to stop the running MariaDB service so that you can bring your cluster online.

Stop MariaDB on All Three Servers

Use the following command on all three servers to stop MariaDB so that you can bring them back up in a cluster:

  • sudo systemctl stop mysql

systemctl doesn’t display the outcome of all service management commands, so to be sure you succeeded, use the following command:

  • sudo systemctl status mysql

If the last line looks something like the following, the command was successful:

Output
. . . Apr 26 03:34:23 galera-node-01 systemd[1]: Stopped MariaDB 10.4.4 database server.

Once you’ve shut down mysql on all of the servers, you’re ready to proceed.

Bring Up the First Node

To bring up the first node, you’ll need to use a special startup script. The way you’ve configured your cluster, each node that comes online tries to connect to at least one other node specified in its galera.cnf file to get its initial state. Without using the galera_new_cluster script that allows systemd to pass the --wsrep-new-cluster parameter, a normal systemctl start mysql would fail because there are no nodes running for the first node to connect with.

  • sudo galera_new_cluster

This command will not display any output on successful execution. When this script succeeds, the node is registered as part of the cluster, and you can see it with the following command:

  • mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

You will see the following output indicating that there is one node in the cluster:

Output
+--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 1 | +--------------------+-------+

On the remaining nodes, you can start mysql normally. They will search for any member of the cluster list that is online, so when they find one, they will join the cluster.

Bring Up the Second Node

Now you can bring up the second node. Start mysql:

  • sudo systemctl start mysql

No output will be displayed on successful execution. You will see your cluster size increase as each node comes online:

  • mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

You will see the following output indicating that the second node has joined the cluster and that there are two nodes in total.

Output
+--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 2 | +--------------------+-------+

Bring Up the Third Node

It’s now time to bring up the third node. Start mysql:

  • sudo systemctl start mysql

Run the following command to find the cluster size:

  • mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

You will see the following output, which indicates that the third node has joined the cluster and that the total number nodes in the cluster is three.

Output
+--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+

At this point, the entire cluster is online and communicating successfully. Next, you can ensure the working setup by testing replication in the next section.

Step 7 — Testing Replication

You’ve gone through the steps up to this point so that your cluster can perform replication from any node to any other node, known as active-active replication. Follow the steps below to test and see if the replication is working as expected.

Write to the First Node

You’ll start by making database changes on your first node. The following commands will create a database called playground and a table inside of this database called equipment.

  • mysql -u root -p -e 'CREATE DATABASE playground;
  • CREATE TABLE playground.equipment ( id INT NOT NULL AUTO_INCREMENT, type VARCHAR(50), quant INT, color VARCHAR(25), PRIMARY KEY(id));
  • INSERT INTO playground.equipment (type, quant, color) VALUES ("slide", 2, "blue");'

In the previous command, the CREATE DATABASE statement creates a database named playground. The CREATE statement creates a table named equipment inside the playground database having an auto-incrementing identifier column called id and other columns. The type column, quant column, and color column are defined to store the type, quantity, and color of the equipment respectively. The INSERT statement inserts an entry of type slide, quantity 2, and color blue.

You now have one value in your table.

Read and Write on the Second Node

Next, look at the second node to verify that replication is working:

  • mysql -u root -p -e 'SELECT * FROM playground.equipment;'

If replication is working, the data you entered on the first node will be visible here on the second:

Output
+----+-------+-------+-------+ | id | type | quant | color | +----+-------+-------+-------+ | 1 | slide | 2 | blue | +----+-------+-------+-------+

From this same node, you can write data to the cluster:

  • mysql -u root -p -e 'INSERT INTO playground.equipment (type, quant, color) VALUES ("swing", 10, "yellow");'

Read and Write on the Third Node

From the third node, you can read all of this data by querying the table again:

  • mysql -u root -p -e 'SELECT * FROM playground.equipment;'

You will see the following output showing the two rows:

Output
+----+-------+-------+--------+ | id | type | quant | color | +----+-------+-------+--------+ | 1 | slide | 2 | blue | | 2 | swing | 10 | yellow | +----+-------+-------+--------+

Again, you can add another value from this node:

  • mysql -u root -p -e 'INSERT INTO playground.equipment (type, quant, color) VALUES ("seesaw", 3, "green");'

Read on the First Node:

Back on the first node, you can verify that your data is available everywhere:

  • mysql -u root -p -e 'SELECT * FROM playground.equipment;'

You will see the following output that indicates the rows are available on the first node.

Output
+----+--------+-------+--------+ | id | type | quant | color | +----+--------+-------+--------+ | 1 | slide | 2 | blue | | 2 | swing | 10 | yellow | | 3 | seesaw | 3 | green | +----+--------+-------+--------+

You’ve successfully verified that you can write to all of the nodes and that replication is being performed properly.

Conclusion

At this point, you have a working three-node Galera test cluster configured. If you plan on using a Galera cluster in a production situation, it’s recommended that you begin with no fewer than five nodes.

Before production use, you may want to take a look at some of the other state snapshot transfer (sst) agents like xtrabackup, which allows you to set up new nodes very quickly and without large interruptions to your active nodes. This does not affect the actual replication, but is a concern when nodes are being initialized.

DigitalOcean Community Tutorials

PYD88 – Snowflake Summit, San Francisco

This week’s episode of Podcast Your Data throws it back to the inaugural Snowflake Summit that occurred last month. Snowflake pulled out all the stops when hosting its first-ever conference and created the most unique and unforgettable experience for attendees. Listen in as Karl Young, Bill Barnes and InterWorks Data Practice Lead Brian Bickell discuss Summit product announcements, their favorite sessions and more:

 

interworks.com/blog/interworks/2…-cloud-analytics/

Subscribe to Podcast Your Data through iTunes, Stitcher, Pocket Casts or your favorite podcasting app.

The post PYD88 – Snowflake Summit, San Francisco appeared first on InterWorks.

InterWorks

Test and Code: 82: pytest – favorite features since 3.0 – Anthony Sottile

Anthony Sotille is a pytest core contributor, as well as a maintainer and contributor to
many other projects. In this episode, Anthony shares some of the super cool features of pytest that have been added since he started using it.

We also discuss Anthony’s move from user to contributor, and how others can help with the pytest project.

Special Guest: Anthony Sottile.

Sponsored By:

Support Test & Code – Python Testing & Development

Links:

<p>Anthony Sotille is a pytest core contributor, as well as a maintainer and contributor to <br> many other projects. In this episode, Anthony shares some of the super cool features of pytest that have been added since he started using it.</p> <p>We also discuss Anthony&#39;s move from user to contributor, and how others can help with the pytest project.</p><p>Special Guest: Anthony Sottile.</p><p>Sponsored By:</p><ul><li><a href=”https://azure.com/pipelines” rel=”nofollow”>Azure Pipelines</a>: <a href=”https://azure.com/pipelines” rel=”nofollow”>Many organizations and open source projects are using Azure Pipelines already. Get started for free at azure.com/pipelines</a></li></ul><p><a href=”https://www.patreon.com/testpodcast” rel=”payment”>Support Test & Code – Python Testing & Development</a></p><p>Links:</p><ul><li><a href=”https://pytest.org/en/latest/” title=”pytest documentation” rel=”nofollow”>pytest documentation</a></li><li><a href=”http://doc.pytest.org/en/latest/changelog.html” title=”pytest Changelog” rel=”nofollow”>pytest Changelog</a></li><li><a href=”http://doc.pytest.org/en/latest/reference.html#” title=”pytest API Reference” rel=”nofollow”>pytest API Reference</a></li><li><a href=”https://docs.pytest.org/en/latest/sponsor.html” title=”sponsor pytest” rel=”nofollow”>sponsor pytest</a></li><li><a href=”http://doc.pytest.org/en/latest/contributing.html” title=”getting started contributing to pytest” rel=”nofollow”>getting started contributing to pytest</a></li><li><a href=”https://amzn.to/2QnzvUv” title=”the book: Python Testing with pytest” rel=”nofollow”>the book: Python Testing with pytest</a> &mdash; The fastest way to learn pytest</li></ul>
Planet Python

How To Use PostgreSQL with Your Ruby on Rails Application on macOS

Introduction

When using the Ruby on Rails web framework, your application is set up by default to use SQLite as a database. SQLite is a lightweight, portable, and user-friendly relational database that performs especially well in low-memory environments, and will work well in many cases. However, for highly complex applications that need more reliable data integrity and programmatic extensibility, a PostgreSQL database will be a more robust and flexible choice. In order to configure your Ruby on Rails setup to use PostgreSQL, you will need to perform a few additional steps to get it up and running.

In this tutorial, you will set up a Ruby on Rails development environment connected to a PostgreSQL database on a local macOS machine. You will install and configure PostgreSQL, and then test your setup by creating a Rails application that uses PostgreSQL as its database server.

Prerequisites

This tutorial requires the following:

  • One computer or virtual machine with macOS installed, with administrative access to that machine and an internet connection. This tutorial has been tested on macOS 10.14 Mojave.

  • A Ruby on Rails development environment installed on your macOS machine. To set this up, follow our guide on How To Install Ruby on Rails with rbenv on macOS. This tutorial will use version 2.6.3 of Ruby and 5.2.3 of Rails; for information on the latest versions, check out the official sites for Ruby and Rails.

Step 1 — Installing PostgreSQL

In order to configure Ruby on Rails to create your web application with PostgreSQL as a database, you will first install the database onto your machine. Although there are many ways to install PostgreSQL on macOS, this tutorial will use the package manager Homebrew.

There are multiple Homebrew packages to install different versions of PostgreSQL. To install the latest version, run the following command:

  • brew install postgresql

If you would like to download a specific version of PostgreSQL, replace postgresql in the previous command with your desired package. You can find the available packages at the Homebrew website.

Next, include the PostgreSQL binary in your PATH variable in order to access the PostgreSQL command line tools, making sure to replace the 10 with the version number you are using:

  • echo 'export PATH="/usr/local/opt/postgresql@10/bin:$ PATH"' >> ~/.bash_profile

Then, apply the changes you made to your ~/.bash_profile file to your current shell session:

  • source ~/.bash_profile

To start the service and enable it to start at login, run the following:

  • brew services start postgresql@10

Check to make sure the installation was successful:

  • postgres -V

You will get the following output:

Output
postgres (PostgreSQL) 10.9

Once PostgreSQL is installed, the next step is to create a role that your Rails application will use later to create your database.

Step 2 — Creating a Database Role for Your Application

In PostgreSQL, roles can be used to organize permissions and authorization. When starting PostgreSQL with Homebrew, you will automatically have a superuser role created with your macOS username. In order to keep these superuser privileges separate from the database instance you use for your Rails application, in this step you will create a new role with less access.

To create a new role, run the following command, replacing appname with whatever name you’d like to give the role:

  • createuser -P -d appname

In this command, you used createuser to create a role named appname. The -d flag gave the role the permission to create new databases.

You also specified the -P flag, which means you will be prompted to enter a password for your new role. Enter your desired password, making sure to record it so that you can use it in a configuration file in a future step.

If you did not use the -P flag and want to set a password for the role after you create it, enter the PostgreSQL console with the following command:

  • psql postgres

You will receive the following output, along with the prompt for the PostgreSQL console:

Output
psql (10.9) Type "help" for help. postgres=#

The PostgreSQL console is indicated by the postgres=# prompt. At the PostgreSQL prompt, enter this command to set the password for the new database role, replacing the highlighted name with the one you created:

  • \password appname

PostgreSQL will prompt you for a password. Enter your desired password at the prompt, then confirm it.

Now, exit the PostgreSQL console by entering this command:

  • \q

Your usual prompt will now reappear.

In this step, you created a new PostgreSQL role without superuser privileges for your application. Now you are ready to create a new Rails app that uses this role to create a database.

Step 3 — Creating a New Rails Application

With your role configured for PostgreSQL, you can now create a new Rails application that is set up to use PostgreSQL as a database.

First, navigate to your home directory:

  • cd ~

Create a new Rails application in this directory, replacing appname with whatever you would like to call your app:

  • rails new appname -d=postgresql

The -d=postgresql option sets PostgreSQL as the database.

Once you’ve run this command, a new folder named appname will appear in your home directory, containing all the elements of a basic Rails application.

Next, move into the application’s directory:

  • cd appname

Now that you have created a new Rails application and have moved into the root directory for your project, you can configure and create your PostgreSQL database from within your Rails app.

Step 4 — Configuring and Creating Your Database

When creating the development and test databases for your application, Rails will use the PostgreSQL role that you created in Step 2. To make sure that Rails creates these databases, you will alter the database configuration file of your project. You will then create your databases.

One of the configuration changes to make in your Rails application is to add the password for the PostgreSQL role you created in the last step. To keep sensitive information like passwords safe, it is a good idea to store this in an environment variable rather than to write it directly in your configuration file.

To store your password in an environment variable at login, run the following command, replacing APPNAME with the name of your app and PostgreSQL_Role_Password with the password you created in the last step:

  • echo 'export APPNAME_DATABASE_PASSWORD="PostgreSQL_Role_Password"' >> ~/.bash_profile

This command writes the export command to your ~/.bash_profile file so that the environment variable will be set at login.

To export the variable for your current session, use the source command:

  • source ~/.bash_profile

Now that you have stored your password in your environment, it’s time to alter the configuration file.

Open your application’s database configuration file in your preferred text editor. This tutorial will use nano:

  • nano config/database.yml

Under the default section, find the line that says pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %> and add the following highlighted lines, filling in your credentials and the environment variable you created. It should look something like this:

config/database.yml
... default: &default   adapter: postgresql   encoding: unicode   # For details on connection pooling, see Rails configuration guide   # http://guides.rubyonrails.org/configuring.html#database-pooling   pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>   username: appname   password: <%= ENV['APPNAME_DATABASE_PASSWORD'] %>  development:   <<: *default   database: appname_development ... 

This will make the Rails application run the database with the correct role and password. Save and exit by pressing CTRL+X, Y, then ENTER.

For more information on configuring databases in Rails, see the Rails documentation.

Now that you have made changes to config/database.yml, create your application’s databases by using the rails command:

  • rails db:create

Once Rails creates the database, you will receive the following output:

Output
Created database 'appname_development' Created database 'appname_test'

As the output suggests, this command created a development and test database in your PostgreSQL server.

You now have a PostgreSQL database connected to your Rails app. To ensure that your application is working, the next step is to test your configuration.

Step 5 — Testing Your Configuration

To test that your application is able to use the PostgreSQL database, try to run your web application so that it will show up in a browser.

First, you’ll use the built-in web server for Rails, Puma, to serve your application. This web server comes with Rails automatically and requires no additional setup. To serve your application, run the following command:

  • rails server --binding=127.0.0.1

--binding binds your application to a specified IP. By default, this flag will bind Rails to 0.0.0.0, but since this means that Rails will listen to all interfaces, it is more secure to use 127.0.0.1 to specify the localhost. By default, the application listens on port 3000.

Once your Rails app is running, your command prompt will disappear, replaced by this output:

Output
=> Booting Puma => Rails 5.2.3 application starting in development => Run `rails server -h` for more startup options Puma starting in single mode... * Version 3.12.1 (ruby 2.6.3-p62), codename: Llamas in Pajamas * Min threads: 5, max threads: 5 * Environment: development * Listening on tcp://127.0.0.1:3000 Use Ctrl-C to stop

To test if your application is running, open up a new terminal window on your server and use the curl command to send a request to 127.0.0.1:3000:

  • curl http://127.0.0.1:3000

You will receive a lot of output in HTML, ending in something like:

Output
... <strong>Rails version:</strong> 5.2.3<br /> <strong>Ruby version:</strong> 2.6.3 (x86_64-darwin18) </p> </section> </div> </body> </html>

You can also access your Rails application in a local web browser by visiting:

http://127.0.0.1:3000 

At this URL, you will find a Ruby on Rails welcome page:

Ruby on Rails Welcome Page

This means that your application is properly configured and connected to the PostgreSQL database.

Conclusion

In this tutorial, you created a Ruby on Rails web application that was configured to use PostgreSQL as a database on a local macOS machine. If you would like to learn more about the Ruby programming language, check out our How To Code in Ruby series.

For more information on choosing a database for your application, check out our tutorial on the differences between and use cases of SQLite, PostgreSQL, and MySQL. If you want to read more about how to use databases, see our An Introduction to Queries in PostgreSQL article, or explore DigitalOcean’s Managed Databases product.

DigitalOcean Community Tutorials