Stack Abuse: Doubly Linked List with Python Examples

This is the third article in the series of articles on implementing linked list with Python. In Part 1 and Part 2 of the series we studied single linked list in detail. In this article, we will start our discussion about doubly linked list, which is actually an extension of single linked list.

In single linked list each node of the list has two components, the actual value of the node and the reference to the next node in the linked list. In the doubly linked list, each node has three components: the value of the node, the reference to the previous node, and the reference to the next node. For the start node of the doubly linked list, the reference to the previous node is null. Similarly, for the last node in the doubly linked list, the reference to next node is null.

Pros and Cons of a Doubly Linked List

Following are some of the pros and cons of a doubly linked list:

Pros

  • Unlike a single linked list, the doubly linked list can be traversed and searched in both directions. The reference to the next node helps in traversing the node in the forward direction while the references to the previous nodes allow traversal in the backward direction.
  • Basic operations such as insertion and deletion are easier to implement in the doubly linked lists since, unlike single linked lists, we do not need to traverse to the predecessor node and store its reference. Rather, in a doubly linked list the reference of the predecessor node can be retrieved from the node that we want to delete.

Cons

  • One of the major drawbacks of the doubly linked list is that you need more memory space to store one extra reference for each node.
  • A few additional steps are required to be performed in order to perform insertion and deletion operations.

Implementing the Doubly Linked List with Python

In this section, we will see how we can create a very simple doubly linked list in Python. If you have read Part 1 and Part 2 of this series of articles, the code should be pretty straight-forward.

As always, let’s first create a class for the single node in the list. Add the following code to your file:

class Node:       def __init__(self, data):         self.item = data         self.nref = None         self.pref = None 

You can see in the above code, we create a Node class with three member variables: item, nref, and pref. The item variable will store the actual data for the node. The nref stores the reference to the next node, while pref stores the reference to the previous node in the doubly linked list.

Next, we need to create the DoublyLinkedList class, which contains different doubly linked list related functions. Add the following code:

class DoublyLinkedList:       def __init__(self):         self.start_node = None 

Throughout this article we will keep adding functions to this class.

Inserting Items in Doubly Linked List

In this section, we will see the different ways of inserting items in a doubly linked list.

Inserting Items in Empty List

The easiest way to insert an item in a doubly linked list is to insert an item in the empty list. The following script inserts an element at the start of the doubly linked list:

 def insert_in_emptylist(self, data):         if self.start_node is None:             new_node = Node(data)             self.start_node = new_node         else:             print("list is not empty") 

In the script above, we define a method insert_in_emptylist(). The method first checks whether the self.start_node variable is None or not. If the variable is None, it means that the list is actually empty. Next, a new node is created and its value is initialized by the value passed as a parameter to the data parameter of the insert_in_emptylist() function. Finally, the value of self.start_node variable is set to the new node. In case if the list is not empty, a message is simply displayed to the user that the list is not empty.

Add the insert_in_emptylist() method to the DoublyLinkedList class that you created earlier.

Inserting Items at the Start

To insert an item at the beginning of the doubly linked list, we have to first check whether the list is empty or not. If the list is empty, we can simply use the logic defined in the insert_in_emptylist() to insert the element since in an empty list, the first element is always at the start.

Else, if the list is not empty, we need to perform three operations:

  1. For the new node, the reference to the next node will be set to self.start_node.
  2. For the self.start_node the reference to the previous node will be set to the newly inserted node.
  3. Finally, the self.start_node will become the newly inserted node.

The following script inserts an item at the start of the doubly linked list:

    def insert_at_start(self, data):         if self.start_node is None:             new_node = Node(data)             self.start_node = new_node             print("node inserted")             return         new_node = Node(data)         new_node.nref = self.start_node         self.start_node.pref = new_node         self.start_node = new_node 

Add the insert_at_start() method to the DoublyLinkedList class that you created earlier.

Inserting Items at the End

Inserting an element at the end of the doubly linked list is somewhat similar to inserting an element at the start. At first, we need to check if the list is empty. If the list is empty then we can simply use the insert_in_emptylist() method to insert the element. If the list already contains some element, we traverse through the list until the reference to the next node becomes None. When the next node reference becomes None it means that the current node is the last node.

The previous reference for the new node is set to the last node, and the next reference for the last node is set to the newly inserted node. The script for inserting an item at the last node is as follows:

    def insert_at_end(self, data):         if self.start_node is None:             new_node = Node(data)             self.start_node = new_node             return         n = self.start_node         while n.nref is not None:             n = n.nref         new_node = Node(data)         n.nref = new_node         new_node.pref = n 

Add the insert_at_end() method to the DoublyLinkedList class that you created earlier.

Inserting Item after another Item

To insert an item after another item, we first check whether or not the list is empty. If the list is actually empty, we simply display the message that the “list is empty”.

Otherwise we iterate through all the nodes in the doubly linked list. In case if the node after which we want to insert the new node is not found, we display the message to the user that the item is not found. Else if the node is found, it is selected and we perform four operations:

  1. Set the previous reference of the newly inserted node to the selected node.
  2. Set the next reference of the newly inserted node to the next reference of the selected.
  3. If the selected node is not the last node, set the previous reference of the next node after the selected node to the newly added node.
  4. Finally, set the next reference of the selected node to the newly inserted node.

The script for inserting item after another item is as follows:

    def insert_after_item(self, x, data):         if self.start_node is None:             print("List is empty")             return         else:             n = self.start_node             while n is not None:                 if n.item == x:                     break                 n = n.nref             if n is None:                 print("item not in the list")             else:                 new_node = Node(data)                 new_node.pref = n                 new_node.nref = n.nref                 if n.nref is not None:                     n.nref.prev = new_node                 n.nref = new_node 

Add the insert_after_item() method to the DoublyLinkedList class that you created earlier.

Inserting Item before another Item

To insert an item before another item, we first check whether or not the list is empty. If the list is actually empty, we simply display the message that the “list is empty”.

Otherwise we iterate through all the nodes in the doubly linked list. In case if the node before which we want to insert the new node is not found, we display the message to the user that the item is not found. Else if the node is found, it is selected and we perform four operations:

  1. Set the next reference of the newly inserted node to the selected node.
  2. Set the previous reference of the newly inserted node to the previous reference of the selected.
  3. Set the next reference of the node previous to the selected node, to the newly added node.
  4. Finally, set the previous reference of the selected node to the newly inserted node.

The script for adding item before another item in a doubly linked list is as follows:

    def insert_before_item(self, x, data):         if self.start_node is None:             print("List is empty")             return         else:             n = self.start_node             while n is not None:                 if n.item == x:                     break                 n = n.nref             if n is None:                 print("item not in the list")             else:                 new_node = Node(data)                 new_node.nref = n                 new_node.pref = n.pref                 if n.pref is not None:                     n.pref.nref = new_node                 n.pref = new_node 

Add the insert_before_item() method to the DoublyLinkedList class that you created earlier.

Traversing a Doubly Linked List

Traversing a doubly linked list is very similar to traversing a single linked list. The script is as follows:

    def traverse_list(self):         if self.start_node is None:             print("List has no element")             return         else:             n = self.start_node             while n is not None:                 print(n.item , " ")                 n = n.nref 

Add the traverse_list() method to the DoublyLinkedList class that you created earlier.

Deleting Elements from Doubly Linked List

Like insertion, there can be multiple ways to delete elements from a doubly linked list. In this section, we will review some of them.

Deleting Elements from the Start

The easiest way to delete an element from a doubly linked list is from the start. To do so, all you have to do is set the value of the start node to the next node and then set the previous reference of the start node to None. However before we do that we need to perform two checks. First, we need to see if the list is empty. And then we have to see if the list contains only one element or not. If the list contains only one element then we can simply set the start node to None. The following script can be used to delete elements from the start of the doubly linked list.

   def delete_at_start(self):         if self.start_node is None:             print("The list has no element to delete")             return          if self.start_node.nref is None:             self.start_node = None             return         self.start_node = self.start_node.nref         self.start_prev = None; 

Add the delete_at_start() method to the DoublyLinkedList class that you created earlier.

Deleting Elements from the End

To delete the element from the end, we again check if the list is empty or if the list contains a single element. If the list contains a single element, all we have to do is to set the start node to None. If the list has more than one element, we iterate through the list until the last node is reached. Once we reach the last node, we set the next reference of the node previous to the last node, to None which actually removes the last node. The following script can be used to delete the element from the end.

    def delete_at_end(self):         if self.start_node is None:             print("The list has no element to delete")             return          if self.start_node.nref is None:             self.start_node = None             return         n = self.start_node         while n.nref is not None:             n = n.nref         n.pref.nref = None 

Add the delete_at_end() method to the DoublyLinkedList class that you created earlier.

Deleting Elements by Value

Deleting an element by value is the trickiest of all the deletion functions in doubly linked lists since several cases have to be handled in order to remove an element by value. Let’s first see how the function looks like and then we will see the explanation of the individual piece of code.

    def delete_element_by_value(self, x):         if self.start_node is None:             print("The list has no element to delete")             return          if self.start_node.nref is None:             if self.start_node.item == x:                 self.start_node = None             else:                 print("Item not found")             return           if self.start_node.item == x:             self.start_node = self.start_node.nref             self.start_node.pref = None             return          n = self.start_node         while n.nref is not None:             if n.item == x:                 break;             n = n.nref         if n.nref is not None:             n.pref.nref = n.nref             n.nref.pref = n.pref         else:             if n.item == x:                 n.pref.nref = None             else:                 print("Element not found") 

In the above script we create delete_element_by_value() function that takes the node value as parameter and deletes that node. At the beginining of the function we check if the list is empty or not. If the list is empty we simply display the user that the list is empty.

This logic is implemented in the following piece of code:

        if self.start_node is None:             print("The list has no element to delete")             return  

Next, we check if the list has a single element and that element is actually the element we want to delete. If the only element is the one that we want to delete, we simply set the self.start_node to None which means that the list will now have no item. If there is only one item and that is not the item that we want to delete, we will simply display the message that item to be deleted is not found.

The following piece of code implements this logic:

        if self.start_node.nref is None:             if self.start_node.item == x:                 self.start_node = None             else:                 print("Item not found")             return  

Next, we handle the case where the list has more than one items but the item to be deleted is the first item. In that case we simply execute the logic that we wrote for the method delete_at_start(). The following piece of code deletes an element from the start in case of multiple items:

        if self.start_node.item == x:             self.start_node = self.start_node.nref             self.start_node.pref = None             return 

Finally, if the list contains multiple items and the item to be deleted is not the first item, we traverse all the elements in the list except the last one and see if any of the nodes has the value that matches the value be deleted. If the node is found, we perform the following two operations:

  1. Set the value of the next reference of the previous node to the next reference of the node to be deleted.
  2. Set the previous value of the next node to the previous reference of the node to be deleted.

Finally, if the node to be deleted is the last node, the next reference of the node previous to the last node is set to None. The following script implements this logic:

        n = self.start_node         while n.nref is not None:             if n.item == x:                 break;             n = n.nref         if n.nref is not None:             n.pref.nref = n.nref             n.nref.pref = n.pref         else:             if n.item == x:                 n.pref.nref = None             else:                 print("Element not found") 

Add the delete_element_by_value() method to the DoublyLinkedList class that you created earlier.

Reversing a Doubly Linked List

To reverse a doubly linked list, you basically have to perform the following operations:

  1. The next reference of the start node should be set none because the first node will become the last node in the reversed list.
  2. The previous reference of the last node should be set to None since the last node will become the previous node.
  3. The next references of the nodes (except the first and last node) in the original list should be swapped with the previous references.

The script for reversing a doubly linked list is as follows:

    def reverse_linked_list(self):         if self.start_node is None:             print("The list has no element to delete")             return          p = self.start_node         q = p.nref         p.nref = None         p.pref = q         while q is not None:             q.pref = q.nref             q.nref = p             p = q             q = q.pref         self.start_node = p 

Add the reverse_linked_list() method to the DoublyLinkedList class that you created earlier.

Testing Doubly Linked List Functions

In this section, we will test the doubly linked functions that we created in the previous sections.

Let’s first create the object of the DoublyLinkedList class. Execute the following script:

new_linked_list = DoublyLinkedList()   
Testing Insertion Functions

Let’s test the insertion functions first. We’ll first add elements in the empty list. Execute the following script:

new_linked_list.insert_in_emptylist(50)   

Now if you traverse the list, you should see 50 as the only element in the list as shown below:

new_linked_list.traverse_list()   

Output:

50   

Now let’s add a few elements at the start. Execute the following script:

new_linked_list.insert_at_start(10)   new_linked_list.insert_at_start(5)   new_linked_list.insert_at_start(18)   

Now if you traverse the list, you should see the following elements in the list:

18   5   10   50   

To add the elements at the end, execute the following script:

new_linked_list.insert_at_end(29)   new_linked_list.insert_at_end(39)   new_linked_list.insert_at_end(49)   

Now if you traverse the doubly linked list, you should see the following elements:

18   5   10   50   29   39   49   

Let’s insert an element after 50.

new_linked_list.insert_after_item(50, 65)   

Now the list should look like this:

18   5   10   50   65   29   39   49   

Finally, let’s add an element before item 29.

new_linked_list.insert_before_item(29, 100)   

The list at this point of time, should contain the following elements:

18   5   10   50   65   100   29   39   49   
Testing Deletion Functions

Let’s now test the deletion functions on the items that we inserted in the last sections. Let’s first delete an element from the start.

new_linked_list.delete_at_start()   

Item 18 will be removed and the list will now look like this:

5   10   50   65   100   29   39   49   

Similarly, the following script deletes the element from the end of the doubly linked list:

new_linked_list.delete_at_end()   

Traversing the list now will return the following items:

5   10   50   65   100   29   39   

Finally, you can also delete the elements by value using the delete_element_by_value() function as shown below:

new_linked_list.delete_element_by_value(65)   

If you traverse the list now, you will see that item 65 will be deleted from the list.

Testing Reverse Function

Finally, let’s reverse the list using the reverse_linked_list() function. Execute the following script:

new_linked_list.reverse_linked_list()   

Now if you traverse the list, you will see the reversed linked list:

39   29   100   50   10   5   

Conclusion

The doubly linked list is extremely useful specifically when you have to perform lots of inserts and delete operations. The links to the previous and next nodes make it very easy to insert and delete new elements without keeping track of the previous and next nodes.

In this article, we saw how doubly linked list can be implemented with Python. We also saw different ways to perform insert and delete operations on doubly linked list. Finally we studied how to reverse a doubly linked list.

Planet Python

Test and Code: 67: Teaching Python in Middle School

In today’s episode we talk with Kelly Paredes & Sean Tibor.
They teach Python in a middle school in Florida, and talk about this experience on the podcast Teaching Python.

I love that they include physical computing right from the start, and everything else they are doing.

It’s a fun interview.

Special Guests: Kelly Paredes and Sean Tibor.

Sponsored By:

Support Test & Code – Software Testing, Development, Python

Links:

<p>In today&#39;s episode we talk with Kelly Paredes &amp; Sean Tibor. <br> They teach Python in a middle school in Florida, and talk about this experience on the podcast <q>Teaching Python</q>.</p> <p>I love that they include physical computing right from the start, and everything else they are doing.</p> <p>It&#39;s a fun interview.</p><p>Special Guests: Kelly Paredes and Sean Tibor.</p><p>Sponsored By:</p><ul><li><a rel=”nofollow” href=”http://testandcode.com/pycharm”>PyCharm Professional</a>: <a rel=”nofollow” href=”http://testandcode.com/pycharm”>Try PyCharm Pro for an extended 4 month trial before deciding which version you need. If you value your time, you owe it to yourself to try PyCharm.</a> Promo Code: TESTNCODE2019</li></ul><p><a rel=”payment” href=”https://www.patreon.com/testpodcast”>Support Test &amp; Code – Software Testing, Development, Python</a></p><p>Links:</p><ul><li><a title=”Teaching Python” rel=”nofollow” href=”https://www.teachingpython.fm/”>Teaching Python</a></li></ul>
Planet Python

Build your own caching mechanisms with volatile collections

The literal definition of the word volatile is “readily vaporizable,” with a secondary connotation of “changeable.” This month’s tool matches this description nicely, as its implementation will cause the objects contained in a hashtable to vaporize, and the rate of that vaporization is indeed changeable.

But more on that in a moment. First, I’ll address reader feedback from last month’s column.

Reflections

Half a dozen readers wrote in to inform me that the following block of code (which relates to the GlobalValues tool we discussed two columns back) is a performance bottleneck for multithreaded access.

To read this article in full, please click here

JavaWorld Cool Tools

Preppin’ Data Project: Week 1

Preppin Data project in Tableau

Note: A big thank you to Carl Allchin and Jonathan Allenby for initiating the Preppin’ Data project for our community.

Hunker down, folks. It’s time to prep some data. Welcome to my submission for the first week of the Preppin’ Data project. For beginners, the idea of preparing data can be intimidating. My hope is to provide insight into my process. There are certainly many ways to achieve the same data goals, but here’s where we started:

Preppin Data project in Tableau

Above: Instructions from Week 1 of the Preppin’ Data Project

Preppin Data project in Tableau

The Steps in My Data Prep Process

  • Input Excel: The data lives on a Google Drive with the ability to download as .xlsx. I initially thought it’d be cool to connect directly to a Google Sheet of the info, but, alas, we don’t have that as a connector yet. If you’d like the ability to connect to Google Sheets using Tableau Prep, upvote this idea. In this scenario, I downloaded the data from Google Drive and connected via Excel in Prep.
  • Explore Data: It’s a good practice to explore before making changes to your data to ensure you’re not duplicating efforts.
  • Clean Date: I used Tableau’s MAKEDATE function to combine the When Sold Year and When Sold Month. After validating that the new date was reporting correctly, I removed the original two date columns:

Preppin Data project in Tableau

  • Pivot Colors: A clear beacon for pivoting occurs when you notice multiple columns representing very similar values (red cars, silver cars, etc.). I chose to pivot the data to have one column representing the number of vehicles sold, while another column described the vehicle color.

Note: This is the part where I did not read the Preppin’ Data instructions (oops). My third-grade teacher would be disappointed in me. In hindsight, since the instructions asked to retain the car sales per color columns, pivoting overcomplicated the result:

Preppin Data project in Tableau

Tableau Prep Tip: Want to see what your current data structure looks like in Tableau Desktop? Right-click any step in the flow to preview in Tableau Desktop:

Preppin Data project in Tableau

  • Clean Color: Since I now had one column representing all car color types, I felt the word color included in each value was a bit repetitive. Using Prep’s Split functionality, I separated Blue from Blue Cars and then removed the original field.
  • Aggregate Data: To achieve Total Car Sales per Month/Per Car Dealership, I knew the data must be aggregated. At the time, one row of information in the source described a dealership’s sales of a certain car color in a certain month (i.e. Dealership A sold 377 black cars in January 2018). I took a bit of liberty and assumed that per month/per car dealership meant the month/year (i.e. Jan 2018) instead of the summation of all years in a particular month (i.e. all Januaries).
  • Preview After Aggregation: I like to perform a quick sanity check after any big data manipulation, and I’m glad to have noticed that Sale Date came through (to my surprise) as a datetime rather than date. This is significant because date was used later as a part of my join clause, so it’s important that the fields maintain the correct data type. Additionally, I recognized that the field created in the aggregate step was more aptly named Total Cars Sold than Cars Sold.
  • Maintain Original: I made a decision at this point that, while I wanted to know the total cars sold per dealership per month/year, I also wanted to maintain each row individually so I could utilize the Color dimension. The challenge is that if a field is not used in the Aggregate step (like Color), then Tableau drops it from the flow moving forward. A workaround to this scenario is to branch off a copy of your original data and then join it back up once you’ve created the aggregate. In a way, you’re seeing both the Per Month/Per Dealership info AND the Per Month/Dealership/Car Color in the same data result.
  • Join Flows: The common fields within the two flows were Dealership and Sale Date, so I knew those were going to be included in my join clause. An inner, left or right join would have given me the same exact outcome in this situation because the dealerships/sale dates came from the same original source.
  • Preview Results: Since a join always brings ALL columns from BOTH tables, this caused two redundant fields: Sale Date and Dealership. After a quick clean-up, I’d arrived at my final result.
  • Output: Oddly enough, I commonly forget to add an Output step. I guess I can relate it to climbing to the top of a difficult rock wall but forgetting to ring the bell at the end. I was curious to see how each dealership performed over time, as well as the breakdown in car colors over the same period. I added the Output step (CSV in my case), ran the flow to process the entire amount of data and then connected in Tableau Desktop to build this quick visualization:

Lessons Learned

  • Tableau Prep changed the data type of my date field from date to datetime. I’m still not 100% positive why it happened, but my assumption is that it’s due to what my selection is in the Group by level date hierarchy.
  • Read ALL of the instructions before tackling the problem. While I’m not disappointed with the opportunity to walk us through my approach, I recognize that my result is different than the solution. Additionally, the format of my result needs careful consideration when used in Tableau Desktop. Using the SUM aggregation when displaying the Total Cars Sold field could result in an inaccurate visualization.

The post Preppin’ Data Project: Week 1 appeared first on InterWorks.

InterWorks

Install Bash 5 on macOS

The default bash on macOS is still bash v3:

$   bash --version GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin18) Copyright (C) 2007 Free Software Foundation, Inc. 

Just recently, bash v5 was released. The discrepancy comes from the fact that bash has been licensed as GPL v3 since version 4. Apple does not include GPL v3 licensed tools with macOS.

However, nothing is keeping you from downloading and installing the latest bash version.

New features include, among many other things, associated arrays (i.e. dictionaries) and better auto-completion setup.

While you would think this is a common desire, most pages I have found will simply point to Homebrew to download and install a newer bash version.

The main challenge with using brew is that it does not work on the scale that MacAdmins require. brew is designed for single user installation, where the user has administrator privileges. brew’s workflows do not scale to large deployments controlled with a management system.

Ideally, there would be package installer for the latest bash version. Unfortunately, the bash project does not provide one.

In this post, I will show how you can install the latest bash version without brew and how to build an installer package for deployment.

Manual Installation

This requires Xcode or the Developer Command Line Tools to be installed.

First, download the source for the latest bash version from this page. As of this writing the latest version is bash-5.0 and the file you want is bash-5.0.tar.gz. Once downloaded, you can expand the archive in Finder by double-clicking.

Open a Terminal window and change directory to the newly expanded bash-5.0 directory. Then run the configure script there.

$   cd ~/Downloads/bash-5.0 $   ./configure 

The configure process will take a while, there will be plenty of messages showing progress.

Once the configure process is complete. You can build bash with the make command.

$   make

This will build the bash binary and the supporting files in the current directory. That’s not where we want it in the end, but it is probably a good idea see if the build process works. This will (again) take a while. There will be some odd looking warnings, but you can ignore those.

When make succeeds, you can actually install bash v5 with

$   sudo make install 

This will build and install the bash binary and supporting files in /usr/local/bin and /usr/local. sudo is required to modify /usr/local.

If you were just looking for a way to install bash v5 without brew, you are done!

There is more useful information in the rest of the post, though, so keep reading!

How the new and the old bash interact

By default, the bash v5 binary is called bash and will be installed in /usr/local/bin. The macOS default PATH lists /usr/local/bin before /bin where the default bash v3 binary, also called bash, is located.

This means, that when a user types bash in to a shell, the version in /usr/local/bin will be preferred over the pre-installed bash v3.

You can test this behavior in Terminal. Since the default shell has not yet been changed from /bin/bash the Terminal still opens to bash v3. You can test this by showing the BASH_VERSION environment variable:

$   echo $  BASH_VERSION 3.2.57(1)-release 

But when you then run bash it will invoke /usr/local/bin/bash, so it will run the new bash v5. It will show this in the prompt, but you can also verify the BASH_VERSION.

$   bash bash-5.0$   echo $  BASH_VERSION 5.0.0(2)-release 

This might be the setup you want, when you want to use bash v5 always. It might lead to some unexpected behavior for some users, though.

One option to avoid this ambiguity is to rename the binary in /usr/local/bin to bash5. But then other tools such as env (mentioned below) will not find the binary any more.

Note: the PATH in other contexts will likely not contain /usr/local/bin and further confuse matters.

bash v5 and Scripting

Scripts using bash, should have the full path to the binary in the shebang. This way, the script author can control whether a script is executed by the default bash v3 (/bin/bash) or the newer bash v5 (/usr/local/bin/bash or /usr/local/bin/bash5).

It is often recommended to use the env command in the shebang:

#!/usr/bin/env bash 

The env command will determine the path to the bash binary in the current environment. (i.e. using the current PATH) This is useful when the script has to run in various environments where the location of the bash binary is unknown, in other words across multiple Unix and Unix-like platforms. However, this renders the actual version of bash that will interpret the script unpredictable.

For example, assume you have bash v5 installed in the default configuration (as /usr/local/bin/bash. A script with the shebang #!/usr/bin/env bash launched in the user environment (i.e. from Terminal) will use the newer bash, as /usr/local/bin comes before /bin in the search order.

When you launch the same script in a different context, e.g. as an installation script, an AppleScript, or a management system, /usr/local/bin will likely not be part of the PATH in that environment. Then the env shebang will choose /bin/bash (v3). The script will be interpreted and might behave differently.

Administrators prefer certainty in their managed environments. Administrators should know the location and versions of the binaries on their systems. For management scripts, you should avoid env and use the proper full path to the desired interpreter binary.

The solutions to resolve the ambiguity are

  • use the full path to the binary in the shebang
  • manage and update the additional custom version of bash with a management system
  • (optional) rename the newer bash binary to bash5 or bash4 (this also allows you to have bash v4 and bash v5 available on the same system)
  • Scripting OS X: On the Shebang
  • Scripting OS X: Setting the PATH in Scripts

Changing a user’s default Shell to bash v5

Even though we have installed bash v5, the default shell of a new Terminal window will still use the built-in bash v3.

The path to the default shell is stored in the user record. You can directly change the UserShell attribute with dscl, in the ‘Advanced Options’ of the ‘Users & Groups’ preference pane, or in Directory Utility.

There is also a command to set the default shell:

$   chsh -s /usr/local/bin/bash Changing shell for armin. Password for armin:  chsh: /usr/local/bin/bash: non-standard shell 

The chsh (change shell) command will check for allowed shells in the /etc/shells file. You can easily append a line with /usr/local/bin/bash to this file, and then chsh will work fine.

$   chsh -s /usr/local/bin/bash Changing shell for armin. Password for armin:  

Note: if you choose to rename the bash binary, you have to use the changed name in /etc/shells and with chsh.

Remember that just running chsh will not change the shell in the current Terminal window. It is best to close the old Terminal window and open a new one to get the new shell.

Packaging bash v5 for mass deployment

While these steps to install and configure bash v5 on a single Mac are simple enough, they would not work well with a management system for hundreds or thousands of Macs. We want to wrap all the files that make install creates into a package installer payload.

The --help option of the configure script yields this useful information:

By default, make install' will install all the files in/usr/local/bin,/usr/local/libetc. You can specify an installation prefix other than/usr/localusing–prefix, for instance–prefix=$ HOME`.

When we run the configure script with the --prefix option it creates a folder suitable as a payload for a package installer. We can then use pkgbuild to build to create an installer pkg:

$   cd ~/Downloads/bash-5.0 $   mkdir payload $   ./configure --prefix=/Users/armin/Downloads/bash-5.0/payload $   make install $   pkgbuild --root payload --install-location /usr/local --identifier org.gnu.bash --version 5.0 bash-5.0.pkg pkgbuild: Inferring bundle components from contents of payload pkgbuild: Wrote package to bash-5.0.pkg 

(Note: the --prefix argument requires an absolute path.)

Automate the package creation

So, we have our workflow for building an installer package to distribute and configure bash v5:

  • download the archive
  • extract the archive
  • run configure with the --prefix argument
  • run make install to create the files in a payload folder
  • optional: rename the resulting bash binary to bash5 to avoid conflicts
  • add a postinstall script that adds /usr/local/bin/bash[5] to /etc/shells if not yet present
  • build the installer with pkgbuild

This sounds like a workflow ripe for automation. You can get the script from this repository.

You can pass a different (valid) bash version number as an argument to the script, e.g. 4.4.18. (I did not test anything significantly older.) The script does not autodetect the latest version and defaults to version 5.0 when no argument is given. When an update to bash v5 is published, you will have to modify the version line or run the script with an argument.

I have not (yet) figured out how to detect the latest version from the download web page. An autopkg recipe will have to wait for that. (If someone else wants to tackle that, please do!)

Scripting OS X