Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation

Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation

In the previous three posts in this series, we used Tableau Prep Builder to create a workflow to join six tables from two data sources. Our flow contains 11 steps. The challenge presented with the data we have from the Census comes from the different levels of detail (LOD) our sales data has—specific transactions every day for the last four years—and the Census data, which includes population data aggregated at the State-level for the same four years. The Census data is much less detailed.

This is a common issue with financial data, for example, because budget data tends to be less detailed than the transaction-level details you can obtain from business systems. So, we need to use an aggregate step to reduce the LOD of the Superstore sales data so that it’s the same as the Census data.

Using the Aggregate Step in Tableau Prep

The Census data provides population data by State and Year. We’ll use an aggregate step to reduce the detail in the Superstore data to match the Census data.

After performing the aggregation of the Superstore data, we can now get a resulting set that combines sales and population data for every state. The Census data actually includes information for Alaska and Hawaii, for which we don’t have any sales. That’s fine. Delaware is also mismatched in 2018 because there were no sales in the Superstore data that year. By using the aggregate step, we’ve now brought both data sources together and achieved a consistent LOD.

Creating a Per-Capital Sales Calculation

The primary reason I wanted to bring in the Census data was to normalize the sales by state for population. Superstore is anonymized retail sales information. While it’s useful to know that California has the most sales in this data, how does Vermont compare to California when you take into account the population differences between states?

Saving Your Tableau Prep Workflow

At the end of the last video, I showed you how to add an output step, save a workflow and then run the workflow to generate a Hyper output file that we’ll use to build a visualization of the data. Saving the workflow enables you to reuse it later. Running the workflow generates the Hyper data.

Cleaning up the Workflow Visualization

We now have a workflow that consists of 18 steps. Let’s step back for a moment and enhance the information in our workflow so that other people can more easily follow its logic:

The video shows how you can change step colors and add comments, making each step easier for others to follow and understand. I didn’t rename any of them. You may also experiment with the ways you can use color, descriptions and step renaming to make your flows easier for other people to track with.

In the next post in this series, I’m going to build a visualization using the dataset we created in the workflow.

The post Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation appeared first on InterWorks.


Understanding Tableau Prep and Conductor: Clean and Join Steps

Understanding Tableau Prep and Conductor: Clean and Join Steps

The last post in this series showed you how to use the Wildcard union and the Manual union to join four years of sales data that was stored in four different worksheets. In this post, you’ll see how the cleaning step can be used to do the following:

  • Change field-data roles
  • Group and replace members of a specific field set
  • Change the data types of a field and rename fields

All this can be done by using the Profile Cards within the Profile Pane in Tableau Prep Builder.

Using the Clean Step in Tableau Prep

The cleaning step behaves a lot like the data interpreter in Tableau Desktop, but it adds additional visualizations in the Profile Pane and the field Profile Cards that help you understand the shape and contents of each field. In addition, clicking on elements within the Profile Cards highlights related details in the other field cards. I find myself using the cleaning step frequently just to examine the results of prior steps.

Adding a Join Step

The Superstore dataset also includes another worksheet containing the names of Regional Managers. Regional Managers are assigned groups of States. We’ll use a join to add that information to the flow:

Now that the sales data from the Superstore tables have been cleaned up, we will bring some public data from the Census Bureau so that we can enhance our sales data by normalizing sales for the population by state.

Adding the Census Data into the Flow

I’ve been using Census data for many years. It’s useful when you want to account for the population density in analyses. In the example we’re building, this data will ultimately be used to express the sales by state in a way that accounts for the population density of each geography.

Using the Pivot Step and Tableau Prep Builder

The Census data isn’t perfectly formatted. We’ll use Prep Builder to fix the structural problems in that dataset:

In the video, I chose to do most of the field clean-up in the pivot step. I could have performed the same cleaning operations in the cleaning step that I added after the pivot. If the work you’re doing is going to be utilized only by use, fewer steps may save you time. If you work with a team of people who are new to Prep Builder, adding more steps to segregate individual cleaning operations may make your flow easier for others to understand. There aren’t hard and fast rules.

This workflow now includes two different data sources and six tables. You’ve seen two different ways to create a union; you’ve seen a join step and a pivot step; and you’ve learned about different ways you can use the cleaning step to improve the formatting and consistency of the data in the workflow. My colleague Katie wrote a blog post that takes a closer look at splitting and pivoting your data, so read it if you need more in-depth insights into those steps. For further information on cleansing your data, look at my colleague Spencer‘s blog on the topic.

In the next post in this series, we’re going to join the Superstore data to the Census data. Because these two data sources are not aggregated in the same way, we’ll be presented with a challenge that we’ll address with an aggregate step.

The post Understanding Tableau Prep and Conductor: Clean and Join Steps appeared first on InterWorks.


Tableau Class Notes: A Use Case for the Self Join

It’s no secret that Tableau Prep is a game changer for those looking to clean, explore and connect their data sources together. There are myriad join options, from your traditional left, right and inner joins to more advanced options such as Left Unmatched Only, which will include only rows that exist on the left data source. But one join type eluded me—the self join. So when my student, Greg, asked me for a good example of a self join, I knew why I would do it but not how.

Enter Harry Potter.

Here’s what I am trying to accomplish: I have a list of all the Harry Potter films, the year of their release and what their sequel is:

self joins in Tableau Prep

Using the Self Join in Tableau Prep

I want a new data source that will show me a few new columns per movie: what is the sequel’s release date, and what is the sequel’s sequel? For example, for Harry Potter and the Sorcerer’s Stone, I should still see the release date of 2001, the sequel should still be Harry Potter and the Chamber of Secrets, but I also want a column for the release date of 2002 for Chamber of Secrets, and for Prisoner of Azkaban as the sequel to Chamber of Secrets.

In order to get that data source, I can do a self join in Tableau Prep (note that this could also be achieved in Tableau Desktop). While this isn’t a pre-defined join option, I can join my Sheet 1 to my Sheet 1. What’s interesting is my join clause, and this took a bit of experimentation.

Finding the Right Join Clause

The challenge I encountered was, how do I join my two instances of the same sheet together in order to get the information I wanted? I needed to make sure that I was pulling in each sequel’s release date as well as its own sequel, information contained in the three columns of my original data source:

self joins in Tableau Prep

If I join my two worksheets with Sequel=Movie, I can then obtain information for that sequel’s release date, as well as the sequel’s sequel:

self joins in Tableau Prep

Of course, I will have one result excluded. As the first film in the series, Harry Potter and the Sorcerer’s Stone is not a sequel. I can discard this, add a clean step and voila! I have all the data I need:

self joins in Tableau Prep

Self joins are a great alternative to what might have been a very tedious copy-paste situation. Thank you, Greg, for the challenge!

The post Tableau Class Notes: A Use Case for the Self Join appeared first on InterWorks.


Reuven Lerner: Want to improve your Python skills? Join the upcoming cohort of Weekly Python Exercise!

For more than 20 years, I’ve been teaching Python courses to companies around the world. This means that just about every day, I’m on the front lines of Python learning.  I see, first-hand, what companies want people to learn and also what people are struggling to understand.

Weekly Python Exercise logo

The result is Weekly Python Exercise, my course that’s designed to make you more fluent in Python by giving you (surprise, surprise) a weekly Python exercise.

I’m starting a new advanced (B) level cohort on Tuesday, March 12th.   Over 15 weeks, participants in the cohort will improve their understanding of Python data structures, objects, threads, functions, iterators, and more.  Plus, every exercise now comes with automated tests written with “pytest” — so you can not only check if your code fits the specs I’ve provided, but also learn more about how to use pytest!

If you’ve always wanted to improve your Python, then there’s no better way to do it than WPE.  Want to learn more?  Just go to WeeklyPythonExercise.com. From that page, you can learn about WPE, sign up for a free sample version (with two exercises), and even register for the course.

If you’ve felt stuck with Python and have always wanted to push your Python skills ahead, then I encourage you to learn more about the B1 (advanced level, part 1) cohort that will start in March.

Learn more about Weekly Python Exercise

The post Want to improve your Python skills? Join the upcoming cohort of Weekly Python Exercise! appeared first on Lerner Consulting Blog.

Planet Python