Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation

Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation

In the previous three posts in this series, we used Tableau Prep Builder to create a workflow to join six tables from two data sources. Our flow contains 11 steps. The challenge presented with the data we have from the Census comes from the different levels of detail (LOD) our sales data has—specific transactions every day for the last four years—and the Census data, which includes population data aggregated at the State-level for the same four years. The Census data is much less detailed.

This is a common issue with financial data, for example, because budget data tends to be less detailed than the transaction-level details you can obtain from business systems. So, we need to use an aggregate step to reduce the LOD of the Superstore sales data so that it’s the same as the Census data.

Using the Aggregate Step in Tableau Prep

The Census data provides population data by State and Year. We’ll use an aggregate step to reduce the detail in the Superstore data to match the Census data.

After performing the aggregation of the Superstore data, we can now get a resulting set that combines sales and population data for every state. The Census data actually includes information for Alaska and Hawaii, for which we don’t have any sales. That’s fine. Delaware is also mismatched in 2018 because there were no sales in the Superstore data that year. By using the aggregate step, we’ve now brought both data sources together and achieved a consistent LOD.

Creating a Per-Capital Sales Calculation

The primary reason I wanted to bring in the Census data was to normalize the sales by state for population. Superstore is anonymized retail sales information. While it’s useful to know that California has the most sales in this data, how does Vermont compare to California when you take into account the population differences between states?

Saving Your Tableau Prep Workflow

At the end of the last video, I showed you how to add an output step, save a workflow and then run the workflow to generate a Hyper output file that we’ll use to build a visualization of the data. Saving the workflow enables you to reuse it later. Running the workflow generates the Hyper data.

Cleaning up the Workflow Visualization

We now have a workflow that consists of 18 steps. Let’s step back for a moment and enhance the information in our workflow so that other people can more easily follow its logic:

The video shows how you can change step colors and add comments, making each step easier for others to follow and understand. I didn’t rename any of them. You may also experiment with the ways you can use color, descriptions and step renaming to make your flows easier for other people to track with.

In the next post in this series, I’m going to build a visualization using the dataset we created in the workflow.

The post Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation appeared first on InterWorks.

InterWorks

Understanding Tableau Prep and Conductor: Clean and Join Steps

Understanding Tableau Prep and Conductor: Clean and Join Steps

The last post in this series showed you how to use the Wildcard union and the Manual union to join four years of sales data that was stored in four different worksheets. In this post, you’ll see how the cleaning step can be used to do the following:

  • Change field-data roles
  • Group and replace members of a specific field set
  • Change the data types of a field and rename fields

All this can be done by using the Profile Cards within the Profile Pane in Tableau Prep Builder.

Using the Clean Step in Tableau Prep

The cleaning step behaves a lot like the data interpreter in Tableau Desktop, but it adds additional visualizations in the Profile Pane and the field Profile Cards that help you understand the shape and contents of each field. In addition, clicking on elements within the Profile Cards highlights related details in the other field cards. I find myself using the cleaning step frequently just to examine the results of prior steps.

Adding a Join Step

The Superstore dataset also includes another worksheet containing the names of Regional Managers. Regional Managers are assigned groups of States. We’ll use a join to add that information to the flow:

Now that the sales data from the Superstore tables have been cleaned up, we will bring some public data from the Census Bureau so that we can enhance our sales data by normalizing sales for the population by state.

Adding the Census Data into the Flow

I’ve been using Census data for many years. It’s useful when you want to account for the population density in analyses. In the example we’re building, this data will ultimately be used to express the sales by state in a way that accounts for the population density of each geography.

Using the Pivot Step and Tableau Prep Builder

The Census data isn’t perfectly formatted. We’ll use Prep Builder to fix the structural problems in that dataset:

In the video, I chose to do most of the field clean-up in the pivot step. I could have performed the same cleaning operations in the cleaning step that I added after the pivot. If the work you’re doing is going to be utilized only by use, fewer steps may save you time. If you work with a team of people who are new to Prep Builder, adding more steps to segregate individual cleaning operations may make your flow easier for others to understand. There aren’t hard and fast rules.

This workflow now includes two different data sources and six tables. You’ve seen two different ways to create a union; you’ve seen a join step and a pivot step; and you’ve learned about different ways you can use the cleaning step to improve the formatting and consistency of the data in the workflow. My colleague Katie wrote a blog post that takes a closer look at splitting and pivoting your data, so read it if you need more in-depth insights into those steps. For further information on cleansing your data, look at my colleague Spencer‘s blog on the topic.

In the next post in this series, we’re going to join the Superstore data to the Census data. Because these two data sources are not aggregated in the same way, we’ll be presented with a challenge that we’ll address with an aggregate step.

The post Understanding Tableau Prep and Conductor: Clean and Join Steps appeared first on InterWorks.

InterWorks

Understanding Tableau Prep and Conductor: Connecting to a Data Source with Builder

Understanding Tableau Prep and Conductor: Connecting to a Data Source

Before you begin building a workflow using Tableau Prep, it’s helpful to know a little bit about the data source(s) you need to connect to. Consider the following:

  • What kind of source are you connecting to?
  • How large are the files? Record counts? Rows/columns?
  • Are you familiar with the data structures?
  • Do you know of any data quality issues?
  • How frequently will you be updating the data?

Understanding the Basics of Your Data Sources

Understanding the basic data structure and size, as well as the granularity of different sources, helps you plan the flow in your mind before you get into the detailed challenges that the transformation of the raw data sources poses.

In the example I’m drawing upon for this series, I’m using a version of Superstore data I created, along with public data from the Census Bureau. I’m going to create a workflow that will combine four tables containing annual sales data and a single dimension table that will be joined to provide regional manager names. These will be joined with a population dataset from the Census, enabling us to normalize sales for population for each state that had sales.

Connecting to the Sales Data

The data used in this example comes from two different spreadsheets: one that contains four (4) sales worksheets and one (1) Regional Manager worksheet, and another spreadsheet containing the census data.

Experienced Tableau Desktop users should be familiar with the Superstore dataset. In this spreadsheet, I’ve separated each year’s sale into its own worksheet. This data could have been in a text file or a database:

Superstore data for Tableau Prep

The Census data provides population estimates for each state for corresponding years:

Census data for Tableau Prep

Because the world isn’t perfect, we will have to deal with data quality issues in these files, different aggregations of the data, union different files, join files, pivot the data and re-aggregate the data. There are also inconsistencies within specific fields that will have to be cleaned. The datasets are small, but every kind of data transformation step that Tableau Prep provides will have to be utilized to prepare the data for analysis in Tableau Desktop. We will also create a calculation in Prep to normalize sales by state and year for the population in each state.

That data is not perfectly clean, and some of the structures aren’t right. That’s the real world. We’ll use Tableau Prep to address all of the issues and create a clean dataset for other people to use.

Connecting to the Superstore Sales Data

In this first video, you’ll see how to make a data connection to an initial data source and then add other files to that data source. We’ll make the following connections:

  1. Connect to the four sales tables
  2. Demonstrate a wildcard union
  3. Demonstrate a manual union

Using the Wildcard Union in Tableau Prep

Wildcard unions offer an efficient way to bring together many different tables with similar naming conventions that also have consistent column and row structures. If you’re working with unfamiliar datasets that may have data inconsistencies, I believe creating a union manually gives you more direct control and may make it easier for you to deal with data quality issues that emerge as you join each table.

Using the Manual Union in Tableau Prep

I like using manual unions when I’m working with a new dataset because it’s easier to identify mismatched field-naming conventions. The inconsistent field names (Sales vs. Revenue) didn’t appear until I brought in the 2018 sales data. The visual cues that Tableau Prep provided, and the filtering in the data grid for mismatched fields, made it very easy to find and merge two different fields that were actually both holding sales data.

It was also easy to make minor changes using the Profile Cards for specific fields. I used that to remove the Table Names field, which Builder adds automatically when you union data. I don’t want to see that information in my output from this workflow, so I removed it. In addition, because Row ID is a number, Builder treated it as a number in the profile pane and generated a histogram in that field’s profile card. I wanted to validate that the Row ID field is a unique key for this dataset, so I changed the field to a string, and the profile card was changed in a way that made it easy to see every Row ID is, in fact, a single row of the data.

In the next post in this series, I’ll show you how to add a cleaning step to make additional modifications to fields.

The post Understanding Tableau Prep and Conductor: Connecting to a Data Source with Builder appeared first on InterWorks.

InterWorks

Understanding Tableau Prep and Conductor: An Overview

Understanding Tableau Prep and Conductor: An Overview

Tableau Prep Builder and Conductor are Tableau’s tools for enabling repeatable, documented and scheduled workflows to transform raw data into user-friendly datasets. Tableau Prep Builder’s first live release came in April 2018 in Desktop 2018.1, and since then, we’ve had four major releases and eight minor updates. Over the past year, Tableau has expanded the Builder feature set substantially.

Tableau Conductor is the server complement to Builder that provides the framework and tools for publishing Builder workflows and datasets to Tableau Server. Tableau Server’s Data Management Add-on released in V2019.1 in February 2019 and enables you to automatically refresh published workflows.

What This Series Will Cover

In this blog series, I’ll be covering the Desktop environment of Prep Builder and all of the related data-transformation tools that are currently provided. I’ll also discuss the basics of installing and managing workflows using the data management add-on toolset for Tableau Server and will complete this series by providing a comparison of Tableau Prep and Conductor to other ETL tools available in the marketplace.

If you are already familiar with the Data Interpreter in Tableau Desktop, your learning curve for Tableau Prep will be short. I’ve kept an eye on Tableau Prep ever since its release, but I haven’t bothered to do a deep dive with the tool until this year (though some of consultants have recognized that Tableau Prep is truly a game-changer). Over the past month, I’ve spent a lot of time playing with Prep and publishing workflows to Tableau Server to see exactly what Tableau is offering now.

Per usual with Tableau, when a new set of features is good enough, they release something new. Over time, Tableau adds more features. Generally, the features added come from a combination of their internal product roadmap and external feedback from the user community. Tableau Prep has now matured to a level where the feature set is equivalent to the Data Interpreter in Tableau Desktop, but it adds the automation and repeatability of a real extract, transform and load tool.

Tableau Prep Work View and Start Page

I’m excited about the possibilities of Tableau Prep and Conductor. While I wouldn’t call this an enterprise-class ETL toolset, Tableau brilliance at user interface is evident. I love the way that Tableau has implemented the transformation features. It’s very “Tableau-like.” Tableau continues to live their mission of making it fast and easy for you to see and understand your data. Here’s a Tableau Prep workflow that I created earlier today:

Tableau Prep Builder start page

There are four areas in every workflow:

  1. The Connection Pane
  2. The Flow Pane
  3. The Profile Pane
  4. The Data Grid

The arrow pointing from the Union 1 step to the Settings area displays a flexible screen space that will change depending on what specific step you click on in the workflow.

In typical Tableau fashion, every icon has a specific meaning. Tableau automatically assigns different colors to each step. I’ve altered some of the colors to my taste and to better reflect common sources (see how the Orders 2015Orders 2018 are the same color). Notice that I’ve added customized descriptions next to each step in the workflow. Tableau Prep also allows you to change the system-assigned names for each step within the flow, which are displayed below each step heading.

In addition, I pasted some summary information from the Start page in Builder:

Tableau Prep workflow

The Builder Start page is very similar to the Start Page in Tableau Desktop. Connections are accessed using the plus sign (+) on the left, available workflows are displayed in the center, and the Discover pane on the right supplies the same type of training videos and resources that you’ve become familiar with using Tableau Desktop. Hovering your mouse over existing flows causes summary information about the flow to display.

I appreciate Tableau’s design consistency between Desktop and Prep Builder. The interface behaves in a way that makes learning Prep Builder easy for any experienced Tableau Desktop user.

In the next post, we’ll demonstrate how you connect to one or more data sources, and we’ll review the different ways you can join and union those data sources.

The post Understanding Tableau Prep and Conductor: An Overview appeared first on InterWorks.

InterWorks

Split and Pivot Your Data with Tableau Prep

running the workflow in Tableau Prep

With the release of Tableau Prep in April 2018, analysts have been given an incredibly valuable tool for their analytical kit. Considering both performance and ease of use, prepping your data before it enters Tableau Desktop has many benefits. I’ve created the following walkthrough to demonstrate just one of many data prep scenarios our clients have faced.

Tableau Prep in Practice

Business Goal: How many customers have selected a specific service type?

Data Goal: Rather than concatenating a customer’s selections in a single value (one row per customer), display all service type selections in a single column with one row per customer/service type combination (multiple rows per customer).

Why: This makes it easier to perform analytics on the Service Type field, such as answering questions like, “How many people selected Financial Literacy?”

Original

Tableau Prep fields concatenated

Above: Service Types are concatenated in a single value per customer

Goal 

Tableau Prep fields not concatenated

Above: Service Types are listed in one column with rows for each Service Type per customer

How: 

Step 1: Connect to data with an Input step.

Input step in Tableau Prep

Step 2: Split all values of the concatenated column into separate fields using a Clean step.

Clean step in Tableau Prep

Step 3: Remove the unnecessary, original concatenated column.

remove unnecessary fields in Tableau Prep

Step 4: Pivot the Split columns using a Pivot step.

removing the Split columns in Tableau Prep

Step 5: Remove the unnecessary Pivot Names field.

removing Pivot Names field in Tableau Prep

Step 6: Exclude blank values from your new pivoted column.

excluding blank values from pivoted column in Tableau Prep

Step 7: Preview results in a Clean step.

preview results in a Clean step in Tableau Prep

Step 8: Create an Output step.

creating an Output step in Tableau Prep

Step 9: Run the flow.

running the workflow in Tableau Prep

The post Split and Pivot Your Data with Tableau Prep appeared first on InterWorks.

InterWorks