Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation

Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation

In the previous three posts in this series, we used Tableau Prep Builder to create a workflow to join six tables from two data sources. Our flow contains 11 steps. The challenge presented with the data we have from the Census comes from the different levels of detail (LOD) our sales data has—specific transactions every day for the last four years—and the Census data, which includes population data aggregated at the State-level for the same four years. The Census data is much less detailed.

This is a common issue with financial data, for example, because budget data tends to be less detailed than the transaction-level details you can obtain from business systems. So, we need to use an aggregate step to reduce the LOD of the Superstore sales data so that it’s the same as the Census data.

Using the Aggregate Step in Tableau Prep

The Census data provides population data by State and Year. We’ll use an aggregate step to reduce the detail in the Superstore data to match the Census data.

After performing the aggregation of the Superstore data, we can now get a resulting set that combines sales and population data for every state. The Census data actually includes information for Alaska and Hawaii, for which we don’t have any sales. That’s fine. Delaware is also mismatched in 2018 because there were no sales in the Superstore data that year. By using the aggregate step, we’ve now brought both data sources together and achieved a consistent LOD.

Creating a Per-Capital Sales Calculation

The primary reason I wanted to bring in the Census data was to normalize the sales by state for population. Superstore is anonymized retail sales information. While it’s useful to know that California has the most sales in this data, how does Vermont compare to California when you take into account the population differences between states?

Saving Your Tableau Prep Workflow

At the end of the last video, I showed you how to add an output step, save a workflow and then run the workflow to generate a Hyper output file that we’ll use to build a visualization of the data. Saving the workflow enables you to reuse it later. Running the workflow generates the Hyper data.

Cleaning up the Workflow Visualization

We now have a workflow that consists of 18 steps. Let’s step back for a moment and enhance the information in our workflow so that other people can more easily follow its logic:

The video shows how you can change step colors and add comments, making each step easier for others to follow and understand. I didn’t rename any of them. You may also experiment with the ways you can use color, descriptions and step renaming to make your flows easier for other people to track with.

In the next post in this series, I’m going to build a visualization using the dataset we created in the workflow.

The post Understanding Tableau Prep and Conductor: Facilitating a Join with Aggregation appeared first on InterWorks.

InterWorks

Understanding Tableau Prep and Conductor: Clean and Join Steps

Understanding Tableau Prep and Conductor: Clean and Join Steps

The last post in this series showed you how to use the Wildcard union and the Manual union to join four years of sales data that was stored in four different worksheets. In this post, you’ll see how the cleaning step can be used to do the following:

  • Change field-data roles
  • Group and replace members of a specific field set
  • Change the data types of a field and rename fields

All this can be done by using the Profile Cards within the Profile Pane in Tableau Prep Builder.

Using the Clean Step in Tableau Prep

The cleaning step behaves a lot like the data interpreter in Tableau Desktop, but it adds additional visualizations in the Profile Pane and the field Profile Cards that help you understand the shape and contents of each field. In addition, clicking on elements within the Profile Cards highlights related details in the other field cards. I find myself using the cleaning step frequently just to examine the results of prior steps.

Adding a Join Step

The Superstore dataset also includes another worksheet containing the names of Regional Managers. Regional Managers are assigned groups of States. We’ll use a join to add that information to the flow:

Now that the sales data from the Superstore tables have been cleaned up, we will bring some public data from the Census Bureau so that we can enhance our sales data by normalizing sales for the population by state.

Adding the Census Data into the Flow

I’ve been using Census data for many years. It’s useful when you want to account for the population density in analyses. In the example we’re building, this data will ultimately be used to express the sales by state in a way that accounts for the population density of each geography.

Using the Pivot Step and Tableau Prep Builder

The Census data isn’t perfectly formatted. We’ll use Prep Builder to fix the structural problems in that dataset:

In the video, I chose to do most of the field clean-up in the pivot step. I could have performed the same cleaning operations in the cleaning step that I added after the pivot. If the work you’re doing is going to be utilized only by use, fewer steps may save you time. If you work with a team of people who are new to Prep Builder, adding more steps to segregate individual cleaning operations may make your flow easier for others to understand. There aren’t hard and fast rules.

This workflow now includes two different data sources and six tables. You’ve seen two different ways to create a union; you’ve seen a join step and a pivot step; and you’ve learned about different ways you can use the cleaning step to improve the formatting and consistency of the data in the workflow. My colleague Katie wrote a blog post that takes a closer look at splitting and pivoting your data, so read it if you need more in-depth insights into those steps. For further information on cleansing your data, look at my colleague Spencer‘s blog on the topic.

In the next post in this series, we’re going to join the Superstore data to the Census data. Because these two data sources are not aggregated in the same way, we’ll be presented with a challenge that we’ll address with an aggregate step.

The post Understanding Tableau Prep and Conductor: Clean and Join Steps appeared first on InterWorks.

InterWorks

Understanding Boolean Logic in Go

The Boolean data type (bool) can be one of two values, either true or false. Booleans are used in programming to make comparisons and to control the flow of the program.

Booleans represent the truth values that are associated with the logic branch of mathematics, which informs algorithms in computer science. Named for the mathematician George Boole, the word Boolean always begins with a capitalized B.

The data type in Go for Boolean is bool, all lowercase. The values true and false will always be with a lowercase t and f respectively, as they are special values in Go.

This tutorial will cover the basics you’ll need to understand how the bool data type works, including Boolean comparison, logical operators, and truth tables.

Comparison Operators

In programming, comparison operators are used to compare values and evaluate down to a single Boolean value of either true or false.

The table below shows Boolean comparison operators.

Operator What it means
== Equal to
!= Not equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to

To understand how these operators work, let’s assign two integers to two variables in a Go program:

x := 5 y := 8 

In this example, since x has the value of 5, it is less than y which has the value of 8.

Using those two variables and their associated values, let’s go through the operators from the preceding table. In this program, you’ll ask Go to print out whether each comparison operator evaluates to either true or false. To help better understand this output, you’ll have Go also print a string to show you what it’s evaluating:

package main  import "fmt"  func main() {     x := 5     y := 8      fmt.Println("x == y:", x == y)     fmt.Println("x != y:", x != y)     fmt.Println("x < y:", x < y)     fmt.Println("x > y:", x > y)     fmt.Println("x <= y:", x <= y)     fmt.Println("x >= y:", x >= y) } 
Output
x == y: false x != y: true x < y: true x > y: false x <= y: true x >= y: false

Following mathematical logic, Go has evaluated the following from the expressions:

  • Is 5 (x) equal to 8 (y)? false
  • Is 5 not equal to 8? true
  • Is 5 less than 8? true
  • Is 5 greater than 8? false
  • Is 5 less than or equal to 8? true
  • Is 5 not less than or equal to 8? false

Although integers were used here, you could substitute them with float values.

Strings can also be used with Boolean operators. They are case-sensitive unless you use an additional string method.

You can look at how strings are compared in practice:

Sammy := "Sammy" sammy := "sammy"  fmt.Println("Sammy == sammy: ", Sammy == sammy) 
Output
Sammy == sammy: false

The string Sammy is not equal to the string sammy, because they are not exactly the same; one starts with an uppercase S and the other with a lowercase s. But, if you add another variable that is assigned the value of Sammy, then they will evaluate to equal:

Sammy := "Sammy" sammy := "sammy" alsoSammy := "Sammy"  fmt.Println("Sammy == sammy: ", Sammy == sammy) fmt.Println("Sammy == alsoSammy", Sammy == alsoSammy) 
Output
Sammy == sammy: false Sammy == alsoSammy true

You can also use the other comparison operators including > and < to compare two strings. Go will compare these strings lexicographically using the ASCII values of the characters.

You can also evaluate Boolean values with comparison operators:

t := true f := false  fmt.Println("t != f: ", t != f) 
Output
t != f: true

The preceding code block evaluated that true is not equal to false.

Note the difference between the two operators = and ==.

x = y   // Sets x equal to y x == y  // Evaluates whether x is equal to y 

The first = is the assignment operator, which will set one value equal to another. The second, ==, is a comparison operator and will evaluate whether two values are equal.

Logical Operators

There are two logical operators that are used to compare values. They evaluate expressions down to Boolean values, returning either true or false. These operators are &&, ||, and !, and are defined in the list below:

  • && (x && y) is the and operator. It is true if both statements are true.
  • || (x || y) is the or operator. It is true if at least one statement is true.
  • ! (!x) is the not operator. It is true only if the statement is false.

Logical operators are typically used to evaluate whether two or more expressions are true or not true. For example, they can be used to determine if the grade is passing and that the student is registered in the course, and if both cases are true, then the student will be assigned a grade in the system. Another example would be to determine whether a user is a valid active customer of an online shop based on whether they have store credit or have made a purchase in the past 6 months.

To understand how logical operators work, let’s evaluate three expressions:

fmt.Println((9 > 7) && (2 < 4))   // Both original expressions are true fmt.Println((8 == 8) || (6 != 6)) // One original expression is true fmt.Println(!(3 <= 1))            // The original expression is false 
Output
true true true

In the first case, fmt.Println((9 > 7) && (2 < 4)), both 9 > 7 and 2 < 4 needed to evaluate to true since the and operator was used.

In the second case, fmt.Println((8 == 8) || (6 != 6)), since 8 == 8 evaluated to true, it did not make a difference that 6 != 6 evaluates to false because the or operator was used. If you had used the and operator, this would evaluate to false.

In the third case, fmt.Println(!(3 <= 1)), the not operator negates the false value that 3 <=1 returns.

Let’s substitute floats for integers and aim for false evaluations:

fmt.Println((-0.2 > 1.4) && (0.8 < 3.1))  // One original expression is false fmt.Println((7.5 == 8.9) || (9.2 != 9.2)) // Both original expressions are false fmt.Println(!(-5.7 <= 0.3))               // The original expression is true 

In this example:

  • and must have at least one false expression evaluate to false.
  • or must have both expressions evaluate to false.
  • ! must have its inner expression be true for the new expression to evaluate to false.

If these results seem unclear to you, go through some truth tables for further clarification.

You can also write compound statements using &&, ||, and !:

!((-0.2 > 1.4) && ((0.8 < 3.1) || (0.1 == 0.1))) 

Take a look at the inner-most expression first: (0.8 < 3.1) || (0.1 == 0.1). This expression evaluates to true because both mathematical statements are true.

Next, Go takes the returned value true and combines it with the next inner expression: (-0.2 > 1.4) && (true). This example returns false because the mathematical statement -0.2 > 1.4 is false, and (false) and (true) returns false.

Finally, we have the outer expression: !(false), which evaluates to true, so the final returned value if we print this statement out is:

Output
true

The logical operators &&, ||, and ! evaluate expressions and return Boolean values.

Truth Tables

There is a lot to learn about the logic branch of mathematics, but you can selectively learn some of it to improve your algorithmic thinking when programming.

The following are truth tables for the comparison operator ==, and each of the logic operators &&, || and !. While you may be able to reason them out, it can also be helpful to memorize them as that can make your programming decision-making process quicker.

== (equal) Truth Table

x == y Returns
true == true true
true == false false
false == true false
false == false true

&& (and) Truth Table

x and y Returns
true and true true
true and false false
false and true false
false and false false

|| (or) Truth Table

x or y Returns
true or true true
true or false true
false or true true
false or false false

! (not) Truth Table

not x Returns
not true false
not false true

Truth tables are common mathematical tables used in logic, and are useful to keep in mind when constructing algorithms (instructions) in computer programming.

Using Boolean Operators for Flow Control

To control the stream and outcomes of a program in the form of flow control statements, you can use a condition followed by a clause.

A condition evaluates down to a Boolean value of true or false, presenting a point where a decision is made in the program. That is, a condition would tell you if something evaluates to true or false.

The clause is the block of code that follows the condition and dictates the outcome of the program. That is, it is the “do this” part of the construction “If x is true, then do this.”

The code block below shows an example of comparison operators working in tandem with conditional statements to control the flow of a Go program:

if grade >= 65 {                 // Condition     fmt.Println("Passing grade") // Clause } else {     fmt.Println("Failing grade") } 

This program will evaluate whether each student’s grade is passing or failing. In the case of a student with a grade of 83, the first statement will evaluate to true, and the print statement of Passing grade will be triggered. In the case of a student with a grade of 59, the first statement will evaluate to false, so the program will move on to execute the print statement tied to the else expression: Failing grade.

Boolean operators present conditions that can be used to decide the eventual outcome of a program through flow control statements.

Conclusion

This tutorial went through comparison and logical operators belonging to the Boolean type, as well as truth tables and using Booleans for program flow control.

DigitalOcean Community Tutorials

Understanding Tableau Prep and Conductor: Connecting to a Data Source with Builder

Understanding Tableau Prep and Conductor: Connecting to a Data Source

Before you begin building a workflow using Tableau Prep, it’s helpful to know a little bit about the data source(s) you need to connect to. Consider the following:

  • What kind of source are you connecting to?
  • How large are the files? Record counts? Rows/columns?
  • Are you familiar with the data structures?
  • Do you know of any data quality issues?
  • How frequently will you be updating the data?

Understanding the Basics of Your Data Sources

Understanding the basic data structure and size, as well as the granularity of different sources, helps you plan the flow in your mind before you get into the detailed challenges that the transformation of the raw data sources poses.

In the example I’m drawing upon for this series, I’m using a version of Superstore data I created, along with public data from the Census Bureau. I’m going to create a workflow that will combine four tables containing annual sales data and a single dimension table that will be joined to provide regional manager names. These will be joined with a population dataset from the Census, enabling us to normalize sales for population for each state that had sales.

Connecting to the Sales Data

The data used in this example comes from two different spreadsheets: one that contains four (4) sales worksheets and one (1) Regional Manager worksheet, and another spreadsheet containing the census data.

Experienced Tableau Desktop users should be familiar with the Superstore dataset. In this spreadsheet, I’ve separated each year’s sale into its own worksheet. This data could have been in a text file or a database:

Superstore data for Tableau Prep

The Census data provides population estimates for each state for corresponding years:

Census data for Tableau Prep

Because the world isn’t perfect, we will have to deal with data quality issues in these files, different aggregations of the data, union different files, join files, pivot the data and re-aggregate the data. There are also inconsistencies within specific fields that will have to be cleaned. The datasets are small, but every kind of data transformation step that Tableau Prep provides will have to be utilized to prepare the data for analysis in Tableau Desktop. We will also create a calculation in Prep to normalize sales by state and year for the population in each state.

That data is not perfectly clean, and some of the structures aren’t right. That’s the real world. We’ll use Tableau Prep to address all of the issues and create a clean dataset for other people to use.

Connecting to the Superstore Sales Data

In this first video, you’ll see how to make a data connection to an initial data source and then add other files to that data source. We’ll make the following connections:

  1. Connect to the four sales tables
  2. Demonstrate a wildcard union
  3. Demonstrate a manual union

Using the Wildcard Union in Tableau Prep

Wildcard unions offer an efficient way to bring together many different tables with similar naming conventions that also have consistent column and row structures. If you’re working with unfamiliar datasets that may have data inconsistencies, I believe creating a union manually gives you more direct control and may make it easier for you to deal with data quality issues that emerge as you join each table.

Using the Manual Union in Tableau Prep

I like using manual unions when I’m working with a new dataset because it’s easier to identify mismatched field-naming conventions. The inconsistent field names (Sales vs. Revenue) didn’t appear until I brought in the 2018 sales data. The visual cues that Tableau Prep provided, and the filtering in the data grid for mismatched fields, made it very easy to find and merge two different fields that were actually both holding sales data.

It was also easy to make minor changes using the Profile Cards for specific fields. I used that to remove the Table Names field, which Builder adds automatically when you union data. I don’t want to see that information in my output from this workflow, so I removed it. In addition, because Row ID is a number, Builder treated it as a number in the profile pane and generated a histogram in that field’s profile card. I wanted to validate that the Row ID field is a unique key for this dataset, so I changed the field to a string, and the profile card was changed in a way that made it easy to see every Row ID is, in fact, a single row of the data.

In the next post in this series, I’ll show you how to add a cleaning step to make additional modifications to fields.

The post Understanding Tableau Prep and Conductor: Connecting to a Data Source with Builder appeared first on InterWorks.

InterWorks

Understanding Tableau Prep and Conductor: An Overview

Understanding Tableau Prep and Conductor: An Overview

Tableau Prep Builder and Conductor are Tableau’s tools for enabling repeatable, documented and scheduled workflows to transform raw data into user-friendly datasets. Tableau Prep Builder’s first live release came in April 2018 in Desktop 2018.1, and since then, we’ve had four major releases and eight minor updates. Over the past year, Tableau has expanded the Builder feature set substantially.

Tableau Conductor is the server complement to Builder that provides the framework and tools for publishing Builder workflows and datasets to Tableau Server. Tableau Server’s Data Management Add-on released in V2019.1 in February 2019 and enables you to automatically refresh published workflows.

What This Series Will Cover

In this blog series, I’ll be covering the Desktop environment of Prep Builder and all of the related data-transformation tools that are currently provided. I’ll also discuss the basics of installing and managing workflows using the data management add-on toolset for Tableau Server and will complete this series by providing a comparison of Tableau Prep and Conductor to other ETL tools available in the marketplace.

If you are already familiar with the Data Interpreter in Tableau Desktop, your learning curve for Tableau Prep will be short. I’ve kept an eye on Tableau Prep ever since its release, but I haven’t bothered to do a deep dive with the tool until this year (though some of consultants have recognized that Tableau Prep is truly a game-changer). Over the past month, I’ve spent a lot of time playing with Prep and publishing workflows to Tableau Server to see exactly what Tableau is offering now.

Per usual with Tableau, when a new set of features is good enough, they release something new. Over time, Tableau adds more features. Generally, the features added come from a combination of their internal product roadmap and external feedback from the user community. Tableau Prep has now matured to a level where the feature set is equivalent to the Data Interpreter in Tableau Desktop, but it adds the automation and repeatability of a real extract, transform and load tool.

Tableau Prep Work View and Start Page

I’m excited about the possibilities of Tableau Prep and Conductor. While I wouldn’t call this an enterprise-class ETL toolset, Tableau brilliance at user interface is evident. I love the way that Tableau has implemented the transformation features. It’s very “Tableau-like.” Tableau continues to live their mission of making it fast and easy for you to see and understand your data. Here’s a Tableau Prep workflow that I created earlier today:

Tableau Prep Builder start page

There are four areas in every workflow:

  1. The Connection Pane
  2. The Flow Pane
  3. The Profile Pane
  4. The Data Grid

The arrow pointing from the Union 1 step to the Settings area displays a flexible screen space that will change depending on what specific step you click on in the workflow.

In typical Tableau fashion, every icon has a specific meaning. Tableau automatically assigns different colors to each step. I’ve altered some of the colors to my taste and to better reflect common sources (see how the Orders 2015Orders 2018 are the same color). Notice that I’ve added customized descriptions next to each step in the workflow. Tableau Prep also allows you to change the system-assigned names for each step within the flow, which are displayed below each step heading.

In addition, I pasted some summary information from the Start page in Builder:

Tableau Prep workflow

The Builder Start page is very similar to the Start Page in Tableau Desktop. Connections are accessed using the plus sign (+) on the left, available workflows are displayed in the center, and the Discover pane on the right supplies the same type of training videos and resources that you’ve become familiar with using Tableau Desktop. Hovering your mouse over existing flows causes summary information about the flow to display.

I appreciate Tableau’s design consistency between Desktop and Prep Builder. The interface behaves in a way that makes learning Prep Builder easy for any experienced Tableau Desktop user.

In the next post, we’ll demonstrate how you connect to one or more data sources, and we’ll review the different ways you can join and union those data sources.

The post Understanding Tableau Prep and Conductor: An Overview appeared first on InterWorks.

InterWorks