Understanding Tableau Prep and Conductor: Clean and Join Steps

Understanding Tableau Prep and Conductor: Clean and Join Steps

The last post in this series showed you how to use the Wildcard union and the Manual union to join four years of sales data that was stored in four different worksheets. In this post, you’ll see how the cleaning step can be used to do the following:

  • Change field-data roles
  • Group and replace members of a specific field set
  • Change the data types of a field and rename fields

All this can be done by using the Profile Cards within the Profile Pane in Tableau Prep Builder.

Using the Clean Step in Tableau Prep

The cleaning step behaves a lot like the data interpreter in Tableau Desktop, but it adds additional visualizations in the Profile Pane and the field Profile Cards that help you understand the shape and contents of each field. In addition, clicking on elements within the Profile Cards highlights related details in the other field cards. I find myself using the cleaning step frequently just to examine the results of prior steps.

Adding a Join Step

The Superstore dataset also includes another worksheet containing the names of Regional Managers. Regional Managers are assigned groups of States. We’ll use a join to add that information to the flow:

Now that the sales data from the Superstore tables have been cleaned up, we will bring some public data from the Census Bureau so that we can enhance our sales data by normalizing sales for the population by state.

Adding the Census Data into the Flow

I’ve been using Census data for many years. It’s useful when you want to account for the population density in analyses. In the example we’re building, this data will ultimately be used to express the sales by state in a way that accounts for the population density of each geography.

Using the Pivot Step and Tableau Prep Builder

The Census data isn’t perfectly formatted. We’ll use Prep Builder to fix the structural problems in that dataset:

In the video, I chose to do most of the field clean-up in the pivot step. I could have performed the same cleaning operations in the cleaning step that I added after the pivot. If the work you’re doing is going to be utilized only by use, fewer steps may save you time. If you work with a team of people who are new to Prep Builder, adding more steps to segregate individual cleaning operations may make your flow easier for others to understand. There aren’t hard and fast rules.

This workflow now includes two different data sources and six tables. You’ve seen two different ways to create a union; you’ve seen a join step and a pivot step; and you’ve learned about different ways you can use the cleaning step to improve the formatting and consistency of the data in the workflow. My colleague Katie wrote a blog post that takes a closer look at splitting and pivoting your data, so read it if you need more in-depth insights into those steps. For further information on cleansing your data, look at my colleague Spencer‘s blog on the topic.

In the next post in this series, we’re going to join the Superstore data to the Census data. Because these two data sources are not aggregated in the same way, we’ll be presented with a challenge that we’ll address with an aggregate step.

The post Understanding Tableau Prep and Conductor: Clean and Join Steps appeared first on InterWorks.

InterWorks

Dataquest: How to Learn Python for Data Science In 5 Steps

Why Learn Python For Data Science?

How to Learn Python for Data Science In 5 Steps

Before we explore how to learn Python for data science, we should briefly answer why you should learn Python in the first place.

In short, understanding Python is one of the valuable skills needed for a data science career.

Though it hasn’t always been, Python is the programming language of choice for data science. Here’s a brief history:

  • In 2016, it overtook R on Kaggle, the premier platform for data science competitions.
  • In 2017, it overtook R on KDNuggets’s annual poll of data scientists’ most used tools.
  • In 2018, 66% of data scientists reported using Python daily, making it the number one tool for analytics professionals.

Data science experts expect this trend to continue with increasing development in the Python ecosystem. And while your journey to learn Python programming may be just beginning, it’s nice to know that employment opportunities are abundant (and growing) as well.

According to Indeed, the average salary for a Data Scientist is $ 127,918.

The good news? That number is only expected to increase. The experts at IBM predicted a 28% increase in demand for data scientists by the year 2020.

So, the future is bright for data science, and Python is just one piece of the proverbial pie. Fortunately, learning Python and other programming fundamentals is as attainable as ever. We’ll show you how in five simple steps.

But remember – just because the steps are simple doesn’t mean you won’t have to put in the work. If you apply yourself and dedicate meaningful time to learning Python, you have the potential to not only pick up a new skill, but potentially bring your career to a new level.

How to Learn Python for Data Science

How to Learn Python for Data Science In 5 Steps

First, you’ll want to find the right course to help you learn Python programming. Dataquest’s courses are specifically designed for you to learn Python for data science at your own pace.

In addition to learning Python in a course setting, your journey to becoming a data scientist should also include soft skills. Plus, there are some complimentary technical skills we recommend you learn along the way.

Step 1: Learn Python Fundamentals

Everyone starts somewhere. This first step is where you’ll learn Python programming basics. You’ll also want an introduction to data science.

One of the important tools you should start using early in your journey is Jupyter Notebook, which comes prepackaged with Python libraries to help you learn these two things.

Kickstart your learning by: Joining a community

By joining a community, you’ll put yourself around like-minded people and increase your opportunities for employment. According to the Society for Human Resource Management, employee referrals account for 30% of all hires.

Create a Kaggle account, join a local Meetup group, and participate in Dataquest’s members-only Slack discussions with current students and alums.

Related skills: Try the Command Line Interface

The Command Line Interface (CLI) lets you run scripts more quickly, allowing you to test programs faster and work with more data.

Step 2: Practice Mini Python Projects

We truly believe in hands-on learning. You may be surprised by how soon you’ll be ready to build small Python projects.

Try programming things like calculators for an online game, or a program that fetches the weather from Google in your city. Building mini projects like these will help you learn Python. programming projects like these are standard for all languages, and a great way to solidify your understanding of the basics.

You should start to build your experience with APIs and begin web scraping. Beyond helping you learn Python programming, web scraping will be useful for you in gathering data later.

Kickstart your learning by: Reading

Enhance your coursework and find answers to the Python programming challenges you encounter. Read guidebooks, blog posts, and even other people’s open source code to learn Python and data science best practices – and get new ideas.

Automate The Boring Stuff With Python by Al Sweigart is an excellent and entertaining resource.

Related skills: Work with databases using SQL

SQL is used to talk to databases to alter, edit, and reorganize information. SQL is a staple in the data science community, as 40% of data scientists report consistently using it.*

Step 3: Learn Python Data Science Libraries

Unlike some other programming languages, in Python, there is generally a best way of doing something. The three best and most important Python libraries for data science are NumPy, Pandas, and Matplotlib.

NumPy and Pandas are great for exploring and playing with data. Matplotlib is a data visualization library that makes graphs like you’d find in Excel or Google Sheets.

Kickstart your learning by: Asking questions

You don’t know what you don’t know!

Python has a rich community of experts who are eager to help you learn Python. Resources like Quora, Stack Overflow, and Dataquest’s Slack are full of people excited to share their knowledge and help you learn Python programming. We also have an FAQ for each mission to help with questions you encounter throughout your programming courses with Dataquest.

Related skills: Use Git for version control

Git is a popular tool that helps you keep track of changes made to your code, which makes it much easier to correct mistakes, experiment, and collaborate with others.

Step 4: Build a Data Science Portfolio as you Learn Python

For aspiring data scientists, a portfolio is a must.

These projects should include several different datasets and should leave readers with interesting insights that you’ve gleaned. Your portfolio doesn’t need a particular theme; find datasets that interest you, then come up with a way to put them together.

Displaying projects like these gives fellow data scientists something to collaborate on and shows future employers that you’ve truly taken the time to learn Python and other important programming skills.

One of the nice things about data science is that your portfolio doubles as a resume while highlighting the skills you’ve learned, like Python programming.

Kickstart your learning by: Communicating, collaborating, and focusing on technical competence

During this time, you’ll want to make sure you’re cultivating those soft skills required to work with others, making sure you really understand the inner workings of the tools you’re using.

Related skills: Learn beginner and intermediate statistics

While learning Python for data science, you’ll also want to get a solid background in statistics. Understanding statistics will give you the mindset you need to focus on the right things, so you’ll find valuable insights (and real solutions) rather than just executing code.

Step 5: Apply Advanced Data Science Techniques

Finally, aim to sharpen your skills. Your data science journey will be full of constant learning, but there are advanced courses you can complete to ensure you’ve covered all the bases.

You’ll want to be comfortable with regression, classification, and k-means clustering models. You can also step into machine learning – bootstrapping models and creating neural networks using scikit-learn.

At this point, programming projects can include creating models using live data feeds. Machine learning models of this kind adjust their predictions over time.

Remember to: Keep learning!

Data science is an ever-growing field that spans numerous industries.

At the rate that demand is increasing, there are exponential opportunities to learn. Continue reading, collaborating, and conversing with others, and you’re sure to maintain interest and a competitive edge over time.

How Long Will It Take To Learn Python?

After reading these steps, the most common question we have people ask us is: “How long does all this take?”

There are a lot of estimates for the time it takes to learn Python. For data science specifically, estimates a range from 3 months to a year of consistent practice.

We’ve watched people move through our courses at lightning speed and others who have taken it much slower.

Really, it all depends on your desired timeline, free time that you can dedicate to learn Python programming and the pace at which you learn.

Dataquest’s courses are created for you to go at your own speed. Each path is full of missions, hands-on learning and opportunities to ask questions so that you get can an in-depth mastery of data science fundamentals.

Get started for free. Learn Python with our Data Scientist path and start mastering a new skill today.

Resources and studies cited:

Planet Python

Synchronize Axes Across Multiple Sheets in Five Simple Steps

reference line calculated field in Tableau

I often find myself building dashboards with a parameter to compare baseline data to different scenarios. Because these scenarios are built out in different worksheets, I am faced with the challenge of correctly displaying changes in results compared to each other. In the mocked-up example below, you can see how the axes change dynamically within each worksheet, yet the bars stay the same height, making it difficult to quickly see the difference.

Note: This example uses Global Superstore training data.

Incorrect GIF

If you hunt through the Tableau forums, you’ll find that the ability to synchronize axes across worksheets is not built into the software. The official suggestion from Tableau is to manually fix the axis height to the same value across all worksheets.

This works, except it is not very future-proof if the data ever changes or updates … and when do we ever work with purely static data? Thanks to my colleague Carl Slifer, we have a better solution—and you will love the simplicity of it.

Step-by-Step Guide to Synchronizing Axes in Tableau

Step 1: Create the first worksheet with your baseline data (I simply use total sales):

baseline scenario in Tableau

Step 2: Create a worksheet to represent your scenario. Here, I use a simple parameter to change sales +/- a factor of 10%:

edit parameter in Tableau

calculated field in Tableau

Hint: Once the parameter and calculated field are created, show the parameter and drag the calculated field to the Rows shelf to create a mirror image of the sales.

Step 3: Now, here comes the cool part. We need to add a reference line to each worksheet. We also need the reference line to be calculated by whichever value is larger: sales or the scenario-adjusted sales. We do this by creating a calculated field with a simple formula using MAX. The use of MAX is important because we want to stretch the axis of other worksheets relative to the value of the largest value being compared:

MAX calculated field in Tableau

This will re-position the reference line based on which value is largest between the two worksheets.

Step 4: Place the Reference Line calculated field on the Details tile of the Marks card for each worksheet. Next, format the reference line so it does not show a value or line:

reference line calculated field in Tableau

Step 5: Lastly, build out your dashboard, adjust the parameter control, and be dazzled as your bar graphs re-size relative to each other!

The last piece to keep in mind is to ensure you align your graphs appropriately on the dashboard. Consider putting both graphs in a container and checking that axis font sizes, titles, etc. are the same for each worksheet as these formatting pieces could also skew how the graphs visually align:

correct GIF

Synchronize Axes with a MIN Function

BONUS! Have negative values in your data? You can set up your axis to display negative values as well. Starting at Step 3, use the MIN function instead, and add a second reference line to your graphs based on the minimum value. This will give your graphs two reference lines that will respond dynamically to your data.

The post Synchronize Axes Across Multiple Sheets in Five Simple Steps appeared first on InterWorks.

InterWorks