IslandT: Create integer list from a number with python

Given a number, return a list of integer with python function.

In this chapter, we are given a number and we need to return a list of integer based on that number, for example, number 3 will return a list of [1,2,3].

We will first create an empty array, then we will loop through that number and push the new number (count + 1) into that empty list.

 def monkey_count(n):     li = []     for i in range(n):         li.append(i+1)     return li 

There is one easy solution as compared to the above, let me know in the comment box below this post.

Planet Python

How I Went from Beer Sales Developer to Analytics Consultant

When I was younger, I used to always imagine what my job would be when I grew up. A sales professional like my mom? A financial professional like my dad? Maybe start my own business one day? Maybe I could take our neighborhood lemonade stand national!

Combining a Love for Communication and Technology

When I went to college, I knew that I wanted to attend Business School. It seemed like the most logical choice in order to eventually get that corporate job that would provide the financial security I wanted and the chance to “make a name for myself” somewhere. But when it came to choosing a major, it was more a process of elimination than picking something I wanted to do. I knew I didn’t want to do accounting or finance; math was never my strongest suit. Operations and supply chain management just sounded boring. International business didn’t really interest me. So marketing it was!

My favorite classes in school were business communications, as well as business technology. But at the time, I just saw those two things as pieces of other jobs—not something that could be my job. Majoring to become a business analyst or analytics consultant wasn’t really a thing. And to be honest, I didn’t really think I was smart or technical enough to turn my love of Excel and speaking to people into its own career path.

Above: Enjoying my first job with Anheuser-Busch

Fast forward a few years: I graduated with a dual major in marketing and entrepreneurship/corporate innovation. I joined Anheuser-Busch’s Sales Development Program and was immediately launched into beer sales 101. While I enjoyed it (I mean what college grad wouldn’t want to have endless supplies of free beer for her friends right out of college!?), it just didn’t fulfill me the way I had hoped.

The Transition to Analytics

Next thing I knew, I moved into an analytics role for the company in Dallas, TX. I took the position because I was told it would make me a stronger candidate to move into a larger sales market back in the field. I didn’t expect to really enjoy the analytics or thrive in a role that was behind a computer all day. I’d always been told I was a people person and should be out in front of customers. But what I found was that my people-person skills really allowed me to excel (pun intended) in the analytics space because I could communicate with the team about data in a way others could not.

Above: “Excelling” in my pursuits of analytics

When it came time to leave Anheuser-Busch, I was eager to try analytics outside of the Consumer Product Goods space, but it was harder to switch industries than I anticipated. Luckily, I found two wonderful managers at Quadratic Insights who were willing to take a chance on me. Despite never having used Tableau before, they took me at my word that I was a fast learner and hired me as a Tableau Developer for their growing Data Visualization and Business Intelligence team. How grateful I am they took that chance!

At Quadratic Insights, I was able to expand my knowledge of Tableau and delve deeper into using visual analytics as a process for uncovering insights in data. I really felt like I had tapped into a new passion. Tableau transformed my analytics process for the better, and now, I can’t imagine digging into a large dataset without it! Plus, it put me on a career path that led to a job where I truly feel like I’m doing what I have always been meant to do.

Above: Back home in California at Big Sur

While I really enjoyed my job at Quadratic Insights, I decided last June that I wanted to move to California to be closer to my family. Since Quadratic Insights is a local Dallas-based company, I started seeking opportunities elsewhere. I immediately thought of InterWorks, as I had the fortune of working with them as a client during my time in Dallas. InterWorks had helped us grow and scale our Tableau practice at the company, and I thoroughly enjoyed all my interactions with their team in that process. As I went to apply, I hoped that I would make the cut and be a HELL YES! Fortunately, everything fell into place.

Where People and Problem Solving Meet

Now, my new career as an analytics consultant at InterWorks lets me combine my three favorite things: data, people and problem solving! It has granted me the ability to live where I want so I can be close to my family, while still working with incredible clients all over the country. I have encountered data problems I didn’t know existed and conquered challenges that seemed so daunting at first glance. I am surrounded by wildly smart individuals across the IT and Business Intelligence space that make me smarter each day and provide mentorship, making me a better consultant and person.

As I look around at my peers and even those further along in their careers than me, I feel extremely fortunate to have found myself in a position where I truly feel like I am where I’m supposed to be, doing what I am meant to do. I am grateful for each experience that has guided me here. It’s incredible where life can take you when you’re not expecting it.

Above: Meeting up with some of the InterWorks team in Portland, Oregon

The post How I Went from Beer Sales Developer to Analytics Consultant appeared first on InterWorks.

InterWorks

Stack Abuse: Python for NLP: Creating Bag of Words Model from Scratch

This is the 13th article in my series of articles on Python for NLP. In the previous article, we saw how to create a simple rule-based chatbot that uses cosine similarity between the TF-IDF vectors of the words in the corpus and the user input, to generate a response. The TF-IDF model was basically used to convert word to numbers.

In this article, we will study another very useful model that converts text to numbers i.e. the Bag of Words (BOW).

Since most of the statistical algorithms, e.g machine learning and deep learning techniques, work with numeric data, therefore we have to convert text into numbers. Several approaches exist in this regard. However, the most famous ones are Bag of Words, TF-IDF, and word2vec. Though several libraries exist, such as Scikit-Learn and NLTK, which can implement these techniques in one line of code, it is important to understand the working principle behind these word embedding techniques. The best way to do so is to implement these techniques from scratch in Python and this is what we are going to do today.

In this article, we will see how to implement the Bag of Words approach from scratch in Python. In the next article, we will see how to implement the TF-IDF approach from scratch in Python.

Before coding, let’s first see the theory behind the bag of words approach.

Theory Behind Bag of Words Approach

To understand the bag of words approach, let’s first start with the help of an example.

Suppose we have a corpus with three sentences:

  • “I like to play football”
  • “Did you go outside to play tennis”
  • “John and I play tennis”

Now if we have to perform text classification, or any other task, on the above data using statistical techniques, we can not do so since statistical techniques work only with numbers. Therefore we need to convert these sentences into numbers.

Step 1: Tokenize the Sentences

The first step in this regard is to convert the sentences in our corpus into tokens or individual words. Look at the table below:

Sentence 1 Sentence 2 Sentence 3
I Did John
like you and
to go I
play outside play
football to tennis
play
tennis

Step 2: Create a Dictionary of Word Frequency

The next step is to create a dictionary that contains all the words in our corpus as keys and the frequency of the occurrence of the words as values. In other words, we need to create a histogram of the words in our corpus. Look at the following table:

Word Frequency
I 2
like 1
to 2
play 3
football 1
Did 1
you 1
go 1
outside 1
tennis 2
John 1
and 1

In the table above, you can see each word in our corpus along with its frequency of occurrence. For instance, you can see that since the word play occurs three times in the corpus (once in each sentence) its frequency is 3.

In our corpus, we only had three sentences, therefore it is easy for us to create a dictionary that contains all the words. In the real world scenarios, there will be millions of words in the dictionary. Some of the words will have a very small frequency. The words with very small frequency are not very useful, hence such words are removed. One way to remove the words with less frequency is to sort the word frequency dictionary in the decreasing order of the frequency and then filter the words having a frequency higher than a certain threshold.

Let’s sort our word frequency dictionary:

Word Frequency
play 3
tennis 2
to 2
I 2
football 1
Did 1
you 1
go 1
outside 1
like 1
John 1
and 1

Step 3: Creating the Bag of Words Model

To create the bag of words model, we need to create a matrix where the columns correspond to the most frequent words in our dictionary where rows correspond to the document or sentences.

Suppose we filter the 8 most occurring words from our dictionary. Then the document frequency matrix will look like this:

Play Tennis To I Football Did You go
Sentence 1 1 0 1 1 1 0 0 0
Sentence 2 1 1 1 0 0 1 1 1
Sentence 3 1 1 0 1 0 0 0 0

It is important to understand how the above matrix is created. In the above matrix, the first row corresponds to the first sentence. In the first, the word “play” occurs once, therefore we added 1 in the first column. The word in the second column is “Tennis”, it doesn’t occur in the first sentence, therefore we added a 0 in the second column for sentence 1. Similarly, in the second sentence, both the words “Play” and “Tennis” occur once, therefore we added 1 in the first two columns. However, in the fifth column, we add a 0, since the word “Football” doesn’t occur in the second sentence. In this way, all the cells in the above matrix are filled with either 0 or 1, depending upon the occurrence of the word. Final matrix corresponds to the bag of words model.

In each row, you can see the numeric representation of the corresponding sentence. For instance, the first row shows the numeric representation of Sentence 1. This numeric representation can now be used as input to the statistical models.

Enough of the theory, let’s implement our very own bag of words model from scratch.

Bag of Words Model in Python

The first thing we need to create our Bag of Words model is a dataset. In the previous section, we manually created a bag of words model with three sentences. However, real-world datasets are huge with millions of words. The best way to find a random corpus is Wikipedia.

In the first step, we will scrape the Wikipedia article on Natural Language Processing. But first, let’s import the required libraries:

import nltk   import numpy as np   import random   import string  import bs4 as bs   import urllib.request   import re   

As we did in the previous article, we will be using the Beautifulsoup4 library to parse the data from Wikipedia. Furthermore, Python’s regex library, re, will be used for some preprocessing tasks on the text.

Next, we need to scrape the Wikipedia article on natural language processing.

raw_html = urllib.request.urlopen('https://en.wikipedia.org/wiki/Natural_language_processing')   raw_html = raw_html.read()  article_html = bs.BeautifulSoup(raw_html, 'lxml')  article_paragraphs = article_html.find_all('p')  article_text = ''  for para in article_paragraphs:       article_text += para.text 

In the script above, we import the raw HTML for the Wikipedia article. From the raw HTML, we filter the text within the paragraph text. Finally, we create a complete corpus by concatenating all the paragraphs.

The next step is to split the corpus into individual sentences. To do so, we will use the sent_tokenize function from the NLTK library.

corpus = nltk.sent_tokenize(article_text)   

Our text contains punctuations. We don’t want punctuations to be the part of our word frequency dictionary. In the following script, we first convert our text into lower case and then will remove the punctuation from our text. Removing punctuation can result in multiple empty spaces. We will remove the empty spaces from the text using regex.

Look at the following script:

for i in range(len(corpus )):       corpus [i] = corpus [i].lower()     corpus [i] = re.sub(r'\W',' ',corpus [i])     corpus [i] = re.sub(r'\s+',' ',corpus [i]) 

In the script above, we iterate through each sentence in the corpus, convert the sentence to lower case, and then remove the punctuation and empty spaces from the text.

Let’s find out the number of sentences in our corpus.

print(len(corpus))   

The output shows 49.

Let’s print one sentence from our corpus:

print(corpus[30])   

Output:

in the 2010s representation learning and deep neural network style machine learning methods became widespread in natural language processing due in part to a flurry of results showing that such techniques 4 5 can achieve state of the art results in many natural language tasks for example in language modeling 6 parsing 7 8 and many others   

You can see that the text doesn’t contain any special character or multiple empty spaces.

Now we have our own corpus. The next step is to tokenize the sentences in the corpus and create a dictionary that contains words and their corresponding frequencies in the corpus. Look at the following script:

wordfreq = {}   for sentence in corpus:       tokens = nltk.word_tokenize(sentence)     for token in tokens:         if token not in wordfreq.keys():             wordfreq[token] = 1         else:             wordfreq[token] += 1 

In the script above we created a dictionary called wordfreq. Next, we iterate through each sentence in the corpus. The sentence is tokenized into words. Next, we iterate through each word in the sentence. If the word doesn’t exist in the wordfreq dictionary, we will add the word as the key and will set the value of the word as 1. Otherwise, if the word already exists in the dictionary, we will simply increment the key count by 1.

If you are executing the above in the Spyder editor like me, you can go the variable explorer on the right and click wordfreq variable. You should see a dictionary like this:

You can see words in the “Key” column and their frequency of occurrences in the “Value” column.

As I said in the theory section, depending upon the task at hand, not all of the words are useful. In huge corpora, you can have millions of words. We can filter the most frequently occurring words. Our corpus has 535 words in total. Let us filter down to the 200 most frequently occurring words. To do so, we can make use of Python’s heap library.

Look at the following script:

import heapq   most_freq = heapq.nlargest(200, wordfreq, key=wordfreq.get)   

Now our most_freq list contains 200 most frequently occurring words along with their frequency of occurrence.

The final step is to convert the sentences in our corpus into their corresponding vector representation. The idea is straightforward, for each word in the most_freq dictionary if the word exists in the sentence, a 1 will be added for the word, else 0 will be added.

sentence_vectors = []   for sentence in corpus:       sentence_tokens = nltk.word_tokenize(sentence)     sent_vec = []     for token in most_freq:         if token in sentence_tokens:             sent_vec.append(1)         else:             sent_vec.append(0)     sentence_vectors.append(sent_vec) 

In the script above we create an empty list sentence_vectors which will store vectors for all the sentences in the corpus. Next, we iterate through each sentence in the corpus and create an empty list sent_vec for the individual sentences. Similarly, we also tokenize the sentence. Next, we iterate through each word in the most_freq list and check if the word exists in the tokens for the sentence. If the word is a part of the sentence, 1 is appended to the individual sentence vector sent_vec, else 0 is appended. Finally, the sentence vector is added to the list sentence_vectors which contains vectors for all the sentences. Basically, this sentence_vectors is our bag of words model.

However, the bag of words model that we saw in the theory section was in the form of a matrix. Our model is in the form of a list of lists. We can convert our model into matrix form using this script:

sentence_vectors = np.asarray(sentence_vectors)   

Basically, in the following script, we converted our list into a two-dimensional numpy array using asarray function. Now if you open the sentence_vectors variable in the variable explorer of the Spyder editor, you should see the following matrix:

You can see the Bag of Words model containing 0 and 1.

Conclusion

Bag of Words model is one of the three most commonly used word embedding approaches with TF-IDF and Word2Vec being the other two.

In this article, we saw how to implement the Bag of Words approach from scratch in Python. The theory of the approach has been explained along with the hands-on code to implement the approach. In the next article, we will see how to implement the TF-IDF approach from scratch in Python.

Planet Python

Removing Borders from Tableau Worksheets

Recently, I needed to add a text box with an embedded URL to a client’s dashboard that read “Click to See Report.” The client, who still uses Tableau Desktop 10.5, wanted users to access an external report related to the dashboard content. However, there was a thin black border wrapped around the text when I would hover and click on it. It made the user interaction feel messy. Luckily, after some troubleshooting, I figured out how to “remove” the border that appears around the text field when hovering over said field.

Making the Border Invisible

To start, I added my Text field to the Rows shelf and then dragged a copy of the same field to the Text icon on the Marks card. Next, I removed some of the default formatting around the text by doing the following:

  1. I right-clicked on the text field that I added to the Rows shelf and unchecked Show Header.
  2. At the top of the workbook, I navigated to Format > Borders, and in the Format pane on the left side of the workbook, I selected Row Divider > Pane > None.

If I hover back over the text, I see the dreaded border:

I simply want users to hover on the text and click through to an external link without seeing this. I’d also want this to disappear if a user were simply hovering over text with Tooltip. So, I return to my formatting menu:

  1. At the top of my workbook, I navigate to Format > Font.
  2. In the Format pane on the left side of the workbook, I make sure I am on the tab labeled Default, under Worksheet, and change the font color to white. This is the key step that makes the border “disappear.” Note that if your background is not white, you’ll want to make sure the color you choose here matches your worksheet’s background color:

Making the Text Visible

Despite the magic of step 2 above, you’ll need to make one more adjustment so that your text is visible again. There are two ways to do this:

Option 1: In the Format pane, go to the Fields drop-down in the upper-right corner, and select your text field. In the Format pane, select the Pane tab and under Default > Font, choose your required font size, style and color. Now, no borders are visible when you hover over the text:

Option 2: Navigate to the Marks card, and click on the Text icon. Click on the gray ellipsis icon ( ), and in the Edit Label box that pops up, highlight your field name and choose your font size, style and color. Click OK and return to your sheet. Again, no borders are visible when you hover over the text:

Note that while there does appear to be a faint white border around the Click to See Report text, after you go through the process of placing the sheet on a dashboard, adding a URL action, setting the action to run on Select and publishing to Tableau Server, the white border will not be visible, as the action will occur very quickly.

The post Removing Borders from Tableau Worksheets appeared first on InterWorks.

InterWorks