## Codementor: Collections In Python | Introduction To Python Collections

This blog will cover the built-in collection data types in python along with the collections module with all the specialised collection data structures.
Planet Python

## Stack Abuse: Brief Introduction to OpenGL in Python with PyOpenGL

### Introduction

In this tutorial, we’re going to learn how to use PyOpenGL library in Python. OpenGL is a graphics library which is supported by multiple platforms including Windows, Linux, and MacOS, and is available for use in multiple other languages as well; however, the scope of this post will be limited to its usage in the Python programming language.

OpenGL, as compared to other similar graphics libraries, is fairly simple. We’ll start with setting it up on our system, followed by writing a simple example demonstrating the usage of the library.

### Installation

The easiest way to install OpenGL using Python is through the pip package manager. If you have pip installed in your system, run the following command to download and install OpenGL:

$pip install PyOpenGL PyOpenGL_accelerate  I’d recommend copying the above command to help avoid typos. Once this command finishes execution, if the installation is successful, you should get the following output at the end: Successfully installed PyOpenGL-3.1.0 PyOpenGL-accelerate-3.1.0  If this doesn’t work, you can also download it manually. For that, this link, scroll down to the ‘downloading and installation’ heading, and download all the files over there. After that, navigate to the folder where you downloaded those files, and run the following command in the terminal or command prompt: $   python setup.py 

It is pertinent to mention that you require Visual C++ 14.0 build tools installed on your system in order to work with OpenGL libraries in Python.

Now that we have successfully installed OpenGL on our system, let’s get our hands dirty with it.

### Coding Exercise

The first thing we need to do to use OpenGL in our code is to import it. To do that, run the following command:

import OpenGL   

Before we proceed, there are a few other libraries that you need to import whenever you intend to use this library in your program. Below is the code for those imports:

import OpenGL.GL   import OpenGL.GLUT   import OpenGL.GLU   print("Imports successful!") # If you see this printed to the console then installation was successful   

Now that we are done with the necessary imports, let’s first create a window in which our graphics will be shown. The code for that is given below, along with its explanation in the comments:

def showScreen():       glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) # Remove everything from screen (i.e. displays all white)  glutInit() # Initialize a glut instance which will allow us to customize our window   glutInitDisplayMode(GLUT_RGBA) # Set the display mode to be colored   glutInitWindowSize(500, 500)   # Set the width and height of your window   glutInitWindowPosition(0, 0)   # Set the position at which this windows should appear   wind = glutCreateWindow("OpenGL Coding Practice") # Give your window a title   glutDisplayFunc(showScreen)  # Tell OpenGL to call the showScreen method continuously   glutIdleFunc(showScreen)     # Draw any graphics or shapes in the showScreen function at all times   glutMainLoop()  # Keeps the window created above displaying/running in a loop   

Copy the imports above, as well as this code in a single python (.py) file, and execute it. You should see a white square dimension screen pop up. Now, if we wish to draw any shapes or make any other kind of graphics, we need to do that in our “showScreen” function.

Let’s now try to make a square using OpenGL, but before we do we need to understand the coordinate system that OpenGL follows.

The (0,0) point is the bottom left of your window, if you go up from there, you’re moving along the y-axis, and if you go right from there, you’re moving along the x-axis. So, the top left point of your window would be (0, 500), top right would be (500, 500), bottom right would be (500, 0).

Note: We’re talking about the window we created above, which had a dimension of 500 x 500 in our example, and not your computer’s full screen.

Now that we’ve got that out of the way, lets code a square. The explanation to the code can be found in the comments.

from OpenGL.GL import *   from OpenGL.GLUT import *   from OpenGL.GLU import *  w, h = 500,500  # ---Section 1--- def square():       # We have to declare the points in this sequence: bottom left, bottom right, top right, top left     glBegin(GL_QUADS) # Begin the sketch     glVertex2f(100, 100) # Coordinates for the bottom left point     glVertex2f(200, 100) # Coordinates for the bottom right point     glVertex2f(200, 200) # Coordinates for the top right point     glVertex2f(100, 200) # Coordinates for the top left point     glEnd() # Mark the end of drawing  # This alone isn't enough to draw our square  # ---Section 2---  def showScreen():       glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) # Remove everything from screen (i.e. displays all white)     glLoadIdentity() # Reset all graphic/shape's position     square() # Draw a square using our function     glutSwapBuffers()  #---Section 3---  glutInit()   glutInitDisplayMode(GLUT_RGBA) # Set the display mode to be colored   glutInitWindowSize(500, 500)   # Set the w and h of your window   glutInitWindowPosition(0, 0)   # Set the position at which this windows should appear   wind = glutCreateWindow("OpenGL Coding Practice") # Set a window title   glutDisplayFunc(showScreen)   glutIdleFunc(showScreen) # Keeps the window open   glutMainLoop()  # Keeps the above created window displaying/running in a loop   

Running the code above would draw a square, but that square would not be visible since it’s color would be the same as the color of our window, so we need to assign it a different color as well, for that we will make some changes in “Section 2” of the code above i.e. the showScreen function. Add the following line below the glLoadIdentity statement and above the square() statement:

glColor3f(1.0, 0.0, 3.0) # Set the color to pink   

However, our code is still not complete. What it currently does is draw the square once, and then clear the screen again. We don’t want that. Actually, we won’t even be able to spot the moment when it actually draws the square because it would appear and disappear in a split second. Lets write another function to avoid this.

# Add this function before Section 2 of the code above i.e. the showScreen function def iterate():       glViewport(0, 0, 500,500)     glMatrixMode(GL_PROJECTION)     glLoadIdentity()     glOrtho(0.0, 500, 0.0, 500, 0.0, 1.0)     glMatrixMode (GL_MODELVIEW)     glLoadIdentity() 

Call this iterate function in “Section 2” of the code above. Add it below glLoadIdentity and above the glColor3d statement in the showScreen function.

Let’s now compile all this into a single code file so that there are no ambiguities:

from OpenGL.GL import *   from OpenGL.GLUT import *   from OpenGL.GLU import *  w,h= 500,500   def square():       glBegin(GL_QUADS)     glVertex2f(100, 100)     glVertex2f(200, 100)     glVertex2f(200, 200)     glVertex2f(100, 200)     glEnd()  def iterate():       glViewport(0, 0, 500, 500)     glMatrixMode(GL_PROJECTION)     glLoadIdentity()     glOrtho(0.0, 500, 0.0, 500, 0.0, 1.0)     glMatrixMode (GL_MODELVIEW)     glLoadIdentity()  def showScreen():       glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)     glLoadIdentity()     iterate()     glColor3f(1.0, 0.0, 3.0)     square()     glutSwapBuffers()  glutInit()   glutInitDisplayMode(GLUT_RGBA)   glutInitWindowSize(500, 500)   glutInitWindowPosition(0, 0)   wind = glutCreateWindow("OpenGL Coding Practice")   glutDisplayFunc(showScreen)   glutIdleFunc(showScreen)   glutMainLoop()   

When you run this, a window should appear with a pink colored square box in it.

Output:

### Conclusion

In this tutorial, we learned about OpenGL, how to download and install it, followed by using it an a short example program. In this example we also practiced making a basic shape using OpenGL, which gave us an insight into some complex function calls that need to be made whenever we need to draw something using this library. To conclude, OpenGL is very resourceful and gets more and more complex as we dive deeper into it.

Planet Python

## Stack Abuse: Introduction to Reinforcement Learning with Python

### Introduction

Reinforcement Learning is definitely one of the most active and stimulating areas of research in AI.

The interest in this field grew exponentially over the last couple of years, following great (and greatly publicized) advances, such as DeepMind’s AlphaGo beating the word champion of GO, and OpenAI AI models beating professional DOTA players.

Thanks to all of these advances, Reinforcement Learning is now being applied in a variety of different fields, from healthcare to finance, from chemistry to resource management.

In this article, we will introduce the fundamental concepts and terminology of Reinforcement Learning, and we will apply them in a practical example.

### What is Reinforcement Learning?

Reinforcement Learning (RL) is a branch of machine learning concerned with actors, or agents, taking actions is some kind of environment in order to maximize some type of reward that they collect along the way.

This is deliberately a very loose definition, which is why reinforcement learning techniques can be applied to a very wide range of real-world problems.

Imagine someone playing a video game. The player is the agent, and the game is the environment. The rewards the player gets (i.e. beat an enemy, complete a level), or doesn’t get (i.e. step into a trap, lose a fight) will teach him how to be a better player.

As you’ve probably noticed, reinforcement learning doesn’t really fit into the categories of supervised/unsupervised/semi-supervised learning.

In supervised learning, for example, each decision taken by the model is independent, and doesn’t affect what we see in the future.

In reinforcement learning, instead, we are interested in a long term strategy for our agent, which might include sub-optimal decisions at intermediate steps, and a trade-off between exploration (of unknown paths), and exploitation of what we already know about the environment.

### Brief History of Reinforcement Learning

For several decades (since the 1950s!), reinforcement learning followed two separate threads of research, one focusing on trial and error approaches, and one based on optimal control.

Optimal control methods are aimed at designing a controller to minimize a measure of a dynamical system’s behaviour over time. To achieve this, they mainly used dynamic programming algorithms, which we will see are the foundations of modern reinforcement learning techniques.

Trial-and-error approaches, instead, have deep roots in the psychology of animal learning and neuroscience, and this is where the term reinforcement comes from: actions followed (reinforced) by good or bad outcomes have the tendency to be reselected accordingly.

Arising from the interdisciplinary study of these two fields came a field called Temporal Difference (TD) Learning.

The modern machine learning approaches to RL are mainly based on TD-Learning, which deals with rewards signals and a value function (we’ll see more in detail what these are in the following paragraphs).

### Terminology

We will now take a look at the main concepts and terminology of Reinforcement Learning.

#### Agent

A system that is embedded in an environment, and takes actions to change the state of the environment. Examples include mobile robots, software agents, or industrial controllers.

#### Environment

The external system that the agent can “perceive” and act on.

Environments in RL are defined as Markov Decision Processes (MDPs). A MDP is a tuple:

 (S, A, P, R, \gamma) 

where:

• S is a finite set of states
• A is a finite set of actions
• P is a state transition probability matrix
 P_{ss’}^{a} = \mathbb{P}[S_{t+1} = s’| S_t = s, A_t = a] 
• R is a reward function
 R_s^a = \mathbb{E}[R_{t+1}|S_t=s, A_t = a] 
• γ is a discount factor, γ ∈ [0,1]

A lot of real-world scenarios can be represented as Markov Decision Processes, from a simple chess board to a much more complex video game.

In a chess environment, the states are all the possible configurations of the board (there are a lot). The actions refer to moving the pieces, surrendering, etc.

The rewards are based on whether we win or lose the game, so that winning actions have higher return than losing ones.

State transition probabilities enforce the game rules. For example, an illegal action (move a rook diagonally) will have zero probability.

#### Reward Function

The reward function maps states to their rewards. This is the information that the agents use to learn how to navigate the environment.

A lot of research goes into designing a good reward function and overcoming the problem of sparse rewards, when the often sparse nature of rewards in the environment doesn’t allow the agent to learn properly from it.

Return Gt is defined as the discounted sum of rewards from timestep t.

 G_t=\sum_{k=0}^{\infty} \gamma^k R_{t+k+1} 

γ is called the discount factor, and it works by reducing the amount of the rewards as we move into the future.

Discounting rewards allows us to represent uncertainty about the future, but it also helps us model human behavior better, since it has been shown that humans/animals have a preference for immediate rewards.

#### Value Function

The value function is probably the most important piece of information we can hold about a RL problem.

Formally, the value function is the expected return starting from state s. In practice, the value function tells us how good it is for the agent to be in a certain state. The higher the value of a state, the higher the amount of reward we can expect:

 v_\pi (s) = \mathbb{E}_\pi [G_t|S_t = s] 

The actual name for this function is state-value function, to distinguish it from another important element in RL: the action-value function.

The action-value function gives us the value, i.e. the expected return, for using action a in a certain state s:

 q_\pi (s, a) = \mathbb{E}_\pi [G_t|S_t = s, A_t = a] 

#### Policy

The policy defines the behaviour of our agent in the MDP.

Formally, policies are distributions over actions given states. A policy maps states to the probability of taking each action from that state:

 \pi (a|s) = \mathbb{P}[A_t = a|S_t=s] 

The ultimate goal of RL is to find an optimal (or a good enough) policy for our agent. In the video game example, you can think of the policy as the strategy that the player follows, i.e, the actions the player takes when presented with certain scenarios.

### Main approaches

A lot of different models and algorithms are being applied to RL problems.

Really, a lot.

However, all of them more or less fall into the same two categories: policy-based, and value-based.

#### Policy-Based Approach

In policy-based approaches to RL, our goal is to learn the best possible policy. Policy models will directly output the best possible move from the current state, or a distribution over the possible actions.

#### Value-Based Approach

In value-based approaches, we want to find the the optimal value function, which is the maximum value function over all policies.

We can then choose which actions to take (i.e. which policy to use) based on the values we get from the model.

### Exploration vs Exploitation

The trade-off between exploration and exploitation has been widely studied in the RL literature.

Exploration refers to the act of visiting and collecting information about states in the environment that we have not yet visited, or about which we still don’t have much information. The ideas is that exploring our MDP might lead us to better decisions in the future.

On the other side, exploitation consists on making the best decision given current knowledge, comfortable in the bubble of the already known.

We will see in the following example how these concepts apply to a real problem.

### A Multi-Armed Bandit

We will now look at a practical example of a Reinforcement Learning problem – the multi-armed bandit problem.

The multi-armed bandit is one of the most popular problems in RL:

You are faced repeatedly with a choice among k different options, or actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected. Your objective is to maximize the expected total reward over some time period, for example, over 1000 action selections, or time steps.

You can think of it in analogy to a slot machine (a one-armed bandit). Each action selection is like a play of one of the slot machine’s levers, and the rewards are the payoffs for hitting the jackpot.

Solving this problem means that we can come come up with an optimal policy: a strategy that allows us to select the best possible action (the one with the highest expected return) at each time step.

#### Action-Value Methods

A very simple solution is based on the action value function. Remember that an action value is the mean reward when that action is selected:

 q(a) = E[R_t \mid A=a] 

We can easily estimate q using the sample average:

 Q_t(a) = \frac{\text{sum of rewards when “a” taken prior to “t”}}{\text{number of times “a” taken prior to “t”}} 

If we collect enough observations, our estimate gets close enough to the real function. We can then act greedily at each timestep, i.e. select the action with the highest value, to collect the highest possible rewards.

#### Don’t be too Greedy

Remember when we talked about the trade-off between exploration and exploitation? This is one example of why we should care about it.

As a matter of fact, if we always act greedily as proposed in the previous paragraph, we never try out sub-optimal actions which might actually eventually lead to better results.

To introduce some degree of exploration in our solution, we can use an ε-greedy strategy: we select actions greedily most of the time, but every once in a while, with probability ε, we select a random action, regardless of the action values.

It turns out that this simple exploration method works very well, and it can significantly increase the rewards we get.

One final caveat – to avoid from making our solution too computationally expensive, we compute the average incrementally according to this formula:

 Q_{n+1} = Q_n + \frac{1}{n}[R_n – Q_n] 

#### Python Solution Walkthrough

import numpy as np  # Number of bandits k = 3  # Our action values Q = [0 for _ in range(k)]  # This is to keep track of the number of times we take each action N = [0 for _ in range(k)]  # Epsilon value for exploration eps = 0.1  # True probability of winning for each bandit p_bandits = [0.45, 0.40, 0.80]  def pull(a):       """Pull arm of bandit with index i and return 1 if win,      else return 0."""     if np.random.rand() < p_bandits[a]:         return 1     else:         return 0  while True:       if np.random.rand() > eps:         # Take greedy action most of the time         a = np.argmax(Q)     else:         # Take random action with probability eps         a = np.random.randint(0, k)      # Collect reward     reward = pull(a)      # Incremental average     N[a] += 1     Q[a] += 1/N[a] * (reward - Q[a]) 

Et voilà! If we run this script for a couple of seconds, we already see that our action values are proportional to the probability of hitting the jackpots for our bandits:

0.4406301434281669,   0.39131455399060977,   0.8008844354479673   

This means that our greedy policy will correctly favour actions from which we can expect higher rewards.

### Conclusion

Reinforcement Learning is a growing field, and there is a lot more to cover. In fact, we still haven’t looked at general-purpose algorithms and models (e.g. dynamic programming, Monte Carlo, Temporal Difference).

The most important thing right now is to get familiar with concepts such as value functions, policies, and MDPs. In the Resources section of this article, you’ll find some awesome resources to gain a deeper understanding of this kind of material.

Planet Python

## An Introduction to the Strings Package in Go

### Introduction

Go’s string package has several functions available to work with the string data type. These functions let us easily modify and manipulate strings. We can think of functions as being actions that we perform on elements of our code. Built-in functions are those that are defined in the Go programming language and are readily available for us to use.

In this tutorial, we’ll review several different functions that we can use to work with strings in Go.

## Making Strings Uppercase and Lowercase

The functions strings.ToUpper and strings.ToLower will return a string with all the letters of an original string converted to uppercase or lowercase letters. Because strings are immutable data types, the returned string will be a new string. Any characters in the string that are not letters will not be changed.

Let’s convert the string "Sammy Shark" to be all uppercase:

ss := "Sammy Shark" fmt.Println(strings.ToUpper(ss)) 
OutputSAMMY SHARK 

Now, let’s convert the string to be all lowercase:

fmt.Println(strings.ToLower(ss)) 
Outputsammy shark 

Since you are using the strings package, you first need to import it into your program. To convert the string to uppercase and lowercase the entire program would be as follows:

package main  import (     "fmt"     "strings" )  func main() {     ss := "Sammy Shark"     fmt.Println(strings.ToUpper(ss))     fmt.Println(strings.ToLower(ss)) } 

The strings.ToUpper and strings.ToLower functions make it easier to evaluate and compare strings by making case consistent throughout. For example, if a user writes their name all lowercase, we can still determine whether their name is in our database by checking it against an all uppercase version.

## String Search Functions

The strings package has a number of functions that help determine if a string contains a specific sequence of characters.

Function Use
strings.HasPrefix Searches the string from the beginning
strings.HasSuffix Searches the string from the end
strings.Contains Searches anywhere in the string
strings.Count Counts how many times the string appears

The strings.HasPrefix and strings.HasSuffix allow you to check to see if a string starts or ends with a specific set of characters.

Let’s check to see if the string Sammy Shark starts with Sammy and ends with Shark.

ss := "Sammy Shark" fmt.Println(strings.HasPrefix(ss, "Sammy")) fmt.Println(strings.HasSuffix(ss, "Shark")) 
Outputtrue true 

Let’s check to see if the string Sammy Shark contains the sequence Sh:

fmt.Println(strings.Contains(ss, "Sh")) 
Outputtrue 

Finally, let’s see how many times the letter S appears in the phrase Sammy Shark:

fmt.Println(strings.Count(ss, "S")) 
Output2 

Note: All strings in Go are case sensitive. This means that Sammy is not the same as sammy.

Using a lowercase s to get a count from Sammy Shark is not the same as using uppercase S:

fmt.Println(strings.Count(ss, "s")) 
Output0 

Because S is different than s, the count returned will be 0.

String functions are useful when you want to compare or search strings in your program.

## Determining String Length

The built-in function len() returns the number of characters in a string. This function is useful for when you need to enforce minimum or maximum password lengths, or to truncate larger strings to be within certain limits for use as abbreviations.

To demonstrate this function, we’ll find the length of a sentence-long string:

openSource := "Sammy contributes to open source." fmt.Println(len(openSource)) 
Output33 

We set the variable openSource equal to the string "Sammy contributes to open source." and then passed that variable to the len() function with len(openSource). Finally we passed the function into the fmt.Println() function so that we could see the program’s output on the screen..

Keep in mind that the len() function will count any character bound by double quotation marks—including letters, numbers, whitespace characters, and symbols.

## Functions for String Manipulation

The strings.Join, strings.Split, and strings.ReplaceAll functions are a few additional ways to manipulate strings in Go.

The strings.Join function is useful for combining a slice of strings into a new single string.

Let’s create a comma-separated string from a slice of strings:

fmt.Println(strings.Join([]string{"sharks", "crustaceans", "plankton"}, ",")) 
Outputsharks,crustaceans,plankton 

If we want to add a comma and a space between string values in our new string, we can simply rewrite our expression with a whitespace after the comma: strings.Join([]string{"sharks", "crustaceans", "plankton"}, ", ").

Just as we can join strings together, we can also split strings up. To do this, we use the strings.Split function and split on the spaces:

balloon := "Sammy has a balloon." s := strings.Split(balloon, " ") fmt.Println(s) 
Output[Sammy has a balloon] 

The output is a slice of strings. Since strings.Println was used, it is hard to tell what the output is by looking at it. To see that it is indeed a slice of strings, use the fmt.Printf function with the %q verb to quote the strings:

fmt.Printf("%q", s) 
Output["Sammy" "has" "a" "balloon."] 

Another useful function in addition to strings.Split is strings.Fields. The difference is that strings.Fields will ignore all whitespace, and will only split out the actual fields in a string:

data := "  username password     email  date" fields := strings.Fields(data) fmt.Printf("%q", fields) 
Output["username" "password" "email" "date"] 

The strings.ReplaceAll function can take an original string and return an updated string with some replacement.

Let’s say that the balloon that Sammy had is lost. Since Sammy no longer has this balloon, we will change the substring "has" from the original string balloon to "had" in a new string:

fmt.Println(strings.ReplaceAll(balloon, "has", "had")) 

Within the parentheses, first is balloon the variable that stores the original string; the second substring "has" is what we want to be replaced, and the third substring "had" is what we are replacing that second substring with. Our output will look like this:

OutputSammy had a balloon. 

Using the string function strings.Join, strings.Split, and strings.ReplaceAll will provide you with greater control to manipulate strings in Go.

## Conclusion

This tutorial went through some of the common string package functions for the string data type that you can use to work with and manipulate strings in your Go programs.

DigitalOcean Community Tutorials

## Stack Abuse: Python for NLP: Introduction to the Pattern Library

This is the eighth article in my series of articles on Python for NLP. In my previous article, I explained how Python’s TextBlob library can be used to perform a variety of NLP tasks ranging from tokenization to POS tagging, and text classification to sentiment analysis. In this article, we will explore Python’s Pattern library, which is another extremely useful Natural Language Processing library.

The Pattern library is a multipurpose library capable of handling the following tasks:

• Natural Language Processing: Performing tasks such as tokenization, stemming, POS tagging, sentiment analysis, etc.
• Data Mining: It contains APIs to mine data from sites like Twitter, Facebook, Wikipedia, etc.
• Machine Learning: Contains machine learning models such as SVM, KNN, and perceptron, which can be used for classification, regression, and clustering tasks.

In this article, we will see the first two applications of the Pattern library from the above list. We will explore the use of the Pattern Library for NLP by performing tasks such as tokenization, stemming and sentiment analysis. We will also see how the Pattern library can be used for web mining.

### Installing the Library

To install the library, you can use the following pip command:

$pip install pattern  Otherwise if you are using the Anaconda distribution of Python, you can use the following Anaconda command to download the library: $   conda install -c asmeurer pattern 

### Pattern Library Functions for NLP

In this section, we will see some of the NLP applications of the Pattern Library.

#### Tokenizing, POS Tagging, and Chunking

In the NLTK and spaCy libraries, we have a separate function for tokenizing, POS tagging, and finding noun phrases in text documents. On the other hand, in the Pattern library there is the all-in-one parse method that takes a text string as an input parameter and returns corresponding tokens in the string, along with the POS tag.

The parse method also tells us if a token is a noun phrase or verb phrase, or subject or object. You can also retrieve lemmatized tokens by setting lemmata parameter to True. The syntax of the parse method along with the default values for different parameters is as follows:

parse(string,       tokenize=True,      # Split punctuation marks from words?     tags=True,          # Parse part-of-speech tags? (NN, JJ, ...)     chunks=True,        # Parse chunks? (NP, VP, PNP, ...)     relations=False,    # Parse chunk relations? (-SBJ, -OBJ, ...)     lemmata=False,      # Parse lemmata? (ate => eat)     encoding='utf-8',   # Input string encoding.     tagset=None         # Penn Treebank II (default) or UNIVERSAL. ) 

Let’s see the parse method in action:

from pattern.en import parse   from pattern.en import pprint  pprint(parse('I drove my car to the hospital yesterday', relations=True, lemmata=True))   

To use the parse method, you have to import the en module from the pattern library. The en module contains English language NLP functions. If you use the pprint method to print the output of the parse method on the console, you should see the following output:

         WORD   TAG    CHUNK   ROLE   ID     PNP    LEMMA               I   PRP    NP      SBJ    1      -      i          drove   VBD    VP      -      1      -      drive             my   PRP$NP OBJ 1 - my car NN NP ^ OBJ 1 - car to TO - - - - to the DT NP - - - the hospital NN NP ^ - - - hospital yesterday NN NP ^ - - - yesterday  In the output, you can see the tokenized words along with their POS tag, the chunk that the tokens belong to, and the role. You can also see the lemmatized form of the tokens. If you call the split method on the object returned by the parse method, the output will be a list of sentences, where each sentence is a list of tokens and each token is a list of words, along with the tags associated with the words. For instance look at the following script: from pattern.en import parse from pattern.en import pprint print(parse('I drove my car to the hospital yesterday', relations=True, lemmata=True).split())  The output of the script above looks like this: [[['I', 'PRP', 'B-NP', 'O', 'NP-SBJ-1', 'i'], ['drove', 'VBD', 'B-VP', 'O', 'VP-1', 'drive'], ['my', 'PRP$  ', 'B-NP', 'O', 'NP-OBJ-1', 'my'], ['car', 'NN', 'I-NP', 'O', 'NP-OBJ-1', 'car'], ['to', 'TO', 'O', 'O', 'O', 'to'], ['the', 'DT', 'B-NP', 'O', 'O', 'the'], ['hospital', 'NN', 'I-NP', 'O', 'O', 'hospital'], ['yesterday', 'NN', 'I-NP', 'O', 'O', 'yesterday']]] 

#### Pluralizing and Singularizing the Tokens

The pluralize and singularize methods are used to convert singular words to plurals and vice versa, respectively.

from pattern.en import pluralize, singularize  print(pluralize('leaf'))   print(singularize('theives'))   

The output looks like this:

leaves   theif   

#### Converting Adjective to Comparative and Superlative Degrees

You can retrieve comparative and superlative degrees of an adjective using comparative and superlative functions. For instance, the comparative degree of good is better and the superlative degree of good is best. Let’s see this in action:

from pattern.en import comparative, superlative  print(comparative('good'))   print(superlative('good'))   

Output:

better   best   

#### Finding N-Grams

N-Grams refer to “n” combination of words in a sentence. For instance, for the sentence “He goes to hospital”, 2-grams would be (He goes), (goes to) and (to hospital). N-Grams can play a crucial role in text classification and language modeling.

In the Pattern library, the ngram method is used to find the all the n-grams in a text string. The first parameter to the ngram method is the text string. The number of n-grams is passed to the n parameter of the method. Look at the following example:

from pattern.en import ngrams  print(ngrams("He goes to hospital", n=2))   

Output:

[('He', 'goes'), ('goes', 'to'), ('to', 'hospital')] 

#### Finding Sentiments

Sentiment refers to an opinion or feeling towards a certain thing. The Pattern library offers functionality to find sentiment from a text string.

In Pattern, the sentiment object is used to find the polarity (positivity or negativity) of a text along with its subjectivity.

Depending upon the most commonly occurring positive (good, best, excellent, etc.) and negative (bad, awful, pathetic, etc.) adjectives, a sentiment score between 1 and -1 is assigned to the text. This sentiment score is also called the polarity.

In addition to the sentiment score, subjectivity is also returned. The subjectivity value can be between 0 and 1. Subjectivity quantifies the amount of personal opinion and factual information contained in the text. The higher subjectivity means that the text contains personal opinion rather than factual information.

from pattern.en import sentiment  print(sentiment("This is an excellent movie to watch. I really love it"))   

When you run the above script, you should see the following output:

(0.75, 0.8) 

The sentence “This is an excellent movie to watch. I really love it” has a sentiment of 0.75, which shows that it is highly positive. Similarly, the subjectivity of 0.8 refers to the fact that the sentence is a personal opinion of the user.

#### Checking if a Statement is a Fact

The modality function from the Pattern library can be used to find the degree of certainty in the text string. The modality function returns a value between -1 to 1. For facts, the modality function returns a value greater than 0.5.

Here is an example of it in action:

from pattern.en import parse, Sentence   from pattern.en import modality  text = "Paris is the capital of France"   sent = parse(text, lemmata=True)   sent = Sentence(sent)  print(modality(sent))   
1.0   

In the script above we first import the parse method along with the Sentence class. On the second line, we import the modality function. The parse method takes text as input and returns a tokenized form of the text, which is then passed to the Sentence class constructor. The modality method takes the Sentence class object and returns the modality of the sentence.

Since the text string “Paris is the capital of France” is a fact, in the output, you will see a value of 1.

Similarly, for a sentence which is not certain, the value returned by the modality method is around 0.0. Look at the following script:

text = "I think we can complete this task"   sent = parse(text, lemmata=True)   sent = Sentence(sent)  print(modality(sent))   
0.25   

Since the string in the above example is not very certain, the modality of the above string will be 0.25.

#### Spelling Corrections

The suggest method can be used to find if a word is spelled correctly or not. The suggest method returns 1 if a word is 100% correctly spelled. Otherwise the suggest method returns the possible corrections for the word along with their probability of correctness.

Look at the following example:

from pattern.en import suggest  print(suggest("Whitle"))   

In the script above we have a word Whitle which is incorrectly spelled. In the output, you will see possible suggestions for this word.

[('While', 0.6459209419680404), ('White', 0.2968881412952061), ('Title', 0.03280067283431455), ('Whistle', 0.023549201009251473), ('Chile', 0.0008410428931875525)] 

According to the suggest method, there is a 0.64 probability that the word is “While”, similarly there is a probability of 0.29 that the word is “White”, and so on.

Now let’s spell a word correctly:

from pattern.en import suggest   print(suggest("Fracture"))   

Output:

[('Fracture', 1.0)] 

From the output, you can see that there is a 100% chance that the word is spelled correctly.

#### Working with Numbers

The Pattern library contains functions that can be used to convert numbers in the form of text strings into their numeric counterparts and vice versa. To convert from text to numeric representation the number function is used. Similarly to convert back from numbers to their corresponding text representation the numerals function is used. Look at the following script:

from pattern.en import number, numerals  print(number("one hundred and twenty two"))   print(numerals(256.390, round=2))   

Output:

122   two hundred and fifty-six point thirty-nine   

In the output, you will see 122 which is the numeric representation of text “one hundred and twenty-two”. Similarly, you should see “two hundred and fifty-six point thirty-nine” which is text representation of the number 256.390.

Remember, for numerals function we have to provide the integer value that we want our number to be rounded-off to.

The quantify function is used to get a word count estimation of the items in the list, which provides a phrase for referring to the group. If a list has 3-8 similar items, the quantify function will quantify it to “several”. Two items are quantified to a “couple”.

from pattern.en import quantify  print(quantify(['apple', 'apple', 'apple', 'banana', 'banana', 'banana', 'mango', 'mango']))   

In the list, we have three apples, three bananas, and two mangoes. The output of the quantify function for this list looks like this:

several bananas, several apples and a pair of mangoes   

Similarly, the following example demonstrates the other word count estimations.

from pattern.en import quantify  print(quantify({'strawberry': 200, 'peach': 15}))   print(quantify('orange', amount=1200))   

Output:

hundreds of strawberries and a number of peaches   thousands of oranges   

### Pattern Library Functions for Data Mining

In the previous section, we saw some of the most commonly used functions of the Pattern library for NLP. In this section, we will see how the Pattern library can be used to perform a variety of data mining tasks.

The web module of the Pattern library is used for web mining tasks.

#### Accessing Web Pages

The URL object is used to retrieve contents from the webpages. It has several methods that can be used to open a webpage, download the contents from a webpage and read a webpage.

You can directly use the download method to download the HTML contents of any webpage. The following script downloads the HTML source code for the Wikipedia article on artificial intelligence.

from pattern.web import download  page_html = download('https://en.wikipedia.org/wiki/Artificial_intelligence', unicode=True)   

You can also download files from webpages, for example, images using the URL method:

from pattern.web import URL, extension  page_url = URL('https://upload.wikimedia.org/wikipedia/commons/f/f1/RougeOr_football.jpg')   file = open('football' + extension(page_url.page), 'wb')   file.write(page_url.download())   file.close()   

In the script above we first make a connection with the webpage using the URL method. Next, we call the extension method on the opened page, which returns the file extension. The file extension is appended at the end of the string “football”. The open method is called to read this path and finally, the download() method downloads the image and writes it to the default execution path.

#### Finding URLs within Text

You can use the findurl method to extract URLs from text strings. Here is an example:

from pattern.web import find_urls  print(find_urls('To search anything, go to www.google.com', unique=True))   

In the output, you will see the URL for the Google website as shown below:

['www.google.com'] 

#### Making Asynchronous Requests for Webpages

Webpages can be very large and it can take quite a bit of time download the complete contents of the webpage, which can block a user from performing any other task on the application until the complete webpage is downloaded. However, the web module of the Pattern library contains a function asynchronous, which downloads contents of a webpage in a parallel manner. The asynchronous method runs in the background so that the user can interact with the application while the webpage is being downloaded.

Let’s take a very simple example of the asynchronous method:

from pattern.web import asynchronous, time, Google  asyn_req = asynchronous(Google().search, 'artificial intelligence', timeout=4)   while not asyn_req.done:       time.sleep(0.1)     print('searching...')  print(asyn_req.value)  print(find_urls(asyn_req.value, unique=True))   

In the above script, we retrieve the Google search result of page 1 for the search query “artificial intelligence”, you can see that while the page downloads we execute a while loop in parallel. Finally, the results retrieved by the query are printed using the value attribute of the object returned by the asynchronous module. Next, we extract the URLs from the search, which are then printed on the screen.

#### Getting Search Engine Results with APIs

The pattern library contains SearchEngine class which is derived by the classes that can be used to connect to call API’s of different search engines and websites such as Google, Bing, Facebook, Wikipedia, Twitter, etc. The SearchEngine object construct accepts three parameters:

• license: The developer license key for the corresponding search engine or website
• throttle: Corresponds to the time difference between successive request to the server
• langauge: Specifies the language for the results

The search method of the SearchEngine class is used to make a request to search engine for certain search query. The search method can take the following parameters:

• query: The search string
• type: The type of data you want to search, it can take three values: SEARCH, NEWS and IMAGE.
• start: The page from which you want to start the search
• count: The number of results per page.

The search engine classes that inherit the SearchEngine class along with its search method are: Google, Bing, Twitter, Facebook, Wikipedia, and Flickr.

The search query returns objects for each item. The result object can then be used to retrieve the information about the searched result. The attributes of the result object are url, title, text, language, author, date.

Now let’s see a very simple example of how we can search something on Google via pattern library. Remember, to make this example work, you will have to use your developer license key for the Google API.

from pattern.web import Google  google = Google(license=None)   for search_result in google.search('artificial intelligence'):       print(search_result.url)     print(search_result.text) 

In the script above, we create an object of Google class. In the constructor of Google, pass your own license key to the license parameter. Next, we pass the string artificial intelligence to the search method. By default, the first 10 results from the first page will be returned which are then iterated, and the url and text of each result is displayed on the screen.

The process is similar for Bing search engine, you only have to replace the Bing class with Google in the script above.

Let’s now search Twitter for the three latest tweets that contain the text “artificial intelligence”. Execute the following script:

from pattern.web import Twitter  twitter = Twitter()   index = None   for j in range(3):       for tweet in twitter.search('artificial intelligence', start=index, count=3):         print(tweet.text)         index = tweet.id 

In the script above we first import the Twitter class from the pattern.web module. Next, We iterate over the tweets returned by the Twitter class and display the text of the tweet on the console. You do not need any license key to run the above script.

#### Converting HTML Data to Plain Text

The download method of the URL class returns data in the form of HTML. However, if you want to do a semantic analysis of the text, for instance, sentiment classification, you need data cleaned data without HTML tags. You can clean the data with the plaintext method. The method takes as a parameter, the HTML content returned by the download method, and returns cleaned text.

Look at the following script:

from pattern.web import URL, plaintext  html_content = URL('https://stackabuse.com/python-for-nlp-introduction-to-the-textblob-library/').download()   cleaned_page = plaintext(html_content.decode('utf-8'))   print(cleaned_page)   

In the output, you should see the cleaned text from the webpage:

It is important to remember that if you are using Python 3, you will need to call decode('utf-8') method to convert the data from byte to string format.

#### Parsing PDF Documments

The Pattern library contains PDF object that can be used to parse a PDF document. PDF (Portable Document Format) is a cross platform file which contains images, texts, and fonts in a stand-alone document.

Let’s see how a PDF document can be parsed with the PDF object:

from pattern.web import URL, PDF  pdf_doc = URL('http://demo.clab.cs.cmu.edu/NLP/syllabus_f18.pdf').download()   print(PDF(pdf_doc.decode('utf-8')))   

In the script we download a document using the download function. Next, the downloaded HTML document is passed to the PDF class which finally prints it on the console.

#### Clearing the Cache

The results returned by the methods such as SearchEngine.search() and URL.download() are, by default, stored in the local cache. To clear the cache after downloading an HTML document, we can use clear method of the cache class, as shown below:

from pattern.web import cache  cache.clear()   

### Conclusion

The Pattern library is one of the most useful natural language processing libraries in Python. Although it is not as well-known as spaCy or NLTK, it contains functionalities such as finding superlatives and comparatives, and fact and opinion detection which distinguishes it from the other NLP libraries.

In this article, we studied the application of the Pattern library for natural language processing, and data mining and web scraping. We saw how to perform basic NLP tasks such as tokenization, lemmatization and sentiment analysis with the Pattern library. Finally, we also saw how to use Pattern for making search engine queries, mining online tweets and cleaning HTML documents.

Planet Python