Stack Abuse: Solving Sequence Problems with LSTM in Python’s Keras Library

In this article, you will learn how to perform time series forecasting that is used to solve sequence problems.

Time series forecasting refers to the type of problems where we have to predict an outcome based on time dependent inputs. A typical example of time series data is stock market data where stock prices change with time. Similarly, the hourly temperature of a particular place also changes and can also be considered as time series data. Time series data is basically a sequence of data, hence time series problems are often referred to as sequence problems.

Recurrent Neural Networks (RNN) have been proven to efficiently solve sequence problems. Particularly, Long Short Term Memory Network (LSTM), which is a variation of RNN, is currently being used in a variety of domains to solve sequence problems.

Types of Sequence Problems

Sequence problems can be broadly categorized into the following categories:

  1. One-to-One: Where there is one input and one output. Typical example of a one-to-one sequence problems is the case where you have an image and you want to predict a single label for the image.
  2. Many-to-One: In many-to-one sequence problems, we have a sequence of data as input and we have to predict a single output. Text classification is a prime example of many-to-one sequence problems where we have an input sequence of words and we want to predict a single output tag.
  3. One-to-Many: In one-to-many sequence problems, we have single input and a sequence of outputs. A typical example is an image and its corresponding description.
  4. Many-to-Many: Many-to-many sequence problems involve a sequence input and a sequence output. For instance, stock prices of 7 days as input and stock prices of next 7 days as outputs. Chatbots are also an example of many-to-many sequence problems where a text sequence is an input and another text sequence is the output.

This article is part 1 of the series. In this article, we will see how LSTM and its different variants can be used to solve one-to-one and many-to-one sequence problems. In the next part, we will see how to solve one-to-many and many-to-many sequence problems. We will be working with Python’s Keras library.

After reading this article, you will be able solve problems like stock price prediction, weather prediction, etc., based on historic data. Since, text is also a sequence of words, the knowledge gained in this article can also be used to solve natural language processing tasks such as text classification, language generation, etc.

One-to-One Sequence Problems

As I said earlier, in one-to-one sequence problems, there is a single input and a single output. In this section we will see two types of sequence problems. First we will see how to solve one-to-one sequence problems with a single feature and then we will see how to solve one-to-one sequence problems with multiple features.

One-to-One Sequence Problems with a Single Feature

In this section, we will see how to solve one-to-one sequence problem where each time-step has a single feature.

Let’s first import the required libraries that we are going to use in this article:

from numpy import array from keras.preprocessing.text import one_hot from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras.layers.core import Activation, Dropout, Dense from keras.layers import Flatten, LSTM from keras.layers import GlobalMaxPooling1D from keras.models import Model from keras.layers.embeddings import Embedding from sklearn.model_selection import train_test_split from keras.preprocessing.text import Tokenizer from keras.layers import Input from keras.layers.merge import Concatenate from keras.layers import Bidirectional  import pandas as pd import numpy as np import re  import matplotlib.pyplot as plt 
Creating the Dataset

In this next step, we will prepare the dataset that we are going to use for this section.

X = list() Y = list() X = [x+1 for x in range(20)] Y = [y * 15 for y in X]  print(X) print(Y) 

In the script above, we create 20 inputs and 20 outputs. Each input consists of one time-step, which in turn contains a single feature. Each output value is 15 times the corresponding input value. If you run the above script, you should see the input and output values as shown below:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] [15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300] 

The input to LSTM layer should be in 3D shape i.e. (samples, time-steps, features). The samples are the number of samples in the input data. We have 20 samples in the input. The time-steps is the number of time-steps per sample. We have 1 time-step. Finally, features correspond to the number of features per time-step. We have one feature per time-step.

We can reshape our data via the following command:

X = array(X).reshape(20, 1, 1) 
Solution via Simple LSTM

Now we can create our simple LSTM model with one LSTM layer.

model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(1, 1))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') print(model.summary()) 

In the script above, we create an LSTM model with one LSTM layer of 50 neurons and relu activation functions. You can see the input shape is (1,1) since our data has one time-step with one feature. Executing the above script prints the following summary:

Layer (type)                 Output Shape              Param # ================================================================= lstm_16 (LSTM)               (None, 50)                10400 _________________________________________________________________ dense_15 (Dense)             (None, 1)                 51 ================================================================= Total params: 10,451 Trainable params: 10,451 Non-trainable params: 0 

Let’s now train our model:

model.fit(X, Y, epochs=2000, validation_split=0.2, batch_size=5) 

We train our model for 2000 epochs with a batch size of 5. You can choose any number. Once the model is trained, we can make predictions on a new instance.

Let’s say we want to predict the output for an input of 30. The actual output should be 30 x 15 = 450. Let’s see what value do we get. First, we need to convert our test data to the right shape i.e. 3D shape, as expected by LSTM. The following script predicts the output for the number 30:

test_input = array([30]) test_input = test_input.reshape((1, 1, 1)) test_output = model.predict(test_input, verbose=0) print(test_output) 

I got an output value of 437.86 which is slightly less than 450.

Note: It is important to mention that the outputs that you obtain by running the scripts will different from mine. This is because the LSTM neural network initializes weights with random values and your values. But overall, the results should not differ much.

Solution via Stacked LSTM

Let’s now create a stacked LSTM and see if we can get better results. The dataset will remain the same, the model will be changed. Look at the following script:

model = Sequential() model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(1, 1))) model.add(LSTM(50, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') print(model.summary()) 

In the above model, we have two LSTM layers. Notice, the first LSTM layer has parameter return_sequences, which is set to True. When the return sequence is set to True, the output of the hidden state of each neuron is used as an input to the next LSTM layer. The summary of the above model is as follows:

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm_33 (LSTM)               (None, 1, 50)             10400 _________________________________________________________________ lstm_34 (LSTM)               (None, 50)                20200 _________________________________________________________________ dense_24 (Dense)             (None, 1)                 51 ================================================================= Total params: 30,651 Trainable params: 30,651 Non-trainable params: 0 ________________________ 

Next, we need to train our model as shown in the following script:

history = model.fit(X, Y, epochs=2000, validation_split=0.2, verbose=1, batch_size=5) 

Once the model is trained, we will again make predictions on the test data point i.e. 30.

test_input = array([30]) test_input = test_input.reshape((1, 1, 1)) test_output = model.predict(test_input, verbose=0) print(test_output) 

I got an output of 459.85 which is better than 437, the number that we achieved via single LSTM layer.

One-to-One Sequence Problems with Multiple Features

In the last section, each input sample had one time-step, where each time-step had one feature. In this section we will see how to solve one-to-one sequence problem where input time-steps have multiple features.

Creating the Dataset

Let’s first create our dataset. Look at the following script:

nums = 25  X1 = list() X2 = list() X = list() Y = list()  X1 = [(x+1)*2 for x in range(25)] X2 = [(x+1)*3 for x in range(25)] Y = [x1*x2 for x1,x2 in zip(X1,X2)]  print(X1) print(X2) print(Y) 

In the script above, we create three lists: X1, X2, and Y. Each list has 25 elements, which means that that the total sample size is 25. Finally, Y contains the output. X1, X2, and Y lists have been printed below:

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50] [3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75] [6, 24, 54, 96, 150, 216, 294, 384, 486, 600, 726, 864, 1014, 1176, 1350, 1536, 1734, 1944, 2166, 2400, 2646, 2904, 3174, 3456, 3750] 

Each element in the output list, is basically the product of the corresponding elements in the X1 and X2 lists. For instance, the second element in the output list is 24, which is the product of the second element in list X1 i.e. 4, and the second element in the list X2 i.e. 6.

The input will consist of the combination of X1 and X2 lists, where each list will be represented as a column. The following script creates the final input:

X = np.column_stack((X1, X2)) print(X) 

Here is the output:

[[ 2  3]  [ 4  6]  [ 6  9]  [ 8 12]  [10 15]  [12 18]  [14 21]  [16 24]  [18 27]  [20 30]  [22 33]  [24 36]  [26 39]  [28 42]  [30 45]  [32 48]  [34 51]  [36 54]  [38 57]  [40 60]  [42 63]  [44 66]  [46 69]  [48 72]  [50 75]] 

Here the X variable contains our final feature set. You can see it contains two columns i.e. two features per input. As we discussed earlier, we need to convert the input into 3-dimensional shape. Our input has 25 samples, where each sample consist of 1 time-step and each time-step consists of 2 features. The following script reshapes the input.

X = array(X).reshape(25, 1, 2) 
Solution via Simple LSTM

We are now ready to train our LSTM models. Let’s first develop a single LSTM layer model as we did in the previous section:

model = Sequential() model.add(LSTM(80, activation='relu', input_shape=(1, 2))) model.add(Dense(10, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') print(model.summary()) 

Here our LSTM layer contains 80 neurons. We have two dense layers where first layer contains 10 neurons and the second dense layer, which also acts as the output layer, contains 1 neuron. The summary of the model is as follows:

Layer (type)                 Output Shape              Param # ================================================================= lstm_38 (LSTM)               (None, 80)                26560 _________________________________________________________________ dense_29 (Dense)             (None, 10)                810 _________________________________________________________________ dense_30 (Dense)             (None, 1)                 11 ================================================================= Total params: 27,381 Trainable params: 27,381 Non-trainable params: 0 _________________________________________________________________ None 

The following script trains the model:

model.fit(X, Y, epochs=2000, validation_split=0.2, batch_size=5) 

Let’s test our trained model on a new data point. Our data point will have two features i.e. (55,80) the actual output should be 55 x 80 = 4400. Let’s see what our algorithm predicts. Execute the following script:

test_input = array([55,80]) test_input = test_input.reshape((1, 1, 2)) test_output = model.predict(test_input, verbose=0) print(test_output) 

I got 3263.44 in the output, which is far from the actual output.

Solution via Stacked LSTM

Let’s now create a more complex LSTM with multiple LSTM and dense layers and see if we can improve our answer:

model = Sequential() model.add(LSTM(200, activation='relu', return_sequences=True, input_shape=(1, 2))) model.add(LSTM(100, activation='relu', return_sequences=True)) model.add(LSTM(50, activation='relu', return_sequences=True)) model.add(LSTM(25, activation='relu')) model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') print(model.summary()) 

The model summary is as follows:

_________________________________________________________________ Layer (type)                 Output Shape              Param # ================================================================= lstm_53 (LSTM)               (None, 1, 200)            162400 _________________________________________________________________ lstm_54 (LSTM)               (None, 1, 100)            120400 _________________________________________________________________ lstm_55 (LSTM)               (None, 1, 50)             30200 _________________________________________________________________ lstm_56 (LSTM)               (None, 25)                7600 _________________________________________________________________ dense_43 (Dense)             (None, 20)                520 _________________________________________________________________ dense_44 (Dense)             (None, 10)                210 _________________________________________________________________ dense_45 (Dense)             (None, 1)                 11 ================================================================= Total params: 321,341 Trainable params: 321,341 Non-trainable params: 0 

The next step is to train our model and test it on the test data point i.e. (55,80).

To improve the accuracy, we will reduce the batch size, and since our model is more complex now we can also reduce the number of epochs. The following script trains the LSTM model and makes prediction on the test datapoint.

history = model.fit(X, Y, epochs=1000, validation_split=0.1, verbose=1, batch_size=3)  test_output = model.predict(test_input, verbose=0) print(test_output) 

In the output, I got a value of 3705.33 which is still less than 4400, but is much better than the previously obtained value of 3263.44 using single LSTM layer. You can play with different combination of LSTM layers, dense layers, batch size and the number of epochs to see if you get better results.

Many-to-One Sequence Problems

In the previous sections we saw how to solve one-to-one sequence problems with LSTM. In a one-to-one sequence problem, each sample consists of single time-step of one or multiple features. Data with single time-step cannot be considered sequence data in a real sense. Densely connected neural networks have been proven to perform better with single time-step data.

Real sequence data consists of multiple time-steps, such as stock market prices of past 7 days, a sentence containing multiple words, and so on.

In this section, we will see how to solve many-to-one sequence problems. In many-to-one sequence problems, each input sample has more than one time-step, however the output consists of a single element. Each time-step in the input can have one or more features. We will start with many-to-one sequence problems having one feature, and then we will see how to solve many-to-one problems where input time-steps have multiple features.

Many-to-One Sequence Problems with a Single Feature

Let’s first create the dataset. Our dataset will consist of 15 samples. Each sample will have 3 time-steps where each time-step will consist of a single feature i.e. a number. The output for each sample will be the sum of the numbers in each of the three time-steps. For instance, if our sample contains a sequence 4,5,6 the output will be 4 + 5 + 6 = 10.

Creating the Dataset

Let’s first create a list of integers from 1 to 45. Since we want 15 samples in our dataset, we will reshape the list of integers containing the first 45 integers.

X = np.array([x+1 for x in range(45)]) print(X) 

In the output, you should see the first 45 integers:

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24  25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45] 

We can reshape it into number of samples, time-steps and features using the following function:

X = X.reshape(15,3,1) print(X) 

The above script converts the list X into 3-dimensional shape with 15 samples, 3 time-steps, and 1 feature. The script above also prints the reshaped data.

[[[ 1]   [ 2]   [ 3]]   [[ 4]   [ 5]   [ 6]]   [[ 7]   [ 8]   [ 9]]   [[10]   [11]   [12]]   [[13]   [14]   [15]]   [[16]   [17]   [18]]   [[19]   [20]   [21]]   [[22]   [23]   [24]]   [[25]   [26]   [27]]   [[28]   [29]   [30]]   [[31]   [32]   [33]]   [[34]   [35]   [36]]   [[37]   [38]   [39]]   [[40]   [41]   [42]]   [[43]   [44]   [45]]] 

We have converted our input data into the right format, let’s now create our output vector. As I said earlier, each element in the output will be equal to the sum of the values in the time-steps in the corresponding input sample. The following script creates the output vector:

Y = list() for x in X:     Y.append(x.sum())  Y = np.array(Y) print(Y) 

The output array Y looks like this:

[  6  15  24  33  42  51  60  69  78  87  96 105 114 123 132] 
Solution via Simple LSTM

Let’s now create our model with one LSTM layer.

model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(3, 1))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') 

The following script trains our model:

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1) 

Once the model is trained, we can use it to make predictions on the test data points. Let’s predict the output for the number sequence 50,51,52. The actual output should be 50 + 51 + 52 = 153. The following script converts our test points into a 3-dimensional shape and then predicts the output:

test_input = array([50,51,52]) test_input = test_input.reshape((1, 3, 1)) test_output = model.predict(test_input, verbose=0) print(test_output) 

I got 145.96 in the output, which is around 7 points less than the actual output value of 153.

Solution via Stacked LSTM

Let’s now create a complex LSTM model with multiple layers and see if we can get better results. Execute the following script to create and train a complex model with multiple LSTM and dense layers:

model = Sequential() model.add(LSTM(200, activation='relu', return_sequences=True, input_shape=(3, 1))) model.add(LSTM(100, activation='relu', return_sequences=True)) model.add(LSTM(50, activation='relu', return_sequences=True)) model.add(LSTM(25, activation='relu')) model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse')  history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1) 

Let’s now test our model on the test sequence i.e. 50,51,52:

test_output = model.predict(test_input, verbose=0) print(test_output) 

The answer I got here is 155.37, which is better than the 145.96 result that we got earlier. In this case, we have a difference of only 2 points from 153, which is the actual answer.

Solution via Bidirectional LSTM

Bidirectional LSTM is a type of LSTM which learns from the input sequence from both forward and backward directions. The final sequence interpretation is the concatenation of both forward and backward learning passes. Let’s see if we can get better results with bidirectional LSTMs.

The following script creates a bidirectional LSTM model with one bidirectional layer and one dense layer which acts as the output of the model.

from keras.layers import Bidirectional  model = Sequential() model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(3, 1))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') 

The following script trains the model and makes predictions on the test sequence which is 50, 51, and 52.

history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1) test_output = model.predict(test_input, verbose=0) print(test_output) 

The result I got is 152.26 which is just a fraction short of the actual result. Therefore, we can conclude that for our dataset, bidirectional LSTM with single layer outperforms both the single layer and stacked unidirectional LSTMs.

Many-to-one Sequence Problems with Multiple Features

In a many-to-one sequence problem we have an input where each time-steps consists of multiple features. The output can be a single value or multiple values, one per feature in the input time step. We will cover both the cases in this section.

Creating the Dataset

Our dataset will contain 15 samples. Each sample will consist of 3 time-steps. Each time-steps will have two features.

Let’s create two lists. One will contain multiples of 3 until 135 i.e. 45 elements in total. The second list will contain multiples of 5, from 1 to 225. The second list will also contain 45 elements in total. The following script creates these two lists:

X1 = np.array([x+3 for x in range(0, 135, 3)]) print(X1)  X2 = np.array([x+5 for x in range(0, 225, 5)]) print(X2) 

You can see the contents of the list in the following output:

[  3   6   9  12  15  18  21  24  27  30  33  36  39  42  45  48  51  54   57  60  63  66  69  72  75  78  81  84  87  90  93  96  99 102 105 108  111 114 117 120 123 126 129 132 135] [  5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90   95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180  185 190 195 200 205 210 215 220 225] 

Each of the above list represents one feature in the time sample. The aggregated dataset can be created by joining the two lists as shown below:

X = np.column_stack((X1, X2)) print(X) 

The output shows the aggregated dataset:

 [  6  10]  [  9  15]  [ 12  20]  [ 15  25]  [ 18  30]  [ 21  35]  [ 24  40]  [ 27  45]  [ 30  50]  [ 33  55]  [ 36  60]  [ 39  65]  [ 42  70]  [ 45  75]  [ 48  80]  [ 51  85]  [ 54  90]  [ 57  95]  [ 60 100]  [ 63 105]  [ 66 110]  [ 69 115]  [ 72 120]  [ 75 125]  [ 78 130]  [ 81 135]  [ 84 140]  [ 87 145]  [ 90 150]  [ 93 155]  [ 96 160]  [ 99 165]  [102 170]  [105 175]  [108 180]  [111 185]  [114 190]  [117 195]  [120 200]  [123 205]  [126 210]  [129 215]  [132 220]  [135 225]] 

We need to reshape our data into three dimensions so that it can be used by LSTM. We have 45 rows in total and two columns in our dataset. We will reshape our dataset into 15 samples, 3 time-steps, and two features.

X = array(X).reshape(15, 3, 2) print(X) 

You can see the 15 samples in the following output:

[[[  3   5]   [  6  10]   [  9  15]]   [[ 12  20]   [ 15  25]   [ 18  30]]   [[ 21  35]   [ 24  40]   [ 27  45]]   [[ 30  50]   [ 33  55]   [ 36  60]]   [[ 39  65]   [ 42  70]   [ 45  75]]   [[ 48  80]   [ 51  85]   [ 54  90]]   [[ 57  95]   [ 60 100]   [ 63 105]]   [[ 66 110]   [ 69 115]   [ 72 120]]   [[ 75 125]   [ 78 130]   [ 81 135]]   [[ 84 140]   [ 87 145]   [ 90 150]]   [[ 93 155]   [ 96 160]   [ 99 165]]   [[102 170]   [105 175]   [108 180]]   [[111 185]   [114 190]   [117 195]]   [[120 200]   [123 205]   [126 210]]   [[129 215]   [132 220]   [135 225]]] 

The output will also have 15 values corresponding to 15 input samples. Each value in the output will be the sum of the two feature values in the third time-step of each input sample. For instance, the third time-step of the first sample have features 9 and 15, hence the output will be 24. Similarly, the two feature values in the third time-step of the 2nd sample are 18 and 30; the corresponding output will be 48, and so on.

The following script creates and displays the output vector:

[ 24  48  72  96 120 144 168 192 216 240 264 288 312 336 360] 

Let’s now solve this many-to-one sequence problem via simple, stacked, and bidirectional LSTMs.

Solution via Simple LSTM
model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(3, 2))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1) 

The model is trained. We will create a test data point and then will use our model to make prediction on the test point.

test_input = array([[8, 51],                     [11,56],                     [14,61]])  test_input = test_input.reshape((1, 3, 2)) test_output = model.predict(test_input, verbose=0) print(test_output) 

The sum of two features of the third time-step of the input is 14 + 61 = 75. Our model with one LSTM layer predicted 73.41, which is pretty close.

Solution via Stacked LSTM

The following script trains a stacked LSTM and makes predictions on test point:

model = Sequential() model.add(LSTM(200, activation='relu', return_sequences=True, input_shape=(3, 2))) model.add(LSTM(100, activation='relu', return_sequences=True)) model.add(LSTM(50, activation='relu', return_sequences=True)) model.add(LSTM(25, activation='relu')) model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse')  history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1)  test_output = model.predict(test_input, verbose=0) print(test_output) 

The output I received is 71.56, which is worse than the simple LSTM. Seems like our stacked LSTM is overfitting.

Solution via Bidirectional LSTM

Here is the training script for simple bidirectional LSTM along with code that is used to make predictions on the test data point:

from keras.layers import Bidirectional  model = Sequential() model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(3, 2))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse')  history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1) test_output = model.predict(test_input, verbose=0) print(test_output) 

The output is 76.82 which is pretty close to 75. Again, bidirectional LSTM seems to be outperforming the rest of the algorithms.

Till now we have predicted single values based on multiple features values from different time-steps. There is another case of many-to-one sequences where you want to predict one value for each feature in the time-step. For instance, the dataset we used in this section has three time-steps and each time-step has two features. We may want to predict individual value for each feature series. The following example makes it clear, suppose we have the following input:

[[[  3   5]   [  6  10]   [  9  15]] 

In the output, we want one time-step with two features as shown below:

[12, 20] 

You can see the first value in the output is a continuation of the first series and the second value is the continuation of the second series. We can solve such problems by simply changing the number of neurons in the output dense layer to the number of features values that we want in the output. However, first we need to update our output vector Y. The input vector will remain the same:

Y = list() for x in X:     new_item = list()     new_item.append(x[2][0]+3)     new_item.append(x[2][1]+5)     Y.append(new_item)  Y = np.array(Y) print(Y) 

The above script creates an updated output vector and prints it on the console, the output looks like this:

[[ 12  20]  [ 21  35]  [ 30  50]  [ 39  65]  [ 48  80]  [ 57  95]  [ 66 110]  [ 75 125]  [ 84 140]  [ 93 155]  [102 170]  [111 185]  [120 200]  [129 215]  [138 230]] 

Let’s now train our simple, stacked and bidirectional LSTM networks on our dataset. The following script trains a simple LSTM:

model = Sequential() model.add(LSTM(50, activation='relu', input_shape=(3, 2))) model.add(Dense(2)) model.compile(optimizer='adam', loss='mse')  history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1) 

The next step is to test our model on the test data point. The following script creates a test data point:

test_input = array([[20,34],                     [23,39],                     [26,44]])  test_input = test_input.reshape((1, 3, 2)) test_output = model.predict(test_input, verbose=0) print(test_output) 

The actual output is [29, 45]. Our model predicts [29.089157, 48.469097], which is pretty close.

Let’s now train a stacked LSTM and predict the output for the test data point:

model = Sequential() model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(3, 2))) model.add(LSTM(50, activation='relu', return_sequences=True)) model.add(LSTM(25, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(2)) model.compile(optimizer='adam', loss='mse')  history = model.fit(X, Y, epochs=500, validation_split=0.2, verbose=1)  test_output = model.predict(test_input, verbose=0) print(test_output) 

The output is [29.170143, 48.688267], which is again very close to actual output.

Finally, we can train our bidirectional LSTM and make prediction on the test point:

from keras.layers import Bidirectional  model = Sequential() model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(3, 2))) model.add(Dense(2)) model.compile(optimizer='adam', loss='mse')  history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1) test_output = model.predict(test_input, verbose=0) print(test_output) 

The output is [29.2071, 48.737988].

You can see once again that bidirectional LSTM makes the most accurate prediction.

Conclusion

Simple neural networks are not suitable for solving sequence problems since in sequence problems, in addition to current input, we need to keep track of the previous inputs as well. Neural Networks with some sort of memory are more suited to solving sequence problems. LSTM is one such network.

In this article, we saw how different variants of the LSTM algorithm can be used to solve one-to-one and many-to-one sequence problems. This is the first part of the article. In the second part, we will see how to solve one-to-many and many-to-many sequence problems. We will also study encoder decoder mechanism that is most commonly used to create chatbots. Till then, happy coding 🙂


Planet Python

How To Construct For Loops in Go

Introduction

In computer programming, a loop is a code structure that loops around to repeatedly execute a piece of code, often until some condition is met. Using loops in computer programming allows you to automate and repeat similar tasks multiple times. Imagine if you had a list of files that you needed to process, or if you wanted to count the number of lines in an article. You would use a loop in your code to solve these types of problems.

In Go, a for loop implements the repeated execution of code based on a loop counter or loop variable. Unlike other programming languages that have multiple looping constructs such as while, do, etc., Go only has the for loop. This serves to make your code clearer and more readable, since you do not have to worry with multiple strategies to achieve the same looping construct. This enhanced readability and decreased cognitive load during development will also make your code less prone to error than in other languages.

In this tutorial, you will learn how Go’s for loop works, including the three major variations of its use. We’ll start by showing how to create different types of for loops, followed by how to loop through sequential data types in Go. We’ll end by explaining how to use nested loops.

Declaring ForClause and Condition Loops

In order to account for a variety of use cases, there are three distinct ways to create for loops in Go, each with their own capabilities. These are to create a for loop with a Condition, a ForClause, or a RangeClause. In this section, we will explain how to declare and use the ForClause and Condition variants.

Let’s look at how we can use a for loop with the ForClause first.

A ForClause loop is defined as having an initial statement, followed by a condition, and then a post statement. These are arranged in the following syntax:

for [ Initial Statement ] ; [ Condition ] ; [ Post Statement ] {     [Action] } 

To explain what the preceding components do, let’s look at a for loop that increments through a specified range of values using the ForClause syntax:

for i := 0; i < 5; i++ {     fmt.Println(i) } 

Let’s break this loop down and identify each part.

The first part of the loop is i := 0. This is the initial statement:

for i := 0; i < 5; i++ {     fmt.Println(i) } 

It states that we are declaring a variable called i, and setting the initial value to 0.

Next is the condition:

for i := 0; i < 5; i++ {     fmt.Println(i) } 

In this condition, we stated that while i is less than the value of 5, the loop should continue looping.

Finally, we have the post statement:

for i := 0; i < 5; i++ {     fmt.Println(i) } 

In the post statement, we increment the loop variable i up by one each time an iteration occurs using the i++ increment operator.

When we run this program, the output looks like this:

Output
0 1 2 3 4

The loop ran 5 times. Initially, it set i to 0, and then checked to see if i was less than 5. Since the value of i was less than 5, the loop executed and the action of fmt.Println(i) was executed. After the loop finished, the post statement of i++ was called, and the value of i was incremented by 1.

Note: Keep in mind that in programming we tend to begin at index 0, so that is why although 5 numbers are printed out, they range from 0-4.

We aren’t limited to starting at 0 or ending at a specified value. We can assign any value to our initial statement, and also stop at any value in our post statement. This allows us to create any desired range to loop through:

for i := 20; i < 25; i++ {     fmt.Println(i) } 

Here, the iteration goes from 20 (inclusive) to 25 (exclusive), so the output looks like this:

Output
20 21 22 23 24

We can also use our post statement to increment at different values. This is similar to step in other languages:

First, let’s use a post statement with a positive value:

for i := 0; i < 15; i += 3 {     fmt.Println(i) } 

In this case, the for loop is set up so that the numbers from 0 to 15 print out, but at an increment of 3, so that only every third number is printed, like so:

Output
0 3 6 9 12

We can also use a negative value for our post statement argument to iterate backwards, but we’ll have to adjust our initial statement and condition arguments accordingly:

for i := 100; i > 0; i -= 10 {     fmt.Println(i) } 

Here, we set i to an initial value of 100, use the condition of i < 0 to stop at 0, and the post statement decrements the value by 10 with the -= operator. The loop begins at 100 and ends at 0, decreasing by 10 with each iteration. We can see this occur in the output:

Output
100 90 80 70 60 50 40 30 20 10

You can also exclude the initial statement and the post statement from the for syntax, and only use the condition. This is what is known as a Condition loop:

i := 0 for i < 5 {     fmt.Println(i)     i++ } 

This time, we declared the variable i separately from the for loop in the preceding line of code. The loop only has a condition clause that checks to see if i is less than 5. As long as the condition evaluates to true, the loop will continue to iterate.

Sometimes you may not know the number of iterations you will need to complete a certain task. In that case, you can omit all statements, and use the break keyword to exit execution:

for {     if someCondition {         break     }     // do action here } 

An example of this may be if we are reading from an indeterminately sized structure like a buffer and we don’t know when we will be done reading:

buffer.go
package main  import (     "bytes"     "fmt"     "io" )  func main() {     buf := bytes.NewBufferString("one\ntwo\nthree\nfour\n")      for {         line, err := buf.ReadString('\n')         if err != nil {             if err == io.EOF {                  fmt.Print(line)                 break             }             fmt.Println(err)             break         }         fmt.Print(line)     } } 

In the preceding code, buf :=bytes.NewBufferString("one\ntwo\nthree\nfour\n") declares a buffer with some data. Because we don’t know when the buffer will finish reading, we create a for loop with no clause. Inside the for loop, we use line, err := buf.ReadString('\n') to read a line from the buffer and check to see if there was an error reading from the buffer. If there was, we address the error, and use the break keyword to exit the for loop. With these break points, you do not need to include a condition to stop the loop.

In this section, we learned how to declare a ForClause loop and use it to iterate through a known range of values. We also learned how to use a Condition loop to iterate until a specific condition was met. Next, we’ll learn how the RangeClause is used for iterating through sequential data types.

Looping Through Sequential Data Types with RangeClause

It is common in Go to use for loops to iterate over the elements of sequential or collection data types like slices, arrays, and strings. To make it easier to do so, we can use a for loop with RangeClause syntax. While you can loop through sequential data types using the ForClause syntax, the RangeClause is cleaner and easier to read.

Before we look at using the RangeClause, let’s look at how we can iterate through a slice by using the ForClause syntax:

main.go
package main  import "fmt"  func main() {     sharks := []string{"hammerhead", "great white", "dogfish", "frilled", "bullhead", "requiem"}      for i := 0; i < len(sharks); i++ {         fmt.Println(sharks[i])     } } 

Running this will give the following output, printing out each element of the slice:

Output
hammerhead great white dogfish frilled bullhead requiem

Now, let’s use the RangeClause to perform the same set of actions:

main.go
package main  import "fmt"  func main() {     sharks := []string{"hammerhead", "great white", "dogfish", "frilled", "bullhead", "requiem"}      for i, shark := range sharks {         fmt.Println(i, shark)     } } 

In this case, we are printing out each item in the list. Though we used the variables i and shark, we could have called the variable any other valid variable name and we would get the same output:

Output
0 hammerhead 1 great white 2 dogfish 3 frilled 4 bullhead 5 requiem

When using range on a slice, it will always return two values. The first value will be the index that the current iteration of the loop is in, and the second is the value at that index. In this case, for the first iteration, the index was 0, and the value was hammerhead.

Sometimes, we only want the value inside the slice elements, not the index. If we change the preceding code to only print out the value however, we will receive a compile time error:

main.go
package main  import "fmt"  func main() {     sharks := []string{"hammerhead", "great white", "dogfish", "frilled", "bullhead", "requiem"}      for i, shark := range sharks {         fmt.Println(shark)     } } 
Output
src/range-error.go:8:6: i declared and not used

Because i is declared in the for loop, but never used, the compiler will respond with the error of i declared and not used. This is the same error that you will receive in Go any time you declare a variable and don’t use it.

Because of this, Go has the blank identifier which is an underscore (_). In a for loop, you can use the blank identifier to ignore any value returned from the range keyword. In this case, we want to ignore the index, which is the first argument returned.

main.go
package main  import "fmt"  func main() {     sharks := []string{"hammerhead", "great white", "dogfish", "frilled", "bullhead", "requiem"}      for _, shark := range sharks {         fmt.Println(shark)     } } 
Output
hammerhead great white dogfish frilled bullhead requiem

This output shows that the for loop iterated through the slice of strings, and printed each item from the slice without the index.

You can also use range to add items to a list:

main.go
package main  import "fmt"  func main() {     sharks := []string{"hammerhead", "great white", "dogfish", "frilled", "bullhead", "requiem"}      for range sharks {         sharks = append(sharks, "shark")     }      fmt.Printf("%q\n", sharks) } 
Output
['hammerhead', 'great white', 'dogfish', 'frilled', 'bullhead', 'requiem', 'shark', 'shark', 'shark', 'shark', 'shark', 'shark']

Here, we have added a placeholder string of "shark" for each item of the length of the sharks slice.

Notice that we didn’t have to use the blank identifier _ to ignore any of the return values from the range operator. Go allows us to leave out the entire declaration portion of the range statement if we don’t need to use either of the return values.

We can also use the range operator to fill in values of a slice:

main.go
package main  import "fmt"  func main() {     integers := make([]int, 10)     fmt.Println(integers)      for i := range integers {         integers[i] = i     }      fmt.Println(integers) } 

In this example, the slice integers is initialized with ten empty values, but the for loop sets all the values in the list like so:

Output
[0 0 0 0 0 0 0 0 0 0] [0 1 2 3 4 5 6 7 8 9]

The first time we print the value of the slice integers, we see all zeros. Then we iterate through each index and set the value to the current index. Then when we print the value of integers a second time, showing that they all now have a value of 0 through 9.

We can also use the range operator to iterate through each character in a string:

main.go
package main  import "fmt"  func main() {     sammy := "Sammy"      for _, letter := range sammy {         fmt.Printf("%c\n", letter)     } } 
Output
S a m m y

When iterating through a map, range will return both the key and the value:

main.go
package main  import "fmt"  func main() {     sammyShark := map[string]string{"name": "Sammy", "animal": "shark", "color": "blue", "location": "ocean"}      for key, value := range sammyShark {         fmt.Println(key + ": " + value)     } } 
Output
color: blue location: ocean name: Sammy animal: shark

Note: It is important to note that the order in which a map returns is random. Each time you run this program you may get a different result.

Now that we have learned how to iterate over sequential data with range for loops, let’s look at how to use loops inside of loops.

Nested For Loops

Loops can be nested in Go, as they can with other programming languages. Nesting is when we have one construct inside of another. In this case, a nested loop is a loop that occurs within another loop. These can be useful when you would like to have a looped action performed on every element of a data set.

Nested loops are structurally similar to nested if statements. They are constructed like so:

for {     [Action]     for {         [Action]       } } 

The program first encounters the outer loop, executing its first iteration. This first iteration triggers the inner, nested loop, which then runs to completion. Then the program returns back to the top of the outer loop, completing the second iteration and again triggering the nested loop. Again, the nested loop runs to completion, and the program returns back to the top of the outer loop until the sequence is complete or a break or other statement disrupts the process.

Let’s implement a nested for loop so we can take a closer look. In this example, the outer loop will iterate through a slice of integers called numList, and the inner loop will iterate through a slice of strings called alphaList.

main.go
package main  import "fmt"  func main() {     numList := []int{1, 2, 3}     alphaList := []string{"a", "b", "c"}      for _, i := range numList {         fmt.Println(i)         for _, letter := range alphaList {             fmt.Println(letter)         }     } } 

When we run this program, we’ll receive the following output:

Output
1 a b c 2 a b c 3 a b c

The output illustrates that the program completes the first iteration of the outer loop by printing 1, which then triggers completion of the inner loop, printing a, b, c consecutively. Once the inner loop has completed, the program returns to the top of the outer loop, prints 2, then again prints the inner loop in its entirety (a, b, c), etc.

Nested for loops can be useful for iterating through items within slices composed of slices. In a slice composed of slices, if we use just one for loop, the program will output each internal list as an item:

main.go
package main  import "fmt"  func main() {     ints := [][]int{         []int{0, 1, 2},         []int{-1, -2, -3},         []int{9, 8, 7},     }      for _, i := range ints {         fmt.Println(i)     } } 
Output
[0 1 2] [-1 -2 -3] [9 8 7]

In order to access each individual item of the internal slices, we’ll implement a nested for loop:

main.go
package main  import "fmt"  func main() {     ints := [][]int{         []int{0, 1, 2},         []int{-1, -2, -3},         []int{9, 8, 7},     }      for _, i := range ints {         for _, j := range i {             fmt.Println(j)         }     } } 
Output
0 1 2 -1 -2 -3 9 8 7

When we use a nested for loop here, we are able to iterate over the individual items contained in the slices.

Conclusion

In this tutorial we learned how to declare and use for loops to solve for repetitive tasks in Go. We also learned the three different variations of a for loop and when to use them. To learn more about for loops and how to control the flow of them, read Using Break and Continue Statements When Working with Loops in Go.

DigitalOcean Community Tutorials

Dashboard Confessions: Being a Data Nonconformist

I have no formal background in data or statistics. The skills and experience I’ve gained have been a direct result of working in analytics, figuring it out as I’ve gone along and working hard to solve problems about which I’m passionate. Big data, machine learning, complex algorithms—these are all now common associations with the world of analytics. While I have colleagues who specialize in these technical areas of our work, my journey with data has been unconventional. I often describe my own work as “finding the story” within a dataset.

The Art of Data Science

Data and narrative may seem like an odd pairing. Typically, we think of data as a hard science, primarily a left-brained activity. We associate stories with the arts and creative right-brained activities, and the idea that these can coexist can be difficult to grasp. Although some areas of data analytics are certainly more science than art, we live in a world where data is so common that we’ve had to democratize the analytics process.

Gone are the days when a small group of coders are the only ones trusted to develop insights. In many companies today, everyone is expected to analyze data, requiring even more overlap and nontraditional approaches to data. Tools like Tableau and Alteryx have proven invaluable for turning ordinary people into analysts, but jumping into analytics without the typical background can still feel like a bit of a leap.

In my last job, I worked as an analyst for KIPP DC, a national network for public charter schools. I helped district leaders, principals and teachers understand and interpret data about our schools so that we could make them stronger. The idea of writing a formula in Excel made most of these people cringe. But I quickly discovered that the key to success in my role was not being able to write complex calculations or model polynomial regressions. Rather, it was something far simpler: understanding what insights people needed and telling a story with data that they could understand. In some regards, this required me to take a step back from the typical left-brained approach to analysis and assume a more organic and creative posture—it wasn’t just “What does the data say?” but it became “What is the data prompting us to do?”.

But I quickly discovered that the key to success in my role was not being able to write complex calculations or model polynomial regressions. Rather, it was something far simpler: understanding what insights people needed and telling a story with data that they could understand.

Thinking About Data in Terms of Design

In this same vein of blurring the lines between science and art, left brain and right brain, I’ve recently been learning about design thinking. In the simplest of terms, it’s just understanding what users need and how to meet those needs in an intuitive and enjoyable way. Although I didn’t realize it at the time, in retrospect, I’ve realized that my early approach to data analytics was through the lens of design thinking. I was constantly asking myself, “What are people going to do with this data?” and I tried to anticipate their next question and what other data they might need. I put myself in the shoes of the user and developed all my dashboards with them in mind. Many times, this took me outside the stereotypical box of an analyst and forced me to consider the data differently.

A couple months into my job at KIPP, I overheard a specific goal that district leaders had for our schools. Although we already had a dashboard with data related to this goal, I found myself running into dead ends and unanswered questions when I used it. So I added several new ways of breaking down the data. With each new graph I built, I asked myself, “What can someone do with this data?” and if that answer didn’t lead to practical action, I knew my work wasn’t done and found a way to make it better. This stands in direct contrast to how some people perceive data analysis: inputting data, yielding results and communicating them as is. However, I’ve learned that data analysis isn’t like this—especially at InterWorks. It’s much more about trying different things over and over, thinking about data in ways you never have before in order to glean meaningful insights and deliver better reports than ever before.

Above: Me with some KIPP students at the Washington Monument

Crafting a Data Narrative

When I presented the updated dashboard to school leaders, they were excited about the new possibilities it unlocked. My graphs led to insights that started conversations that produced better performing schools. But none of this was the result of something I had learned in a statistics class. Instead, it was simply the result of a mindset that anyone can develop: understanding your audience and designing solutions with their specific needs in mind.

Every organization has unique questions it needs its data to answer, and more often than not, getting to those answers isn’t a straight and narrow path. It’s one that necessitates adaptability, living in the grey rather than sticking to black or white, and it invites creativity. While my path into analytics wasn’t what the textbook may have prescribed, I’m thankful for the outsider’s perspective it lent me. It makes balancing art and science easier, it allows me the freedom to bend and not break, and it more closely resembles what big data really looks like today: a blend of beautiful design and actionable insights—a unifying of both sides of the brain.

The post Dashboard Confessions: Being a Data Nonconformist appeared first on InterWorks.

InterWorks

Adventures in Dashboarding: A Summer of Real-World Tableau Training

Over the course of the summer, the InterWorks team of business intelligence interns was provided with extensive training and internal work that helped them gain extremely valuable skills. These individuals—Fisher Ankey, JP Urrutia and Andrew Langford—teamed up on this blog post to chronicle their work over the summer.

Working with the Stillwater Chamber of Commerce

This summer, we also had the opportunity to conduct client work and provide solutions to an organization that was looking for a better understanding of their operation. That client was the Stillwater Chamber of Commerce.

In our initial meeting, we learned that the Chamber was looking to improve reporting across their organization in areas such as finance, community development and social media. Tableau was the obvious choice as the tool that could provide them the reporting power they needed, so we divided the work between the three of us. This process included collecting data from different sources, cleaning that data and positioning it in a structure that was usable for reporting. With Tableau, we created multiple dashboards, each one tailored to specific needs of the Chamber’s departments.

Finance

The finance portion of this project included connecting to a couple of different data sources, such as QuickBooks Online and Chamber Master. The QuickBooks Online data was used to view the Chamber’s income and expenses over time, while the Chamber Master data was used to report on memberships and retention within their organization.

QuickBooks Online

Using QuickBooks Online as a data source proved to be advantageous because Tableau has an embedded web data connector that allows users to sign in with their QuickBooks Online credentials and import the data into the workbook. This data was used to produce dashboards containing income and expense reports, as well as profit and loss, for the Chamber.

Each dashboard contains features to perform a deeper dive into the data through action filters and buttons. Each of these views gives the Chamber a quick, easy-to-access look at their finances at a YTD or month-by-month view. Lastly, because of the data connection to QuickBooks, the dashboards will update dynamically and eliminate the need for manual updates in the future.

Chamber Master

The Chamber Master site contained data regarding membership and retention. It is a quick overview of how the Chamber is tracking member count. Getting this data was not as easy as connecting to the data source in Tableau. However, Chamber Master allows a user to download report spreadsheets in a customizable format for the desired view.  After exporting the data, the tables were combined using a union.

Just like the QuickBooks online dashboard, each of these dashboards contains filters and actions that allow the user to get a more in-depth understanding of the data. These reports were saved in Chamber Master and will allow the customer to export and reproduce future data with the same structure.

Community Dashboards

Along with financial information, our team also created a set of community dashboards to provide the Chamber a view of the Stillwater community at different levels. These levels include demographics, education, housing, workforce, safety and recreation. The data for these various community levels was collected from a lineup of private and public sector sources, such as the U.S. Census Bureau, U.S. Bureau of Labor Statistics, workreadycommunities.org and more:

A task that proved just as important as gathering and cleaning the data was documenting how to access and update it. Each year, the various sources listed update their information to report upon the year prior. That means once a year, the Chamber of Commerce will have to manually update the underlying data that drives these dashboards. Just like the financial dashboards, the Chamber was provided documentation with step-by-step instructions on how to perform these updates.

Social Media

Lastly, the Stillwater Chamber of Commerce also wanted to understand the behavior and performance of each of the social media platforms the organization was using. So, we provided solutions regarding platforms such as Google Analytics, Facebook, Instagram and Twitter.

Google Analytics

The Google Analytics dashboards provide information and insights regarding the Stillwater Chamber website and Grow Stillwater Google Analytics data. The dashboards contain data of the trailing 18 months to the current date and provide metrics at the website, page, device and location levels, including pageviews, sessions, bounce rate, time on page and more.

The same templates are built for both dashboards. Both connect to the Google Analytics web data connector built into Tableau, providing a cheap and simple way to extract data into the workbook. The data connector updates when the user updates the date parameter in the workbook’s data source tab. Each dashboard contains across-dashboard filtering that allows the user to specify web pages and see KPIs for a complete drilldown analysis:

The data was exported from the Facebook Insights page. To add more data, the client would go to the Facebook profile > Insights tab > export data for each level at a specified date range. The downloaded file can be added by a union through Tableau or copy/paste in a Microsoft Excel sheet.

Facebook

The Facebook dashboard provides information and insights regarding the Stillwater Chamber Facebook data. The dashboard contains data of the trailing 18 months to the current date and provides metrics at post, page and video levels, including page likes, post impressions, post engagement rates and more. These metrics show monthly performance for metrics such as comments, engagement rate and unique video views. Page likes and total impressions provide total numbers, but all of these are accompanied by monthly percent changes for the most recent month:

The data was exported from the Facebook insights page. To add more data, the client would go to the Facebook profile > Insights tab > export data for each level at a specified date range. The downloaded file can be added by a union through Tableau or copy/paste in a Microsoft Excel sheet. 

Twitter

The Twitter dashboard provides information and insight regarding the Stillwater Chamber Twitter page data. The dashboard contains data of the trailing 5 months to the current date and provides metrics at page and post levels including tweets, retweets, post engagement rate, and post URL clicks. These metrics show monthly performance for metrics such as tweets, retweets, engagement rate and URL clicks. These are accompanied by monthly percent changes for the most recent month, as well as weekly trend visuals:

The data was exported from the Twitter data request. To add more data, the client would request the data through the Twitter Analytics page. They then select their date range and export the data. The downloaded file can be added through a union through Tableau or copy/pasted in a Microsoft Excel sheet. 

Instagram

The Instagram dashboard provides information and insights regarding the Stillwater Chamber Instagram page data. The dashboard contains all the Instagram page data up to the current date and provides metrics post by post, including total and monthly performance for things such as likes and comments. These are accompanied by monthly percent changes for the most recent month, as well as weekly trend visuals:

The data is connected through a web data connector that connects the client’s Instagram account to Tableau. This connector provides all the data connected to the Instagram account and updates dynamically. 

Moving Forward

The best way for the Chamber to move forward could be to implement the Matillion ETL tool for Snowflake. This would help provide a more efficient way of collecting the Chamber’s data by connecting to each social media platform’s Application User Interface (API) in the Matillion environment.

Through this ETL pipeline, a workflow in Matillion would connect to the right API endpoints and store the data in Snowflake to provide the client with cloud storage space to keep the data and access it whenever they need. The goal would be for the ETL pipeline to record the data and be placed on a schedule that could update as frequently as the client needed. Moreover, the tables that are created and loaded into Snowflake could easily connect to a Tableau workbook and update the data source on its own due to the Matillion components. It is a costly solution, but it is a solution that could apply to other data solutions that the Chamber uses, such as Quickbooks and Salesforce.

Looking Back on a Summer of Work

The current solutions are dashboards that provide the Chamber a deep dive into the different areas of their organization and give them a granular look at multiple platforms. These Tableau dashboards also give the client an idea of organizational resource allocation, community standards and needs, and a look at what content they are using to impact their community. The current dashboards are easy to use, insightful and provide a good start for the Stillwater Chamber of Commerce; however, with the addition of ETL tools, this solution could really be improved and made easier to use for the client.

Spending time to learn and work with these tools and implement them into a real-world project like the Stillwater Chamber of Commerce proved to be an incredible work experience. Our team dedicated a great amount of time and effort to this project, and we were happy to deliver our best solutions to such a fantastic client.

The post Adventures in Dashboarding: A Summer of Real-World Tableau Training appeared first on InterWorks.

InterWorks