Stack Abuse: Python for NLP: Creating Bag of Words Model from Scratch

This is the 13th article in my series of articles on Python for NLP. In the previous article, we saw how to create a simple rule-based chatbot that uses cosine similarity between the TF-IDF vectors of the words in the corpus and the user input, to generate a response. The TF-IDF model was basically used to convert word to numbers.

In this article, we will study another very useful model that converts text to numbers i.e. the Bag of Words (BOW).

Since most of the statistical algorithms, e.g machine learning and deep learning techniques, work with numeric data, therefore we have to convert text into numbers. Several approaches exist in this regard. However, the most famous ones are Bag of Words, TF-IDF, and word2vec. Though several libraries exist, such as Scikit-Learn and NLTK, which can implement these techniques in one line of code, it is important to understand the working principle behind these word embedding techniques. The best way to do so is to implement these techniques from scratch in Python and this is what we are going to do today.

In this article, we will see how to implement the Bag of Words approach from scratch in Python. In the next article, we will see how to implement the TF-IDF approach from scratch in Python.

Before coding, let’s first see the theory behind the bag of words approach.

Theory Behind Bag of Words Approach

To understand the bag of words approach, let’s first start with the help of an example.

Suppose we have a corpus with three sentences:

  • “I like to play football”
  • “Did you go outside to play tennis”
  • “John and I play tennis”

Now if we have to perform text classification, or any other task, on the above data using statistical techniques, we can not do so since statistical techniques work only with numbers. Therefore we need to convert these sentences into numbers.

Step 1: Tokenize the Sentences

The first step in this regard is to convert the sentences in our corpus into tokens or individual words. Look at the table below:

Sentence 1 Sentence 2 Sentence 3
I Did John
like you and
to go I
play outside play
football to tennis
play
tennis

Step 2: Create a Dictionary of Word Frequency

The next step is to create a dictionary that contains all the words in our corpus as keys and the frequency of the occurrence of the words as values. In other words, we need to create a histogram of the words in our corpus. Look at the following table:

Word Frequency
I 2
like 1
to 2
play 3
football 1
Did 1
you 1
go 1
outside 1
tennis 2
John 1
and 1

In the table above, you can see each word in our corpus along with its frequency of occurrence. For instance, you can see that since the word play occurs three times in the corpus (once in each sentence) its frequency is 3.

In our corpus, we only had three sentences, therefore it is easy for us to create a dictionary that contains all the words. In the real world scenarios, there will be millions of words in the dictionary. Some of the words will have a very small frequency. The words with very small frequency are not very useful, hence such words are removed. One way to remove the words with less frequency is to sort the word frequency dictionary in the decreasing order of the frequency and then filter the words having a frequency higher than a certain threshold.

Let’s sort our word frequency dictionary:

Word Frequency
play 3
tennis 2
to 2
I 2
football 1
Did 1
you 1
go 1
outside 1
like 1
John 1
and 1

Step 3: Creating the Bag of Words Model

To create the bag of words model, we need to create a matrix where the columns correspond to the most frequent words in our dictionary where rows correspond to the document or sentences.

Suppose we filter the 8 most occurring words from our dictionary. Then the document frequency matrix will look like this:

Play Tennis To I Football Did You go
Sentence 1 1 0 1 1 1 0 0 0
Sentence 2 1 1 1 0 0 1 1 1
Sentence 3 1 1 0 1 0 0 0 0

It is important to understand how the above matrix is created. In the above matrix, the first row corresponds to the first sentence. In the first, the word “play” occurs once, therefore we added 1 in the first column. The word in the second column is “Tennis”, it doesn’t occur in the first sentence, therefore we added a 0 in the second column for sentence 1. Similarly, in the second sentence, both the words “Play” and “Tennis” occur once, therefore we added 1 in the first two columns. However, in the fifth column, we add a 0, since the word “Football” doesn’t occur in the second sentence. In this way, all the cells in the above matrix are filled with either 0 or 1, depending upon the occurrence of the word. Final matrix corresponds to the bag of words model.

In each row, you can see the numeric representation of the corresponding sentence. For instance, the first row shows the numeric representation of Sentence 1. This numeric representation can now be used as input to the statistical models.

Enough of the theory, let’s implement our very own bag of words model from scratch.

Bag of Words Model in Python

The first thing we need to create our Bag of Words model is a dataset. In the previous section, we manually created a bag of words model with three sentences. However, real-world datasets are huge with millions of words. The best way to find a random corpus is Wikipedia.

In the first step, we will scrape the Wikipedia article on Natural Language Processing. But first, let’s import the required libraries:

import nltk   import numpy as np   import random   import string  import bs4 as bs   import urllib.request   import re   

As we did in the previous article, we will be using the Beautifulsoup4 library to parse the data from Wikipedia. Furthermore, Python’s regex library, re, will be used for some preprocessing tasks on the text.

Next, we need to scrape the Wikipedia article on natural language processing.

raw_html = urllib.request.urlopen('https://en.wikipedia.org/wiki/Natural_language_processing')   raw_html = raw_html.read()  article_html = bs.BeautifulSoup(raw_html, 'lxml')  article_paragraphs = article_html.find_all('p')  article_text = ''  for para in article_paragraphs:       article_text += para.text 

In the script above, we import the raw HTML for the Wikipedia article. From the raw HTML, we filter the text within the paragraph text. Finally, we create a complete corpus by concatenating all the paragraphs.

The next step is to split the corpus into individual sentences. To do so, we will use the sent_tokenize function from the NLTK library.

corpus = nltk.sent_tokenize(article_text)   

Our text contains punctuations. We don’t want punctuations to be the part of our word frequency dictionary. In the following script, we first convert our text into lower case and then will remove the punctuation from our text. Removing punctuation can result in multiple empty spaces. We will remove the empty spaces from the text using regex.

Look at the following script:

for i in range(len(corpus )):       corpus [i] = corpus [i].lower()     corpus [i] = re.sub(r'\W',' ',corpus [i])     corpus [i] = re.sub(r'\s+',' ',corpus [i]) 

In the script above, we iterate through each sentence in the corpus, convert the sentence to lower case, and then remove the punctuation and empty spaces from the text.

Let’s find out the number of sentences in our corpus.

print(len(corpus))   

The output shows 49.

Let’s print one sentence from our corpus:

print(corpus[30])   

Output:

in the 2010s representation learning and deep neural network style machine learning methods became widespread in natural language processing due in part to a flurry of results showing that such techniques 4 5 can achieve state of the art results in many natural language tasks for example in language modeling 6 parsing 7 8 and many others   

You can see that the text doesn’t contain any special character or multiple empty spaces.

Now we have our own corpus. The next step is to tokenize the sentences in the corpus and create a dictionary that contains words and their corresponding frequencies in the corpus. Look at the following script:

wordfreq = {}   for sentence in corpus:       tokens = nltk.word_tokenize(sentence)     for token in tokens:         if token not in wordfreq.keys():             wordfreq[token] = 1         else:             wordfreq[token] += 1 

In the script above we created a dictionary called wordfreq. Next, we iterate through each sentence in the corpus. The sentence is tokenized into words. Next, we iterate through each word in the sentence. If the word doesn’t exist in the wordfreq dictionary, we will add the word as the key and will set the value of the word as 1. Otherwise, if the word already exists in the dictionary, we will simply increment the key count by 1.

If you are executing the above in the Spyder editor like me, you can go the variable explorer on the right and click wordfreq variable. You should see a dictionary like this:

You can see words in the “Key” column and their frequency of occurrences in the “Value” column.

As I said in the theory section, depending upon the task at hand, not all of the words are useful. In huge corpora, you can have millions of words. We can filter the most frequently occurring words. Our corpus has 535 words in total. Let us filter down to the 200 most frequently occurring words. To do so, we can make use of Python’s heap library.

Look at the following script:

import heapq   most_freq = heapq.nlargest(200, wordfreq, key=wordfreq.get)   

Now our most_freq list contains 200 most frequently occurring words along with their frequency of occurrence.

The final step is to convert the sentences in our corpus into their corresponding vector representation. The idea is straightforward, for each word in the most_freq dictionary if the word exists in the sentence, a 1 will be added for the word, else 0 will be added.

sentence_vectors = []   for sentence in corpus:       sentence_tokens = nltk.word_tokenize(sentence)     sent_vec = []     for token in most_freq:         if token in sentence_tokens:             sent_vec.append(1)         else:             sent_vec.append(0)     sentence_vectors.append(sent_vec) 

In the script above we create an empty list sentence_vectors which will store vectors for all the sentences in the corpus. Next, we iterate through each sentence in the corpus and create an empty list sent_vec for the individual sentences. Similarly, we also tokenize the sentence. Next, we iterate through each word in the most_freq list and check if the word exists in the tokens for the sentence. If the word is a part of the sentence, 1 is appended to the individual sentence vector sent_vec, else 0 is appended. Finally, the sentence vector is added to the list sentence_vectors which contains vectors for all the sentences. Basically, this sentence_vectors is our bag of words model.

However, the bag of words model that we saw in the theory section was in the form of a matrix. Our model is in the form of a list of lists. We can convert our model into matrix form using this script:

sentence_vectors = np.asarray(sentence_vectors)   

Basically, in the following script, we converted our list into a two-dimensional numpy array using asarray function. Now if you open the sentence_vectors variable in the variable explorer of the Spyder editor, you should see the following matrix:

You can see the Bag of Words model containing 0 and 1.

Conclusion

Bag of Words model is one of the three most commonly used word embedding approaches with TF-IDF and Word2Vec being the other two.

In this article, we saw how to implement the Bag of Words approach from scratch in Python. The theory of the approach has been explained along with the hands-on code to implement the approach. In the next article, we will see how to implement the TF-IDF approach from scratch in Python.

Planet Python

Mike Driscoll: Book Contest: Creating GUI Applications with wxPython

Last month, I released a new book entitled Creating GUI Applications with wxPython. In celebration of a successful launch, I have decided to do a little contest.

Cover art for Creating GUI Applications with wxPython

Rules

  • Tweet about the contest and include my handle: @driscollis
  • Send me a direct message on Twitter or via my contact form with a link to your Tweet
  • If you don’t have Twitter, feel free to message me through the website and I’ll enter you anyway

The contest will run starting now until Friday, June 21st @ 11:59 p.m. CST.

Runners up will receive a free copy of the eBook. The grand prize will be a signed paperback copy + the eBook version!

The post Book Contest: Creating GUI Applications with wxPython appeared first on The Mouse Vs. The Python.

Planet Python

Stack Abuse: Creating and Importing Modules in Python

Introduction

In Python, a module is a self-contained file with Python statements and definitions. For example, file.py, can be considered a module named file. This differs from a package in that a package is a collection of modules in directories that give structure and hierarchy to the modules.

Modules help us break down large programs into small files that are more manageable. With modules, code reusability becomes a reality. Suppose we have a function that is frequently used in different programs. We can define this function in a module then import it into the various programs without having to copy its code each time.

In this article, we will see how to create Python modules and how to use them in Python code.

Writing Modules

A module is simply a Python file with the .py extension. The name of the file becomes the module name. Inside the file, we can have definitions and implementations of classes, variables, or functions. These can then be used in other Python programs.

Let us begin by creating a function that simply prints “Hello World”. To do this, create a new Python file and save it as hello.py. Add the following code to the file:

def my_function():       print("Hello World") 

If you run the above code, it will return nothing. This is because we have not told the program to do anything. It is true that we have created a function named my_function() within the code, but we have not called or invoked the function. When invoked, this function should print the text “Hello World”.

Now, move to the same directory where you have saved the above file and create a new file named main.py. Add the following code to the file:

import hello  hello.my_function()   

Output

Hello World   

The function was invoked successfully. We began by importing the module. The name of the file was hello.py, hence the name of the imported module is hello.

Also, note the syntax that we have used to invoke the function. This is called the “dot notation”, which allows us to call the function by first specifying the module name, and then the name of the function.

However, that is just one way of importing the module and invoking the function. We could have done it as follows:

from hello import my_function  my_function()   

Output

Hello World   

In the above example, the first line commands the Python interpreter to import a function named my_function from a module named hello. In such a case, you don’t have to use the dot notation to access the function, you can just call it directly.

However, in the case where our hello module has multiple functions, the statement from hello import my_function will not import all hello‘s functions into our program, only my_function. If you attempt to access any other function, an error will be generated. You have to import the whole module or import each individual functions in order to use them.

We can define a variable within a module, which can then be used by other modules. To demonstrate this, open the file hello.py and add the following code to it:

def my_function():       print("Hello World")  # The variable that'll be used in other modules name = "Nicholas"   

Now, open the main.py file and modify it as follows:

import hello  hello.my_function()  print(hello.name)   

Output

Hello World   Nicholas   

We have successfully invoked both the function and the variable defined in the module since we imported the whole module instead of just the my_function() function.

We stated earlier that we can define a class within a module. Let’s see how to do this in the next example. Open the hello.py file and modify it as follows:

def my_function():       print("Hello World")  # Defining our variable name = "Nicholas"  # Defining a class class Student:       def __init__(self, name, course):         self.course = course         self.name = name      def get_student_details(self):         print("Your name is " + self.name + ".")         print("You are studying " + self.course) 

Here we have defined a class named Student. Two variables have been defined in this class, name and course. The method get_student_details() has also been defined within this, which prints the student details to the console.

Now, open the file main.py and modify it as follows:

import hello  hello.my_function()  print(hello.name)  nicholas = hello.Student("Nicholas", "Computer Science")   nicholas.get_student_details()   

Output

Hello World   Nicholas   Your name is Nicholas.   You are studying Computer Science   

In the script above, we again used the dot notation to create an object of the student class from the hello module. We then used the get_student_details() function to get the student details.

Although modules mostly consist of class definitions (in most cases), it is possible for them to actually run their own code as well when imported. To demonstrate this, let us modify the hello.py file, where we have a definition of the function my_function(), along with the call to the function:

def my_function():       print("Hello World")  my_function()   

Now, open the file main.py and delete all the lines except the following:

import hello   

Output

Hello World   

The above output shows that we defined and called the function within the module. When the module is imported, it directly returns the result from the function without having to invoke the function. This behavior isn’t always desired, but it’s helpful for certain use-cases, like pre-loading data from cache when the module is imported.

Importing all Module Objects

To import all objects (functions, variables, classes, etc.) from a module, we can use the import * statement. For example, to import all objects contained in the hello module, we can use the following statement:

from hello import *   

After adding the above statement to a program, we will be able to use any function, variable, or class contained in the hello module without having to prefix it with hello.

Accessing a Module from Another Path

In Python, modules are used in more than one project. Hence, it makes no sense if you keep your modules in the directory of one of the projects, since other projects wouldn’t be able to use it as easily.

You have a couple of options whenever you need to access a module that is not stored in the same directory as your program. Let us discuss these in the next few sections:

Appending Paths

To import a module from another path, you first need to import the sys module as well as any other Python modules that you would like to use in your program.

The sys module is provided by the Python Standard Library and it provides functions and parameters that are system-specific. The path.append() function from the sys module can be used to add the path of the module to the current project.

To demonstrate this, cut the hello.py file from the directory where you have the file main.py. Paste it in another directory. In my case, I have pasted it in the directory “F:\Python.”

Now, open the file main.py, import the sys module and specify the path in which the Python interpreter will look for files. This is demonstrated below:

import sys   sys.path.append('F:/Python/')  import hello   

Output

Hello World   

In the above script, the line sys.path.append('F:/Python/') tells the Python interpreter to include this path in the list of paths that will be searched while importing the modules.

Adding a Module to Python Path

The above method works only if you import the sys module. If you don’t import the sys module and specify the path to the module, an error will be generated. To make the module available to the entire system, you can add it to the path where Python normally checks for modules and packages. This way, you will not have to import the sys module and specify the path to the module as we have done in the previous section.

Before doing anything else, you should first identify the path that Python searches for modules and packages. Just open the command line of your operating system and run the python command. This will take you to the Python terminal.

Import the sys module as follows:

import sys   

You can then run the following command to print out the path:

print(sys.path)   

The output will contain at least one system path. If you do it from a programming environment, you will get several paths. In my case, I got the following:

$   python Python 2.7.10 (default, Oct 23 2015, 19:19:21)   [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin Type "help", "copyright", "credits" or "license" for more information.   >>> import sys >>> print(sys.path) ['', '/Library/Python/2.7/site-packages/six-1.10.0-py2.7.egg', '/Library/Python/2.7/site-packages/cffi-1.2.1-py2.7-macosx-10.9-intel.egg', '/Library/Python/2.7/site-packages/pycparser-2.14-py2.7.egg', '/Library/Python/2.7/site-packages/virtualenv-13.1.2-py2.7.egg', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python27.zip', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload', '/Library/Python/2.7/site-packages'] >>>  

Your goal should be to find the one in the environment that you are currently using. You should look for something like the following:

/Library/Python/2.7/site-packages 

Move your hello.py file to the path. After that, you will be able to import the hello module from any directory in the usual way, as shown below:

import hello   

Output

Hello World   

Conclusion

This marks the end of this article. A module is simply a Python file with a set of variables and function definitions. A module facilitates code reusability since you can define a function in a module and invoke it from different programs instead of having to define the function in every program. Although a module is mostly used for function and class definitions, it may also export variables and class instances.

Planet Python

Evennia: Creating Evscaperoom, part 1

Over the last month (April-May 2019) I have taken part in the Mud Coder’s Guild Game Jam “Enter the (Multi-User) Dungeon”. This year the theme for the jam was One Room.

The result was Evscaperoom, an text-based multi-player “escape-room” written in Python using the Evennia MU* creation system. You can play it from that link in your browser or MU*-client of choice. If you are so inclined, you can also vote for it here in the jam (don’t forget to check out the other entries while you’re at it).

This little series of (likely two) dev-blog entries will try to recount the planning and technical aspects of the Evscaperoom. This is also for myself – I’d better write stuff down now while it’s still fresh in my mind!

Update: The next part of this blog is here.

Inception 

When I first heard about the upcoming game-jam’s theme of One Room, an ‘escape room’ was the first thing that came to mind, not the least because I just recently got to solve my my own first real-world escape-room as a gift on my birthday. 

If you are not familiar with escape-rooms, the premise is simple – you are locked into a room and have to figure out a way to get out of it by solving practical puzzles and finding hidden clues in the room. 

While you could create such a thing in your own bedroom (and there are also some one-use board game variants), most escape-rooms are managed by companies selling this as an experience for small groups. You usually have one hour to escape and if you get stuck you can press a button (or similar) to get a hint.

I thought making a computer escape-room. Not only can you do things in the computer that you cannot do in the real world, restricting the game to a single room limits so that it’s conceivable to actually finish the damned thing in a month. 

A concern I had was that everyone else in the jam surely must have went for the same obvious idea. In the end that was not an issue at all though.

Basic premises
 
I was pretty confident that I would technically be able to create the game in time (not only is Python and Evennia perfect for this kind of fast experimentation and prototyping, I know the engine very well). But that’s not enough; I had to first decide on how the thing should actually play. Here are the questions I had to consider:

Room State 

 An escape room can be seen as going through multiple states as puzzles are solved. For example, you may open a cabinet and that may open up new puzzles to solve. This is fine in a single-player game, but how to handle it in a multi-player environment?

My first thought was that each object may have multiple states and that players could co-exist in the same room, seeing different states at the same time. I really started planning for this. It would certainly be possible to implement.

But in the end I considered how a real-world escape-room works – people in the same room solves it together. For there to be any meaning with multi-player, they must share the room state.

So what I went with was a solution where players can create their own room or join an existing one. Each such room is generated on the fly (and filled with objects etc) and will change as players solve it. Once complete and/or everyone leaves, the room is deleted along with all objects in it. Clean and tidy.

So how to describe these states? I pictured that these would be described as normal Python modules with a start- and end function that initialized each state and cleaned it up when a new state was started. In the beginning I pictured these states as being pretty small (like one state to change one thing in the room). In the end though, the entire Evscaperoom fits in 12 state modules. I’ll describe them in more detail in the second part of this post. 

Accessibility and “pixel-hunting” in text

When I first started writing descriptions I didn’t always note which objects where interactive. It’s a very simple and tempting puzzle to add – mention an object as part of a larger description and let the player figure out that it’s something they can interact with. This practice is sort-of equivalent to pixel-hunting in graphical games – sweeping with the mouse across the screen until you find that little spot on the screen that you can do something with.

Problem is, pixel-hunting’s not really fun. You easily get stuck and when you eventually find out what was blocking you, you don’t really feel clever but only frustrated. So I decided that I should clearly mark every object that people could interact with and focus puzzles on better things.

In fact, in the end I made it an option:

Option menu ('quit' to return)   1: ( ) No item markings (hard mode)  2: ( ) Items marked as item (with color)  3: (*) Items are marked as [item] (screenreader friendly)  4: ( ) Screenreader mode

As part of this I had to remind myself never to use colors only when marking important information: Visually impaired people with screen readers will simply miss that. Not to mention that some just disable colors in their clients.

So while I personally think option 2 above is the most visually pleasing, Evscaperoom defaults to the third option. It should should start everyone off on equal footing. Evennia has a screen-reader mode out of the box, but I moved it into the menu here for easy access.

Inventory and collaboration

In a puzzle-game, you often find objects and combine them with other things. Again, this is simple to do in a single-player game: Players just pick things up and use them later.

But in a multi-player game this offers a huge risk: players that pick up something important and then log off. The remaining players in that room would then be stuck in an unsolvable room – and it would be very hard for them to know this.

In principle you could try to ‘clean’ player inventories when they leave, but not only does it add complexity, there is another issue with players picking things up: It means that the person first to find/pick up the item is the only one that can use it and look at it. Others won’t have access until the first player gives it up. Trusting that to anonymous players online is not a good idea.

So in the end I arrived at the following conclusions:

  • As soon as an item/resource is discovered, everyone in the room must be able to access it immediately.
  • There can be no inventory. Nothing can ever be picked up and tied to a specific player.
  • As soon as a discovery is made, this must be echoed to the entire room (it must not be up to the finder to announce what they found to everyone else).  

As a side-effect of this I also set a limit to the kind of puzzles I would allow:

  • No puzzles must require more than one player to solve. While one could indeed create some very cool puzzles where people collaborate, it’s simply not feasible to do so with random strangers on the internet. At any moment the other guy may log off and leave you stuck. And that’s if you even find someone logged in at the same time in the first place! The room should always be possible to solve solo, from beginning to end.

Focusing on objects

So without inventory system, how do you interact with objects? A trademark of any puzzle is using one object with another and also to explore things closer to find clues. I turned to graphical adventure games for inspiration:

Hovering with mouse over lens object offers action
Secret of Monkey Island ©1990 LucasArts. Image from old-games.com

A common way to operate on an object in traditional adventure games is to hover the mouse over it and then select the action you want to apply to it. In later (3D) games you might even zoom in of the object and rotate it around with your mouse to see if there are some clues to be had.

While Evennia and modern UI clients may allow you to use the mouse to select objects, I wanted this to work the traditional MUD-way, by inserting commands. So I decided that you as a player would be in one of two states:

  • The ‘normal’ state: When you use look you see the room description.
  • The ‘focused’ state: You focus on a specific object with the examine <target> command (aliases are ex or just e). Now object-specific actions become available to you. Use examine again to “un-focus”. 
A small stone fireplace sits in the middle of the wall opposite the [door]. On the chimney hangs a small oil [painting] of a man. Hanging over the hearth is a black [cauldron]. The piles of [ashes] below are cold.  (It looks like fireplace may be suitable to [climb].)


In the example above, the fireplace points out other objects you could also focus on, whereas the last parenthesis includes one or more “actions” that you can perform on the fireplace only when you have it focused. 

This ends up pretty different from most traditional MUD-style inputs. When I first released this to the public, I found people logged off after their first examine. It turned out that they couldn’t figure out how to leave the focus mode. So they just assumed the thing was buggy and quit instead. Of course it’s mentioned if you care to write help, but this is clearly one step too many for such an important UI concept. 

So I ended up adding the header above that always reminds you. And since then I’ve not seen any confusion over how the focus mode works.

For making it easy to focus on things, I also decided that each room would only ever have one object named a particular thing. So there is for example only one single object in the game named “key” that you can focus on. 

Communication

I wanted players to co-exist in the same room so that they could collaborate on solving it. This meant communication must be possible. I pictured people would want to point things out and talk to each other.

In my first round of revisions I had a truckload of individual emotes; you could

      point at target

 for example. In the end I just limited it to  

     say/shout/whisper <message>

and 

     emote <whatever>

And seeing what people actually use, this is more than enough (say alone is probably 99% of what people need, really). I had a notion that the shout/whisper could be used in a puzzle later but in the end I decided that communication commands should be strictly between players and not have anything to do with the puzzles.

I removed all other interaction: There is no fighting and without an inventory or requirement to collaborate on puzzles, there is no need for other interactions than to communicate.

First version you didn’t even see what the others did, but eventually I added so that you at least saw what other players were focusing on at the moment (and of course if some major thing was solved/found).

In the end I don’t even list characters as objects in the room (you have to use the who command to see who’s in there with you).

Listing of commands available in the Evscaperoom (output of the help-command in game)
The main help command output.

Story

It’s very common for this type of game to have a dangerous or scary theme. Things like “get out before the bomb explodes”, “save the space ship before the engines overheat”, “flee the axe murderer before he comes back” etc). I’m no stranger to dark themes, but for this I wanted something friendlier and brighter, maybe with a some dark undercurrents here and there.

My Jester character is someone I’ve not only depicted in art, but she’s also an old RP character and literary protagonist of mine. Who else would find it funny to lock someone into a room only to provide crazy puzzles and hints for them to get out again? So my flimsy ‘premise’ was this: 

The village Jester wants to win the pie eating contest. You are one of her most dangerous opponents. She tricked you to her cabin and now you are locked in! If you don’t get out in time, she’ll get to eat all those pies on her own and surely win!


That’s it – this became the premise from which the entire game flowed. I quickly decided that it to be a very “small-scale” story: no life-or-death situation, no saving of the world. The drama takes place in a small village with an “adversary” that doesn’t really want to hurt you, but only to eat more pies than you.

From this, the way to offer hints came naturally – just eat a slice of “hintberry pie” the jester made (she even encourage you to eat it). It gives you a hint but is also very filling. So if you eat too much, how will you beat her in the contest later, even if you do get out?

To further the rustic and friendly tone I made sure the story took place on a warm summer day. Many descriptions describe sunshine, chirping birds and the smell of pie. I aimed at letting the text point out quirky and slightly comedic tone of the puzzles the Jester left behind. The player also sometimes gets teased by the game when doing things that does not make sense.

I won’t go into the story further here – it’s best if you experience it yourself. Let’s just say that the village has some old secrets. And and the Jester has her own ways of doing things and of telling a story. The game has multiple endings and so far people have drawn very different conclusions in the end.

Scoring

Most often in escape rooms, final score is determined by the time and the number of hints used. I do keep the latter – for every pie you eat, you get a penalty on your final score.

As for time – this background story would fit very well with a time limit (get out in X time, after which the pie-eating contest will start!). But from experience with other online text-based games I decided against this. Not only should a player be able to take a break, they may also want to wait for a friend to leave and come back etc. 

But more importantly, I want players to explore and read all my carefully crafted descriptions! So I’d much rather prefer they take their time and reward them for being thorough. 

So in the end I give specific scores for actions throughout the game instead. Most points are for doing things that drive the story forward, such as using something or solving a puzzle. But a significant portion of the score comes from turning every stone and trying everything out. The nice side-effect of this is that even if you know exactly how to solve everything and rush through the game you will still not end up with a perfect score. 

The final score, adjusted by hints is then used to determine if you make it in time to the contest and how you fare. This means that if you explore carefully you have a “buffer” of points so eating a few pies may still land you a good result in the end.
 

First sketch

I really entered the game ‘building’ aspect with no real notion of how the Jester’s cabin should look nor which puzzles should be in it. I tried to write things down beforehand but it didn’t really work for me. 

So in the end I decided “let’s just put a lot of interesting stuff in the room and then I’ll figure out how they interact with each other”. I’m sure this is different from game-maker to game-maker. But for me, this process worked perfectly. 

Scribbles on my notebook, sketching up the room's main items
My first, very rough, sketch of the Jester’s cabin


The above, first sketch ended up being what I used, although many of the objects mentioned never ended up in the final game and some things switched places. I did some other sketches too, but they’d be spoilers so I won’t show them here …


The actual game logic

The Evscaperoom principles outlined above deviate quite a bit from the traditional MU* style of game play. 

While Evennia provides everything for database management, in-game objects, commands, networking and other resources, the specifics of your game is something you need to make yourself – and you have the full power of Python to do it!

So for the first three days of the jam I used Evennia to build the custom game logic needed to provide the evscaperoom style of game play. I also made the tools I needed to quickly create the game content (which then took me the rest of the jam to make). 

In part 2 of this blog post I will cover the technical details of the Evscaperoom I built. I’ll also go through some issues I ran into and conclusions I drew. I’ll link to that from here when it’s available!

Continue to part 2.
Planet Python