Webinar Replay – Zero to Cloud Analytics

Zero to Cloud Analytics 
Join Snowflake, InterWorks and Tableau for a webinar led by InterWorks’ Principal and Data Practice Lead, Brian Bickell.

Webinar Takeaways

  • How modern cloud analytics technologies are changing the data landscape
  • How to easily load, integrate and query structured and semi-structured data
  • What it’s like to run large queries in a fraction of the time
  • Exploring modern cloud data warehouse pattern

Download Now

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration of my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

The post Webinar Replay – Zero to Cloud Analytics appeared first on InterWorks.

InterWorks

How To Set Up the code-server Cloud IDE Platform on DigitalOcean Kubernetes

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

With developer tools moving to the cloud, creation and adoption of cloud IDE (Integrated Development Environment) platforms is growing. Cloud IDEs allow for real-time collaboration between developer teams to work in a unified development environment that minimizes incompatibilities and enhances productivity. Accessible through web browsers, cloud IDEs are available from every type of modern device. Another advantage of a cloud IDE is the possibility to leverage the power of a cluster, which can greatly exceed the processing power of a single development computer.

code-server is Microsoft Visual Studio Code running on a remote server and accessible directly from your browser. Visual Studio Code is a modern code editor with integrated Git support, a code debugger, smart autocompletion, and customizable and extensible features. This means that you can use various devices, running different operating systems, and always have a consistent development environment on hand.

In this tutorial, you will set up the code-server cloud IDE platform on your DigitalOcean Kubernetes cluster and expose it at your domain, secured with Let’s Encrypt certificates. In the end, you’ll have Microsoft Visual Studio Code running on your Kubernetes cluster, available via HTTPS and protected by a password.

Prerequisites

  • A DigitalOcean Kubernetes cluster with your connection configured as the kubectl default. Instructions on how to configure kubectl are shown under the Connect to your Cluster step when you create your cluster. To create a Kubernetes cluster on DigitalOcean, see Kubernetes Quickstart.

  • The Helm package manager installed on your local machine, and Tiller installed on your cluster. To do this, complete Steps 1 and 2 of the How To Install Software on Kubernetes Clusters with the Helm Package Manager tutorial.

  • The Nginx Ingress Controller and Cert-Manager installed on your cluster using Helm in order to expose code-server using Ingress Resources. To do this, follow How to Set Up an Nginx Ingress on DigitalOcean Kubernetes Using Helm.

  • A fully registered domain name to host code-server, pointed at the Load Balancer used by the Nginx Ingress. This tutorial will use code-server.your_domain throughout. You can purchase a domain name on Namecheap, get one for free on Freenom, or use the domain registrar of your choice. This domain name must differ from the one used in the How To Set Up an Nginx Ingress on DigitalOcean Kubernetes prerequisite tutorial.

Step 1 — Installing And Exposing code-server

In this section, you’ll install code-server to your DigitalOcean Kubernetes cluster and expose it at your domain, using the Nginx Ingress controller. You will also set up a password for admittance.

You’ll store the deployment configuration on your local machine, in a file named code-server.yaml. Create it using the following command:

  • nano code-server.yaml

Add the following lines to the file:

code-server.yaml
apiVersion: v1 kind: Namespace metadata:   name: code-server --- apiVersion: extensions/v1beta1 kind: Ingress metadata:   name: code-server   namespace: code-server   annotations:     kubernetes.io/ingress.class: nginx spec:   rules:   - host: code-server.your_domain     http:       paths:       - backend:           serviceName: code-server           servicePort: 80 --- apiVersion: v1 kind: Service metadata:  name: code-server  namespace: code-server spec:  ports:  - port: 80    targetPort: 8443  selector:    app: code-server --- apiVersion: extensions/v1beta1 kind: Deployment metadata:   labels:     app: code-server   name: code-server   namespace: code-server spec:   selector:     matchLabels:       app: code-server   replicas: 1   template:     metadata:       labels:         app: code-server     spec:       containers:       - image: codercom/code-server         imagePullPolicy: Always         name: code-server         args: ["--allow-http"]         ports:         - containerPort: 8443         env:         - name: PASSWORD           value: "your_password" 

This configuration defines a Namespace, a Deployment, a Service, and an Ingress. The Namespace is called code-server and separates the code-server installation from the rest of your cluster. The Deployment consists of one replica of the codercom/code-server Docker image, and an environment variable named PASSWORD that specifies the password for access.

The code-server Service internally exposes the pod (created as a part of the Deployment) at port 80. The Ingress defined in the file specifies that the Ingress Controller is nginx, and that the code-server.your_domain domain will be served from the Service.

Remember to replace your_password with your desired password, and code-server.your_domain with your desired domain, pointed to the Load Balancer of the Nginx Ingress Controller.

Then, create the configuration in Kubernetes by running the following command:

  • kubectl create -f code-server.yaml

You’ll see the following output:

Output
namespace/code-server created ingress.extensions/code-server created service/code-server created deployment.extensions/code-server created

You can watch the code-server pod become available by running:

  • kubectl get pods -w -n code-server

The output will look like:

Output
NAME READY STATUS RESTARTS AGE code-server-f85d9bfc9-j7hq6 0/1 ContainerCreating 0 1m

As soon as the status becomes Running, code-server has finished installing to your cluster.

Navigate to your domain in your browser. You’ll see the login prompt for code-server.

code-server login prompt

Enter the password you set in code-server.yaml and press Enter IDE. You’ll enter code-server and immediately see its editor GUI.

code-server GUI

You’ve installed code-server to your Kubernetes cluster and made it available at your domain. You have also verified that it requires you to log in with a password. Now, you’ll move on to secure it with free Let’s Encrypt certificates using Cert-Manager.

Step 2 — Securing the code-server Deployment

In this section, you will secure your code-server installation by applying Let’s Encrypt certificates to your Ingress, which Cert-Manager will automatically create. After completing this step, your code-server installation will be accessible via HTTPS.

Open code-server.yaml for editing:

  • nano code-server.yaml

Add the highlighted lines to your file, making sure to replace the example domain with your own:

code-server.yaml
apiVersion: v1 kind: Namespace metadata:   name: code-server --- apiVersion: extensions/v1beta1 kind: Ingress metadata:   name: code-server   namespace: code-server   annotations:     kubernetes.io/ingress.class: nginx     certmanager.k8s.io/cluster-issuer: letsencrypt-prod spec:   tls:   - hosts:     - code-server.your_domain     secretName: codeserver-prod   rules:   - host: code-server.your_domain     http:       paths:       - backend:           serviceName: code-server           servicePort: 80 ... 

First, you specify that the cluster-issuer that this Ingress will use to provision certificates will be letsencrypt-prod, created as a part of the prerequisites. Then, you specify the domains that will be secured under the tls section, as well as your name for the Secret holding them.

Apply the changes to your Kubernetes cluster by running the following command:

  • kubectl apply -f code-server.yaml

You’ll need to wait a few minutes for Let’s Encrypt to provision your certificate. In the meantime, you can track its progress by looking at the output of the following command:

  • kubectl describe certificate codeserver-prod -n code-server

When it finishes, the end of the output will look similar to this:

Output
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Generated 2m49s cert-manager Generated new private key Normal GenerateSelfSigned 2m49s cert-manager Generated temporary self signed certificate Normal OrderCreated 2m49s cert-manager Created Order resource "codeserver-prod-4279678953" Normal OrderComplete 2m14s cert-manager Order "codeserver-prod-4279678953" completed successfully Normal CertIssued 2m14s cert-manager Certificate issued successfully

You can now refresh your domain in your browser. You’ll see the padlock to the left of the address bar in your browser signifying that the connection is secure.

In this step, you have configured the Ingress to secure your code-server deployment. Now, you can review the code-server user interface.

Step 3 — Exploring the code-server Interface

In this section, you’ll explore some of the features of the code-server interface. Since code-server is Visual Studio Code running in the cloud, it has the same interface as the standalone desktop edition.

On the left-hand side of the IDE, there is a vertical row of six buttons opening the most commonly used features in a side panel known as the Activity Bar.

code-server GUI - Sidepanel

This bar is customizable so you can move these views to a different order or remove them from the bar. By default, the first view opens the Explorer panel that provides tree-like navigation of the project’s structure. You can manage your folders and files here—creating, deleting, moving, and renaming them as necessary. The next view provides access to a search and replace functionality.

Following this, in the default order, is your view of the source control systems, like Git. Visual Studio code also supports other source control providers and you can find further instructions for source control workflows with the editor in this documentation.

Git dropdown menu with version control actions

The debugger option on the Activity Bar provides all the common actions for debugging in the panel. Visual Studio Code comes with built-in support for the Node.js runtime debugger and any language that transpiles to Javascript. For other languages you can install extensions for the required debugger. You can save debugging configurations in the launch.json file.

Debugger View with launch.json open

The final view in the Activity Bar provides a menu to access available extensions on the Marketplace.

code-server GUI - Tabs

The central part of the GUI is your editor, which you can separate by tabs for your code editing. You can change your editing view to a grid system or to side-by-side files.

Editor Grid View

After creating a new file through the File menu, an empty file will open in a new tab, and once saved, the file’s name will be viewable in the Explorer side panel. Creating folders can be done by right clicking on the Explorer sidebar and pressing on New Folder. You can expand a folder by clicking on its name as well as dragging and dropping files and folders to upper parts of the hierarchy to move them to a new location.

code-server GUI - New Folder

You can gain access to a terminal by pressing CTRL+SHIFT+\, or by pressing on Terminal in the upper menu, and selecting New Terminal. The terminal will open in a lower panel and its working directory will be set to the project’s workspace, which contains the files and folders shown in the Explorer side panel.

You’ve explored a high-level overview of the code-server interface and reviewed some of the most commonly used features.

Conclusion

You now have code-server, a versatile cloud IDE, installed on your DigitalOcean Kubernetes cluster. You can work on your source code and documents with it individually or collaborate with your team. Running a cloud IDE on your cluster provides more power for testing, downloading, and more thorough or rigorous computing. For further information see the Visual Studio Code documentation on additional features and detailed instructions on other components of code-server.

DigitalOcean Community Tutorials

InterWorks Receives a Cloud Partner of the Year Award from Ingram Micro for 2018

Ingram Micro Partner of the Year award

Recently, Ingram Micro held its annual Cloud Summit, a multi-day conference that brings together not only some of the brightest technology resellers and partners but also some of the biggest, like Microsoft and Amazon. As part of the Summit, Ingram hands out awards to some of their most outstanding partners to honor achievements from the previous year. We are proud to announce that InterWorks was recognized as a US Cloud Partner of the Year Award for Depth.

Cloud Partner of the Year for Depth

This award was based on annual percentage growth for infrastructure-as-a-services (IaaS) sold, and we increased our output by over 400% last year. What’s especially humbling about this award is that it’s just a byproduct of the way we like to do business: collaborating with the best partners to empower our clients to do their best work.

“It just so happened that IaaS was the right solution for many of our clients. We believe in providing the right solutions, and our clients believe in us. It all goes back to strong relationships.” – Aaron Richardson, Director of IT Practice Delivery

What Is Infrastructure-as-a-Service (IaaS)

IaaS is IT infrastructure that’s offered through the cloud. This is valuable to clients because it affords them the ability to manage their infrastructure remotely, which simplifies management, improves reliability and increases the agility and flexibility of performance. Ingram Micro has been our go-to distributor for these kinds of services for a long time, and we are grateful for the relationship we’ve fostered over the years.

Looking Ahead with Ingram Micro

As an American distributor of information technology products, Ingram Micro works to provide comprehensive solutions that help businesses fully realize the promise and potential of tech. With a global reach and wide scope of services, Ingram Micro delivers solid tech and supply-chain services to customers worldwide. We’ve looked to them for many things over the years—primarily Microsoft Office, Microsoft Azure and Amazon Web Services—but after learning about a number of their additional offerings, we’re expectant to see how these could lead to more exciting and productive options for everyone involved going forward.

If you are interested learning more about IaaS and what we can do with it, we encourage you to reach out to us. Whether it’s Rackspace, AWS or Azure, we have a number of options to give clients the right solution for their needs and things that work well with their existing infrastructure.

The post InterWorks Receives a Cloud Partner of the Year Award from Ingram Micro for 2018 appeared first on InterWorks.

InterWorks

Data School: Six easy ways to run your Jupyter Notebook in the cloud

Six easy ways to run your Jupyter Notebook in the cloud

There are many ways to share a static Jupyter notebook with others, such as posting it on GitHub or sharing an nbviewer link. However, the recipient can only interact with the notebook file if they already have the Jupyter Notebook environment installed.

But what if you want to share a fully interactive Jupyter notebook that doesn’t require any installation? Or, you want to create your own Jupyter notebooks without installing anything on your local machine?

In this post, I’m going to review six services you can use to easily run your Jupyter notebook in the cloud. All of them have the following characteristics:

  • They don’t require you to install anything on your local machine.
  • They are completely free (or they have a free plan).
  • They give you access to the Jupyter Notebook environment (or a Jupyter-like environment).
  • They allow you to import and export notebooks using the standard .ipynb file format.
  • They support the Python language (and most support other languages as well).

Since all of these are cloud-based services, none of them will work for you if you are restricted to working with your data on-premise.

Table of Contents

Note: If you just want a quick summary, check out the comparison table.


Criteria for comparison

Here are the criteria on which I compared each of the six services:

Supported languages: Does this service support any programming languages other than Python?

Ability to install packages: Does this service allow you to install additional packages (or a particular version of a package), beyond the ones that are already installed?

Interface similarity: If the service provides a "Jupyter-like" interface (rather than the native Jupyter interface), how similar is its interface to Jupyter? (This makes it easier for existing Jupyter users to transition to this service.)

Keyboard shortcuts: Does this service use the same keyboard shortcuts as the Jupyter Notebook?

Missing features: Is there anything that the Jupyter Notebook can do that this service does not support?

Added features: Is there anything this service can do that the Jupyter Notebook does not support?

Ease of working with datasets: How easy does this service make it to work with your own datasets?

Internet access: Does this service give you Internet access from within the Notebook, so that you can read data from URLs when necessary?

Ability to work privately: Does this service allow you to keep your work private?

Ability to share publicly: Does this service provide a way for you to share your work publicly?

Ability to collaborate: Does this service allow you to invite someone to collaborate on a notebook, and can the collaboration occur in real-time?

Performance of the free plan: What computational resources (RAM and CPU) does this service provide? Does it give you access to a GPU (which is useful for deep learning)? How much disk space is included? How long can a session run?

Ability to upgrade for better performance: Can you pay for this service in order to access more computational resources?

Documentation and technical support: Is the service well-documented? Can you get in touch with someone if you run into a problem?


1. Binder

Six easy ways to run your Jupyter Notebook in the cloud

Binder is a service provided by the Binder Project, which is a member of the Project Jupyter open source ecosystem. It allows you to input the URL of any public Git repository, and it will open that repository within the native Jupyter Notebook interface. You can run any notebooks in the repository, though any changes you make will not be saved back to the repository. You don’t have to create an account with Binder and you don’t need to be the owner of the repository, though the repository must include a configuration file that specifies its package requirements.

Supported languages: Python (2 and 3), R, Julia, and any other languages supported by Jupyter.

Ability to install packages: You can specify your exact package requirements using a configuration file (such as environment.yml or requirements.txt).

Interface similarity: Binder uses the native Jupyter Notebook interface.

Keyboard shortcuts: Binder uses all of the same keyboard shortcuts as Jupyter.

Missing features: None.

Added features: None.

Ease of working with datasets: If your dataset is in the same Git repository, then it will automatically be available within Binder. If your dataset is not in that repository but is available at any public URL, then you can add a special file to the repository telling Binder to download your dataset. However, Binder does not support accessing private datasets.

Internet access: Yes.

Ability to work privately: No, since it only works with public Git repositories.

Ability to share publicly: Yes. You can share a URL that goes directly to your Binder, or someone can run your notebooks using the Binder website (as long as they know the URL of your Git repository).

Ability to collaborate: No. If you want to work with someone on the same notebook and your repository is hosted on GitHub, then you can instead use the normal pull request workflow.

Performance of the free plan: You will have access to up to 2 GB of RAM. There is no specific limit to the amount of disk space, though they ask you not to include "very large files" (more than a few hundred megabytes). Binder can be slow to launch, especially when it’s run on a newly updated repository. Sessions will shut down after 20 minutes of inactivity, though they can run for 12 hours or longer. Binder has other usage guidelines, including a limit of 100 simultaneous users for any given repository.

Ability to upgrade for better performance: No. However, you do have the option of setting up your own BinderHub deployment, which can provide the same functionality as Binder while allowing you to customize the environment (such as increasing the computational resources or allowing private files).

Documentation and technical support: Binder has extensive documentation. Community support is available via Gitter chat and a Discourse forum, and product issues are tracked on GitHub.

Conclusion: If your notebooks are already stored in a public GitHub repository, Binder is the easiest way to enable others to interact with them. Users don’t have to create an account, and they’ll feel right at home if they already know how to use the Jupyter Notebook. However, you’ll want to keep the performance limitations and user limits in mind!


2. Kaggle Kernels

Six easy ways to run your Jupyter Notebook in the cloud

Kaggle is best known as a platform for data science competitions. However, they also provide a free service called Kernels that can be used independently of their competitions. After creating a Kaggle account (or logging in with Google or Facebook), you can create a Kernel that uses either a notebook or scripting interface, though I’m focusing on the notebook interface below.

Supported languages: Python (3 only) and R.

Ability to install packages: Hundreds of packages come pre-installed, and you can install additional packages using pip or by specifying the GitHub repository of a package. However, any additional packages you install will need to be reinstalled at the start of every session. Alternatively, you can ask Kaggle to include additional packages in their default installation.

Interface similarity: Visually, the Kernels interface looks quite different from the Jupyter interface. There’s no menu bar or toolbar at the top of the screen, there’s a collapsible sidebar on the right for adjusting settings, and there’s a console docked below the notebook. However, working in the Kernels notebook actually feels very similar to working in the Jupyter Notebook, especially if you’re comfortable with Jupyter’s keyboard shortcuts. Also, note that a redesigned interface (shown in the screenshot above) will soon be released, which is more similar to the Jupyter interface and includes a simple menu bar.

Keyboard shortcuts: Kernels uses all of the same keyboard shortcuts as Jupyter.

Missing features:

  • Because Kernels doesn’t (yet) include a menu bar or a toolbar, many actions can only be done using keyboard shortcuts or the command palette.
  • You can’t download your notebook into other useful formats such as a Python script, HTML webpage, or Markdown file.

Added features:

  • Kernels includes a lightweight version control system. Every time you want to save your work, there’s a "commit" button which runs the entire notebook from top to bottom and adds a new version to the history. (You can keep working while this process takes place, which is essential for long-running notebooks.) Although you can’t name the versions, you can display the "diff" between any two versions.
  • Kernels allows you to selectively hide the input and/or output of any code cell, which makes it easy to customize the presentation of your notebook.

Ease of working with datasets: You can upload a dataset to Kaggle from your local computer, a URL, or a GitHub repository, and it will be hosted for free by another Kaggle service called Datasets. You can make the dataset private or public. Any dataset you upload, as well as any public dataset uploaded by a Kaggle user, can be accessed by any of your Kernels. The maximum size of each dataset is 20 GB, and a single Kernel can access multiple datasets.

Internet access: Yes.

Ability to work privately: Yes.

Ability to share publicly: Yes. If you choose to make your Kernel public, anyone can access it without creating a Kaggle account, and anyone with a Kaggle account can comment on your Kernel or copy it to their own account. Additionally, Kaggle also provides you with a public profile page, which displays all of your public Kernels and datasets.

Ability to collaborate: Yes. You can keep your Kernel private but invite specific Kaggle users to view or edit it. There’s no real-time collaboration: It’s more like working on separate copies of the Kernel, except that all commits are added to the same version history.

Performance of the free plan: You can access either a 4-core CPU with 17 GB of RAM, or a 2-core CPU with 14 GB of RAM plus a GPU. You will have 5 GB of "saved" disk space and 17 GB of "temporary" disk space, though any disk space used by your dataset does not count towards these figures. Sessions will shut down after 60 minutes of inactivity, though they can run for up to 9 hours.

Ability to upgrade for better performance: No.

Documentation and technical support: Kernels has adequate documentation. Support is available via a contact form and a forum.

Conclusion: As long as you’re comfortable with a slightly cluttered interface (which has already been improved in the redesign), you’ll have access to a high-performance environment in which it’s easy to work with your datasets and share your work publicly (or keep it private). The included version control and collaboration features are also nice additions, though neither are fully-featured.


3. Google Colaboratory (Colab)

Six easy ways to run your Jupyter Notebook in the cloud

Google Colaboratory, usually referred to as "Google Colab," is available to anyone with a Google account. As long as you are signed into Google, you can quickly get started by creating an empty notebook, uploading an existing notebook, or importing a notebook from any public GitHub repository. Your Colab notebooks are automatically saved in a special folder in your Google Drive, and you can even create new notebooks directly from Drive.

Supported languages: Python (2 and 3) and Swift (which was added in January 2019). Kernels can also be installed for other languages, though the installation process varies by language and is not well-documented.

Ability to install packages: Hundreds of packages come pre-installed, and you can install additional packages using pip. However, any additional packages you install will need to be reinstalled at the start of every session.

Interface similarity: Visually, the Colab interface looks quite similar to the Jupyter interface. However, working in Colab actually feels very dissimilar to working in the Jupyter Notebook:

  • Most of the menu items are different.
  • Colab has changed some of the standard terminology ("runtime" instead of "kernel", "text cell" instead of "markdown cell", etc.)
  • Colab has invented new concepts that you have to understand, such as "playground mode."
  • Command mode and Edit mode in Colab work differently than they do in Jupyter.

Keyboard shortcuts: In Colab, most of the single letter keyboard shortcuts used by Jupyter (such as "a" to "insert cell above") have been changed to a multi-step process ("Ctrl+m" followed by "a"), though Colab does allow you to customize the shortcuts.

Missing features:

  • Because the Colab menu bar is missing some items and the toolbar is kept very simple, some actions can only be done using keyboard shortcuts.
  • You can’t download your notebook into other useful formats such as an HTML webpage or Markdown file (though you can download it as a Python script).

Added features:

  • Colab includes a lightweight version control system. It frequently saves the current state of your notebook, and you can browse through the revision history. However, you can’t display the "diff" between versions, which means that you would have to do any comparisons manually.
  • Colab allows you to add form fields to your notebook, which enables you to parameterize your code in an interactive way. However, these fields only work within Colab.
  • When you create a section heading in your notebook, Colab makes every section collapsible and automatically creates a "table of contents" in the sidebar, which makes large notebooks easier to navigate.

Ease of working with datasets: You can upload a dataset to use within a Colab notebook, but it will automatically be deleted once you end your session. Alternatively, you can allow Colab to read files from your Google Drive, though it’s more complicated than it should be. Colab also includes connectors to other Google services, such as Google Sheets and Google Cloud Storage.

Internet access: Yes.

Ability to work privately: Yes.

Ability to share publicly: Yes. If you choose to make your notebook public and you share the link, anyone can access it without creating a Google account, and anyone with a Google account can copy it to their own account. Additionally, you can authorize Colab to save a copy of your notebook to GitHub or Gist and then share it from there.

Ability to collaborate: Yes. You can keep your notebook private but invite specific people to view or edit it (using Google’s familiar sharing interface). You and your collaborator(s) can edit the notebook at the same time and see each other’s changes, as well as add comments for each other (similar to Google Docs), though there’s a 30-second lag between when you make changes and when collaborators will see them. Also, you are not actually sharing your environment with your collaborators (meaning there is no syncing of what code has been run), which significantly limits the usefulness of near real-time collaboration.

Performance of the free plan: Colab does give you access to a GPU or a TPU. Otherwise, Google does not provide any specifications for their environments. If you connect Colab to Google Drive, that will give you up to 15 GB of disk space for storing your datasets. Sessions will shut down after 60 minutes of inactivity, though they can run for up to 12 hours.

Ability to upgrade for better performance: No. However, you do have the option of connecting to a local runtime, which allows you to execute code on your local hardware and access your local file system.

Documentation and technical support: Colab has minimal documentation, which is contained within an FAQ page and a variety of sample notebooks. Support is available via GitHub issues, and community support is available via Stack Overflow.

Conclusion: The greatest strength of Colab is that it’s easy to get started, since most people already have a Google account, and it’s easy to share notebooks, since the sharing functionality works the same as Google Docs. However, the cumbersome keyboard shortcuts and the difficulty of working with datasets are significant drawbacks. The ability to collaborate on the same notebook is useful, but less useful than it could be since you’re not sharing an environment.


4. Microsoft Azure Notebooks

Six easy ways to run your Jupyter Notebook in the cloud

To get started with Azure Notebooks, you first sign in with a Microsoft or Outlook account (or create one). The next step is to create a "project", which is structured identically to a GitHub repository: it can contain one or more notebooks, Markdown files, datasets, and any other file you want to create or upload, and all of these can be organized into folders. Also like GitHub, you can initialize a project with a README file, which will automatically be displayed on the project page. If your work is already stored on GitHub, you can import the entire repository directly into a project.

Supported languages: Python (2 and 3), R, and F#.

Ability to install packages: Hundreds of packages come pre-installed, you can install additional packages using pip or conda, and you can specify your exact package requirements using a configuration file (such as environment.yml or requirements.txt).

Interface similarity: Azure uses the native Jupyter Notebook interface.

Keyboard shortcuts: Azure uses all of the same keyboard shortcuts as Jupyter.

Missing features: None.

Added features:

  • The RISE extension comes pre-installed, which allows you to instantly present your notebook as a live reveal.js-based slideshow.
  • The jupyter_contrib_nbextensions package comes pre-installed, which gives you easy access to a collection of 50+ Jupyter Notebook extensions for enhancing the notebook interface.

Ease of working with datasets: You can upload a dataset to your project from your local computer or a URL, and it can be accessed by any notebook within your project. Azure also includes connectors to other Azure services, such as Azure Storage and various Azure databases.

Internet access: Yes.

Ability to work privately: Yes.

Ability to share publicly: Yes. If you choose to make your project public, anyone can access it without creating a Microsoft account, and anyone with a Microsoft account can copy it to their own account. Additionally, Azure also provides you with a public profile page (very similar to a GitHub profile), which displays all of your public projects.

Ability to collaborate: No, though this is a planned feature.

Performance of the free plan: You will have access to 4 GB of RAM and 1 GB of disk space (per project). Sessions will shut down after 60 minutes of inactivity, though they can run for 8 hours or longer.

Ability to upgrade for better performance: Yes. You can pay for an Azure subscription, though the setup process is non-trivial and the pricing is complicated.

Documentation and technical support: Azure has extensive documentation. Support is available via GitHub issues.

Conclusion: The greatest strength of Azure Notebooks is its ease of use: the project structure (borrowed from GitHub) makes it simple to work with multiple notebooks and datasets, and the use of the native Jupyter interface means that existing Jupyter users will have an easy transition. However, the RAM and disk space are not particularly generous, and the lack of collaboration is a big gap in the functionality.


5. CoCalc

Six easy ways to run your Jupyter Notebook in the cloud

CoCalc, short for "collaborative calculation", is an online workspace for computation in Python, R, Julia, and many other languages. It allows you to create and edit Jupyter Notebooks, Sage worksheets, and LaTeX documents. After creating a CoCalc account, the first step is to create a "project", which can contain one or more notebooks, Markdown files, datasets, and any other file you want to create or upload, and all of these can be organized into folders. The project interface is a bit overwhelming at first, but it looks much more familiar once you create or open a notebook.

Supported languages: Python (2 and 3), R, Julia, and many other languages.

Ability to install packages: Hundreds of packages come pre-installed. You can install additional packages using pip, but this is not available when using a free plan. Alternatively, you can ask CoCalc to include additional packages in their default installation.

Interface similarity: Although CoCalc does not use the native Jupyter Notebook interface (they rewrote it using React.js), the interface is very similar to Jupyter, with only a few minor modifications. You can actually switch to using the native Jupyter Notebook from within CoCalc, though it’s not recommended since you would lose access to the most valuable CoCalc features ("time travel" and real-time collaboration, which are discussed below).

Keyboard shortcuts: CoCalc uses almost all of the same keyboard shortcuts as Jupyter.

Missing features: CoCalc does not currently support interactive widgets.

Added features:

  • CoCalc includes a powerful version control feature called time travel, which records all of your changes to the notebook in fine detail, and allows you to browse those changes using an intuitive slider control.
  • CoCalc saves a backup of all of your project files every few minutes, which means you can recover older versions of your files if needed.
  • CoCalc includes additional features for instructors, such as the ability to distribute and grade assignments, and the ability to watch students while they work and chat with them about the assignment.

Ease of working with datasets: You can upload a dataset to your project from your local computer, and it can be accessed by any notebook within your project.

Internet access: No, this is not available when using a free plan.

Ability to work privately: Yes.

Ability to share publicly: Yes. If you choose to make your notebook public and you share the link, anyone can access it without creating a CoCalc account, and anyone with a CoCalc account can copy it to their own account.

Ability to collaborate: Yes. You can keep your notebook private but invite specific people to edit it. You and your collaborator(s) can edit the notebook at the same time and see each other’s changes (and cursors) in real-time, as well as chat (using text or video) in a window next to the notebook. The status and the results of all computations are also synchronized, which means that everyone involved will experience the notebook in the same way.

Performance of the free plan: You will have access to a 1-core shared CPU with 1 GB of shared RAM, and 3 GB of disk space (per project). Sessions will shut down after 30 minutes of inactivity, though they can run for up to 24 hours.

Ability to upgrade for better performance: Yes. You can pay for a CoCalc subscription, which starts at $ 14/month. Alternatively, you can install the CoCalc Docker image on your own computer, which allows you to run a private multi-user CoCalc server for free.

Documentation and technical support: CoCalc has extensive documentation. Support is available via email and a contact form, and product issues are tracked on GitHub.

Conclusion: The most compelling reasons to use CoCalc are the real-time collaboration and the "time travel" version control features, as well as the course management features (if you’re an instructor). Although the interface is a bit cluttered, existing Jupyter users would have a relatively easy time transitioning to CoCalc. However, the free plan does have some important limitations (inability to install additional packages or access the Internet), and the performance of the free plan is modest.


6. Datalore

Six easy ways to run your Jupyter Notebook in the cloud

Datalore was created by JetBrains, the same company who makes PyCharm (a popular Python IDE). Getting started is as easy as creating an account, or logging in with a Google or JetBrains account. You can either create a new Datalore "workbook" or upload an existing Jupyter Notebook. Datalore workbooks are stored in a proprietary format, though it does support importing and exporting the standard .ipynb file format.

Supported languages: Python 3 only.

Ability to install packages: Hundreds of packages come pre-installed, and you can install additional packages using pip or conda, or by specifying the GitHub repository of a package.

Interface similarity: When you open Datalore, the interface does resemble a Jupyter Notebook in the sense that there are code and Markdown cells as well as output below those cells. However, there are some important differences between the Datalore and Jupyter interfaces:

  • Cells (which Datalore calls "blocks") are not numbered, because the ordering of cells is enforced. In other words, all of your code must be written in the order in which you ultimately want it to run.
  • The notebook (which Datalore calls a "workbook") can have multiple worksheets, similar to Google Sheets, which is a convenient way to break long workbooks into logical sections. If you create multiple worksheets in a workbook, all of the worksheets share the same environment. Because cell order is important in Datalore, the cells in the second worksheet are treated as coming after the cells in the first worksheet, the third worksheet comes after the second worksheet, and so on.
  • There are many other interface differences, which are explained in the "added features" section.

Keyboard shortcuts: Keyboard shortcuts are available for most actions in Datalore, but the shortcuts are wildly different from those used by Jupyter.

Missing features:

  • Datalore does not use the IPython kernel, and thus IPython magic functions and shell commands are not available. (However, optional access to the IPython kernel is a planned feature.)
  • Because the Datalore menu bar is kept very simple and there’s no toolbar, many actions can only be done using keyboard shortcuts.
  • You can’t download your workbook into other useful formats such as a Python script, HTML webpage, or Markdown file.
  • Datalore does not support all of the commonly supported Markdown features in its Markdown cells. (However, improved Markdown support is a planned feature.)
  • Datalore does not support interactive widgets.
  • Datalore does not include multicursor support.

Added features:

  • Cells are automatically run as you write them, which Datalore calls "live computation". This actually makes it easier to debug code as you write it, since you can see the results of your code immediately. (Live computation can be disabled, in which case you can manually trigger cells to run.)
  • Because cells will always run in the order in which they are arranged, Datalore can track cell dependencies. This means that when a given cell is edited, Datalore will determine which cells below it are potentially affected and will immediately re-run those cells (assuming live computation is enabled). If the edit causes an error in a dependent cell, those errors will immediately be flagged.
  • Datalore allows you to display cell inputs and outputs sequentially (like in Jupyter) or in "split view", in which case the inputs and outputs are in two separate panes. When using sequential view, Datalore also makes it easy to hide all inputs or hide all outputs.
  • Datalore includes more "intelligence" than Jupyter in its code completion.
  • As you write code, Datalore provides context-aware suggestions (called "intentions") for which actions you might want to take. For example, after typing the name of a DataFrame, the intentions might include "drop string columns", "histogram", and "train test split". When you click an intention, Datalore actually generates the code for you, which can be a useful way to learn the code behind certain tasks.
  • Datalore includes a well-designed version control system. It frequently saves the current state of your workbook, and you can quickly browse the diffs between the current version and any past versions. You can also choose to add a message when saving the workbook, and then filter the list of versions to only include those versions with a message.
  • Datalore gives you access to a plotting library called datalore.plot, which is very similar to R’s ggplot2, though you can only use it inside of Datalore.

Ease of working with datasets: You can upload a dataset to your workbook from your local computer or a URL, but it can only be accessed by that particular workbook. This would be a significant annoyance if you work with the same dataset(s) across many workbooks. (However, sharing datasets between workbooks is a planned feature.)

Internet access: Yes.

Ability to work privately: Yes.

Ability to share publicly: No.

Ability to collaborate: Yes. You can keep your workbook private but invite specific people to view or edit it. You and your collaborator(s) can edit the notebook at the same time and see each other’s changes (and cursors) in real-time. The status and the results of all computations are also synchronized, which means that everyone involved will experience the notebook in the same way.

Performance of the free plan: You will have access to a 2-core CPU with 4 GB of RAM, and 10 GB of disk space. Sessions will shut down after 60 minutes of inactivity, though there is no specific limit on the length of individual sessions. You can use the service for up to 120 hours per month.

Ability to upgrade for better performance: No, though there will soon be a paid plan which offers more disk space and a more powerful CPU (or GPU).

Documentation and technical support: Datalore has minimal documentation, which is contained within sample workbooks. Support is available via a Discourse forum.

Conclusion: Rather than being an adaptation of the Jupyter Notebook, Datalore is more like a reinvention of the Notebook. It includes an innovative feature set, including live computation, dependency tracking, real-time collaboration, and built-in version control. However, existing Jupyter users may have a challenging time transitioning to Datalore, especially since cell ordering is enforced and all of the keyboard shortcuts are quite different. As well, Datalore currently includes some notable limitations, namely that workbooks can’t be shared publicly and uploaded datasets can’t be shared between workbooks.


How to choose the right service for you

Out of the six options presented, there’s not one clear "winner". Instead, the right choice for you will depend on your priorities. Below are my suggestions for what you should choose, based on your particular needs. (Note: You can also view this as a comparison table.)

You use a language other than Python: Binder and CoCalc support tons of languages. Azure supports Python, R and F#, Kernels supports Python and R, Colab supports Python and Swift, and Datalore only supports Python.

You need to use Python 2: Binder, Colab, Azure, and CoCalc all support Python 2 and 3, whereas Kernels and Datalore only support Python 3.

You work with non-standard packages: Binder and Azure allow you to specify your exact package requirements using a configuration file. CoCalc and Datalore allow you to install additional packages, which will persist across sessions, though this is not available with CoCalc’s free plan. Kernels and Colab also allow you to install additional packages, though they do not persist across sessions. Kernels and CoCalc accept user requests for which packages should be included in their default installation.

You love the existing Jupyter Notebook interface: Binder and Azure use the native Jupyter Notebook interface, and CoCalc uses a nearly identical interface. Kernels is visually different from Jupyter but works like it, whereas Colab is visually similar to Jupyter but does not work like it. Datalore is the furthest from the existing Jupyter Notebook.

You are a heavy user of keyboard shortcuts: Binder, Kernels, and Azure use the same keyboard shortcuts as Jupyter, and CoCalc uses almost all of the same shortcuts. Datalore uses completely different keyboard shortcuts, and Colab uses cumbersome multi-step keyboard shortcuts (though they can be customized).

You prefer a point-and-click interface: Binder, Azure, and CoCalc allow you to perform all actions by pointing and clicking, whereas Kernels, Colab, and Datalore require you to use keyboard shortcuts for certain actions.

You want an integrated version control system: CoCalc and Datalore provide the best interfaces for version control. Kaggle’s version control system is more limited, and Colab’s system is even more limited. Binder and Azure do not provide a version control system.

You work with a lot of datasets: Kernels works seamlessly with Kaggle Datasets, a full-featured (and free) service for hosting datasets of up to 20 GB each. CoCalc offers 3 GB of disk space per project, and any dataset you upload can be accessed by any notebook in your project. Azure has similar functionality, except it offers 1 GB of disk space per project. Datalore offers 10 GB of total disk space, though every dataset you upload has to be linked to a particular workbook. Colab will discard any datasets you upload when your session ends, unless you link Colab to your Google Drive. Binder is best for small datasets that are either stored in your Git repository or located at a public URL.

Your project is already hosted on GitHub: Binder can run your notebooks directly from GitHub, Azure will allow you to import an entire GitHub repository, and Colab can import a single notebook from GitHub. Kernels, CoCalc, and Datalore don’t provide any similar functionality.

You need to keep your work private: All of the options except for Binder support working in private.

You need to keep your data on-premise: None of these cloud-based services allow you to keep your data on-premise. However, you can set up Binder or CoCalc on your own server, since BinderHub and the CoCalc Docker image are both open source, which would allow you to keep your data on-premise.

You want to share your work publicly: Binder creates the least friction possible when sharing, since people can view and run your notebook without creating an account. Kernels, Colab, Azure, and CoCalc allow you to share a URL for read-only access, while requiring users to create an account if they want to run your notebook. Kernels and Azure make sharing even easier by providing you with a public profile page. Datalore does not allow for public sharing.

You need to collaborate with others: CoCalc and Datalore support real-time collaboration. Colab supports collaborating on the same document, though it’s not quite real-time and you’re not sharing the same environment. Kernels supports a form of collaboration in which you’re sharing a version history. Binder and Azure don’t include any collaboration functionality, though with Binder it could easily occur through the normal GitHub pull request workflow.

You want a high performance environment: Kernels provides the most powerful environment (4-core CPU and 17 GB RAM), followed by Datalore (2-core CPU and 4 GB RAM), Azure (4 GB RAM), Binder (up to 2 GB RAM), and CoCalc (1-core CPU and 1 GB RAM). Colab does not provide specifications for its environment.

You need access to a GPU: Kernels and Colab both provide free access to a GPU. GPU access is available to paying customers of Azure and (soon) Datalore. GPU access is not available through Binder or CoCalc.

You prefer to use a non-commercial tool: Binder is the only option that is managed by a non-commercial entity.


Similar services which were not reviewed

The following services are similar to the six options above, but were not included in my comparison:

  • I didn’t include any service that only provides access to JupyterLab, such as Notebooks AI, Kyso, and CyVerse. (Note that Binder, Azure, and CoCalc all allow you to use JupyterLab instead of Jupyter Notebook if you prefer.)
  • I didn’t include IBM Watson Studio Cloud because the process of getting started is cumbersome, the interface is overly complicated, the free plan has lots of limitations, and there were lots of error messages during my testing.
  • I didn’t include Gryd because the free plan requires an academic email address, and I didn’t include Code Ocean because the free plan is severely limited without an academic email address.
  • I didn’t include ZEPL because it doesn’t allow you to export notebooks using the standard .ipynb format.
  • I didn’t include any paid services, such as Saturn Cloud, Crestle.ai, Paperspace, and Salamander.

My fact-checking process

This article is the result of 50+ hours of research, testing, and writing. In addition, I shared drafts of this article with the relevant teams from Binder, Kaggle, Google, Microsoft, CoCalc, and Datalore in March 2019. I received detailed feedback from all six companies/organizations (thank you!), which I incorporated into the article before publishing.

That being said, these services are constantly changing, and it’s likely that some of this information will become outdated in the future. If you believe that something in this article is no longer correct, please leave a comment below, and I’d be happy to consider updating the article.

P.S. Want to master the Jupyter Notebook? Subscribe to the Data School newsletter to find out when my new course launches!


Planet Python