Introduction to the Databricks Environment

The Databricks Data Science Ecosystem

Databricks offers a Community Edition of their Data Science ecosystem for running experiments and notebooks. You can sign up for a free account and start running the notebooks for the course.

The Welcome page

As you log in, this is the screen that you see.

Welcome page

You can import a notebook into the workspace using the tab on the left. Use it to import the notebooks for the course.

S3

Import notebooks

Upload a notebook or provide a path to a notebook. Once it has been successfully uploaded, you should see a page such as the one shown below.

Imported notebooks

The Cluster Page

Launch a cluster to run the notebook by selecting the configuration for the image.

Cluster launch page

Once launched, you should see a green circle.

Cluster launch page

You can click on the Clusters tab on the left and check the status of the available clusters.

Cluster launch page

If any libraries were installed during the creation of the cluster, they would show up here.

Cluster launch page

Install a Python library by providing the library name and a version. It is a good idea to specify the version explicitly.

Cluster launch page

The Notebook Interface

Attach a running cluster to the notebook

Cluster launch page

The File dropdown menu

Cluster launch page

The Edit dropdown menu

Cluster launch page

Execute the code cell.

Cluster launch page

Git Integration

Integrate with a Git repository

Cluster launch page

Version control the notebook and save to the remote Git repository.

Cluster launch page