Introduction to the Databricks Environment¶
The Databricks Data Science Ecosystem¶
Databricks offers a Community Edition of their Data Science ecosystem for running experiments and notebooks. You can sign up for a free account and start running the notebooks for the course.
The Welcome page¶
As you log in, this is the screen that you see.
You can import a notebook into the workspace using the tab on the left. Use it to import the notebooks for the course.
Upload a notebook or provide a path to a notebook. Once it has been successfully uploaded, you should see a page such as the one shown below.
The Cluster Page¶
Launch a cluster to run the notebook by selecting the configuration for the image.
Once launched, you should see a green circle.
You can click on the Clusters tab on the left and check the status of the available clusters.
If any libraries were installed during the creation of the cluster, they would show up here.
Install a Python library by providing the library name and a version. It is a good idea to specify the version explicitly.
The Notebook Interface¶
Attach a running cluster to the notebook
The File dropdown menu
The Edit dropdown menu
Execute the code cell.
Git Integration¶
Integrate with a Git repository
Version control the notebook and save to the remote Git repository.