# An introduction to Jupyter - and why I don't like it

2020-11-09

Jupyter Notebooks are hugely popular. In this post I'll give an introduction into what they are, why I would actually not recommend using them, and what I do instead.

I wrote this post for my Python Course

# What is Jupyter?

Currently there are two important things:

1. Jupyter Notebooks, which are a type of "document" which contains executable code, as well as outputs of that code like images, tables, or text and styled text via Markdown, which also supports mathematical equations.
2. Jupyter Lab, which is a browser based IDE.

You can think of a Jupyter notebook like of the ipython console. But instead of typing your code into a terminal, you type it into a code cell in your browser. If you execute it, the results are printed below it and you can then open a new cell to execute further code and still, at any point in time, go back to the above cells and rerun or edit that code easily. If your code creates a plot, that plot will be part of the document too. Here is a screenshot of what this looks like:

It has many more Features, like interactive sliders, and if you use Jupyter Lab to edit them, it also supports other filetypes, and may look like this:

They are, all in all, pretty damn powerful and extremely popular among data scientists and students. If you do want to use them I'd recommend to follow a tutorial: first Google hit for "jupyter tutorial"

# Why I don't like them

I don't want to hate on Jupyter, it's an impressive piece of software that is excellent in what it does; BUT, I just don't think it's a good Idea to do what it does.

Jupyter's main selling point is to combine Code, its results, and text. And there is no point in doing that, except for teaching. Actually for teaching, Jupyter is awesome. However, the most common use case is data analysis. I've used Jupyter Notebooks excessively for nearly two years for the purpose of data analysis and have frequently encountered the following problems:

## Crowded Notebooks

Usually, for me, I found that 70% - 90% of the visible area in a notebook was covered with code, rather than with results. To get to a result, you'd have to scroll a lot. Further more, because of the easy access to code that was typed in earlier, I'd regularly go up, make some changes, rerun a cell, and then continue in the bottom. This would lead to notebooks that will not be able to execute from top to bottom because some cells would rely on something that was changed away in a cell above it. Also, I would copy a lot of code between notebooks, which is a terrible thing to do.

## Copied Notebooks

Sometimes I did an analysis in Jupyter, and at some point, I would want to run that same analysis with a different dataset. What would I do? Obviously I'd copy that notebook, exchange the path in the top, and run it again. (At this point I'd probably realize, that it can't run from top to bottom without crashing, so I'd have to spent quite some time on fixing the notebook). This might happen again. And again. AND THEN you find a bug in that notebook, and have to fix it in ALL of those pesky copies, <sarcasm>oh the pleasure … </sarcasm>

## Git incompatibility

People tend to have notebooks stored in git repositories, or at least I used to. Sometimes git would see a notebook as changed when I just opened it, and scrolled down without actually making changes. Also notebooks contain images, these Will blow up your repository hugely. Also it's a very bad practice to store data in git. Git is for source code.

## You have to edit Stuff in the Browser

Jupyter Lab runs in the browser, this means you have to edit your code in the Browser. I don't like that. I have put a lot of effort into my .vimrc, and I want to use it. I absolutely hate editing Code in notebooks. Yeah, sure, there are plugins that give you some Vim functionality, but that's just additional effort for a worse solution (compared to editing code with Vim)