3.2 KiB
Design
Two primary use cases for repo2docker
drive most design decisions:
- Automated image building used by projects like BinderHub
- Manual image building and running the image from the command line client,
jupyter-repo2docker
, by users interactively on their workstations
We share our guiding design principles here. This is not an exhaustive list :)
Deterministic output
The core of repo2docker
can be considered a
deterministic algorithm.
When given an input directory which has a particular repository checked out, it
deterministically produces a Dockerfile based on the contents of the directory.
So if we run repo2docker
on the same directory multiple times, we get the
exact same Dockerfile output.
This provides a few advantages:
- Reuse of cached built artifacts based on a repository's identity increases
efficiency and reliability. For example, if we had already run
reop2docker
on a git repository at a particular commit hash, we know we can just re-use the old output, since we know it is going to be the same. This provides massive performance & architectural advantages when building additional tools (like BinderHub) on top ofreop2docker
. - We produce Dockerfiles that have as much in common as possible across multiple repositories, enabling better use of the Docker build cache. This also provides massive performance advantages.
Unix principles "do one thing well"
repo2docker
should do one thing, and do it well. This one thing is:
Given a repository, deterministically build a docker image from it.
There's also some convenience code (to run the built image) for users, but that's separated out cleanly. This allows easy use by other projects (like BinderHub).
There is additional (and very useful) design advice on this in the Art of Unix Programming which is a highly recommended quick read.
Composability
Although other projects, like
s2i, exist to convert source to
Docker images, repo2docker
provides the additional functionality to support
composable environments. We want to easily have an image with
Python3+Julia+R-3.2 environments, rather than just one single language
environment. While generally one language environment per container works well,
in many scientific / datascience computing environments you need multiple
languages working together to get anything done. So all buildpacks are
composable, and need to be able to work well with other languages.
Pareto principle (The 80-20 Rule)
Roughly speaking, we want to support 80% of use cases, and provide an escape hatch (raw Dockerfiles) for the other 20%. We explicitly want to provide support only for the most common use cases - covering every possible use case never ends well.
An easy process for getting support for more languages here is to demonstrate
their value with Dockerfiles that other people can use, and then show that this
pattern is popular enough to be included inside reop2docker
. Remember that 'yes'
is forever (very hard to remove features!), but 'no' is only temporary!