kopia lustrzana https://github.com/jupyterhub/repo2docker
110 wiersze
6.5 KiB
Markdown
110 wiersze
6.5 KiB
Markdown
# Architecture of repo2docker
|
|
|
|
This is a living document talking about the architecture of repo2docker
|
|
from various perspectives.
|
|
|
|
## Buildpack
|
|
|
|
The **buildpack** concept comes from [Heroku](https://devcenter.heroku.com/articles/buildpacks)
|
|
and Ruby on Rails' [Convention over Configuration](http://rubyonrails.org/doctrine/#convention-over-configuration)
|
|
doctrine.
|
|
|
|
Instead of the user specifying a complete specification of exactly how they want
|
|
their environment to be, they can focus only on how their environment differs from a conventional
|
|
environment. This means instead of deciding 'should I get Python from Apt or pyenv or ?', user
|
|
can just specify 'I want python-3.6'. Usually, specifying a **runtime** and list of **libraries**
|
|
with explicit **versions** is all that is needed.
|
|
|
|
In repo2docker, a Buildpack does the following things:
|
|
|
|
1. **Detect** if it can handle a given repository
|
|
2. **Build** a base language environment in the docker image
|
|
3. **Copy** the contents of the repository into the docker image
|
|
4. **Assemble** a specific environment in the docker image based on repository contents
|
|
5. **Push** the built docker image to a specific docker registry (optional)
|
|
6. **Run** the build docker image as a docker container (optional)
|
|
|
|
### Detect
|
|
|
|
When given a repository, repo2docker first has to determine which buildpack to use.
|
|
It takes the following steps to determine this:
|
|
|
|
1. Look at the ordered list of `BuildPack` objects listed in `Repo2Docker.buildpacks`
|
|
traitlet. This is populated with a default set of buildpacks in most-specific-to-least-specific
|
|
order. Other applications using this can add / change this using traditional
|
|
[traitlet](http://traitlets.readthedocs.io/en/stable/) configuration mechanisms.
|
|
2. Calls the `detect` method of each `BuildPack` object. This method assumes that the repository
|
|
is present in the current working directory, and should return `True` if the repository is
|
|
something that it should be used for. For example, a `BuildPack` that uses `conda` to install
|
|
libraries can check for presence of an `environment.yml` file and say 'yes, I can handle this
|
|
repository' by returning `True`. Usually buildpacks look for presence of specific files
|
|
(`requirements.txt`, `environment.yml`, `install.R`, `manifest.xml` etc) to determine if they can handle a
|
|
repository or not. Buildpacks may also look into specific files to determine specifics of the
|
|
required environment, such as the Stencila integration which extracts the required language-specific
|
|
executions contexts from an XML file (see base `BuildPack`). More than one buildpack may use such
|
|
information, as properties can be inherited (e.g. the R buildpack uses the list of required Stencila
|
|
contexts to see if R must be installed).
|
|
3. If no `BuildPack` returns true, then repo2docker will use the default `BuildPack` (defined in
|
|
`Repo2Docker.default_buildpack` traitlet).
|
|
|
|
## Build base environment
|
|
|
|
Once a buildpack is chosen, it builds a **base environment** that is mostly the same for various
|
|
repositories built with the same buildpack.
|
|
|
|
For example, in `CondaBuildPack`, the base environment consists of installing [miniconda](https://conda.io/miniconda.html)
|
|
and basic notebook packages (from `repo2docker/buildpacks/conda/environment.yml`). This is going
|
|
to be the same for most repositories built with `CondaBuildPack`, so we want to use
|
|
[docker layer caching](https://thenewstack.io/understanding-the-docker-cache-for-faster-builds/) as
|
|
much as possible for performance reasons. Next time a repository is built with `CondaBuildPack`,
|
|
we can skip straight to the **copy** step (since the base environment docker image *layers* have
|
|
already been built and cached).
|
|
|
|
The `get_build_scripts` and `get_build_script_files` methods are primarily used for this.
|
|
`get_build_scripts` can return arbitrary bash script lines that can be run as different users,
|
|
and `get_build_script_files` is used to copy specific scripts (such as a conda installer) into
|
|
the image to be run as pat of `get_build_scripts`. Code in either has following constraints:
|
|
|
|
1. You can *not* use the contents of repository in them, since this happens before the repository
|
|
is copied into the image. For example, `pip install -r requirements.txt` will not work,
|
|
since there's no `requirements.txt` inside the image at this point. This is an explicit
|
|
design decision, to enable better layer caching.
|
|
2. You *may*, however, read the contents of the repository and modify the scripts emitted based
|
|
on that! For example, in `CondaBuildPack`, if there's Python 2 specified in `environment.yml`,
|
|
a different kind of environment is set up. The reading of the `environment.yml` is performed
|
|
in the BuildPack itself, and not in the scripts returned by `get_build_scripts`. This is fine.
|
|
BuildPack authors should still try to minimize the variants created in this fashion, to
|
|
optimize the build cache.
|
|
|
|
## Copy repository contents
|
|
|
|
The contents of the repository are copied unconditionally into the Docker image, and made
|
|
available for all further commands. This is common to most `BuildPack`s, and the code is in
|
|
the `build` method of the `BuildPack` base class.
|
|
|
|
## Assemble repository environment
|
|
|
|
The **assemble** stage builds the specific environment that is requested by the repository.
|
|
This usually means installing required libraries specified in a format native to the language
|
|
(`requirements.txt`, `environment.yml`, `REQUIRE`, `install.R`, etc).
|
|
|
|
Most of this work is done in `get_assemble_scripts` method. It can return arbitrary bash script
|
|
lines that can be run as different users, and has access to the repository contents (unlike
|
|
`get_build_scripts`). The docker image layers produced by this usually can not be cached,
|
|
so less restrictions apply to this than to `get_build_scripts`.
|
|
|
|
At the end of the assemble step, the docker image is ready to be used in various ways!
|
|
|
|
## Push
|
|
|
|
Optionally, repo2docker can **push** a built image to a [docker registry](https://docs.docker.com/registry/).
|
|
This is done as a convenience only (since you can do the same with a `docker push` after using repo2docker
|
|
only to build), and implemented in `Repo2Docker.push` method. It is only activated if using the
|
|
`--push` commandline flag.
|
|
|
|
## Run
|
|
|
|
Optionally, repo2docker can **run** the built image and allow the user to access the Jupyter Notebook
|
|
running inside by default. This is also done as a convenience only (since you can do the same with `docker run`
|
|
after using repo2docker only to build), and implemented in `Repo2Docker.run`. It is activated by default
|
|
unless the `--no-run` commandline flag is passed. |