Merge pull request #105 from yuvipanda/design

Attempt to add some info about design principles
2017-10-24 11:36:40 -07:00 · 2017-10-24 11:36:40 -07:00 · c718a90ee1
commit c718a90ee1
--- a/docs/source/design.md
+++ b/docs/source/design.md
@ -0,0 +1,71 @@
+# Design
+
+Two primary use cases for `repo2docker` drive most design decisions:
+
+1. Automated image building used by projects like
+   [BinderHub](http://github.com/jupyterhub/binderhub)
+2. Manual image building and running the image from the command line client,
+   `jupyter-repo2docker`, by users interactively on their workstations
+
+We share our guiding design principles here. This is not an exhaustive
+list :)
+
+## Deterministic output
+
+The core of `repo2docker` can be considered a
+[deterministic algorithm](https://en.wikipedia.org/wiki/Deterministic_algorithm).
+When given an input directory which has a particular repository checked out, it 
+deterministically produces a Dockerfile based on the contents of the directory.
+So if we run `repo2docker` on the same directory multiple times, we get the
+exact same Dockerfile output.
+
+This provides a few advantages:
+
+1. Reuse of cached built artifacts based on a repository's identity increases
+   efficiency and reliability. For example, if we had already run `reop2docker`
+   on a git repository at a particular commit hash, we know we can just re-use
+   the old output, since we know it is going to be the same. This provides
+   massive performance & architectural advantages when building additional
+   tools (like BinderHub) on top of `reop2docker`.
+2. We produce Dockerfiles that have as much in common as possible across
+   multiple repositories, enabling better use of the Docker build cache. This
+   also provides massive performance advantages.
+
+## Unix principles "do one thing well"
+
+`repo2docker` should do one thing, and do it well. This one thing is:
+
+> Given a repository, deterministically build a docker image from
+> it.
+
+There's also some convenience code (to run the built image) for users, but
+that's separated out cleanly. This allows easy use by other projects (like
+BinderHub).
+
+There is additional (and very useful) design advice on this in
+the [Art of Unix Programming](http://www.faqs.org/docs/artu/ch01s06.html) which
+is a highly recommended quick read.
+
+## Composability
+
+Although other projects, like
+[s2i](https://github.com/openshift/source-to-image), exist to convert source to
+Docker images, `repo2docker` provides the additional functionality to support
+*composable* environments. We want to easily have an image with
+Python3+Julia+R-3.2 environments, rather than just one single language
+environment. While generally one language environment per container works well,
+in many scientific / datascience computing environments you need multiple
+languages working together to get anything done. So all buildpacks are
+composable, and need to be able to work well with other languages.
+
+## [Pareto principle](https://en.wikipedia.org/wiki/Pareto_principle) (The 80-20 Rule)
+
+Roughly speaking, we want to support 80% of use cases, and provide an escape
+hatch (raw Dockerfiles) for the other 20%. We explicitly want to provide support
+only for the most common use cases - covering every possible use case never ends
+well.
+
+An easy process for getting support for more languages here is to demonstrate
+their value with Dockerfiles that other people can use, and then show that this
+pattern is popular enough to be included inside `reop2docker`. Remember that 'yes'
+is forever (very hard to remove features!), but 'no' is only temporary!
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -10,4 +10,5 @@ Site Contents
 .. toctree::
   :maxdepth: 2

+   design
   samples