Merge pull request #457 from nuest/stencila

Updates for Stencila
pull/467/head
Tim Head 2018-11-09 13:56:23 +01:00 zatwierdzone przez GitHub
commit 58cbe1797d
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: 4AEE18F83AFDEB23
21 zmienionych plików z 599 dodań i 30 usunięć

Wyświetl plik

@ -16,8 +16,8 @@ script:
# cd into tests so CWD being repo2docker does not hide
# possible issues with MANIFEST.in
- pushd tests;
if [ ${REPO_TYPE} == "r" ]; then
travis_wait pytest --cov repo2docker -v ${REPO_TYPE} || exit 1;
if [ ${REPO_TYPE} == "r" ] || [ ${REPO_TYPE} == "stencila-r" ] || [ ${REPO_TYPE} == "stencila-py" ]; then
travis_wait 30 pytest --cov repo2docker -v ${REPO_TYPE} || exit 1;
else
travis_retry pytest --cov repo2docker -v ${REPO_TYPE} || exit 1;
fi;
@ -46,7 +46,8 @@ env:
- REPO_TYPE=base
- REPO_TYPE=conda
- REPO_TYPE=venv
- REPO_TYPE=stencila
- REPO_TYPE=stencila-r
- REPO_TYPE=stencila-py
- REPO_TYPE=julia
- REPO_TYPE=r
- REPO_TYPE=nix

Wyświetl plik

@ -38,8 +38,12 @@ It takes the following steps to determine this:
something that it should be used for. For example, a `BuildPack` that uses `conda` to install
libraries can check for presence of an `environment.yml` file and say 'yes, I can handle this
repository' by returning `True`. Usually buildpacks look for presence of specific files
(`requirements.txt`, `environment.yml`, `install.R`, etc) to determine if they can handle a
repository or not.
(`requirements.txt`, `environment.yml`, `install.R`, `manifest.xml` etc) to determine if they can handle a
repository or not. Buildpacks may also look into specific files to determine specifics of the
required environment, such as the Stencila integration which extracts the required language-specific
executions contexts from an XML file (see base `BuildPack`). More than one buildpack may use such
information, as properties can be inherited (e.g. the R buildpack uses the list of required Stencila
contexts to see if R must be installed).
3. If no `BuildPack` returns true, then repo2docker will use the default `BuildPack` (defined in
`Repo2Docker.default_buildpack` traitlet).

Wyświetl plik

@ -107,6 +107,22 @@ You also need to have a ``runtime.txt`` file that is formatted as
``r-<YYYY>-<MM>-<DD>``, where YYYY-MM-DD is a snapshot of MRAN that will be
used for your R installation.
``manifest.xml`` - Install Stencila
===================================
`Stencila <https://stenci.la/>`_ is an open source office suite for reproducible research.
It is powered by the open file format `Dar <https://github.com/substance/dar>`_.
If your repository contains a Stencila document, repo2docker detects it based on the file ``manifest.xml``.
The required `execution contexts <https://stenci.la/learn/intro.html>` are extracted from a Dar article (i.e.
files named ``*.jats.xml``).
You may also have a ``runtime.txt`` and/or an ``install.R`` to manually configure your R installation.
To see example repositories, visit our
`Stencila with R <https://github.com/binder-examples/stencila-r/>`_ and
`Stencila with Python <https://github.com/binder-examples/stencila-py>`_ examples.
.. _postBuild:
``postBuild`` - Run code after installing the environment

Wyświetl plik

@ -31,10 +31,29 @@ To learn more about URLs in JupyterLab and Jupyter Notebook, visit
RStudio
-------
The RStudio user interface is automatically enabled a configuration file for
R is detected (an R version specified in ``runtime.txt``). If this is detected,
The RStudio user interface is automatically enabled if a configuration file for
R is detected (i.e. an R version specified in ``runtime.txt``). If this is detected,
RStudio will be accessible by appending ``/rstudio`` to the URL, like so:
.. code-block:: none
http(s)://<server:port>/rstudio
Stencila
--------
The Stencila user interface is automatically enabled if a Stencila document (i.e.
a file `manifest.xml`) is detected. Stencila will be accessible by appending
``/stencila`` to the URL, like so:
.. code-block:: none
http(s)://<server:port>/stencila
The editor will open the Stencila document corresponding to the last `manifest.xml`
found in the file tree. If you want to open a different document, you can configure
the path in the URL parameter `archive`:
.. code-block:: none
http(s)://<server:port>/stencila/?archive=other-dir

Wyświetl plik

@ -7,6 +7,7 @@ import re
import logging
import docker
import sys
import xml.etree.ElementTree as ET
TEMPLATE = r"""
FROM buildpack-deps:bionic
@ -257,7 +258,6 @@ class BuildPack:
"""
return {}
@property
def stencila_manifest_dir(self):
"""Find the stencila manifest dir if it exists"""
@ -271,16 +271,57 @@ class BuildPack:
# ${STENCILA_ARCHIVE_DIR}/${STENCILA_ARCHIVE}/manifest.xml
self._stencila_manifest_dir = None
for root, dirs, files in os.walk("."):
if "manifest.xml" in files:
self.log.debug("Found a manifest.xml at %s", root)
self._stencila_manifest_dir = root.split(os.path.sep, 1)[1]
self.log.info(
"Found stencila manifest.xml in %s",
"Using stencila manifest.xml in %s",
self._stencila_manifest_dir,
)
break
return self._stencila_manifest_dir
@property
def stencila_contexts(self):
"""Find the stencila manifest contexts from file path in manifest"""
if hasattr(self, '_stencila_contexts'):
return self._stencila_contexts
# look at the content of the documents in the manifest
# to extract the required execution contexts
self._stencila_contexts = set()
# get paths to the article files from manifest
files = []
if self.stencila_manifest_dir:
manifest = ET.parse(os.path.join(self.stencila_manifest_dir,
'manifest.xml'))
documents = manifest.findall('./documents/document')
files = [os.path.join(self.stencila_manifest_dir, x.get('path'))
for x in documents]
else:
return self._stencila_contexts
for filename in files:
self.log.debug("Extracting contexts from %s", filename)
# extract code languages from file
document = ET.parse(filename)
code_chunks = document.findall('.//code[@specific-use="source"]')
languages = [x.get('language') for x in code_chunks]
self._stencila_contexts.update(languages)
self.log.info(
"Added executions contexts, now have %s",
self._stencila_contexts,
)
break
return self._stencila_contexts
def get_build_scripts(self):
"""
Ordered list of shell script snippets to build the base image.
@ -345,9 +386,9 @@ class BuildPack:
This script is added as the `ENTRYPOINT` to the container.
It is run as a non-root user, and must be executable. Used for performing
run time steps that can not be performed with standard tools. For example
setting environment variables for your repository.
It is run as a non-root user, and must be executable. Used for
performing run time steps that can not be performed with standard
tools. For example setting environment variables for your repository.
The script should be as deterministic as possible - running it twice
should not produce different results.
@ -472,9 +513,9 @@ class BaseImage(BuildPack):
env = []
if self.stencila_manifest_dir:
# manifest_dir is the path containing the manifest.xml
# archive_dir is the directory containing archive directories (one level up)
# default archive is the name of the directory in the archive_dir
# such that
# archive_dir is the directory containing archive directories
# (one level up) default archive is the name of the directory
# in the archive_dir such that
# ${STENCILA_ARCHIVE_DIR}/${STENCILA_ARCHIVE}/manifest.xml
# exists.
@ -518,14 +559,24 @@ class BaseImage(BuildPack):
))
except FileNotFoundError:
pass
if 'py' in self.stencila_contexts:
assemble_scripts.extend(
[
(
"${NB_USER}",
r"""
${KERNEL_PYTHON_PREFIX}/bin/pip install --no-cache https://github.com/stencila/py/archive/f1260796.tar.gz && \
${KERNEL_PYTHON_PREFIX}/bin/python -m stencila register
""",
)
]
)
if self.stencila_manifest_dir:
assemble_scripts.extend(
[
(
"${NB_USER}",
r"""
${KERNEL_PYTHON_PREFIX}/bin/pip install --no-cache https://github.com/stencila/py/archive/f6a245fd.tar.gz && \
${KERNEL_PYTHON_PREFIX}/bin/python -m stencila register && \
${NB_PYTHON_PREFIX}/bin/pip install --no-cache nbstencilaproxy==0.1.1 && \
jupyter serverextension enable --sys-prefix --py nbstencilaproxy && \
jupyter nbextension install --sys-prefix --py nbstencilaproxy && \

Wyświetl plik

@ -19,8 +19,22 @@ class RBuildPack(PythonBuildPack):
date snapshot of https://mran.microsoft.com/timemachine
from which libraries are to be installed.
2. An optional `install.R` file that will be executed at build time,
and can be used for installing packages from both MRAN and GitHub.
2. A `DESCRIPTION` file signaling an R package
3. A Stencila document (*.jats.xml) with R code chunks (i.e. language="r")
If there is no `runtime.txt`, then the MRAN snapshot is set to latest
date that is guaranteed to exist across timezones.
Additional R packages are installed if specified either
- in a file `install.R`, that will be executed at build time,
and can be used for installing packages from both MRAN and GitHub
- as dependencies in a `DESCRIPTION` file
- are needed by a specific tool, for example the package `stencila` is
installed and configured if a Stencila document is given.
The `r-base` package from Ubuntu apt repositories is used to install
R itself, rather than any of the methods from https://cran.r-project.org/.
@ -60,22 +74,22 @@ class RBuildPack(PythonBuildPack):
"""
Check if current repo should be built with the R Build pack
super().detect() is not called in this function - it would return false
unless a `requirements.txt` is present and we do not want to require the
presence of a `requirements.txt` to use R.
Instead we just check if runtime.txt contains a string of the form
`r-<YYYY>-<MM>-<DD>`
super().detect() is not called in this function - it would return
false unless a `requirements.txt` is present and we do not want
to require the presence of a `requirements.txt` to use R.
"""
# If no date is found, then self.checkpoint_date will be False
# Otherwise, it'll be a date object, which will evaluate to True
if self.checkpoint_date:
return True
description_R = 'DESCRIPTION'
if not os.path.exists('binder') and os.path.exists(description_R):
if ((not os.path.exists('binder') and os.path.exists(description_R))
or 'r' in self.stencila_contexts):
if not self.checkpoint_date:
# no R snapshot date set through runtime.txt
# set the R runtime to the latest date that is guaranteed to be on MRAN across timezones
# set the R runtime to the latest date that is guaranteed to
# be on MRAN across timezones
self._checkpoint_date = datetime.date.today() - datetime.timedelta(days=2)
self._runtime = "r-{}".format(str(self._checkpoint_date))
return True
@ -128,11 +142,14 @@ class RBuildPack(PythonBuildPack):
This sets up:
- A directory owned by non-root in ${R_LIBS_USER} for installing R packages into
- A directory owned by non-root in ${R_LIBS_USER}
for installing R packages into
- RStudio
- R's devtools package, at a particular frozen version (determined by MRAN)
- R's devtools package, at a particular frozen version
(determined by MRAN)
- IRKernel
- nbrsessionproxy (to access RStudio via Jupyter Notebook)
- stencila R package (if Stencila document with R code chunks detected)
"""
rstudio_url = 'https://download2.rstudio.org/rstudio-server-1.1.419-amd64.deb'
# This is MD5, because that is what RStudio download page provides!
@ -148,7 +165,7 @@ class RBuildPack(PythonBuildPack):
# IRKernel version - specified as a tag in the IRKernel repository
irkernel_version = '0.8.11'
return super().get_build_scripts() + [
scripts = [
(
"root",
r"""
@ -226,6 +243,21 @@ class RBuildPack(PythonBuildPack):
),
]
if "r" in self.stencila_contexts:
scripts += [
(
"${NB_USER}",
# Install and register stencila library
r"""
R --quiet -e "source('https://bioconductor.org/biocLite.R'); biocLite('graph')" && \
R --quiet -e "devtools::install_github('stencila/r', ref = '361bbf560f3f0561a8612349bca66cd8978f4f24')" && \
R --quiet -e "stencila::register()"
"""
),
]
return super().get_build_scripts() + scripts
def get_assemble_scripts(self):
"""
Return series of build-steps specific to this repository

Wyświetl plik

@ -0,0 +1,6 @@
<dar>
<documents>
<document id="py.ipynb" name="py.ipynb" type="article" path="py.ipynb.jats.xml" src="py.ipynb" />
</documents>
<assets/>
</dar>

Wyświetl plik

@ -0,0 +1,212 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1d3 20150301//EN" "JATS-archivearticle1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<article-meta>
<title-group>
<article-title>Jupyter and Stencila</article-title>
</title-group>
<contrib-group content-type="author">
<contrib contrib-type="person">
<name>
<surname>Pawlik</surname>
<given-names>Aleksandra</given-names>
</name>
</contrib>
</contrib-group>
<abstract>
<p>An example of a Jupyter notebook converted into a JATS document for editing in Stencila.</p>
</abstract>
</article-meta>
</front>
<body>
<sec id="introduction-1">
<title>Introduction</title>
<p>Jupyter notebooks (<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-3">3</xref>) are one of the most popular platforms for doing reproducible research. Stencila supports importing of Jupyter Notebook <monospace>.ipynb</monospace> files. This allows you to work with collegues to refine a document for final publication while still retaining the code cells, and thus reprodubility of your the work. In the future we also plan to support exporting to <monospace>.ipynb</monospace> files.</p>
</sec>
<sec id="markdown-cells-1">
<title>Markdown cells</title>
<p>Most standard Markdown should be supported by the importer including inline <monospace>code</monospace>, headings etc (although the Stencila user interface do not currently support rendering of some elements e.g.&#xA0;math and lists).</p>
</sec>
<sec id="code-cells-1">
<title>Code cells</title>
<p>Code cells in notebooks are imported without loss. Stencila&#x2019;s user interface currently differs from Jupyter in that code cells are executed on update while you are typing. This produces a very reactive user experience but is inappropriate for more compute intensive, longer running code cells. We are currently working on improving this to allowing users to decide to execute cells explicitly (e.g.&#xA0;using <monospace>Ctrl+Enter</monospace>).</p>
<code specific-use="cell"><named-content><alternatives>
<code specific-use="source" language="py" executable="yes">import sys
import time
&apos;Hello this is Python %s.%s and it is %s&apos; % (sys.version_info[0], sys.version_info[1], time.strftime(&apos;%c&apos;))</code>
<code specific-use="output" language="json">{}</code>
</alternatives>
</named-content>
</code>
<p>Stencila also support Jupyter code cells that produce plots. The cell below produces a simple plot based on the example from <ext-link ext-link-type="uri" xlink:href="https://matplotlib.org/examples/shapes_and_collections/scatter_demo.html">the Matplotlib website</ext-link>. Try changing the code below (for example, the variable <monospace>N</monospace>).</p>
<code specific-use="cell"><named-content><alternatives>
<code specific-use="source" language="py" executable="yes">import numpy as np
import matplotlib.pyplot as plt
N = 50
N = min(N, 1000) # Prevent generation of too many numbers :)
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2 # 0 to 15 point radii
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()</code>
<code specific-use="output" language="json">{}</code>
</alternatives>
</named-content>
</code>
<p>We are currently working on supporting <ext-link ext-link-type="uri" xlink:href="http://ipython.readthedocs.io/en/stable/interactive/magics.html">Jupyter&#x2019;s magic commands</ext-link> in Stencila via a bridge to Jupyter kernels.</p>
</sec>
<sec id="metadata-1">
<title>Metadata</title>
<p>To add some metadata about the document (such as authors, title, abstract and so on), In Jupyter, select <monospace>Edit -&gt; Edit Notebook metadata</monospace> from the top menu. Add the title and abstract as JSON strings and authors and organisations metadata as <ext-link ext-link-type="uri" xlink:href="https://www.w3schools.com/js/js_json_arrays.asp">JSON arrays</ext-link>. Author <monospace>affiliation</monospace> identifiers (like <monospace>university-of-earth</monospace> below) must be unique and preferably use only lowercase characters and no spaces.</p>
<p>For example,</p>
<preformat> &quot;authors&quot;: [
{
&quot;given-names&quot;: &quot;Your first name goes here&quot;,
&quot;surname&quot;: &quot;Your last name goes here&quot;,
&quot;email&quot;: &quot;your.email@your-organisation&quot;,
&quot;corresponding&quot;: &quot;yes / no&quot;,
&quot;affiliation&quot;: &quot;university-of-earth&quot;
}
],
&quot;organisations&quot;: [
{
&quot;university-of-earth&quot;: {
&quot;institution&quot;: &quot;Your organisation name&quot;,
&quot;city&quot;: &quot;Your city&quot;,
&quot;country&quot;: &quot;Your country&quot;
}
],
&quot;title&quot;: &quot;Your title goes here&quot;,
&quot;abstract&quot;: &quot;This is a paper about lots of different interesting things&quot;,
</preformat>
</sec>
<sec id="citations-and-references-1">
<title>Citations and references</title>
<p>Stencila supports Pandoc style citations and reference lists within Jupyter notebook Markdown cells. Add a <monospace>bibliography</monospace> entry to the notebook&#x2019;s metadata which points to a file containing your list of references e.g.</p>
<code language="json">&quot;bibliography&quot;: &quot;my-bibliography.bibtex&quot;</code>
<p>Then, within Markdown cells, you can insert citations inside square brackets and separated by semicolons. Each citation is represented using the <monospace>@</monospace> symbol followed by the citation identifier from the bibliography database e.g.</p>
<code language="json">[@perez2015project; @kluyver2016jupyter]</code>
<p>The <ext-link ext-link-type="uri" xlink:href="https://github.com/takluyver/cite2c">cite2c</ext-link> Jupyter extension allows for easier, &#x201C;cite-while-you-write&#x201D; insertion of citations from a Zotero library. We&#x2019;re hoping to support conversion of cite2cstyle citations/references in the <ext-link ext-link-type="uri" xlink:href="https://github.com/stencila/convert/issues/14">future</ext-link>.</p>
</sec>
</body>
<back>
<ref-list>
<ref id="ref-1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Perez</surname>
<given-names>Fernando</given-names>
</name>
<name>
<surname>Granger</surname>
<given-names>Brian E</given-names>
</name>
</person-group>
<article-title>Project jupyter: Computational narratives as the engine of collaborative data science</article-title>
<source>Retrieved September</source>
<year>2015</year>
<volume>11</volume>
<fpage>207</fpage>
</element-citation>
</ref>
<ref id="ref-2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kluyver</surname>
<given-names>Thomas</given-names>
</name>
<name>
<surname>Ragan-Kelley</surname>
<given-names>Benjamin</given-names>
</name>
<name>
<surname>P&#xE9;rez</surname>
<given-names>Fernando</given-names>
</name>
<name>
<surname>Granger</surname>
<given-names>Brian E</given-names>
</name>
<name>
<surname>Bussonnier</surname>
<given-names>Matthias</given-names>
</name>
<name>
<surname>Frederic</surname>
<given-names>Jonathan</given-names>
</name>
<name>
<surname>Kelley</surname>
<given-names>Kyle</given-names>
</name>
<name>
<surname>Hamrick</surname>
<given-names>Jessica B</given-names>
</name>
<name>
<surname>Grout</surname>
<given-names>Jason</given-names>
</name>
<name>
<surname>Corlay</surname>
<given-names>Sylvain</given-names>
</name>
<name>
<surname>Others</surname>
</name>
</person-group>
<article-title>Jupyter notebooks-a publishing format for reproducible computational workflows.</article-title>
<source>ELPUB</source>
<year>2016</year>
<fpage>87</fpage>
</element-citation>
</ref>
<ref id="ref-3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ragan-Kelley</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Perez</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Granger</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kluyver</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Frederic</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bussonnier</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>The jupyter/ipython architecture: A unified view of computational research, from interactive exploration to communication and publication.</article-title>
<source>AGU Fall Meeting Abstracts</source>
<year>2014</year>
</element-citation>
</ref>
</ref-list>
</back>
</article>

Wyświetl plik

@ -2,3 +2,4 @@
jupyter serverextension list 2>&1 | grep nbstencilaproxy
jupyter nbextension list 2>&1 | grep nbstencilaproxy
python3 -c "import stencila"

Wyświetl plik

@ -0,0 +1,21 @@
@article{kluyver2016jupyter,
title={Jupyter Notebooks-a publishing format for reproducible computational workflows.},
author={Kluyver, Thomas and Ragan-Kelley, Benjamin and P{\'e}rez, Fernando and Granger, Brian E and Bussonnier, Matthias and Frederic, Jonathan and Kelley, Kyle and Hamrick, Jessica B and Grout, Jason and Corlay, Sylvain and others},
journal={ELPUB},
pages={87--90},
year={2016}
}
@article{ragan2014jupyter,
title={The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication.},
author={Ragan-Kelley, M and Perez, F and Granger, B and Kluyver, T and Ivanov, P and Frederic, J and Bussonnier, M},
journal={AGU Fall Meeting Abstracts},
year={2014}
}
@article{perez2015project,
title={Project Jupyter: Computational narratives as the engine of collaborative data science},
author={Perez, Fernando and Granger, Brian E},
journal={Retrieved September},
volume={11},
pages={207},
year={2015}
}

File diff suppressed because one or more lines are too long

Wyświetl plik

@ -0,0 +1,5 @@
#!/bin/sh
jupyter serverextension list 2>&1 | grep nbstencilaproxy
jupyter nbextension list 2>&1 | grep nbstencilaproxy
python3 -c "import stencila" 2>&1 | grep ModuleNotFoundError

Wyświetl plik

@ -0,0 +1,6 @@
#!/bin/sh
jupyter serverextension list 2>&1 | grep nbstencilaproxy
jupyter nbextension list 2>&1 | grep nbstencilaproxy
python3 -c "import stencila" 2>&1 | grep ModuleNotFoundError
R -e "library('stencila');"