Thanks @russss!
* Add register_output_renderer hook
This changeset refactors out the JSON renderer and then adds a hook and
dispatcher system to allow custom output renderers to be registered.
The CSV output renderer is untouched because supporting streaming
renderers through this system would be significantly more complex, and
probably not worthwhile.
We can't simply allow hooks to be called at request time because we need
a list of supported file extensions when the request is being routed in
order to resolve ambiguous database/table names. So, renderers need to
be registered at startup.
I've tried to make this API independent of Sanic's request/response
objects so that this can remain stable during the switch to ASGI. I'm
using dictionaries to keep it simple and to make adding additional
options in the future easy.
Fixes#440
Prior to this commit Datasette would calculate the content hash of every
database and redirect to a URL containing that hash, like so:
https://v0-27.datasette.io/fixtures => https://v0-27.datasette.io/fixtures-dd88475
This assumed that all databases were opened in immutable mode and were not
expected to change.
This will be changing as a result of #419 - so this commit takes the first step
in implementing that change by changing this default behaviour. Datasette will
now only redirect hash-free URLs under two circumstances:
* The new `hash_urls` config option is set to true (it defaults to false).
* The user passes `?_hash=1` in the URL
If you start Datasette with no files, it will connect to :memory: instead.
When starting it with files you can add --memory to also get a :memory: database.
The extra_css_urls and extra_js_urls hooks now take additional optional
parameters.
Also refactored them out of the Datasette class and into RenderMixin.
Plus improved plugin documentation to explicitly list parameters.
This change introduces a new plugin hook, publish_subcommand, which can be
used to implement new subcommands for the "datasette publish" command family.
I've used this new hook to refactor out the "publish now" and "publish heroku"
implementations into separate modules. I've also added unit tests for these
two publishers, mocking the subprocess.call and subprocess.check_output
functions.
As part of this, I introduced a mechanism for loading default plugins. These
are defined in the new "default_plugins" list inside datasette/app.py
Closes#217 (Plugin support for datasette publish)
Closes#348 (Unit tests for "datasette publish")
Refs #14, #59, #102, #103, #146, #236, #347
I saw this error:
sanic.exceptions.RequestTimeout: Request Timeout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/sanic/handlers.py", line 82, in response
response = handler(request=request, exception=exception)
File "/Users/simonw/Dropbox/Development/datasette/datasette/app.py", line 512, in on_exception
if request.path.split("?")[0].endswith(".json"):
AttributeError: 'NoneType' object has no attribute 'path'
Strangely "if request and request.path..." did not work here, because the
Sanic Request class extends builtins.dict and hence evaluates to False if it
has no headers.
* table.csv?_stream=1 to download all rows - refs #266
This option causes Datasette to serve ALL rows in the table, by internally
following the _next= pagination links and serving everything out as a stream.
Also added new config option, allow_csv_stream, which can be used to disable
this feature.
* New config option max_csv_mb limiting size of CSV export
This is a relatively obscure new command-line argument that helps solve the
problem of showing accurate version information in deployed instances of
Datasette even if they were deployed directly from source code.
You can pass --version-note to datasette publish and package and it will then
in turn be passed to datasette when it starts:
datasette --version-note=hello fixtures.db
Now if you visit /-/versions.json you will see this:
{
"datasette": {
"note": "hello",
"version": "0+unknown"
},
"python": {
"full": "3.6.5 (default, Jun 6 2018, 19:19:24) \n[GCC 6.3.0 20170516]",
"version": "3.6.5"
},
...
}
I plan to use this in some Travis CI configuration, refs #313
Tables and custom SQL query results can now be exported as CSV.
The easiest way to do this is to use the .csv extension, e.g.
/test_tables/facet_cities.csv
By default this is served as Content-Type: text/plain so you can see it in
your browser. If you want to download the file (using text/csv and with an
appropriate Content-Disposition: attachment header) you can do so like this:
/test_tables/facet_cities.csv?_dl=1
We link to the CSV and downloadable CSV URLs from the table and query pages.
The links use ?_size=max and so by default will return 1,000 rows.
Also fixes#303 - table names ending in .json or .csv are now detected and
URLs are generated that look like this instead:
/test_tables/table%2Fwith%2Fslashes.csv?_format=csv
The ?_format= option is available for everything else too, but we link to the
.csv / .json versions in most cases because they are aesthetically pleasing.
Removed the --page_size= argument to datasette serve in favour of:
datasette serve --config default_page_size:50 mydb.db
Added new help section:
$ datasette --help-config
Config options:
default_page_size Default page size for the table view
(default=100)
max_returned_rows Maximum rows that can be returned from a table
or custom query (default=1000)
sql_time_limit_ms Time limit for a SQL query in milliseconds
(default=1000)
default_facet_size Number of values to return for requested facets
(default=30)
facet_time_limit_ms Time limit for calculating a requested facet
(default=200)
facet_suggest_time_limit_ms Time limit for calculating a suggested facet
(default=50)
Replaced the --max_returned_rows and --sql_time_limit_ms options to
"datasette serve" with a new --limit option, which supports a larger
list of limits.
Example usage:
datasette serve --limit max_returned_rows:1000 \
--limit sql_time_limit_ms:2500 \
--limit default_facet_size:50 \
--limit facet_time_limit_ms:1000 \
--limit facet_suggest_time_limit_ms:500
New docs: https://datasette.readthedocs.io/en/latest/limits.htmlCloses#270Closes#264
* Default is now ?_shape=arrays (renamed from lists)
* New ?_shape=array returns an array of objects as the root object
* Changed ?_shape=object to return the object as the root
* Updated docs
When there's a simple (single-column) primary key, it looks weird to
duplicate it in the link column.
This change removes the second PK column and treats the link column as
if it were the PK column from a header/sorting perspective.
* New --plugins-dir=plugins/ option
New option causing Datasette to load and evaluate all of the Python files in
the specified directory and register any plugins that are defined in those
files.
This new option is available for the following commands:
datasette serve mydb.db --plugins-dir=plugins/
datasette publish now/heroku mydb.db --plugins-dir=plugins/
datasette package mydb.db --plugins-dir=plugins/
* Unit tests for --plugins-dir=plugins/
Closes#211
Uses https://pluggy.readthedocs.io/ originally created for the py.test project
We're starting with two plugin hooks:
prepare_connection(conn)
This is called when a new SQLite connection is created. It can be used to register custom SQL functions.
prepare_jinja2_environment(env)
This is called with the Jinja2 environment. It can be used to register custom template tags and filters.
An example plugin which uses these two hooks can be found at https://github.com/simonw/datasette-plugin-demos or installed using `pip install datasette-plugin-demos`
Refs #14
This also stops it filling up the logs. This happens for HEAD requests
at the moment - which perhaps should be handled better, but that's a
different issue.
This renders unlabeled FKs as simple links. I can't see why this would
cause any major problems.
Also includes bonus fixes for two minor issues:
* In foreign key link hrefs the primary key was escaped using HTML
escaping rather than URL escaping. This broke some non-integer PKs.
* Print tracebacks to console when handling 500 errors.
This fixes `ERROR: conn=<sqlite3.Connection object at 0x10bbb9f10>, sql
= 'select ', params = {'id': '1'}` caused by an invalid query when
loading incoming FKs.
The error was ignored due to async but it still got printed to the
console.
We now display sort options as a select box plus a descending checkbox, which
means you can apply sort orders even in portrait mode on a mobile phone where
the column headers are hidden.
Closes#199
You can now explicitly set which columns in a table can be used for sorting
using the _sort and _sort_desc arguments using metadata.json:
{
"databases": {
"database1": {
"tables": {
"example_table": {
"sortable_columns": [
"height",
"weight"
]
}
}
}
}
}
Refs #189
Verifies that they match an existing column, and only one or the other option
is provided - refs #189
Eses a new DatasetteError exception that closes#193
New _shape= parameter replacing old .jsono extension
Now instead of this:
/database/table.jsono
We use the _shape parameter like this:
/database/table.json?_shape=objects
Also introduced a new _shape called 'object' which looks like this:
/database/table.json?_shape=object
Returning an object for the rows key:
...
"rows": {
"pk1": {
...
},
"pk2": {
...
}
}
Refs #122
If you set the source_url/license_url/source/license fields in your root
metadata those values will now be inherited all the way down to the database
and table templates.
The title/description are NOT inherited.
Also added unit tests for the HTML generated by the metadata.
Refs #185
Also added support for descriptions and HTML descriptions.
Here's an example metadata.json file illustrating custom per-database and per-
table metadata:
{
"title": "Overall datasette title",
"description_html": "This is a <em>description with HTML</em>.",
"databases": {
"db1": {
"title": "First database",
"description": "This is a string description & has no HTML",
"license_url": "http://example.com/",
"license": "The example license",
"queries": {
"canned_query": "select * from table1 limit 3;"
},
"tables": {
"table1": {
"title": "Custom title for table1",
"description": "Tables can have descriptions too",
"source": "This has a custom source",
"source_url": "http://example.com/"
}
}
}
}
}
Closes#165, Refs #164
Named canned queries can now be defined in metadata.json like this:
{
"databases": {
"timezones": {
"queries": {
"timezone_for_point": "select tzid from timezones ..."
}
}
}
}
These will be shown in a new "Queries" section beneath "Views" on the database page.
As part of this, I refactored the logic for the database index page. It used
to combine the functionality for listing available tables and the
functionality for executing custom SQL queries in a single template and view.
I have split that template out into database.html and query.html and reworked
the view to more clearly separate the custom SQL executing code.
Refs #20
You can now tell Datasette to serve static files from a specific location at a
specific mountpoint.
For example:
datasette serve mydb.db --static extra-css:/tmp/static/css
Now if you visit this URL:
http://localhost:8001/extra-css/blah.css
The following file will be served:
/tmp/static/css/blah.css
Refs #160
It is now possible to over-ride templates on a per-database / per-row or per-
table basis.
When you access e.g. /mydatabase/mytable Datasette will look for the following:
- table-mydatabase-mytable.html
- table.html
If you provided a --template-dir argument to datasette serve it will look in
that directory first.
The lookup rules are as follows:
Index page (/):
index.html
Database page (/mydatabase):
database-mydatabase.html
database.html
Table page (/mydatabase/mytable):
table-mydatabase-mytable.html
table.html
Row page (/mydatabase/mytable/id):
row-mydatabase-mytable.html
row.html
If a table name has spaces or other unexpected characters in it, the template
filename will follow the same rules as our custom <body> CSS classes
introduced in 8ab3a169d4 - for example, a table called "Food Trucks"
will attempt to load the following templates:
table-mydatabase-Food-Trucks-399138.html
table.html
It is possible to extend the default templates using Jinja template
inheritance. If you want to customize EVERY row template with some additional
content you can do so by creating a row.html template like this:
{% extends "default:row.html" %}
{% block content %}
<h1>EXTRA HTML AT THE TOP OF THE CONTENT BLOCK</h1>
<p>This line renders the original block:</p>
{{ super() }}
{% endblock %}
Closes#12, refs #153
You can now pass an additional argument specifying a directory to look for
custom templates in.
Datasette will fall back on the default templates if a template is not
found in that directory.
Refs #12, #153
Refs #153
Every template now gets CSS classes in the body designed to support custom
styling.
The index template (the top level page at /) gets this:
<body class="index">
The database template (/dbname/) gets this:
<body class="db db-dbname">
The table template (/dbname/tablename) gets:
<body class="table db-dbname table-tablename">
The row template (/dbname/tablename/rowid) gets:
<body class="row db-dbname table-tablename">
The db-x and table-x classes use the database or table names themselves IF
they are valid CSS identifiers. If they aren't, we strip any invalid
characters out and append a 6 character md5 digest of the original name, in
order to ensure that multiple tables which resolve to the same stripped
character version still have different CSS classes.
Some examples (extracted from the unit tests):
"simple" => "simple"
"MixedCase" => "MixedCase"
"-no-leading-hyphens" => "no-leading-hyphens-65bea6"
"_no-leading-underscores" => "no-leading-underscores-b921bc"
"no spaces" => "no-spaces-7088d7"
"-" => "336d5e"
"no $ characters" => "no--characters-59e024"
A mechanism in the metadata.json format for adding custom CSS and JS urls.
Create a metadata.json file that looks like this:
{
"extra_css_urls": [
"https://simonwillison.net/static/css/all.bf8cd891642c.css"
],
"extra_js_urls": [
"https://code.jquery.com/jquery-3.2.1.slim.min.js"
]
}
Then start datasette like this:
datasette mydb.db --metadata=metadata.json
The CSS and JavaScript files will be linked in the <head> of every page.
You can also specify a SRI (subresource integrity hash) for these assets:
{
"extra_css_urls": [
{
"url": "https://simonwillison.net/static/css/all.bf8cd891642c.css",
"sri": "sha384-9qIZekWUyjCyDIf2YK1FRoKiPJq4PHt6tp/ulnuuyRBvazd0hG7pWbE99zvwSznI"
}
],
"extra_js_urls": [
{
"url": "https://code.jquery.com/jquery-3.2.1.slim.min.js",
"sri": "sha256-k2WSCIexGzOj3Euiig+TlR8gA0EmPjuc79OEeY5L45g="
}
]
}
Modern browsers will only execute the stylsheet or JavaScript if the SRI hash
matches the content served. You can generate hashes using www.srihash.org
Rows page for rows that linked to the same table in more
than one columns were display incorrectly. Fixed that and added a test.
Also introduced /db/table/row-pk.json?_extras=foreign_key_tables
This is used by the new unit test, but is the first example of a new
?_extras=comma-separated-list pattern I am introducing.
This:
?_filter_column_1=name&_filter_op_1=contains&_filter_value_1=hello
&_filter_column_2=age&_filter_op_2=gte&_filter_value_2=12
Now redirects to this:
?name__contains=hello&age__gte=12
This is needed for the filter editing interface, refs #86
URL shortcut for counting rows grouped by one or more columns.
?_group_count=column1&_group_count=column2 works as well.
SQL generated looks like this:
select "qSpecies", count(*) as "count"
from Street_Tree_List
group by "qSpecies"
order by "count" desc limit 100
Or for two columns like this:
select "qSpecies", "qSiteInfo", count(*) as "count"
from Street_Tree_List
group by "qSpecies", "qSiteInfo"
order by "count" desc limit 100
Refs #44
Still todo: clean up code a bunch (it currently fakes being a 'view'), get
foreign key expansion working.
if filter_op contains a __ the value is set to the right hand side.
e.g.
?_filter_column=col&_filter_op=isnull__1&_filter_value=x
Redirects to:
?col__isnull=1
Refs #86
Part of implementing the filters UI (refs #86) - the following:
/trees/Trees?_filter_column=SiteOrder&_filter_op=gt&_filter_value=2
Now redirects to this;
/trees/Trees?SiteOrder__gt=2
If a table has foreign key columns, and those foreign key tables have
label_columns, the TableView will now query those other tables for the
corresponding values and display those values as links in the corresponding
table cells.
label_columns are currently detected by the inspect() function, which looks
for any table that has just two columns - an ID column and one other - and
sets the label_column to be that second non-ID column.
Added a unit test for the sql_time_limit_ms option.
To test this, I needed to add a custom SQLite sleep() function. I've added a
simple mechanism to the Datasette class for registering custom functions.
I also had to modify the sqlite_timelimit() function. It makes use of a magic
value, N, which is the number of SQLite virtual machine instructions that
should execute in between calls to my termination decision function.
The value of N was not finely grained enough for my test to work - so I've
added logic that says that if the time limit is less than 50ms, N is set to 1.
This got the tests working.
Refs #95
The serve command now accepts --sql_time_limit_ms for customizing the SQL time
limit.
The publish and package commands now accept --extra-options which can be used
to specify additional options to be passed to the datasite serve command when
it executes inside the rusulting Docker containers.
If someone executes 'select * from table' against a table with a million rows
in it, we could run into problems: just serializing that much data as JSON is
likely to lock up the server.
Solution: we now have a hard limit on the maximum number of rows that can be
returned by a query. If that limit is exceeded, the server will return a
`"truncated": true` field in the JSON.
This limit can be optionally controlled by the new `--max_returned_rows`
option. Setting that option to 0 disables the limit entirely.
Closes#69
If provided, the --metadata option is the path to a JSON file containing
metadata that should be displayed alongside the dataset.
datasette /tmp/fivethirtyeight.db --metadata /tmp/metadata.json
Currently that metadata format looks like this:
{
"title": "Five Thirty Eight",
"license": "CC Attribution 4.0 License",
"license_url": "http://creativecommons.org/licenses/by/4.0/",
"source": "fivethirtyeight/data on GitHub",
"source_url": "https://github.com/fivethirtyeight/data"
}
If provided, this will be used by the index template and to populate the
common footer.
The publish command also accepts this argument, and will package any provided
metadata up and include it with the resulting Docker container.
datasette publish --metadata /tmp/metadata.json /tmp/fivethirtyeight.db
Closes#68
Building metadata is now optional. If you want to do it, do this:
datasette build *.db --metadata=metadata.json
Then when you run the server you can tell it to read from metadata:
datasette serve *.db --metadata=metadata.json
The Dockerfile generated by datasette publish now uses this mechanism.
Closes#60