For large tables, counting the number of rows in the table can take a
significant amount of time. Instead, where an inspect-file is provided
for an immutable database, look up the row-count for a plain count(*).
Thanks, @kevindkeogh
Queries with reserved words or characters according to the SQLite
FTS5 query language could cause errors.
Queries are now escaped like so:
dog cat => "dog" "cat"
The index page now only shows row counts for immutable databases OR for
databases with less than 30 tables provided it could get a count for
each of those tables in less than 10ms.
Closes#467, Refs #460
If we have less than 5 tables we now also show one or more views in the
summary on the homepage.
Also corrected the logic for the row counts - we now count hidden and
visible tables separately.
Closes#373, Refs #460
I've run the black code formatting tool against everything:
black tests datasette setup.py
I also added a new unit test, in tests/test_black.py, which will fail if the code does not
conform to black's exacting standards.
This unit test only runs on Python 3.6 or higher, because black itself doesn't run on 3.5.
Datasette previously only supported one type of faceting: exact column value counting.
With this change, faceting logic is extracted out into one or more separate classes which can implement other patterns of faceting - this is discussed in #427, but potential upcoming facet types include facet-by-date, facet-by-JSON-array, facet-by-many-2-many and more.
A new plugin hook, register_facet_classes, can be used by plugins to add in additional facet classes.
Each class must implement two methods: suggest(), which scans columns in the table to decide if they might be worth suggesting for faceting, and facet_results(), which executes the facet operation and returns results ready to be displayed in the UI.
Also introduced a mechanism whereby table counts are calculated against a time limit
but immutable databases have their table counts calculated on server startup.
Prior to this commit Datasette would calculate the content hash of every
database and redirect to a URL containing that hash, like so:
https://v0-27.datasette.io/fixtures => https://v0-27.datasette.io/fixtures-dd88475
This assumed that all databases were opened in immutable mode and were not
expected to change.
This will be changing as a result of #419 - so this commit takes the first step
in implementing that change by changing this default behaviour. Datasette will
now only redirect hash-free URLs under two circumstances:
* The new `hash_urls` config option is set to true (it defaults to false).
* The user passes `?_hash=1` in the URL
If you start Datasette with no files, it will connect to :memory: instead.
When starting it with files you can add --memory to also get a :memory: database.
* table.csv?_stream=1 to download all rows - refs #266
This option causes Datasette to serve ALL rows in the table, by internally
following the _next= pagination links and serving everything out as a stream.
Also added new config option, allow_csv_stream, which can be used to disable
this feature.
* New config option max_csv_mb limiting size of CSV export
These new querystring arguments can be used to request expanded foreign keys
in both JSON and CSV formats.
?_labels=on turns on expansions for ALL foreign key columns
?_label=COLUMN1&_label=COLUMN2 can be used to pick specific columns to expand
e.g. `Street_Tree_List.json?_label=qSpecies&_label=qLegalStatus`
{
"rowid": 233,
"TreeID": 121240,
"qLegalStatus": {
"value" 2,
"label": "Private"
}
"qSpecies": {
"value": 16,
"label": "Sycamore"
}
"qAddress": "91 Commonwealth Ave",
...
}
The labels option also works for the HTML and CSV views.
HTML defaults to `?_labels=on`, so if you pass `?_labels=off` you can disable
foreign key expansion entirely - or you can use `?_label=COLUMN` to request
just specific columns.
If you expand labels on CSV you get additional columns in the output:
`/Street_Tree_List.csv?_label=qLegalStatus`
rowid,TreeID,qLegalStatus,qLegalStatus_label...
1,141565,1,Permitted Site...
2,232565,2,Undocumented...
I also refactored the existing foreign key expansion code.
Closes#233. Refs #266.
Tables and custom SQL query results can now be exported as CSV.
The easiest way to do this is to use the .csv extension, e.g.
/test_tables/facet_cities.csv
By default this is served as Content-Type: text/plain so you can see it in
your browser. If you want to download the file (using text/csv and with an
appropriate Content-Disposition: attachment header) you can do so like this:
/test_tables/facet_cities.csv?_dl=1
We link to the CSV and downloadable CSV URLs from the table and query pages.
The links use ?_size=max and so by default will return 1,000 rows.
Also fixes#303 - table names ending in .json or .csv are now detected and
URLs are generated that look like this instead:
/test_tables/table%2Fwith%2Fslashes.csv?_format=csv
The ?_format= option is available for everything else too, but we link to the
.csv / .json versions in most cases because they are aesthetically pleasing.
https://github.com/pytest-dev/pytest/issues/1875 made it impossible to declare
a function as a fixture multiple times, which we were doing across different
modules. The fix was to move our @pytest.fixture calls into decorators in the
tests/fixtures.py module.
Removed the --page_size= argument to datasette serve in favour of:
datasette serve --config default_page_size:50 mydb.db
Added new help section:
$ datasette --help-config
Config options:
default_page_size Default page size for the table view
(default=100)
max_returned_rows Maximum rows that can be returned from a table
or custom query (default=1000)
sql_time_limit_ms Time limit for a SQL query in milliseconds
(default=1000)
default_facet_size Number of values to return for requested facets
(default=30)
facet_time_limit_ms Time limit for calculating a requested facet
(default=200)
facet_suggest_time_limit_ms Time limit for calculating a suggested facet
(default=50)
Every now and then a test will fail in Travis CI on Python 3.5 because it hit
the default 20ms SQL time limit.
Test fixtures now default to a 200ms time limit, and we only use the 20ms time
limit for the specific test that tests query interruption. This should make
our tests on Python 3.5 in Travis much more stable.
* Default is now ?_shape=arrays (renamed from lists)
* New ?_shape=array returns an array of objects as the root object
* Changed ?_shape=object to return the object as the root
* Updated docs
* New --plugins-dir=plugins/ option
New option causing Datasette to load and evaluate all of the Python files in
the specified directory and register any plugins that are defined in those
files.
This new option is available for the following commands:
datasette serve mydb.db --plugins-dir=plugins/
datasette publish now/heroku mydb.db --plugins-dir=plugins/
datasette package mydb.db --plugins-dir=plugins/
* Unit tests for --plugins-dir=plugins/
Closes#211
You can now explicitly set which columns in a table can be used for sorting
using the _sort and _sort_desc arguments using metadata.json:
{
"databases": {
"database1": {
"tables": {
"example_table": {
"sortable_columns": [
"height",
"weight"
]
}
}
}
}
}
Refs #189
Verifies that they match an existing column, and only one or the other option
is provided - refs #189
Eses a new DatasetteError exception that closes#193
New _shape= parameter replacing old .jsono extension
Now instead of this:
/database/table.jsono
We use the _shape parameter like this:
/database/table.json?_shape=objects
Also introduced a new _shape called 'object' which looks like this:
/database/table.json?_shape=object
Returning an object for the rows key:
...
"rows": {
"pk1": {
...
},
"pk2": {
...
}
}
Refs #122