New performance documentation, closes #421

pull/479/head
Simon Willison 2019-05-18 22:31:03 -07:00
rodzic db9dbfb816
commit 81ba98a509
3 zmienionych plików z 82 dodań i 2 usunięć

Wyświetl plik

@ -29,7 +29,7 @@ If this time limit is too short for you, you can customize it using the ``sql_ti
datasette mydatabase.db --config sql_time_limit_ms:3500
You can optionally set a lower time limit for an individual query using the ``?_timelimit=100`` query string argument::
You can optionally set a lower time limit for an individual query using the ``?_timelimit=100`` querystring argument::
/my-database/my-table?qSpecies=44&_timelimit=100
@ -112,6 +112,8 @@ Enable/disable the ability for users to run custom SQL directly against a databa
datasette mydatabase.db --config allow_sql:off
.. _config_default_cache_ttl:
default_cache_ttl
-----------------
@ -121,6 +123,8 @@ Default HTTP caching max-age header in seconds, used for ``Cache-Control: max-ag
datasette mydatabase.db --config default_cache_ttl:60
.. _config_default_cache_ttl_hashed:
default_cache_ttl_hashed
------------------------
@ -198,7 +202,7 @@ hash_urls
When enabled, this setting causes Datasette to append a content hash of the
database file to the URL path for every table and query within that database.
When combined with far-future expire headers this ensures that queries can be
When combined with far-future expire headers this ensures that queries can be
cached forever, safe in the knowledge that any modifications to the database
itself will result in new, uncachcacheed URL paths.

Wyświetl plik

@ -24,6 +24,7 @@ Contents
publish
json_api
sql_queries
performance
csv_export
facets
full_text_search

Wyświetl plik

@ -0,0 +1,75 @@
.. _performance:
Performance and caching
=======================
Datasette runs on top of SQLite, and SQLite has excellent performance. For small databases almost any query should return in just a few milliseconds, and larger databases (100s of MBs or even GBs of data) should perform extremely well provided your queries make sensible use of database indexes.
That said, there are a number of tricks you can use to improve Datasette's performance.
Immutable mode
--------------
If you can be certain that a SQLite database file will not be changed by another process you can tell Datasette to open that file in *immutable mode*.
Doing so will disable all locking and change detection, which can result in improved query performance.
This also enables further optimizations relating to HTTP caching, described below.
To open a file in immutable mode pass it to the datasette command using the ``-i`` option::
datasette -i data.db
When you open a file in immutable mode like this Datasette will also calculate and cache the row counts for each table in that database when it first starts up, further improving performance.
Using "datasette inspect"
-------------------------
Counting the rows in a table can be a very expensive operation on larger databases. In immutable mode Datasette performs this count only once and caches the results, but this can still cause server startup time to increase by several seconds or more.
If you know that a database is never going to change you can precalculate the table row counts once and store then in a JSON file, then use that file when you later start the server.
To create a JSON file containing the calculated row counts for a database, use the following::
datasette inspect data.db --inspect-file=counts.json
Then later you can start Datasette against the ``counts.json`` file and use it to skip the row counting step and speed up server startup::
datasette -i data.db --inspect-file=counts.json
You need to use the ``-i`` immutable mode agaist the databse file here or the counts from the JSON file will be ignored.
You will rarely need to use this optimization in every-day use, but several of the ``datasette publish`` commands described in :ref:`publishing` use this optimization for better performance when deploying a database file to a hosting provider.
HTTP caching
------------
If your database is immutable and guaranteed not to change, you can gain major performance improvements from Datasette by enabling HTTP caching.
This can work at two different levels. First, it can tell browsers to cache the results of queries and serve future requests from the browser cache.
More significantly, it allows you to run Datasette behind a caching proxy such as `Varnish <https://varnish-cache.org/>`__ or use a cache provided by a hosted service such as `Fastly <https://www.fastly.com/>`__ or `Cloudflare <https://www.cloudflare.com/>`__. This can provide incredible speed-ups since a query only needs to be executed by Datasette the first time it is accessed - all subsequent hits can then be served by the cache.
Using a caching proxy in this way could enable a Datasette-backed visualization to serve thousands of hits a second while running Datasette itself on extremely inexpensive hosting.
Datasette's integration with HTTP caches can be enabled using a combination of configuration options and querystring arguments.
The :ref:`config_default_cache_ttl` setting sets the default HTTP cache TTL for all Datasette pages. This is 5 seconds unless you change it - you can set it to 0 if you wish to disable HTTP caching entirely.
You can also change the cache timeout on a per-request basis using the ``?_ttl=10`` querystring parameter. This can be useful when you are working with the Datasette JSON API - you may decide that a specific query can be cached for a longer time, or maybe you need to set ``?_ttl=0`` for some requests for example if you are running a SQL ``order by random()`` query.
Hashed URL mode
---------------
When you open a database file in immutable mode using the ``-i`` option, Datasette calculates a SHA-256 hash of the contents of that file on startup. This content hash can then optionally be used to create URLs that are guaranteed to change if the contents of the file changes in the future. This results in URLs that can then be cached indefinitely by both browsers and caching proxies - an enormous potential performance optimization.
You can enable these hashed URLs in two ways: using the :ref:`config_hash_urls` configuration setting (which affects all requests to Dataslette) or via the ``?_hash=1`` querystring parameter (which only applies to the current request).
With hashed URLs enabled, any request to e.g. ``/mydatabase/mytable`` will 302 redirect to ``mydatabase-455fe3a/mytable``. The URL containing the hash will be served with a very long cache expire header - configured using :ref:`config_default_cache_ttl_hashed` which defaults to 365 days.
Since these responses are cached for a long time, you may wish to build API clients against the non-hashed version of these URLs. These 302 redirects are served extremely quickly, so this should still be a performant way to work agaist the Datasette API.
If you run Datasette behind an `HTTP/2 server push <https://en.wikipedia.org/wiki/HTTP/2_Server_Push>`__ aware proxy such as Cloudflare Datasette will serve the 302 redirects in such a way that the redirected page will be efficiently "pushed" to the browser as part of the response, without the browser needing to make a second HTTP request to fetch the redirected resource.
.. note::
Prior to Datasette 0.28 hashed URL mode was the default behaviour for Datasette, since all database files were assumed to be immutable and unchanging. From 0.28 onwards the default has been to treat database files as mutable unless explicitly configured otherwise.