You can now call arbitrary SQL like this:
/flights?sql=select%20*%20from%20airports%20where%20country%20like%20:c&c=iceland
Unescaped, those querystring params look like this:
sql = select * from airports where country like :c
c = iceland
So SQL can be constructed with named parameters embedded in it, which will
then be read from the querystring and correctly escaped.
This means we can aggressively filter the SQL parameter for potentially
dangerous syntax. For the moment we enforce that it starts with a SELECT
statement and we ban the sequence "pragma" from it entirely.
If you need to use pragma in a query, you can use the new named parameter
mechanism.
Fixes#39
SQLite operations are blocking, but we're running everything in Sanic, an
asyncio web framework, so blocking operations are bad - a long-running DB
operation could hold up the entire server.
Instead, I've moved all SQLite operations into threads. These are managed by a
concurrent.futures ThreadPoolExecutor. This means I can run up to X queries in
parallel, and I can continue to queue up additional incoming HTTP traffic
while the threadpool is busy.
Each thread is responsible for managing its own SQLite connections - one per
database. These are cached in a threadlocal.
Since we are working with immutable, read-only SQLite databases it should be
safe to share SQLite objects across threads. On this assumption I'm using the
check_same_thread=False option. Opening a database connection looks like this:
conn = sqlite3.connect(
'file:filename.db?immutable=1',
uri=True,
check_same_thread=False,
)
The following articles were helpful in figuring this out:
* https://pymotw.com/3/asyncio/executors.html
* https://marlinux.wordpress.com/2017/05/19/python-3-6-asyncio-sqlalchemy/Closes#45. Refs #38.
I now call a factory function to construct the Sanic app:
app = app_factory(files)
This allows me to pass additional arguments to it, e.g. the files to serve.
Also refactored my class-based views to accept jinja as an argument, e.g:
app.add_route(
TableView.as_view(jinja),
'/<db_name:[^/]+>/<table:[^/]+?><as_json:(.jsono?)?$>'
)
I'm using click, and click recommends using a setup.py - so I've added one of
those. I also refactored code into a new datasite package. It's not quite
deploying to now properly at the moment though - I seem to have messed up the
path handling a bit.
Also snuck in a new template for the "Row" view.
Refs #40
Using the (undocumented in the Python docs) fact that if you return 1 from a
set_progress_handler callback, SQLite will cancel the current query.
Closes#35
Expecting SQLite columns to all be valid utf8 doesn't work, because we are
deailing with all kinds of databases. Instead, we now use the 'replace'
encoding mode to replace any non-UTF8 characters with a [X] character.
Since the URL now includes a hash of the database, we can return a Cache-
Control: max-age=31536000 header for every response.
The exception is our 302 redirects. These we now serve with a Link: header
that tells any HTTP/2 server-push aware fronting proxies (such as Cloudfront)
to push the target of the redirect.
Closes#4
This will be run at compile time - the goal is to generate a build-
metadata.json file with a bunch of useful facts about the databases that could
be expensive to generate at run-time.
Example metadata:
{
"flights": {
"file": "flights.db",
"tables": {
"airlines": 6048,
"airports": 8107,
"routes": 67663
},
"hash": "07d1283e07786b1235bb7041ea445ae103d1571565580a29eab0203c555725fd"
}
So far we have a sha256 hash of the database file itself, plus a row count for
each table.
Fixes#11