6.1 KiB
Moonstream architecture
Moonstream consists of:
- A colder data store in which we store large amounts of transactional data and metadata directly from various blockchains.
- A warmer data store in which we store data that streams in very quickly, for example from the Ethereum transaction pool. The data in the warm data store is not stored permanently. All data here is removed after a certain data-specific time-to-live (TTL).
- Crawlers which collect data from different blockchain related sources and insert them into either the slow data store or the fast one.
- The Moonstream API, which allows users to sign up to Moonstream, subscribe to different sources of data in Moonstream, and serve their requests for this data.
- The Moonstream frontend (live) through which users can interact with Moonstream in their browsers.
- Moonstream client libraries through which users can interact with Moonstream from the programming environment of their choice.
This document gives a brief explanation of the role of each of these components and points you to more detailed information about whichever components you are particularly interested in.
It also tries to answer any questions you may have about why certain decisions/trade-offs were made.
Data storage
Fast vs. slow
Blockchains like Ethereum and Solana implement smart contract functionality by recording the state of accounts on the blockchain at every block. This record of state grows over time. Ethereum state already takes hundreds of gigabytes of storage. Solana state is even larger, and they host historical state centrally on a Google BigTable instance.
Moonstream is an open source project, and we intend for people to host Moonstream themselves. We cannot assume that someone hosting Moonstream has tons of cash to spend on high-quality storage (e.g. latest generation SSD). The most cost-effective way to store the large amount of state data (without relying on cloud object storage) is on a magnetic hard disk.
Although this makes storage cheaper, it makes it slower to read and write data from the data store. Since we have some crawlers which collect volatile data, like the data in the Ethereum transaction pool, we also need a fast storage layer that we can store and retrieve data from faster.
This is why we have two different classes of storage in Moonstream.
Slow data store: Postgres
We use a Postgres database as the slow datastore. The code in the db/
directory defines
the schema for this Postgres database as well as migrations that you can use to set up a similar
database yourself.
The db/
directory contains:
- A Python package called
moonstreamdb
which defines the databse schema and can be used as a Python library to interact with the data store. - Alembic migrations which can be used via the
alembic.sh
shell script to run the migrations against a Postgres database server.
The Ethereum blockchain crawler (accessed through the ethcrawler blocks
command)
stores Ethereum state in the slow database.
We also have other crawlers (e.g. the CoinMarketCap crawler) which store address and transaction metadata in the slow database. This is because the slow database is permanent whereas the fast database is assumed to be ephemeral.
Fast data store: Bugout
Since different crawlers store data in the fast data store using different schemas, we use Bugout as our fast data store with no extra assumptions about schema.
Bugout is open source and can be self-hosted as well from the following repositories:
Our Bugout instance also uses a Postgres database as the underlying data store. This Postgres server is provisioned on high-throughput SSD.
The crawlers that use the fast data store write to a single Bugout journal using a write-only token. Each crawler tags the data it writes with the type and any additional schema information.
The API reads from that journal using a read token. Queries are resolved using the tags that the crawlers created.
Crawlers
Many of the Moonstream crawlers are written in Python. These are all packaged together in a single Python
package called mooncrawl
.
Crawlers can be written in any programming language - some programming languages may be more preferable for certain kinds of data. For example, we plan to write our Solana crawlers in Rust because the Solana library support for "Solana programs" (their version of smart contracts) is much better in their native Rust.
The Ethereum transaction pool crawler, for example, is written in Go.
Moonstream API
The Moonstream API is written in Python and uses the FastAPI framework.
API routes are defined in backend/moonstream/api.py
, and that file
is the right entrypoint into understanding the API codebase.
The API uses Bugout for authentication and to manage resources like user subscriptions to different types of data.
It also defines event providers, which are responsible for
retrieving data of each available type (e.g. ethereum_blockchain
, ethereum_txpool
, etc.) from the
fast and/or slow data stores and serving it to Moonstream users.
Frontend
The Moonstream frontend is a React application. It uses the Chakra UI component library and react-query to manage data.
Client libraries
These are still under development. If you would like to build a Moonstream client library for your favorite language, reach out to @zomglings on Discord.
These are the languages we currently have libraries for:
Python
This is a work in progress. Pull request.