kopia lustrzana https://github.com/bugout-dev/moonstream
Add readme.
rodzic
34e13c1519
commit
8e0ce97989
|
@ -0,0 +1,180 @@
|
|||
# Metadata Crawler Architecture
|
||||
|
||||
## Overview
|
||||
The metadata crawler is designed to fetch and store metadata for NFTs (Non-Fungible Tokens) from various blockchains. It supports both traditional database TokenURI view methods queries and Spire journal-based job configurations, with the ability to handle both v2 and v3 database structures.
|
||||
|
||||
|
||||
## Core Components
|
||||
|
||||
### 1. Update Strategies
|
||||
|
||||
#### Leak-Based Strategy (Legacy v2)
|
||||
- Uses probabilistic approach to determine which tokens to update
|
||||
- Controlled by `max_recrawl` parameter
|
||||
- Suitable for large collections with infrequent updates
|
||||
|
||||
#### SQL-Based Strategy (v3)
|
||||
- Uses SQL queries to determine which tokens need updates
|
||||
- More precise tracking of token updates
|
||||
- Better suited for active collections
|
||||
|
||||
### 2. Database Connections
|
||||
|
||||
The crawler supports multiple database connection strategies:
|
||||
- Default Moonstream database connection
|
||||
- Custom database URI via `--custom-db-uri`
|
||||
- Per-customer instance connections (v3)
|
||||
```json
|
||||
{
|
||||
"customer_id": "...",
|
||||
"instance_id": "...",
|
||||
"blockchain": "ethereum",
|
||||
"v3": true
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Job Configuration
|
||||
Jobs can be configured in two ways:
|
||||
- Through Spire journal entries with tags `#metadata-job #{blockchain}`
|
||||
- Direct database queries (legacy mode) using TokenURI view method
|
||||
Example Spire journal entry:
|
||||
```json
|
||||
{
|
||||
"type": "metadata-job",
|
||||
"query_api": {
|
||||
"name": "new_tokens_to_crawl",
|
||||
"params": {
|
||||
"address": "0x...",
|
||||
"blockchain": "ethereum"
|
||||
}
|
||||
},
|
||||
"contract_address": "0x...",
|
||||
"blockchain": "ethereum",
|
||||
"update_existing": false,
|
||||
"v3": true,
|
||||
"customer_id": "...", // Optional, for custom database
|
||||
"instance_id": "..." // Optional, for custom database
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Data Flow
|
||||
1. **Token Discovery**
|
||||
- Query API integration for dynamic token discovery
|
||||
- Database queries for existing tokens
|
||||
- Support for multiple addresses per job
|
||||
|
||||
2. **Metadata Fetching**
|
||||
- Parallel processing with ThreadPoolExecutor
|
||||
- IPFS gateway support
|
||||
- Automatic retry mechanism
|
||||
- Rate limiting and batch processing
|
||||
|
||||
3. **Storage**
|
||||
- Supports both v2 and v3 database structures
|
||||
- Batch upsert operations
|
||||
- Efficient cleaning of old labels
|
||||
|
||||
### 3. Database Structures
|
||||
|
||||
v2:
|
||||
```python
|
||||
{
|
||||
"label": METADATA_CRAWLER_LABEL,
|
||||
"label_data": {
|
||||
"type": "metadata",
|
||||
"token_id": "...",
|
||||
"metadata": {...}
|
||||
},
|
||||
"block_number": 1234567890
|
||||
"block_timestamp": 456
|
||||
}
|
||||
```
|
||||
|
||||
v3:
|
||||
```python
|
||||
{
|
||||
"label": METADATA_CRAWLER_LABEL,
|
||||
"label_type": "metadata",
|
||||
"label_data": {
|
||||
"token_id": "...",
|
||||
"metadata": {...}
|
||||
},
|
||||
"address": "0x...",
|
||||
"block_number": 123,
|
||||
"block_timestamp": 456,
|
||||
"block_hash": "0x..."
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
1. **Flexible Token Selection**
|
||||
- Query API integration
|
||||
- Support for multiple addresses
|
||||
- Configurable update strategies
|
||||
|
||||
2. **Efficient Processing**
|
||||
- Batch processing
|
||||
- Parallel metadata fetching
|
||||
- Optimized database operations
|
||||
|
||||
3. **Error Handling**
|
||||
- Retry mechanism for failed requests
|
||||
- Transaction management
|
||||
- Detailed logging
|
||||
|
||||
4. **Database Management**
|
||||
- Efficient upsert operations
|
||||
- Label cleaning
|
||||
- Version compatibility (v2/v3)
|
||||
|
||||
## Usage
|
||||
|
||||
### CLI Options
|
||||
|
||||
```bash
|
||||
metadata-crawler crawl \
|
||||
--blockchain ethereum \
|
||||
--commit-batch-size 50 \
|
||||
--max-recrawl 300 \
|
||||
--threads 4 \
|
||||
--spire true \
|
||||
--custom-db-uri "postgresql://..." # Optional
|
||||
```
|
||||
### Environment Variables
|
||||
- `MOONSTREAM_ADMIN_ACCESS_TOKEN`: Required for API access
|
||||
- `METADATA_CRAWLER_LABEL`: Label for database entries
|
||||
- `METADATA_TASKS_JOURNAL_ID`: Journal ID for metadata tasks
|
||||
|
||||
|
||||
### Database Modes
|
||||
|
||||
1. **Legacy Mode (v2)**
|
||||
- Uses leak-based update strategy
|
||||
- Single database connection
|
||||
- Simple metadata structure
|
||||
|
||||
2. **Modern Mode (v3)**
|
||||
- SQL-based update tracking
|
||||
- Support for multiple database instances
|
||||
- Enhanced metadata structure
|
||||
- Per-customer database isolation
|
||||
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Job Configuration**
|
||||
- Use descriptive job names
|
||||
- Group related addresses
|
||||
- Set appropriate update intervals
|
||||
|
||||
2. **Performance Optimization**
|
||||
- Adjust batch sizes based on network conditions
|
||||
- Monitor thread count vs. performance
|
||||
- Use appropriate IPFS gateways
|
||||
|
||||
3. **Maintenance**
|
||||
- Regular cleaning of old labels
|
||||
- Monitor database size
|
||||
- Check for failed metadata fetches
|
Ładowanie…
Reference in New Issue