mastodon-stream/README.md

157 wiersze
4.0 KiB
Markdown
Czysty Wina Historia

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Setup virtual python environment
Optionally, you can use a [virtual python](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) environment to keep dependencies separate. The _venv_ module is the preferred way to create and manage virtual environments.
```console
python3 -m venv env
```
Before you can start installing or using packages in your virtual environment youll need to activate it.
```console
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
# Federated timeline
These are the most recent public posts from people on this and other servers of the decentralized network that this server knows about.
https://data-folks.masto.host/public
# Proudcer
python mastodonlisten.py --baseURL https://data-folks.masto.host/ --enableKafka
# Kafka Connect
confluent-hub install confluentinc/kafka-connect-s3:10.3.0
curl -X PUT -H "Content-Type:application/json" localhost:8083/connectors/mastodon-sink-s3/config -d '@./config/mastodon-sink-s3.json'
curl -X PUT -H "Content-Type:application/json" localhost:8083/connectors/mastodon-sink-s3-aws/config -d '@./config/mastodon-sink-s3-aws.json'
# DuckDB
duckdb --init duckdb/init.sql
select * FROM read_parquet('s3://mastodon/topics/mastodon-topic*');
select 'epoch'::TIMESTAMP + INTERVAL 1675325510 seconds;
# OLD Notes
- https://martinheinz.dev/blog/86
- https://github.com/morsapaes/hex-data-council/tree/main/data-generator
- https://redpanda.com/blog/kafka-streaming-data-pipeline-from-postgres-to-duckdb
# Docker Notes
```
docker-compose up -d postgres datagen
```
Password `postgres`
```
psql -h localhost -U postgres -d postgres
select * from public.user limit 3;
```
```
docker-compose up -d redpanda redpanda-console connect
```
Redpanda Console at http://localhost:8080
```
docker exec -it connect /bin/bash
curl -X PUT -H "Content-Type:application/json" localhost:8083/connectors/pg-src/config -d '@/connectors/pg-src.json'
curl -X PUT -H "Content-Type:application/json" localhost:8083/connectors/s3-sink/config -d '@/connectors/s3-sink.json'
curl -X PUT -H "Content-Type:application/json" localhost:8083/connectors/s3-sink-m/config -d '@/connectors/s3-sink-m.json'
```
```
docker-compose up -d minio mc
```
http://localhost:9000
```
Login with : `minio / minio123`
```
docker-compose up -d duckdb
```
```
docker-compose exec duckdb bash
duckdb --init duckdb/init.sql
SELECT count(value.after.id) as user_count FROM read_parquet('s3://user-payments/debezium.public.user-*');
```
## Kafka notes
python avro-producer.py -b "localhost:9092" -s "http://localhost:8081" -t aubury.mytopic
## LakeFS
docker run --pull always -p 8000:8000 \
-e LAKEFS_BLOCKSTORE_TYPE='s3' \
-e AWS_ACCESS_KEY_ID='YourAccessKeyValue' \
-e AWS_SECRET_ACCESS_KEY='YourSecretKeyValue' \
treeverse/lakefs run --local-settings
docker run --pull always -p 8000:8000 \
-e LAKEFS_BLOCKSTORE_TYPE='s3' \
-e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE='true' \
-e LAKEFS_BLOCKSTORE_S3_ENDPOINT='http://minio:9000' \
-e LAKEFS_BLOCKSTORE_S3_DISCOVER_BUCKET_REGION='false' \
-e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID='minio' \
-e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY='minio123' \
treeverse/lakefs run --local-settings
set s3_endpoint='minio:9000';
set s3_access_key_id='minio';
set s3_secret_access_key='minio123';
set s3_use_ssl=false;
set s3_region='us-east-1';
set s3_url_style='path';
### Installing packages
Now that youre in your virtual environment you can install packages.
```console
python -m pip install --requirement requirements.txt
```
### JupyterLab
Once installed, launch JupyterLab with:
```console
jupyter-lab
```
### Cleanup of virtual environment
If you want to switch projects or otherwise leave your virtual environment, simply run:
```console
deactivate
```
If you want to re-enter the virtual environment just follow the same instructions above about activating a virtual environment. Theres no need to re-create the virtual environment.