ETL to QE, Update 70, Embarrassing Over Engineering
Date: 2025-04-20
Let's review where we are and where we wanted to be,
By now I am supposed to be half way through the the RBAC LDAP Like Content Addressable Storage System within the DDaemon 2025 Roadmap Rev. 0.0.3 but in reality I am still stuck on the Nostr Scraping Project where I continue to over engineer it over and over again.
Workflow Engines are Not Easy
I started with the idea of developing my own workflow engine within Postgres, this would involve loading up an task within postgres, this task would then be processed by a worker that updated it's progress and results in postgres, then when the job was complete other jobs would start or get triggered.
Turns out this was all a cope to avoid Doing The Thing. All I really wanted was 1,000,000 nostr events in a database so I could start making sense of things, I didn't need an entire workflow engine with signed hash chains for the IO of every function fun on the system.
When it came to developing a dependency job management queue for the workflow engine, which would mean actually representing the DAG in Postgres. I realized I had a harder project on my hands than just scraping a tone of Nostr Events.
You Don't Need Separate Logs
Earlier today I tried to create a separate SQL table for the raw Nostr Filter that was to be scraped via paginated timestamps rather than just loging everything to the logging table.
I realized that just adding the correct label to the log accomplished the same thing as having a separate table for the raw nostr filters themselves.
Turns out there is better Tooling
After realizing how slow paginating 100 event within postgres is I realized I could have been using nosdump the entire time. Every filter from every relay could just be a good old NDJSON file, that could all just be stored in a file and uploaded to S3 or Google Drive, then we could have a nice simple ingestion script which could do some epic SQL Transactions to ingest the data quickly and easily.