Nostr Scraping Project
TODO
- But how many live connections can we support
- Error Debugging in for nosdump-ingest
- We need to use a the token indexing thing in Postgres to search the Text field better
- Map out the logging I want to do, write a Blog post about it
- Write a blog post about the nature of Nostr Relay Filters, how they have ID's, how they Have limits, how they are closed, etc. etc
- Write a blog post about the problem of Activities verses Workflows, and relate it to what were were trying to do with CGFS
- We ought to start using fractal terminology to describe our scraping n stuff, like with CGFS we are supposed to reference the root of a discussion, in Nostr we are supposed to also reference the root event of a discussion, I believe in the future raw Nostr events posted without context are going to be rare and not adopted by default in Nostr clients
Job States
- My Job States
- TODO
- RUNNING
- COMPLETED
- ERROR
- FAILED
- Temporal Activity States
- Running
- Cancelled
- Completed
- Failed
- Terminated
- Timed Out
As per Nostr Scraping Plan 0.0.1 we got a couple different things we need to scrape separately,
- Events from a User from a Specific Relay
- scrape.pubkey.from.relay.0.0.1
- Replies to a thread
- scrape.replies.to.thread.from.relay.0.0.1
- Reactions to a Thread
- scrape.reactions.to.thread.from.relay.0.0.1
- Follows of a NPUB
- scrape.follows.of.pubkey.from.relay.0.0.1
- Badges send to a User
- scrape.badges.to.publey.from.relay.0.0.1
-
NIP05 Stuff
- scrape.nip05.0.0.1
-
We start with a single NPUB of popular Nostr User
- We scrape the Users NIP05 Identity for other Relays they use
- We scrape all that users events from every relay they say they publish to
- We then grab all the
- We then look at their follow list
- We add every NPUB to a backlog of Nostr events to scrape
Logs
- 2025-04-05-14-01-15
- Looked at Workflow SQL Schemas
- Looked at how to easily install and run Postgres with extensions
- 2025-04-06-13-45-53
- Designed a Workflow SQL Schema
- Made a list of Activities for the Workflow Engine to do
- 2025-04-06-23-17-55
- Actually installed and tested our little workflow engine
- 2025-04-08-16-42-40
- Worked on writing, ETL to QE, Update 67, Nostr Scraping via a Custom Workflow Engine in SQL
- 2025-04-11-00-11-05
- Spent my time developing a plan inside ETL to QE, Update 67, Nostr Scraping via a Custom Workflow Engine in SQL
- 2025-04-13-21-21-43
- Spent time writing, ETL to QE, Update 68, Thinking Through how a Workflow Engine Works
- 2025-04-15-00-24-10
- Created what would become the over engineered schema, the one that was supposed to be an entire workflow engine
- 2025-04-16-16-06-12
- Coded up scripts that would actually input data into the SQL Schema, realized that the Workflow Engine stuff was a little too much
- 2025-04-20-01-46-09
- Got the recursive scraping for the Event 0 and the Author's figured out
- 2025-04-21-16-42-18
- We ended up getting Postgres.js working with much simpler code. I finally have the schema I want and am proud of as well as the ingesting system for nostr events as well.
- 2025-04-22-20-37-24
- 2025-04-23-17-53-46
- 2025-04-24-19-46-28
- 2025-04-25-15-47-53
- 2025-04-28-16-44-54