Nostr Scraping Project

TODO

But how many live connections can we support
Error Debugging in for nosdump-ingest
We need to use a the token indexing thing in Postgres to search the Text field better
Map out the logging I want to do, write a Blog post about it
Write a blog post about the nature of Nostr Relay Filters, how they have ID's, how they Have limits, how they are closed, etc. etc
Write a blog post about the problem of Activities verses Workflows, and relate it to what were were trying to do with CGFS
We ought to start using fractal terminology to describe our scraping n stuff, like with CGFS we are supposed to reference the root of a discussion, in Nostr we are supposed to also reference the root event of a discussion, I believe in the future raw Nostr events posted without context are going to be rare and not adopted by default in Nostr clients

Job States

My Job States
- TODO
- RUNNING
- COMPLETED
- ERROR
- FAILED
Temporal Activity States
- Running
- Cancelled
- Completed
- Failed
- Terminated
- Timed Out

As per Nostr Scraping Plan 0.0.1 we got a couple different things we need to scrape separately,

Events from a User from a Specific Relay
- scrape.pubkey.from.relay.0.0.1
Replies to a thread
- scrape.replies.to.thread.from.relay.0.0.1
Reactions to a Thread
- scrape.reactions.to.thread.from.relay.0.0.1
Follows of a NPUB
- scrape.follows.of.pubkey.from.relay.0.0.1
Badges send to a User
- scrape.badges.to.publey.from.relay.0.0.1
NIP05 Stuff
- scrape.nip05.0.0.1
We start with a single NPUB of popular Nostr User
We scrape the Users NIP05 Identity for other Relays they use
We scrape all that users events from every relay they say they publish to
We then grab all the
- events mention a pubkey using p tag
- reactions(NIP-07) to the NPUB
- replies(NIP-01) to the NPUB
- Followers (NIP-02) of the NPUB
- Badges (NIP-58) to the NPUB
We then look at their follow list
We add every NPUB to a backlog of Nostr events to scrape

Logs

2025-04-05-14-01-15
- Looked at Workflow SQL Schemas
- Looked at how to easily install and run Postgres with extensions
2025-04-06-13-45-53
- Designed a Workflow SQL Schema
- Made a list of Activities for the Workflow Engine to do
2025-04-06-23-17-55
- Actually installed and tested our little workflow engine
2025-04-08-16-42-40
- Worked on writing, ETL to QE, Update 67, Nostr Scraping via a Custom Workflow Engine in SQL
2025-04-11-00-11-05
- Spent my time developing a plan inside ETL to QE, Update 67, Nostr Scraping via a Custom Workflow Engine in SQL
2025-04-13-21-21-43
- Spent time writing, ETL to QE, Update 68, Thinking Through how a Workflow Engine Works
2025-04-15-00-24-10
- Created what would become the over engineered schema, the one that was supposed to be an entire workflow engine
2025-04-16-16-06-12
- Coded up scripts that would actually input data into the SQL Schema, realized that the Workflow Engine stuff was a little too much
2025-04-20-01-46-09
- Got the recursive scraping for the Event 0 and the Author's figured out
2025-04-21-16-42-18
- We ended up getting Postgres.js working with much simpler code. I finally have the schema I want and am proud of as well as the ingesting system for nostr events as well.
2025-04-22-20-37-24
- TLDR, nostr-fetch all the things
- ETL to QE, Update 71, Nostr SQL Over Engineering Complete, Time for Websockets
2025-04-23-17-53-46
- ETL to QE, Update 72, Minimum Viable Workflow Engine for Nostr Scraping
2025-04-24-19-46-28
- ETL to QE, Update 73, The SQL Schema was Still Over Engineered
2025-04-25-15-47-53
2025-04-28-16-44-54

Nostr Scraping Project

TODO

Job States

Logs

Backlinks