Skip to content

Nostr Scraping Project

TODO

  • But how many live connections can we support
  • Error Debugging in for nosdump-ingest
  • We need to use a the token indexing thing in Postgres to search the Text field better
  • Map out the logging I want to do, write a Blog post about it
  • Write a blog post about the nature of Nostr Relay Filters, how they have ID's, how they Have limits, how they are closed, etc. etc
  • Write a blog post about the problem of Activities verses Workflows, and relate it to what were were trying to do with CGFS
  • We ought to start using fractal terminology to describe our scraping n stuff, like with CGFS we are supposed to reference the root of a discussion, in Nostr we are supposed to also reference the root event of a discussion, I believe in the future raw Nostr events posted without context are going to be rare and not adopted by default in Nostr clients

Job States

  • My Job States
    • TODO
    • RUNNING
    • COMPLETED
    • ERROR
    • FAILED
  • Temporal Activity States
    • Running
    • Cancelled
    • Completed
    • Failed
    • Terminated
    • Timed Out

As per Nostr Scraping Plan 0.0.1 we got a couple different things we need to scrape separately,

  • Events from a User from a Specific Relay
    • scrape.pubkey.from.relay.0.0.1
  • Replies to a thread
    • scrape.replies.to.thread.from.relay.0.0.1
  • Reactions to a Thread
    • scrape.reactions.to.thread.from.relay.0.0.1
  • Follows of a NPUB
    • scrape.follows.of.pubkey.from.relay.0.0.1
  • Badges send to a User
    • scrape.badges.to.publey.from.relay.0.0.1
  • NIP05 Stuff

    • scrape.nip05.0.0.1
  • We start with a single NPUB of popular Nostr User

  • We scrape the Users NIP05 Identity for other Relays they use
  • We scrape all that users events from every relay they say they publish to
  • We then grab all the
    • events mention a pubkey using p tag
    • reactions(NIP-07) to the NPUB
    • replies(NIP-01) to the NPUB
    • Followers (NIP-02) of the NPUB
    • Badges (NIP-58) to the NPUB
  • We then look at their follow list
  • We add every NPUB to a backlog of Nostr events to scrape

Logs