ETL to QE, Update 42, LevelDB All The Things

Date: 2024-07-25

I want CGFS to be a browser native file system that can be used across all devices and apps.

Browser have a variety of storage options,

Cookies, Limit 20 at 4kb each
Web Storage API, Limit 5Mb
IndexedDB, 50 mb limit but expandable
File System Access API, not in Firefox
Origin Private File System
WebSQL, Killed by Bureaucrats

The two real options

IndexedDB and the Origin Private File System are the two options that can store large amounts of storage.

IndexedDB can be used to create a series of sorted key value stores.
Origin Private File System stores data in files and directories
IndexedDB stores and returns blobs of data
Origin Private File System allows for byte level read and write access to files

In comes LevelDB

LevelDB is what IndexedDB in chrome runs on, Firefox's IndexedDB runs on SQLite.

IndexedDB can be wrapped with a Browser specific LevelDB API converter.

LevelDB is also available on Javascript backends via its npm package and also has libraries for other programming languages. such as Python and Rust

The fact is, the same Javascript code that is used in the frontend for CGFS can be used in the backend for CGFS plus CGFS can even run remotely in the browser on a S3 bucket using s3leveldown. Plus with LevelDB there is the possibility of sharing the same database between multiple separate programs written in different programming languages.

LevelDB verses Origin Private File System + File System Code

If we use Origin Private File System we will need to write separate sets of write, read, update, and list functions for the frontend code, backend code, and it might make CGFS implementations in other Programming Languages more difficult. Plus there is the entire problem of a remote file system.

Fundamentally the only real difference between Origin Private File System and LevelDB is the fact that LevelDB is sorted therefore we can easily iterate through it.

The Real Question

The Real question we are trying to answer here is just how far one can go with a KV store rather than a dataframe database such as RxDB or SQLite.

Token Accounting App

The first app I want to develop on CGFS is a token app where people can mint, transfer, and redeem tokens.

Tokens have two main kind of accounting models, Account based where everyone has their balances stored in a KV store and UTxO where people trade collections of recites amongst one another.

When transferring a token, to validate the transfer of value one simply has to look up the account, or UTxO recites, which can all be stored in a KV store. I now realize that even with a UTxO account model you can still calculate the balances somewhere.

The issue comes when we want to do analytics. If we want to see someone's balance change over time, or aggregate the sum of tokens owned by a group, that requires either a lot of use of the KV store or a proper dataframe system.

The point is the core functionality of the token can run without doing dataframe queries.

Bookmarking App

I use Raindrop.io religiously every day to store tagged bookmarks. This application has two component parts, the saved bookmarks with their descriptions, and the tags attached to the bookmarks.

The tags and bookmarks have a relationship. A tag can be collectively renamed changing the tag name across all bookmarks. This means that the tag itself sort of functions as its own document, file or index and just points at the various bookmarks.

The bookmarks themselves are easy to store in a KV store, and the tags can be stored in LevelDB using a sublevel and easily incremented through with a specific key to represent the text of the key. The sublevel or whatever namespace stores the Bookmark can also be incremented through to find what tags the bookmark is linked to.

When we add or remove a bookmark, we just add or remove two "keys", one from the Bookmark itself, and one from the "tag" sublevel namespace.

The issue comes when we want to search through bookmarks. We can get a list of bookmarks that have a specific tag, but we can't search bookmarks by timestamp edited, or when they were first added. That requires a dataframe.

The point is the core functionality of the Bookmarking app can run without doing dataframe queries.

PKMS Knowledge Graph

A Personal Knowledge Graph Manage System is basically the same thing as the Bookmarking app mentioned above.

Kanban Board

The odds of a Kanban board being so large and complex it does not fit into a single Key Value pair strikes me as unlikely. Even if we want to log the changes to the Kanban board one by one, that is just an event stream that can be stored separately.

Issue Tracker

An issue tracker is the same thing as the Personal Knowledge Graph Management System except self contained with typed forms instead of raw documents.

Application Notifications

Application Notifications are just an event stream with Timestamps, and Timestamps can be easily stored as part of a key in LevelDB and merging event streams is as simple as iterating through two LevelDB sublevels and dumping all the keys into a new sublevel

Moral of the Story

Using LevelDB and building core applications without complex dataframe systems such as absurd-sql, SQLite, RxDB, MongoDB, or Polars is possible with complex app features such as search, analytics, and Graph Queries available as separate applications build on the Extensibility of the core applications.

Sources

Directory Sharing in a Web-Based RDP Client Using the File System Acce

Links

When do you actually need to sort, aggregate, or use "WHERE" or text search in a query?