Schema Approaches for Personal Knowledge

This document surveys approaches to structuring meaning in personal data operations. Each approach represents different tradeoffs between flexibility and queryability, a core tension identified in our principles analysis.

Centralized Ontologies

What it solves: Shared semantics, interoperability

Key implementations: schema.org, FOAF, Dublin Core

Description: Standardized vocabularies where everyone agrees on what entities and relationships mean. Apps can reliably interpret each other's data.

Principle Alignment:

Supports P3 (Semantic Richness) - formal semantics enable inference
Supports P6 (Interoperability) - shared vocabulary enables portability
Conflicts with P4 (Schema Pluralism) - assumes single correct schema
Conflicts with P5 (Friction Minimization) - requires upfront modeling

Requirements Addressed:

R10 (Tool-independent representation)
R11 (Relation preservation across transformations)

Tradeoffs:

Strengths: Strong interoperability, rich existing vocabularies, tooling support (validators, reasoners)
Weaknesses: Lowest-common-denominator problem, slow evolution, assumes consensus exists

Questions:

Does personal knowledge need to be interoperable across individuals?
What if my concept of "importance" differs from schema.org?

Related Terms: See glossary-engineering - RDF, SPARQL, Ontology

Lexicon Systems (Namespaced Schemas)

What it solves: Schema evolution without global coordination

Key implementations: atproto Lexicons, JSON-LD contexts

Description: Schemas are versioned, namespaced documents. Anyone can define new schemas without central approval. Example: com.myapp.note.v2

Principle Alignment:

Strongly supports P4 (Schema Pluralism) - multiple concurrent schemas by design
Supports P6 (Interoperability) - schemas are portable definitions
Addresses GAP between centralized (too rigid) and emergent (too loose)

Requirements Addressed:

R12 (Schema evolution without data loss)
R11 (Relation preservation)

Tradeoffs:

Strengths: Fast iteration, no central authority needed, versioning built-in
Weaknesses: Fragmentation risk (many "note" schemas), apps must handle multiple versions, discovery problem

System Examples:

atproto uses lexicons for all record types
JSON-LD contexts serve similar function in Solid

Questions:

How do you merge knowledge from different lexicons?
Who defines vocabulary for personal concepts like "thinking process"?

Related Terms: See glossary-engineering - Lexicon System, JSON-LD

Property Graphs

What it solves: Relationships as first-class entities, expressive queries

Key implementations: Neo4j, labeled property graph databases, Roam Research

Description: Nodes and edges both have properties. Queries traverse relationships naturally using graph query languages.

Principle Alignment:

Strongly supports P3 (Semantic Richness) - typed, directional relationships
Supports P9 (Performance Pragmatism) - graph databases optimized for traversal
Better performance than RDF triple stores for many queries

Gap Addressed:

Partially addresses GAP-4 (Proactive Surfacing) - graph algorithms enable discovery

Requirements Addressed:

R14 (Graph traversal and pattern detection)
R17 (Non-obvious connection discovery)
R3 (Semantic query support)

Tradeoffs:

Strengths: Natural for knowledge graphs, powerful query languages (Cypher), schema can be emergent
Weaknesses: Can become unstructured without discipline, version/migration challenges, performance degrades with sprawl

Questions:

How structured should relationship types be?
Is [related-to](../related-to) sufficient or do you need [challenges](../challenges), [extends](../extends), [contradicts](../contradicts)?

Related Terms: See glossary-engineering - Property Graph, Graph Traversal

Document-Oriented / Flexible JSON

What it solves: Schema flexibility, developer ergonomics

Key implementations: MongoDB, JSON-LD, Obsidian frontmatter

Description: Documents can have arbitrary structure. Schema validation at use-time, not storage-time. "Schema-on-read" rather than "schema-on-write."

Principle Alignment:

Supports P4 (Schema Pluralism) - accommodates unexpected properties
Supports P5 (Friction Minimization) - low cognitive overhead to start
Weakens P3 (Semantic Richness) - relationships less explicit
Weakens P9 (Performance Pragmatism) - query optimization difficult

Requirements Addressed:

R37 (Low-friction integration)
R12 (Schema evolution)

Tradeoffs:

Strengths: Easy to start, accommodates unexpected properties, familiar to developers
Weaknesses: Query optimization difficult, schema drift over time, validation is optional

Questions:

How do you prevent vault from becoming junk drawer?
When do you need schema discipline?

System Examples:

Obsidian uses Markdown with YAML frontmatter
Notion uses flexible block-based documents

Schema-less / Emergent Structure

What it solves: Pure flexibility, structure emerges from use

Key implementations: Roam Research, Obsidian (in practice), Zettelkasten method

Description: No predefined structure. Meaning emerges from links, backlinks, and tags. Structure is implicit, not explicit.

Principle Alignment:

Maximizes P4 (Schema Pluralism) - no schema at all
Maximizes P5 (Friction Minimization) - zero upfront modeling
Strongly conflicts with P3 (Semantic Richness) - relationships are untyped
Conflicts with P9 (Performance Pragmatism) - structured queries difficult

Requirements Challenged:

R3 (Semantic query support) - no formal semantics
R14 (Graph traversal) - relationships are implicit
R29 (Presentation-ready export) - structure must be inferred

Tradeoffs:

Strengths: Zero cognitive overhead, organic evolution, resists premature optimization
Weaknesses: Structured queries difficult, consistency hard to enforce, export/migration pain

Philosophy: Classic Zettelkasten - let structure emerge from connections rather than imposing hierarchy.

Questions:

Can you have emergence AND queryability?
What is minimal structure needed for useful computation?

Related Terms: Zettelkasten, emergent order

Comparative Analysis

Flexibility vs Queryability Spectrum:

Schema-less ← Document-Oriented ← Property Graph ← Lexicons ← Centralized Ontologies
  (max flex)                                                    (max queryability)

Which Approach for Which Principle:

P3 (Semantic Richness): Centralized Ontologies, Property Graphs
P4 (Schema Pluralism): Lexicons, Document-Oriented, Schema-less
P5 (Friction Minimization): Schema-less, Document-Oriented
P9 (Performance): Property Graphs, Lexicons

System Scores by Approach:

Solid (Centralized Ontologies via RDF): 19/30 - semantics but poor performance
atproto (Lexicons): 20/30 - good balance, designed for evolution
Obsidian (Schema-less): 18/30 - low friction but weak semantics
Roam (Property Graph-ish): 11/30 - good surfacing but vendor lock-in

Cross-Domain Term Mapping

Understanding how different communities name similar concepts:

Concept	Semantic Web	Data Ops	PKM
Structure definition	Ontology	Schema	Template
Relationship	Predicate/Property	Foreign key/Edge	Link/Connection
Validation	SHACL/ShEx	Schema validation	Linting
Evolution	Versioned namespaces	Migration scripts	Refactoring

Selection Criteria

Based on our principles and requirements:

Choose Centralized Ontologies if:

Interoperability with external systems is critical (P6)
Formal reasoning and inference are needed (P3)
Willing to accept upfront modeling cost

Choose Lexicons if:

Schema evolution is frequent (P4, R12)
Multiple concurrent schemas needed
Decentralized schema development required

Choose Property Graphs if:

Relationship queries are primary use case (R14, R17)
Performance at scale matters (P9)
Willing to impose some schema discipline

Choose Document-Oriented if:

Rapid iteration and experimentation (P5)
Schema is still emerging
Developer ergonomics priority

Choose Schema-less if:

Absolute minimum friction required (P5)
User base is non-technical
Structure can emerge organically

Open Questions

Should personal schemas be standardized or bespoke?
How do you balance expressive power with query performance?
What is the right level of formality for "notes to self" versus shared knowledge?
Can AI help with schema suggestion/emergence?
Can hybrid approaches combine flexibility and queryability effectively?

Cross-References

principles - P3 (Semantic Richness), P4 (Schema Pluralism)
system-evaluation - How real systems score on these dimensions
gap-analysis - GAP-3 (Semantic richness vs performance tradeoff)
glossary-engineering - Technical term definitions
atproto-analysis - Lexicon system in practice

Schema Approaches for Personal Knowledge

Centralized Ontologies

Lexicon Systems (Namespaced Schemas)

Property Graphs

Document-Oriented / Flexible JSON

Schema-less / Emergent Structure

Comparative Analysis

Cross-Domain Term Mapping

Selection Criteria

Open Questions

Cross-References

Backlinks