Architecture
This page explains how TypeGraph works under the hood, the design decisions that shaped it, and why certain tradeoffs were made.
High-Level Architecture
Section titled “High-Level Architecture”┌────────────────────────────────────────────────────────┐│ Your Application ││ ││ ┌──────────────────────────────────────────────────┐ ││ │ TypeGraph Library │ ││ │ │ ││ │ ┌────────────┐ ┌────────────┐ │ ││ │ │ Schema │ │ Query │ │ ││ │ │ DSL │ │ Builder │ │ ││ │ └──────┬─────┘ └─────┬──────┘ │ ││ │ │ │ │ ││ │ └──────────────┴───────────────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌──────────────────┐ │ ││ │ │ Ontology Layer │ │ ││ │ └──────────────────┘ │ ││ └─────────────────────────┬────────────────────────┘ ││ │ ││ ▼ ││ ┌──────────────┐ ││ │ Drizzle │ ││ │ ORM │ ││ └──────┬───────┘ ││ │ │└───────────────────────────┼────────────────────────────┘ │ ▼ ┌─────────────────┐ │ Your Database │ └─────────────────┘TypeGraph is an embedded library, not a database. It runs in your application process, uses your existing database connection, and compiles queries to SQL.
Core Design Principles
Section titled “Core Design Principles”1. Embedded, Not External
Section titled “1. Embedded, Not External”Decision: TypeGraph is a library dependency, not a separate service.
Why: Graph databases like Neo4j require managing another piece of infrastructure. For many use cases—knowledge bases, organizational structures, content relationships—the graph is part of your application, not a standalone system.
Tradeoff: You don’t get Neo4j’s specialized graph algorithms (PageRank, community detection), but you avoid:
- Additional deployment complexity
- Network latency between app and graph
- Separate scaling and monitoring
- Data synchronization challenges
2. Schema-First, Type-Driven
Section titled “2. Schema-First, Type-Driven”Decision: Zod schemas are the single source of truth. TypeScript types are inferred, not duplicated.
Why: In many graph systems, you define types in one place, validation in another, and database schemas in a third. This leads to drift and bugs.
With TypeGraph:
const Person = defineNode("Person", { schema: z.object({ name: z.string().min(1), email: z.string().email().optional(), }),});
// TypeScript type is inferred automaticallytype PersonProps = z.infer<typeof Person.schema>;// { name: string; email?: string }The schema drives:
- Runtime validation on create/update
- TypeScript types for compile-time safety
- Database storage format
- Query builder type constraints
3. SQL as the Execution Engine
Section titled “3. SQL as the Execution Engine”Decision: Compile graph queries to SQL, don’t implement a custom query engine.
Why: SQLite and PostgreSQL are battle-tested, highly optimized query engines. Rather than building another one:
// Your querystore.query() .from("Person", "p") .traverse("worksAt", "e") .to("Company", "c") .select((ctx) => ({ person: ctx.p.name, company: ctx.c.name }))
// Compiles to SQL with CTEsWITH person_cte AS ( SELECT * FROM typegraph_nodes WHERE kind = 'Person' AND deleted_at IS NULL),edge_cte AS ( SELECT * FROM typegraph_edges WHERE kind = 'worksAt' AND deleted_at IS NULL),company_cte AS ( SELECT * FROM typegraph_nodes WHERE kind = 'Company' AND deleted_at IS NULL)SELECT p.props->>'name' as person, c.props->>'name' as companyFROM person_cte pJOIN edge_cte e ON e.from_id = p.idJOIN company_cte c ON c.id = e.to_idThis means:
- You get database-level query optimization
- Indexes work as expected
- Transactions are ACID
- You can analyze queries with EXPLAIN
4. Precomputed Ontology
Section titled “4. Precomputed Ontology”Decision: Compute transitive closures at store initialization, not query time.
Why: Semantic relationships like subClassOf and implies form hierarchies. Computing “all subclasses of
Media” during every query would be expensive.
Instead, when you create a store:
const store = createStore(graph, backend);// ↑ Computes:// - subClassOf closure: Media → [Media, Podcast, Article, Video]// - implies closure: marriedTo → [marriedTo, partneredWith, knows]// - disjoint sets: Person ⊥ Organization ⊥ ProductThese closures are stored in the TypeRegistry and used during query compilation:
.from("Media", "m", { includeSubClasses: true })// At compile time, expands to: WHERE kind IN ('Media', 'Podcast', 'Article', 'Video')Tradeoff: Changing the ontology requires recreating the store. But ontologies typically change rarely compared to instance data.
5. Homoiconic Schema Storage
Section titled “5. Homoiconic Schema Storage”Decision: Store the graph schema and ontology as data in the database itself.
Why: Most ORMs and graph libraries define schemas only in application code. The database stores data but has no record of what the data means. This creates problems:
- You can’t understand the database without reading the application source
- Schema changes are invisible—no history, no diff, no audit trail
- Exports require the application to interpret the data
- Multiple applications can’t share schema understanding
TypeGraph takes a different approach: the schema is data. When you initialize a store, the complete schema
(node types, edge types, property definitions, ontology relations, precomputed closures) is serialized to JSON
and stored in typegraph_schema_versions:
SELECT schema_doc FROM typegraph_schema_versionsWHERE graph_id = 'my_graph' AND is_active = TRUE;The stored schema includes everything needed to understand the graph:
{ graphId: "my_graph", version: 3, nodes: { Person: { properties: { /* JSON Schema */ }, ... }, Company: { ... } }, edges: { worksAt: { fromKinds: ["Person"], toKinds: ["Company"], ... } }, ontology: { relations: [{ metaEdge: "subClassOf", from: "Engineer", to: "Person" }], closures: { subClassAncestors: { Engineer: ["Person"] }, // ... precomputed inference data } }}This enables:
| Capability | How It Works |
|---|---|
| Self-describing database | Query the schema without application code—useful for debugging, admin tools, and data exploration |
| Schema versioning | Every schema change creates a new version; previous versions are preserved for auditing |
| Change detection | Compare stored schema to code schema to detect additions, removals, and breaking changes |
| Portable exports | The interchange format is self-contained—importers know what the data means |
| Runtime introspection | Applications can query the schema at runtime for dynamic UI, validation, or documentation |
import { getActiveSchema, getSchemaChanges } from "@nicia-ai/typegraph/schema";
// Query the active schema at runtimeconst schema = await getActiveSchema(backend, "my_graph");console.log("Node types:", Object.keys(schema.nodes));console.log("Edge types:", Object.keys(schema.edges));
// Detect pending changes before deploymentconst diff = await getSchemaChanges(backend, graph);if (!diff.isBackwardsCompatible) { console.error("Breaking changes require migration");}Tradeoff: Schema storage adds a small amount of database overhead (one JSON document per version). The benefit is a database that explains itself.
Data Model
Section titled “Data Model”Storage Schema
Section titled “Storage Schema”TypeGraph uses two core tables:
-- Nodes tableCREATE TABLE typegraph_nodes ( graph_id TEXT NOT NULL, kind TEXT NOT NULL, id TEXT NOT NULL, props JSON NOT NULL, -- Properties as JSON version INTEGER NOT NULL, -- Optimistic concurrency valid_from TEXT NOT NULL, -- Temporal validity valid_to TEXT, created_at TEXT NOT NULL, updated_at TEXT NOT NULL, deleted_at TEXT, -- Soft delete PRIMARY KEY (graph_id, kind, id, valid_from));
-- Edges tableCREATE TABLE typegraph_edges ( graph_id TEXT NOT NULL, kind TEXT NOT NULL, id TEXT NOT NULL, from_kind TEXT NOT NULL, from_id TEXT NOT NULL, to_kind TEXT NOT NULL, to_id TEXT NOT NULL, props JSON NOT NULL, version INTEGER NOT NULL, valid_from TEXT NOT NULL, valid_to TEXT, created_at TEXT NOT NULL, updated_at TEXT NOT NULL, deleted_at TEXT, PRIMARY KEY (graph_id, kind, id, valid_from));Why JSON for Properties?
Section titled “Why JSON for Properties?”Decision: Store node/edge properties as JSON, not as columns.
Why:
- Schema flexibility: Adding a property doesn’t require ALTER TABLE
- Heterogeneous nodes: Different node kinds have different schemas
- Query simplicity: One table for all nodes, not one per kind
Both SQLite (JSON1 extension) and PostgreSQL (JSONB) have efficient JSON operators:
-- PostgreSQLSELECT props->>'name' FROM typegraph_nodes WHERE props->>'status' = 'active';
-- SQLiteSELECT json_extract(props, '$.name') FROM typegraph_nodes WHERE json_extract(props, '$.status') = 'active';Tradeoff: You can’t create a B-tree index on a JSON property as easily as a column. For high-cardinality filtering, consider:
- PostgreSQL: Expression indexes on JSONB paths
- SQLite: Expression indexes on
json_extract(...)(or generated columns)
See Indexes for TypeGraph utilities to define and create these indexes.
Temporal Model
Section titled “Temporal Model”Every node and edge tracks temporal validity:
┌──────────────────────────────────────────────────────────────┐│ Node: Article#123 │├──────────────────────────────────────────────────────────────┤│ Version 1: "Draft" │ valid_from: 2024-01-01 ││ │ valid_to: 2024-01-15 │├─────────────────────────┼────────────────────────────────────┤│ Version 2: "Published" │ valid_from: 2024-01-15 ││ │ valid_to: NULL (current) │└─────────────────────────┴────────────────────────────────────┘When you update a node:
- The current row’s
valid_tois set to now - A new row is inserted with
valid_from = now,valid_to = NULL
This enables:
- Point-in-time queries: “What did the graph look like on January 10th?”
- Audit trails: “What were all the versions of this article?”
- Soft deletes:
deleted_atmarks deletion without losing history
Query Compilation
Section titled “Query Compilation”The Query Pipeline
Section titled “The Query Pipeline”Query Builder API → Query AST → SQL Generator → Drizzle → Database- Query Builder: Fluent API that constructs a typed AST
- Query AST: A data structure representing the query (nodes, edges, predicates, projections)
- SQL Generator: Transforms AST to SQL using CTEs for each step
- Drizzle: Executes the SQL and returns typed results
Common Table Expressions (CTEs)
Section titled “Common Table Expressions (CTEs)”TypeGraph compiles traversals to CTEs, which databases optimize well:
store.query() .from("Person", "p") .traverse("authored", "e") .to("Document", "d") .whereNode("d", (d) => d.status.eq("published"))Becomes:
WITH step_0 AS ( -- Start: all Person nodes SELECT * FROM typegraph_nodes WHERE graph_id = $1 AND kind = 'Person' AND deleted_at IS NULL ), step_1 AS ( -- Traverse: follow 'authored' edges SELECT e.*, s.id as _from_step FROM typegraph_edges e JOIN step_0 s ON e.from_id = s.id WHERE e.kind = 'authored' AND e.deleted_at IS NULL ), step_2 AS ( -- Arrive: at Document nodes SELECT n.*, s.id as _edge_id FROM typegraph_nodes n JOIN step_1 s ON n.id = s.to_id WHERE n.kind = 'Document' AND n.deleted_at IS NULL )SELECT step_0.props->>'name' as person, step_2.props->>'title' as documentFROM step_0JOIN step_1 ON step_1._from_step = step_0.idJOIN step_2 ON step_2._edge_id = step_1.idWHERE step_2.props->>'status' = 'published';Recursive CTEs for Variable-Length Paths
Section titled “Recursive CTEs for Variable-Length Paths”For recursive() traversals with cycle prevention enabled (the default),
TypeGraph generates recursive CTEs like:
WITH RECURSIVE path AS ( -- Base case: starting nodes SELECT id, 1 as depth, ARRAY[id] as path FROM typegraph_nodes WHERE kind = 'Person' AND id = $1
UNION ALL
-- Recursive case: follow edges SELECT n.id, p.depth + 1, p.path || n.id FROM path p JOIN typegraph_edges e ON e.from_id = p.id JOIN typegraph_nodes n ON n.id = e.to_id WHERE e.kind = 'reportsTo' AND p.depth < 100 -- Implicit cap for unbounded traversal AND NOT n.id = ANY(p.path) -- Cycle detection)SELECT * FROM path;When you opt into cyclePolicy: "allow" and do not project a path column,
TypeGraph can use a lighter recursive shape without path-array state and
cycle predicates.
Vector Search Architecture
Section titled “Vector Search Architecture”For semantic search with embeddings:
Storage
Section titled “Storage”-- PostgreSQL with pgvectorCREATE TABLE typegraph_embeddings ( graph_id TEXT NOT NULL, node_kind TEXT NOT NULL, node_id TEXT NOT NULL, field TEXT NOT NULL, embedding vector(1536), -- pgvector type PRIMARY KEY (graph_id, node_kind, node_id, field));
CREATE INDEX ON typegraph_embeddings USING hnsw (embedding vector_cosine_ops); -- HNSW index for fast similarityQuery Flow
Section titled “Query Flow”.whereNode("d", (d) => d.embedding.similarTo(queryVector, 10))Compiles to:
-- PostgreSQLSELECT * FROM typegraph_nodes nJOIN typegraph_embeddings e ON e.node_id = n.idORDER BY e.embedding <=> $1 -- Cosine distanceLIMIT 10;The database’s vector index (HNSW or IVFFlat) handles approximate nearest neighbor search efficiently.
Performance Characteristics
Section titled “Performance Characteristics”What’s Fast
Section titled “What’s Fast”- Point lookups by ID: O(1) with primary key index
- Traversals: Single SQL query with JOINs, optimized by the database
- Ontology expansion: Precomputed at initialization, O(1) at query time
- Semantic search: HNSW indexes provide sub-linear search
What’s Slower
Section titled “What’s Slower”- Deep recursive traversals: Recursive CTEs are more expensive than simple JOINs
- Large property filtering without indexes: JSON extraction is slower than column access
- Cross-kind queries:
includeSubClasses: trueincreases the WHERE IN set
Optimization Strategies
Section titled “Optimization Strategies”- Filter early: Apply predicates as close to the source as possible
- Limit results: Always paginate large result sets
- Use specific kinds: Avoid
includeSubClassesunless needed - Index JSON paths: For frequently-filtered properties, add expression indexes
- Batch writes: Use transactions to reduce disk syncs and round-trips
Why These Tradeoffs?
Section titled “Why These Tradeoffs?”Why Not a Native Graph Database?
Section titled “Why Not a Native Graph Database?”Native graph databases (Neo4j, Amazon Neptune) excel at:
- Very deep traversals (10+ hops)
- Graph algorithms (shortest path, PageRank)
- Massive scale (billions of nodes)
TypeGraph is designed for:
- Knowledge bases with thousands to millions of nodes
- Shallow to medium traversals (1-5 hops typically)
- Applications that already use SQL databases
- Teams that want one database to manage
Why Drizzle ORM?
Section titled “Why Drizzle ORM?”- Type safety: Full TypeScript inference
- Multiple dialects: Same API for SQLite and PostgreSQL
- Raw SQL access: When needed for performance
- Active ecosystem: Well-maintained, growing community
Why Zod for Schemas?
Section titled “Why Zod for Schemas?”- Runtime validation: Not just types, but actual validation
- Inference:
z.infer<T>eliminates type duplication - Composition: Build complex schemas from simple ones
- Ecosystem: Widely used, lots of integrations
Next Steps
Section titled “Next Steps”- Performance - Benchmarks and optimization tips
- Schemas & Stores - Complete function signatures
- Integration Patterns - How to integrate with your stack