Skip to content

Research Copilot

A single runnable example that exercises nearly every TypeGraph capability — typed schema, ontology, vector embeddings, recursive traversals, and the graph algorithms — against a corpus of landmark ML papers. It produces an explainable literature-review digest in one run against a single SQLite file, with zero external services.

A natural-language query comes in. The copilot returns a ranked, chronological reading list with citation counts, authors, and topics — all computed against a single in-memory SQLite database:

Query: "contrastive self-supervised representation learning for vision"
Recommended reading order (chronological among top-ranked):
2012 ImageNet Classification with Deep Convolutional Neural Networks [3 citations]
Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
topics: CNN, ComputerVision, DeepLearning
why: semantic 0.449 · topic match: DeepLearning · 3 incoming citations
2014 Adam: A Method for Stochastic Optimization [1 citation]
Diederik Kingma, Jimmy Ba
topics: Optimization
why: semantic 0.429 · 1 incoming citation
2019 Momentum Contrast for Unsupervised Visual Representation Learning [1 citation]
Kaiming He, Haoqi Fan, Yuxin Wu, et al.
topics: Contrastive, SelfSupervised, ComputerVision
why: semantic 0.523 · topic match: SelfSupervised, Contrastive · 1 incoming citation
2020 A Simple Framework for Contrastive Learning of Visual Representations [1 citation]
Ting Chen, Simon Kornblith, Mohammad Norouzi, et al.
topics: Contrastive, SelfSupervised, ComputerVision
why: semantic 0.436 · topic match: SelfSupervised, Contrastive · 1 incoming citation
2020 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [1 citation]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, et al.
topics: Transformer, ComputerVision, DeepLearning
why: semantic 0.350 · topic match: DeepLearning · 1 incoming citation

Each moving part maps to a single TypeGraph primitive:

FeatureTypeGraph capability
Semantic paper retrievalembedding() fields + cosine similarity
Topic hierarchy expansionOntology + store.algorithms.reachable()
Citation-authority rankingstore.algorithms.degree() over cites
Explainable paper lineagestore.algorithms.shortestPath() over cites
”Does this trace back to X?”store.algorithms.canReach()
Co-author discovery (2-hop)store.algorithms.neighbors()
Reading-list assemblyQuery builder with typed traversals

Three node kinds and four edges model the citation graph plus a topic hierarchy that supports query expansion:

const Paper = defineNode("Paper", {
schema: z.object({
// Title + abstract are `searchable()` so BM25 ranks papers by
// keyword hits (rare technical terms, author surnames, dataset
// names) — exactly the queries where embeddings are least
// discriminative.
title: searchable({ language: "english" }),
year: z.number().int(),
abstract: searchable({ language: "english" }),
embedding: embedding(128),
}),
});
const Author = defineNode("Author", {
schema: z.object({ name: z.string() }),
});
const Topic = defineNode("Topic", {
schema: z.object({ name: z.string() }),
});
const cites = defineEdge("cites", { schema: z.object({}) });
const authoredBy = defineEdge("authored_by", { schema: z.object({}) });
const coversTopic = defineEdge("covers_topic", { schema: z.object({}) });
// Topic hierarchy: `CNN broader_than DL` reads "CNN is a more specific
// concept than DL". Recursive traversal expands narrow query terms into
// their ancestor concepts for higher recall.
const broaderThan = defineEdge("broader_than", { schema: z.object({}) });
const graph = defineGraph({
id: "research_copilot",
nodes: { Paper: { type: Paper }, Author: { type: Author }, Topic: { type: Topic } },
edges: {
cites: { type: cites, from: [Paper], to: [Paper] },
authored_by: { type: authoredBy, from: [Paper], to: [Author] },
covers_topic: { type: coversTopic, from: [Paper], to: [Topic] },
broader_than: { type: broaderThan, from: [Topic], to: [Topic] },
},
});

The example walks through five capabilities end-to-end. Each produces real console output against the seeded corpus of 18 landmark papers.

Every paper has a 128-dimensional embedding. Rank the corpus against a query embedding and take the top hits:

const queryEmbedding = mockEmbedding(query);
const allPapers = await store.nodes.Paper.find();
const ranked = allPapers
.map((paper) => ({
paper,
similarity: cosine(queryEmbedding, paper.embedding),
}))
.sort((a, b) => b.similarity - a.similarity);

In production, swap the in-JS ranking for p.embedding.similarTo(queryEmbedding, k) in a query builder predicate — backed by pgvector or sqlite-vec — to do the scoring in SQL. See Semantic Search.

title and abstract are declared searchable(), so the same corpus is also indexed for BM25 via SQLite’s FTS5. The example runs a rare-token query against the fulltext index to show where BM25 wins — dataset names, method acronyms, proper nouns — exactly the queries embeddings smooth out:

const fulltextHits = await store.search.fulltext("Paper", {
query: "Dropout",
limit: 3,
includeSnippets: true,
});
─── Fulltext retrieval (BM25 via FTS5) for: "Dropout" ───
2.619 Dropout: A Simple Way to Prevent Neural Networks from Overfitting
<mark>Dropout</mark>: A Simple Way to Prevent Neural Networks from Overfitting
Randomly zeroing unit activations during training prevents co-adaptation and…

In production you’d fuse the two via store.search.hybrid(), which runs both retrievers and blends them with Reciprocal Rank Fusion at the SQL layer:

const hits = await store.search.hybrid("Paper", {
limit: 10,
vector: { fieldPath: "embedding", queryEmbedding, metric: "cosine" },
fulltext: { query, includeSnippets: true },
// Weight fulltext slightly higher for the entity-heavy queries
// typical of literature search.
fusion: { method: "rrf", k: 60, weights: { vector: 1, fulltext: 1.25 } },
});

See the Fulltext Search guide for tuning and Example 15 for an end-to-end hybrid walkthrough.

A query for the narrow topic Contrastive should also return papers tagged with its ancestors (SelfSupervised, DeepLearning). reachable() walks the broader_than edge recursively and returns every ancestor topic:

const topicAncestors = await store.algorithms.reachable(contrastiveTopic, {
edges: ["broader_than"],
maxHops: 10,
excludeSource: true,
});

Then filter papers whose covers_topic edge lands in the expanded set:

const topicMatches = await store
.query()
.from("Paper", "p")
.traverse("covers_topic", "e")
.to("Topic", "t")
.whereNode("t", (t) => t.id.in([...expandedTopicIds]))
.select((ctx) => ({ id: ctx.p.id, title: ctx.p.title, topic: ctx.t.name }))
.execute();

Output:

Expanded set: {Contrastive, SelfSupervised, DeepLearning}

Pure vector similarity is noisy. Fuse it with in-degree on the cites edge so highly-cited papers bubble up:

const citationCount = await store.algorithms.degree(paperId, {
edges: ["cites"],
direction: "in",
});
const score = similarity + topicBonus + Math.log(citationCount + 1) / 10;

Output:

score = similarity + 0.05 * topicMatches + log(1 + citations) / 10
rank score sim topic cites title
───────────────────────────────────────────────────────────────────
1 0.692 0.523 2 1 Momentum Contrast for Unsupervised Visual Representation Learning
2 0.638 0.449 1 3 ImageNet Classification with Deep Convolutional Neural Networks
3 0.606 0.436 2 1 A Simple Framework for Contrastive Learning of Visual Representations

“You’ve read AlexNet — how does SimCLR trace back to it?” shortestPath returns an ordered list of nodes, which the example formats as a tree:

const lineage = await store.algorithms.shortestPath(simclr.id, alex.id, {
edges: ["cites"],
maxHops: 6,
});
2-hop citation lineage:
A Simple Framework for Contrastive Learning of Visual Representations
└─▶ Deep Residual Learning for Image Recognition
└─▶ ImageNet Classification with Deep Convolutional Neural Networks

canReach is the boolean sibling of shortestPath — useful when you don’t need the path, just the answer. Here: “which of these modern papers still trace back to Rumelhart’s 1986 backprop paper?“

const reaches = await store.algorithms.canReach(paper.id, backprop.id, {
edges: ["cites"],
maxHops: 10,
});
✓ "LLaMA: Open and Efficient Foundation Language Models" traces to Rumelhart 1986
✓ "Learning Transferable Visual Models From Natural Language" traces to Rumelhart 1986
✓ "Chain-of-Thought Prompting Elicits Reasoning in Large LMs" traces to Rumelhart 1986
✓ "A Simple Framework for Contrastive Learning of Visual Reps" traces to Rumelhart 1986

neighbors returns the direct neighborhood of a node. Compose it — authors of CLIP → their other papers → co-authors on those papers — to rank natural collaborators by shared-paper count:

const clipAuthors = await store.algorithms.neighbors(clip.id, {
edges: ["authored_by"],
depth: 1,
});
// For each CLIP author: walk authored_by backwards to all their papers,
// then forwards to all their co-authors.
const perAuthorPapers = await Promise.all(
clipAuthors.map((author) =>
store.algorithms.neighbors(author.id, {
edges: ["authored_by"],
direction: "in",
depth: 1,
}),
),
);

Issuing each level in parallel keeps the fan-out at O(depth) round-trips instead of O(authors × papers).

Seed paper authors: Ilya Sutskever, Jong Wook Kim, Aditya Ramesh, Alec Radford, Chris Hallacy
Nearby collaborators beyond the original CLIP paper:
2× shared papers with CLIP authors Alex Krizhevsky
2× shared papers with CLIP authors Geoffrey Hinton
2× shared papers with CLIP authors Rewon Child
2× shared papers with CLIP authors Jeffrey Wu
1× shared papers with CLIP authors Nitish Srivastava

The full source lives at packages/typegraph/examples/14-research-copilot.ts. From a checkout of the repository:

Terminal window
pnpm install
npx tsx packages/typegraph/examples/14-research-copilot.ts

The example builds the graph, runs every scene, and tears down — all against an in-memory SQLite database. To persist it, point createExampleBackend() at a file path. To run it on Postgres, swap the import to createPostgresBackend — see Backend Setup.

  • Graph Algorithms — the full API for shortestPath, reachable, canReach, neighbors, and degree
  • Knowledge Graph for RAG — entity linking, chunk traversal, and hybrid vector + fulltext retrieval
  • Ontology & Reasoning — inverse edges, subclass hierarchies, and other ontology primitives beyond broader_than
  • Semantic Search — production vector search with pgvector and sqlite-vec