glove-scratchpad exposes an agent's capabilities as a relational database it queries with a single execute_sql tool. Instead of loading dozens of tool definitions into the context window, the model writes SQL — which it knows fluently, at every model size — to discover, invoke, and compose capabilities.
The idea: resources become tables. A resource is an entity / data type — github_pr, linear_issue, emails, time, images — and its CRUD verbs map to (possibly different) underlying tools. The model never sees the plumbing; it sees a schema and writes queries against it.
-- discover what you can do
SELECT table_name FROM information_schema.tables;
SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tasks';
-- invoke a tool by querying its table; push arguments through WHERE
SELECT id, name FROM tasks WHERE due_date = (SELECT tomorrow FROM time);
-- compose across services in one statement — no intermediate rows in context
INSERT INTO notion_page (title, body)
SELECT title, body FROM github_pr WHERE merged = true AND base = 'main';
-- stage an outbound effect, preview it, then commit (or roll back — a dry run)
BEGIN;
INSERT INTO emails (to_addr, subject, body) VALUES ('a@b.com', 'hi', 'yo');
COMMIT;It is, at heart, a SQL interpreter: every statement is parsed and inspected before any tool runs. That buys discovery (information_schema), composition (joins / INSERT … SELECT), preview (EXPLAIN, transactions), and a real security surface (a syntax tree you can reject) — for free, because the database solved them decades ago.
pnpm scratchpad:db (drives Database.execute directly so the transcript is deterministic).The obvious question is whether this actually pays off, or just moves the complexity around. So we built an agentic A/B benchmark: ten in-process MCP servers mirroring a real product team's stack — GitHub, Linear, Email, Slack, Notion, Jira, Sentry, PagerDuty, Calendar, Filesystem, 32 tools over one deterministic, cross-linked seed world — run through the real glove-core agent loop twice per model: once with all 32 tools folded directly (baseline), once with a single execute_sql over the same capabilities as tables (scratchpad). Runs are graded deterministically; writes are graded on the real side-effect outbox, which can't be faked.
Three findings fell out. (1) The scratchpad always cut peak context 2–4× — 32 schemas never enter the window and only selected rows do — but the first pass was a tradeoff, not a blowout: weak models spiralled in the open SQL surface (74% pass, six 30-turn runaways). (2) Nearly every weak-model failure traced to a platform gap, not a model limit — most damningly, places where the engine silently mis-answered where Postgres would error. Five rounds of fixes (engine bugs, prompt discipline, SQL-discoverable metadata, read-your-writes, Postgres parity) took the same five budget models from 74% → 100% with zero spirals. (3) At production scale — 40 servers, 367 tools, ~95% noise — the tool-folding baseline inverts: it becomes the least accurate arm at 12× the context and ~6× the cost, while the scratchpad's footprint is statistically unchanged.
The context win is a property of the data, not the model. Measured deterministically (no API key) — asking one cross-service question as the world grows:
| rows / service | naive (est. tokens) | scratchpad (est. tokens) | reduction |
|---|---|---|---|
| 100 | 20.3k | 5.6k | 3.6× |
| 1,000 | 202.8k | 5.8k | 35.0× |
| 5,000 | 1.01M | 5.8k | 174.5× |
| 20,000 | 4.06M | 5.8k | 696.1× |
| 50,000 | 10.14M | 5.9k | 1731.2× |
The transferable lesson is a design stance: the scratchpad only works if it behaves like the database the model already knows. Every place it silently deviated from Postgres muscle-memory was a place a weak model failed; every fix that made truth cheaper to see — command tags, read-your-writes, loud errors, in-band discovery — bought more capability than any prompt instruction. The full study, with all transcripts, figures, and the v1→v5 hardening arc, is in the repo: The Scratchpad Is a Database.
pnpm add glove-scratchpad
# zero runtime dependencies — the query engine (glove-sql) is bundled.
# OPTIONAL — a full Postgres dialect (WASM) instead of the bundled subset:
pnpm add @electric-sql/pglite
# OPTIONAL — bridge MCP servers in as tables:
pnpm add glove-mcpimport { Database, resourceFromTool, defineResource, mountDatabase } from "glove-scratchpad";
const db = await Database.create({ policy: { writes: true } });
// A read-only tool → a one-row `time` table.
db.register(resourceFromTool(getTimeTool, {
name: "time", volatility: "stable",
columns: [{ name: "now", type: "timestamptz" }, { name: "tomorrow", type: "text" }],
}));
// A search tool → a `web` table whose required `query` column is a pushed-down argument.
db.register(resourceFromTool(searchTool, {
name: "web", volatility: "volatile",
columns: [{ name: "title", type: "text" }, { name: "url", type: "text" }],
}));
// Fold the single tool + prime the model to discover → invoke → act → stage.
mountDatabase(agent, { db });Now the agent works entirely in SQL through execute_sql (and explain_sql).
A resource is an entity with columns and any subset of CRUD verbs, each wired independently. defineResource is the explicit contract; resourceFromTool is the convenience for the single-verb case.
const githubPr = defineResource({
name: "github_pr",
volatility: "stable",
columns: [
{ name: "number", type: "bigint", requiredKey: true }, // an API argument
{ name: "title", type: "text" },
{ name: "merged", type: "boolean" },
],
select: (b) => listPrs({ number: b.one("number") }), // SELECT → a list/get tool
insert: (rows) => createPr(rows[0]), // INSERT → a create tool
update: (set, b) => updatePr(b.one("number"), set), // UPDATE → an update tool
delete: (b) => closePr(b.one("number")), // DELETE → a close tool
});A read-only time has only select; an emails (send) is insert-only; an images generator is a select-shaped but volatile function-as-relation (SELECT url FROM images WHERE prompt = '…' — prompt is an argument). Verb presence is the capability gate: SELECTing a write-only resource, or writing one with no writer, is a clear error.
Every resource declares Postgres's immutable | stable | volatile. It governs caching and protects effectful tools from being called the wrong number of times.
| Volatility | Cached | For |
|---|---|---|
| immutable | database lifetime | Pure lookups. |
| stable | within one execute | Turn-stable reads (e.g. time). |
| volatile | never | Effectful / nondeterministic reads & all writes. Invoked exactly once per statement. |
The query engine (glove-sql) is synchronous; resources are async and effectful. So Database.execute can't hook resolution inside the engine — it pre-resolves:
CREATE/DROP refused, multi-statement only as a BEGIN … COMMIT/ROLLBACK script.INSERT … SELECT source) and classify each as a resource or not.WHERE / JOIN-ON equalities scoped to each resource — these are arguments, not just filters; a missing required key is a clear error.Resolving once, up front, is what makes the volatility guarantee hold: the engine evaluates FROM-resolution lazily and repeatedly (once per correlated-subquery row), so an inline async hook would invoke an effectful tool N times. Pre-resolution invokes it once.
There is no separate discovery step. Resources are advertised in information_schema.tables / .columns (engine-agnostically, via a catalog callback), so the agent lands in an unfamiliar database, lists its tables, inspects the relevant ones, and figures out its own capabilities — exactly how SQL has always done progressive disclosure.
A write against a resource is a side-effecting tool call. Inside a transaction it is staged, not fired — recorded with the exact resolver + arguments it will invoke. db.preview() (and the staged field on the result) is the approval surface; COMMIT fires the staged writes in order; ROLLBACK discards them — a true dry run. Writes are off unless the database is created with policy: { writes: true }.
await db.execute(`BEGIN`);
const staged = await db.execute(
`INSERT INTO notion_page (title) SELECT title FROM github_pr WHERE merged = true`,
);
staged.staged; // → the writes about to fire (preview them)
await db.execute(`COMMIT`); // fires in order (or ROLLBACK to discard)db.explain(sql) (and the explain_sql tool, and EXPLAIN <stmt> through execute_sql) runs the pre-pass only — no resolver calls — and reports which resources a statement will hit, each one's volatility, read/write access, and the arguments it resolved. Explaining a generate_image query costs nothing.
Most MCP tools are CRUD over some resource type, so decompose a server into resources and give each a table. glove-mcp is an optional peer dependency.
import { connectMcp } from "glove-mcp";
import { mountMcpDatabase } from "glove-scratchpad/mcp";
const conn = await connectMcp({ namespace: "github", url });
await mountMcpDatabase(db, conn, {
table: (t) => t.name === "list_pull_requests"
? { name: "github_pr", op: "select", volatility: "stable",
columns: [{ name: "title", type: "text" }, { name: "merged", type: "boolean" }],
rows: (d) => JSON.parse(d as string) }
: null, // skip the rest, or map them too
});
// → INSERT INTO linear_issue SELECT … FROM github_pr WHERE merged = true
// composes two servers in one statement.A read tool (readOnlyHint) defaults to a select resource; others default to a volatile insert. Declare columns (and a rows extractor) via table(tool) to make a server's data genuinely queryable.
The manipulation surface is a defined Postgres subset; the backend behind it is swappable (ScratchpadBackend).
| Backend | Import | When |
|---|---|---|
| glove-sql (default) | bundled | Zero-dependency pure-JS Postgres subset. Covers the SQL agents write. |
| PgliteBackend | glove-scratchpad/pglite | Embedded Postgres (WASM) for a full dialect. Optional peer. |
| Bring your own | ScratchpadBackend | Real Postgres / SQLite / a remote service. |
const db = await Database.create({ policy?: { writes }, backend?, actor? });
db.register(resource); // or registerAll([...])
await db.execute(sql, { params?, limit?, allowWrites?, signal? });
// → { rows, truncated, touched, staged?, committed?, message? }
await db.explain(sql, { params? }); // → { statementKind, readOnly, relations, staged? }
db.preview(); // staged writes in the open transaction
mountDatabase(glove, { db, prime?, explain?, allowWrites? });JOIN are yours to handle in the resolver.The query engine lives in glove-sql — a standalone, zero-dependency Postgres-subset engine you can use on its own.