Build a PDF-Extraction Glovebox

In this tutorial you will package a PDF-extraction agent as a Glovebox — a sandboxed, network-addressable Glove runtime that ships with pdftk, pandoc, and pdftotext baked in. The host process never touches a PDF; it hands a file to the box, the agent does the work in isolation, and the host gets back extracted text plus a structured outline.

This is the most compelling use of Glovebox: factor out an environment that would be painful to install on every web server, run it once behind a stable WebSocket endpoint, and let your host app talk to it through the regular client SDK. The agent inside the box is an ordinary Glove agent — same builder, same tools, same subscribers.

Prerequisites: read Glovebox for the surface area, and Server-Side Agents for the kind of agent you wrap. The example sources live at examples/glovebox-pdf-extractor/.

What you will build

A box that takes a single PDF on /input and returns two artefacts: extracted.txt (the body text) and outline.json (page-numbered headings). The agent decides which CLI to invoke based on the document — pure text PDFs go through pdftotext, scans get a fallback path through pdftk + pandoc. Both binaries ship in glovebox/docs:1.2, so no extra packages are needed.

The host serialises a PDF as a FileRef (inline below 1MB, otherwise wrapped through client storage)
The kit materialises it onto /input/document.pdf before invoking the agent
The agent calls extract_text, which shells out to pdftotext and writes /output/extracted.txt
The agent calls extract_outline, which uses pdftk to dump bookmarks and writes /output/outline.json
The kit lists /output, applies the outputs policy, and ships back a complete message with the resolved FileRefs

1. The agent

The agent is a plain Glove runnable. Two tools, an Anthropic adapter, an in-memory store, and the standard Displaymanager. Nothing here knows about Glovebox yet.

examples/glovebox-pdf-extractor/agent.tstypescript

import { Glove, Displaymanager, createAdapter } from "glove-core";
import { exec } from "node:child_process";
import { promisify } from "node:util";
import { writeFile } from "node:fs/promises";
import path from "node:path";
import z from "zod";

const run = promisify(exec);

class MemoryStore {
  identifier = "pdf";
  private msgs: any[] = [];
  private tokens = 0;
  private turns = 0;
  async getMessages() { return this.msgs; }
  async appendMessages(m: any[]) { this.msgs.push(...m); }
  async getTokenCount() { return this.tokens; }
  async addTokens(n: number) { this.tokens += n; }
  async getTurnCount() { return this.turns; }
  async incrementTurn() { this.turns++; }
  async resetCounters() { this.tokens = 0; this.turns = 0; }
}

export const agent = new Glove({
  store: new MemoryStore(),
  model: createAdapter({ provider: "anthropic", model: "claude-sonnet-4.5", stream: true }),
  displayManager: new Displaymanager(),
  serverMode: true,
  systemPrompt:
    "You extract structured data from PDFs. The user uploads one PDF " +
    "to /input. Use extract_text for the body and extract_outline for " +
    "the table of contents. Always write results into /output and " +
    "summarise what you produced in one paragraph.",
  compaction_config: { compaction_instructions: "Summarise extraction findings." },
})
  .fold({
    name: "extract_text",
    description: "Run pdftotext on a PDF in /input. Writes plain text to /output/<name>.txt.",
    inputSchema: z.object({
      file: z.string().describe("Filename inside /input, e.g. 'document.pdf'."),
      outputName: z.string().describe("Output filename, e.g. 'extracted.txt'."),
    }),
    async do(input) {
      const src = path.join("/input", input.file);
      const dest = path.join("/output", input.outputName);
      await run(`pdftotext -layout '${src}' '${dest}'`);
      return { status: "success", data: `Wrote ${dest}` };
    },
  })
  .fold({
    name: "extract_outline",
    description: "Dump the PDF's bookmark tree as JSON via pdftk and write it to /output.",
    inputSchema: z.object({
      file: z.string(),
      outputName: z.string(),
    }),
    async do(input) {
      const src = path.join("/input", input.file);
      const dest = path.join("/output", input.outputName);
      const { stdout } = await run(`pdftk '${src}' dump_data_utf8`);
      const headings = stdout
        .split("\n")
        .filter((l) => l.startsWith("BookmarkTitle:") || l.startsWith("BookmarkPageNumber:"));
      const outline: { title: string; page: number }[] = [];
      for (let i = 0; i < headings.length; i += 2) {
        const title = headings[i]?.replace("BookmarkTitle: ", "") ?? "";
        const page = Number(headings[i + 1]?.replace("BookmarkPageNumber: ", "") ?? "0");
        outline.push({ title, page });
      }
      await writeFile(dest, JSON.stringify(outline, null, 2));
      return { status: "success", data: `Wrote ${dest} (${outline.length} entries).` };
    },
  })
  .build();

Notice the agent uses serverMode: true and never touches the display manager. This is the headless shape — no permission gating, no UI checkpoints, just tools that read files and write files. The Displaymanager is still required by GloveConfig but stays empty.

The tools deliberately reach paths through /input and /output. Those mounts come from the default fs map the wrap config inherits — read-only inputs, writable outputs, and a writable /work if the agent ever wants scratch space.

2. The wrap config

glovebox.wrap turns the runnable into a deployable app. The base image carries every binary the tools call out to, so the packages map stays empty.

examples/glovebox-pdf-extractor/glovebox.tstypescript

import { glovebox, rule, composite } from "glovebox-core";
import { agent } from "./agent";

export default glovebox.wrap(agent, {
  name: "pdf-extractor",
  version: "0.1.0",
  base: "glovebox/docs",
  env: {
    ANTHROPIC_API_KEY: { required: true, secret: true },
  },
  storage: {
    // Inputs default to url-then-inline; explicit here for clarity.
    inputs: composite([rule.url(), rule.inline()]),
    // Small extracts inline, anything larger stays on the box for an hour.
    outputs: composite([
      rule.inline({ below: "256KB" }),
      rule.localServer({ ttl: "1h" }),
    ]),
  },
  limits: { cpu: "1", memory: "1Gi", timeout: "2m" },
});

This is everything the build CLI needs. The default fs layout is fine; the kit's injected environment and workspace skills will appear automatically and the /output hook gives the agent an escape hatch if a tool ever writes outside /output and still wants the file shipped back.

3. Build it

terminalbash

pnpm exec glovebox build ./glovebox.ts
# ✓ Resolved base image: ghcr.io/porkytheblack/glovebox/docs:1.2
# ✓ Resolved packages (0 apt, 0 pip, 0 npm)
# ✓ Generated Dockerfile
# ✓ Generated nixpacks.toml
# ✓ Generated server bundle
# ✓ Generated auth key (fingerprint: 9f3a…b1c2)
# ✓ Wrote dist/

The dist/ directory is now self-contained — a Dockerfile that FROMs ghcr.io/porkytheblack/glovebox/docs:1.2, an esbuild bundle of the agent + the kit, the manifest, and a single-use auth key. Running it is a docker invocation away.

terminalbash

docker build -t pdf-extractor dist/
GLOVEBOX_KEY=$(cat dist/glovebox.key) docker run \
  -p 8080:8080 \
  -e GLOVEBOX_KEY \
  -e ANTHROPIC_API_KEY \
  pdf-extractor

4. Call it from the host

The host script is a thin GloveboxClient wrapper. It reads a PDF off disk, hands it to the box as a named input, streams deltas as the agent works, and writes the extracted artefacts to the local filesystem when the prompt completes.

examples/glovebox-pdf-extractor/extract.tstypescript

import { GloveboxClient } from "glovebox-client";
import { readFile, writeFile } from "node:fs/promises";

const client = GloveboxClient.make({
  endpoints: {
    pdf: {
      url: process.env.PDF_BOX_URL ?? "ws://localhost:8080",
      key: process.env.PDF_BOX_KEY!,
    },
  },
});

async function extract(localPath: string) {
  const box = client.box("pdf");
  const bytes = await readFile(localPath);

  const result = box.prompt(
    "Extract the body text and the table of contents from /input/document.pdf. " +
    "Write extracted.txt and outline.json into /output.",
    {
      files: {
        "document.pdf": { mime: "application/pdf", bytes },
      },
    },
  );

  // Stream subscriber events as the agent works.
  for await (const ev of result.events) {
    if (ev.event_type === "tool_use") {
      const e = ev.data as { name: string; input: unknown };
      console.log(`[tool] ${e.name}`);
    } else if (ev.event_type === "text_delta") {
      process.stdout.write((ev.data as { text: string }).text);
    }
  }

  const summary = await result.message;
  console.log(`\n--\n${summary}`);

  // Pull each output through the configured ClientStorage.
  await writeFile("./extracted.txt", await result.read("extracted.txt"));
  await writeFile("./outline.json", await result.read("outline.json"));
}

await extract(process.argv[2]!);
await client.close();

box.prompt(...) returns immediately. The async iterables (events, display) drain as messages arrive on the WebSocket; the promises (message, outputs) settle when the kit sends complete. result.read(name) dispatches through ClientStorage — inline refs decode in place,server refs hit GET /files/:id with the bearer token. The host code never has to know which adapter the kit picked; the policy decides on the box side.

5. What the kit injected

Everything ran on top of four extensions the kit folded onto the agent at boot — without touching the agent source.

The environment skill let the model ask "what's installed?" mid-turn (it returns the manifest spec — base image, fs layout, packages, limits).
The workspace skill listed /input dynamically so the model could verify the upload landed before shelling out.
The /output hook would have caught any path the agent wanted shipped from outside /output — both tools here write inside that mount, so it stays unused.
The /clear-workspace hook is available if you turn this into a long-lived box that processes many PDFs in sequence; sending /clear-workspace between turns empties /work.

On boot the kit also prepended an environment block to the existing system prompt — the agent now knows it is running in a glovebox, what version, what fs mounts exist, and what the limits are, before any user prompt arrives.

Where each piece runs

Piece	Where it runs	Why
`agent.ts` + tools	Inside the container	Calls `pdftotext` / `pdftk`; needs the docs base image.
`glovebox.ts` (wrap)	Build step only	Resolved at `glovebox build`; the runtime reads its config from the bundle.
`startGlovebox` (kit)	Inside the container	HTTP + WS endpoint, storage adapters, file routes, injections.
`extract.ts` (client)	Host machine / worker / CI	Holds the PDF, drives the prompt, writes the extracted artefacts to disk.

Next steps

Glovebox reference — full authoring + protocol surface
Server-Side Agents — the headless agent shape the box wraps
Hooks, Skills & Mentions — how the kit's injections compose with your own
Build a Coding Agent — the in-process counterpart, where the tools live next to the UI