Build a Coffee Shop Assistant

Voice Enabled

In this showcase you will explore a working coffee ordering assistant that supports both text and voice interaction. The user types or speaks to a friendly barista AI, browses specialty coffees through interactive cards, adds items to a bag, and checks out through a form — all inside a single chat interface.

This is a real, runnable example in the Glove monorepo at examples/coffee/. Unlike the other showcases, which walk through conceptual builds, this page explains an app you can launch and use today. It demonstrates three capabilities that work together: interactive tool cards (the display stack), voice-driven conversation (ElevenLabs STT and TTS with Silero VAD), and an unAbortable checkout pattern that prevents voice interruptions from killing a critical form.

Prerequisites: You should have completed Getting Started and read The Display Stack. If you plan to set up voice, read the Voice docs as well.

What you will build

A coffee ordering assistant where a user can say “I want something fruity and light” and the app will:

Ask about brew method and taste preferences through clickable option chips (pushAndWait)
Show a horizontal carousel of matching coffees with origin, tasting notes, intensity bars, and pricing — the user clicks to select or add directly (pushAndWait)
Display an expanded product detail card with the full description (pushAndForget)
Show the shopping bag as a persistent summary card that updates as items are added (pushAndForget)
Present a checkout form with grind selection, email input, and order totals — the form is unAbortable, meaning it survives voice interruptions (pushAndWait)
Confirm the order with a success info card (pushAndForget)

In voice mode, the same flow works hands-free. The AI narrates product details instead of showing clickable cards, uses voice-only tools (get_products, get_cart), and asks preference questions verbally instead of through option chips.

Understanding the architecture

The coffee shop is a Next.js application. It uses glove-react for the display stack and chat loop, glove-next for the LLM proxy, and glove-voice for the voice pipeline. Here is how the pieces connect:

/api/chat — a createChatHandler route that proxies to the LLM provider. It sends tool schemas to the AI and streams back responses. It does not execute tools.
/api/voice/stt-token and /api/voice/tts-token — server routes that generate short-lived ElevenLabs tokens. The browser calls these before starting the voice pipeline so that API keys never leave the server.
Tool do functions — run in the browser. When the AI requests a tool call, useGlove executes the do function client-side. The function uses display.pushAndWait() or display.pushAndForget() to show React components in the chat.
Cart state — managed with React useState in the browser. A CartOps interface (add, get, clear) is passed to tool factories so they can read and modify the bag.
Session persistence — each conversation is stored in SQLite via createRemoteStore, so refreshing the page restores the full history.

The app also has a text mode and a voice mode. In text mode, the AI uses interactive tools with clickable UI. In voice mode, it swaps to voice-friendly tools that return plain text for the AI to narrate. The system prompt itself changes when voice activates — more on this in the dynamic system prompts section.

Tool categories

The coffee shop has 9 tools organized into three categories based on how they interact with the user. Understanding these categories is key to designing tools that work in both text and voice modes.

Interactive tools (pushAndWait)

These tools show a UI component and block until the user interacts. The AI pauses while the card is on screen. Think of them as questions that need a click to answer.

The ask_preference tool presents a question with multiple-choice options. The AI calls it to gather brew method, taste preference, or occasion — one question at a time, progressively.

app/lib/tools/ask-preference.tsxtsx

import React from "react";
import { defineTool } from "glove-react";
import { z } from "zod";

const inputSchema = z.object({
  question: z.string().describe("The question to display"),
  options: z
    .array(
      z.object({
        label: z.string().describe("Display text"),
        value: z.string().describe("Value returned when selected"),
      }),
    )
    .describe("2-6 options to present"),
});

export function createAskPreferenceTool() {
  return defineTool({
    name: "ask_preference",
    description:
      "Present the user with a set of options to choose from. " +
      "Blocks until they pick one. Use for brew method, roast " +
      "preference, mood, or any multiple-choice question.",
    inputSchema,
    displayPropsSchema: inputSchema,
    resolveSchema: z.string(),
    displayStrategy: "hide-on-complete",

    async do(input, display) {
      const selected = await display.pushAndWait(input);
      const selectedOption = input.options.find((o) => o.value === selected);
      return {
        status: "success" as const,
        data: `User selected: ${selected}`,
        renderData: {
          question: input.question,
          selected: selectedOption ?? { label: selected, value: selected },
        },
      };
    },

    render({ props, resolve }) {
      return (
        <div style={{ padding: 20, background: "#fefdfb", border: "1px dashed #8fa88f" }}>
          <p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 500, color: "#1e2e1e" }}>
            {props.question}
          </p>
          <div style={{ display: "flex", gap: 8, flexWrap: "wrap" }}>
            {props.options.map((opt) => (
              <button
                key={opt.value}
                onClick={() => resolve(opt.value)}
                style={{
                  padding: "8px 16px",
                  background: "transparent",
                  border: "1px solid #b8cab8",
                  color: "#2d422d",
                  fontFamily: "'DM Sans', sans-serif",
                  fontSize: 13,
                  cursor: "pointer",
                }}
              >
                {opt.label}
              </button>
            ))}
          </div>
        </div>
      );
    },

    renderResult({ data }) {
      const { question, selected } = data as {
        question: string;
        selected: { label: string; value: string };
      };
      return (
        <div style={{ padding: 20, background: "#fefdfb", border: "1px dashed #8fa88f" }}>
          <p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 500, color: "#1e2e1e" }}>
            {question}
          </p>
          <div style={{ display: "inline-block", padding: "8px 16px", background: "#111a11", color: "#fefdfb", fontFamily: "'DM Sans', sans-serif", fontSize: 13 }}>
            {selected.label}
          </div>
        </div>
      );
    },
  });
}

When the user clicks an option, resolve(opt.value) sends the string back to the do function. The displayStrategy: "hide-on-complete" makes the option chips disappear after the user picks, replaced by a compact renderResult showing just the selected option. This keeps the conversation clean when multiple preference questions are asked in sequence.

The show_products tool shows a horizontal carousel of product cards. Each card has a “Details” button (to drill into a product) and an “Add to bag” button (to add directly). The AI passes an array of product IDs, or ["all"] for the full catalog:

app/lib/tools/show-products.tsx (do function)tsx

export function createShowProductsTool(cartOps: CartOps) {
  return defineTool({
    name: "show_products",
    description:
      "Display a carousel of coffee products for the user to browse " +
      "and select from. Blocks until the user picks a product.",
    inputSchema: z.object({
      product_ids: z
        .array(z.string())
        .describe('Array of product IDs to show, or ["all"] for the full catalog'),
      prompt: z.string().optional().describe("Optional text shown above the products"),
    }),
    displayPropsSchema: inputSchema,
    resolveSchema: z.object({
      productId: z.string(),
      action: z.enum(["select", "add"]),
    }),
    displayStrategy: "hide-on-complete",

    async do(input, display) {
      const selected = await display.pushAndWait(input);
      const product = getProductById(selected.productId);
      if (!product) return "Product not found.";

      // If the user clicked "Add to bag", update the cart immediately
      const resultText =
        selected.action === "add"
          ? (() => {
              cartOps.add(selected.productId);
              const cart = cartOps.get();
              const total = cart.reduce((s, i) => s + i.price * i.qty, 0);
              return `User added ${product.name} to their bag. Cart now has ${cart.length} item(s), total ${formatPrice(total)}.`;
            })()
          : `User selected ${product.name} (${product.origin}, ${product.roast} roast, ${formatPrice(product.price)}).`;

      return {
        status: "success" as const,
        data: resultText,
        renderData: {
          productId: selected.productId,
          action: selected.action,
          productName: product.name,
          price: product.price,
        },
      };
    },

    // render() shows a scrollable carousel of ProductCard components
    // renderResult() shows a compact "Added Yirgacheffe - $22.00" label
  });
}

The resolve schema is an object with productId and action ("select" or "add"). This lets the AI know whether the user wants to see details or add straight to the bag. If the action is "add", the do function updates the cart via cartOps.add() before returning the result to the AI.

Display tools (pushAndForget)

These tools show a card and do not block. The AI keeps talking immediately after the card appears. Use them for information the user should see but does not need to act on.

The show_product_detail tool shows an expanded card with the full product description, origin, roast level, tasting notes, and intensity bar. The show_cart tool shows a summary of the shopping bag with line items, quantities, and totals. The show_info tool shows general information cards for sourcing details, brewing tips, or order confirmations.

app/lib/tools/show-cart.tsxtsx

export function createShowCartTool(cartOps: CartOps) {
  return defineTool({
    name: "show_cart",
    description:
      "Display the current shopping bag contents as a summary card. Non-blocking.",
    inputSchema: z.object({}),
    displayPropsSchema: z.object({ items: z.array(z.any()) }),
    displayStrategy: "hide-on-new",

    async do(_input, display) {
      const cart = cartOps.get();
      if (cart.length === 0) return "The bag is empty.";

      await display.pushAndForget({ items: cart });
      const total = cart.reduce((s, i) => s + i.price * i.qty, 0);
      return {
        status: "success" as const,
        data: `Displayed cart: ${cart.length} item(s), ${formatPrice(total)}.`,
        renderData: { items: cart },
      };
    },

    render({ props }) {
      return <CartSummary items={props.items as CartItem[]} />;
    },

    renderResult({ data }) {
      const { items } = data as { items: CartItem[] };
      return <CartSummary items={items} />;
    },
  });
}

The cart tool uses displayStrategy: "hide-on-new". This means when the AI calls show_cart again (after the user adds another item), the previous cart card disappears and the new one takes its place. Without this strategy, the conversation would accumulate stale cart snapshots.

The show_info tool supports a variant field — "info" for general cards and "success" for order confirmations. The success variant shows a green accent bar on the left:

app/lib/tools/show-info.tsxtsx

export function createShowInfoTool() {
  return defineTool({
    name: "show_info",
    description:
      "Display a persistent information card in the chat. Use for " +
      "sourcing details, brewing tips, order confirmations, or general info.",
    inputSchema: z.object({
      title: z.string().describe("Card title"),
      content: z.string().describe("Card body text"),
      variant: z
        .enum(["info", "success"])
        .optional()
        .describe("info = general, success = confirmation/order placed"),
    }),
    displayPropsSchema: z.object({
      title: z.string(),
      content: z.string(),
      variant: z.string(),
    }),

    async do(input, display) {
      const variant = input.variant ?? "info";
      await display.pushAndForget({ title: input.title, content: input.content, variant });
      return {
        status: "success" as const,
        data: `Displayed info card: ${input.title}`,
        renderData: { title: input.title, content: input.content, variant },
      };
    },

    render({ props }) {
      const accentColor = props.variant === "success" ? "#4ade80" : "#6b8a6b";
      return (
        <div style={{
          background: "#fefdfb",
          border: "1px solid #dce5dc",
          borderLeft: `3px solid ${accentColor}`,
          padding: 16,
          maxWidth: 400,
        }}>
          <p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 600, color: "#111a11" }}>
            {props.title}
          </p>
          <p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 13, lineHeight: 1.6, color: "#3d5a3d", whiteSpace: "pre-wrap" }}>
            {props.content}
          </p>
        </div>
      );
    },
  });
}

Voice-friendly tools (no UI)

These tools return plain text. They have no render function, no display stack involvement. The AI calls them, gets text back, and speaks it to the user. They exist so the voice mode has equivalents for the interactive tools that require clicking.

app/lib/tools/get-products.tstypescript

import type { ToolConfig } from "glove-react";
import { z } from "zod";
import { formatPrice, getProductsByIds } from "../products";

export function createGetProductsTool(): ToolConfig {
  return {
    name: "get_products",
    description:
      "Look up product details and return them as text. " +
      "Use this in voice mode instead of show_products.",
    inputSchema: z.object({
      product_ids: z
        .array(z.string())
        .describe('Array of product IDs to look up, or ["all"] for everything'),
    }),

    async do(input) {
      const ids = (input as { product_ids: string[] }).product_ids;
      const products = getProductsByIds(ids.includes("all") ? "all" : ids);

      if (products.length === 0) {
        return { status: "error" as const, data: "No products found for the given IDs." };
      }

      const lines = products.map(
        (p) =>
          `- ${p.name} (${p.id}): ${p.origin}, ${p.roast} roast, ${formatPrice(p.price)}/${p.weight}. Notes: ${p.notes.join(", ")}. Intensity: ${p.intensity}/10. ${p.description}`,
      );

      return { status: "success" as const, data: lines.join("\n") };
    },
  };
}

app/lib/tools/get-cart.tstypescript

import type { ToolConfig } from "glove-react";
import { z } from "zod";
import { formatPrice } from "../products";
import type { CartOps } from "../theme";

export function createGetCartTool(cartOps: CartOps): ToolConfig {
  return {
    name: "get_cart",
    description:
      "Look up the current shopping bag contents and return them as text. " +
      "Use this in voice mode instead of show_cart.",
    inputSchema: z.object({}),

    async do() {
      const cart = cartOps.get();
      if (cart.length === 0) {
        return { status: "success" as const, data: "The bag is empty." };
      }

      const lines = cart.map(
        (item) => `- ${item.name} x${item.qty} — ${formatPrice(item.price * item.qty)}`,
      );
      const subtotal = cart.reduce((s, i) => s + i.price * i.qty, 0);
      const totalItems = cart.reduce((s, i) => s + i.qty, 0);

      return {
        status: "success" as const,
        data: `${totalItems} item(s) in bag:\n${lines.join("\n")}\nSubtotal: ${formatPrice(subtotal)}`,
      };
    },
  };
}

Notice these are plain ToolConfig objects, not defineTool calls. Since they have no display UI, they do not need typed display props or resolve schemas. They are pure data lookups — the AI gets text back and narrates it aloud.

All 9 tools are assembled through a factory function that receives the CartOps interface:

app/lib/tools/index.tstypescript

import type { ToolConfig } from "glove-react";
import type { CartOps } from "../theme";
import { createAskPreferenceTool } from "./ask-preference";
import { createShowProductsTool } from "./show-products";
import { createShowProductDetailTool } from "./show-product-detail";
import { createAddToCartTool } from "./add-to-cart";
import { createShowCartTool } from "./show-cart";
import { createCheckoutTool } from "./checkout";
import { createShowInfoTool } from "./show-info";
import { createGetProductsTool } from "./get-products";
import { createGetCartTool } from "./get-cart";

export function createCoffeeTools(cartOps: CartOps): ToolConfig[] {
  return [
    createAskPreferenceTool(),
    createShowProductsTool(cartOps),
    createShowProductDetailTool(),
    createAddToCartTool(cartOps),
    createShowCartTool(cartOps),
    createCheckoutTool(cartOps),
    createShowInfoTool(),
    // Voice-friendly tools — return data as text for the LLM to narrate
    createGetProductsTool(),
    createGetCartTool(cartOps),
  ];
}

The tool factory pattern lets tools that need cart access (like show_products, checkout, and get_cart) share the same CartOps instance. Cart state lives in React, not on the server, so it updates instantly when the user adds an item.

The unAbortable pattern

This is the key pattern in the coffee shop example. The checkout tool uses unAbortable: true, which means the tool keeps running even when the user interrupts it.

Why does this matter? In voice mode, the user might speak while the checkout form is on screen. Normally, speaking triggers a barge-in — the voice pipeline interrupts the current AI turn, aborts any running tools, and starts listening for the new utterance. This is great for casual conversation (the user can say “actually, never mind” mid-response), but it is terrible for checkout. If the user accidentally makes a sound — a cough, a background noise, even saying “let me type my email” — the checkout form would vanish and the cart data would be lost.

The unAbortable flag provides two layers of protection:

Voice layer: barge-in suppression. When any pushAndWait resolver is active (the display manager's resolver store has entries), the voice pipeline suppresses barge-in. The user's microphone still picks up audio, but the pipeline will not interrupt the current turn. This means speaking during checkout does not trigger a new AI response.
Core layer: abort resistance. Even if an abort signal fires (from any source, not just voice), the Glove core checks tool.unAbortable before killing the tool. If the flag is set, the tool keeps running to completion. ThepushAndWait promise resolves normally when the user submits the form.

Here is the checkout tool implementation:

app/lib/tools/checkout.tsxtsx

import React, { useState } from "react";
import { defineTool } from "glove-react";
import { z } from "zod";
import { formatPrice, GRIND_OPTIONS, type CartItem } from "../products";
import type { CartOps } from "../theme";

export function createCheckoutTool(cartOps: CartOps) {
  return defineTool({
    name: "checkout",
    description:
      "Present the checkout form with the current cart, grind selection, " +
      "and email input. Blocks until the user submits or cancels. " +
      "Only call when the user is ready to checkout.",
    inputSchema: z.object({}),
    displayPropsSchema: z.object({ items: z.array(z.any()) }),
    resolveSchema: z.union([
      z.object({ grind: z.string(), email: z.string() }),
      z.null(),
    ]),
    unAbortable: true,
    displayStrategy: "hide-on-complete",

    async do(_input, display) {
      const cart = cartOps.get();
      if (cart.length === 0) return "Cannot checkout — the bag is empty.";

      const result = await display.pushAndWait({ items: cart });

      if (!result) {
        return {
          status: "success" as const,
          data: "User cancelled checkout and wants to continue shopping.",
          renderData: { cancelled: true },
        };
      }

      const total = cart.reduce((s, i) => s + i.price * i.qty, 0);
      cartOps.clear();
      return {
        status: "success" as const,
        data: `Order placed! Grind: ${result.grind}. Cart cleared. Total items ordered: ${cart.length}.`,
        renderData: {
          grind: result.grind,
          email: result.email,
          items: cart,
          total,
        },
      };
    },

    render({ props, resolve }) {
      return <CheckoutForm items={props.items as CartItem[]} onSubmit={resolve} />;
    },

    renderResult({ data }) {
      const result = data as
        | { cancelled: true }
        | { grind: string; email: string; items: CartItem[]; total: number };

      if ("cancelled" in result) {
        return (
          <div style={{ padding: 16, background: "#fefdfb", border: "1px solid #dce5dc" }}>
            <p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 13, color: "#6b8a6b", fontStyle: "italic" }}>
              Checkout cancelled — continued shopping.
            </p>
          </div>
        );
      }

      return (
        <div style={{ background: "#fefdfb", border: "1px solid #dce5dc", borderLeft: "3px solid #4ade80", padding: 16 }}>
          <p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 600, color: "#111a11" }}>
            Order Confirmed
          </p>
          {result.items.map((item) => (
            <div key={item.id} style={{ display: "flex", justifyContent: "space-between", padding: "4px 0", fontSize: 12, color: "#3d5a3d" }}>
              <span>{item.name} x{item.qty}</span>
              <span style={{ fontFamily: "'DM Mono', monospace" }}>{formatPrice(item.price * item.qty)}</span>
            </div>
          ))}
          <div style={{ marginTop: 8, paddingTop: 8, borderTop: "1px solid #dce5dc", display: "flex", justifyContent: "space-between" }}>
            <span style={{ fontFamily: "'DM Mono', monospace", fontSize: 13, fontWeight: 600 }}>Total</span>
            <span style={{ fontFamily: "'DM Mono', monospace", fontSize: 13, fontWeight: 600 }}>{formatPrice(result.total)}</span>
          </div>
        </div>
      );
    },
  });
}

The CheckoutForm component is a regular React form with useState for grind selection and email input. It shows the bag items, a grind picker (Whole Bean, French Press, Pour Over, Espresso, Aeropress), an email field, subtotal, shipping (free over $40), and total. The “Place Order” button calls resolve({ grind, email }) and the “Continue shopping” link calls resolve(null).

The critical line is unAbortable: true. Without it, a voice interrupt during checkout would abort the tool, dismiss the form, and lose the cart data. With it, the form stays on screen no matter what happens in the voice pipeline.

Here is the conceptual flow of what happens when the user speaks during checkout:

conceptual flowtypescript

// 1. User says "let me check out"
// 2. AI calls checkout tool
// 3. do() calls display.pushAndWait({ items: cart })
// 4. CheckoutForm renders on screen — resolver is registered

// 5. User accidentally speaks while filling in email
// 6. Voice pipeline detects speech...
// 7. Voice layer checks: resolverStore.size > 0? YES
//    -> Barge-in SUPPRESSED. Speech is ignored.

// 8. Even if an abort signal fires from another source:
// 9. Core checks: tool.unAbortable? YES
//    -> Tool keeps running. pushAndWait stays active.

// 10. User fills in email, clicks "Place Order"
// 11. resolve({ grind: "Pour Over", email: "..." })
// 12. do() receives the result, clears the cart, returns to AI
// 13. AI confirms the order with show_info variant="success"

Voice integration

The voice pipeline has four components: speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), and the React hook that ties them together. Here is how to set up each piece.

Step 1: Token routes

ElevenLabs uses token-based authentication. Your server generates short-lived tokens, and the browser uses them to connect directly to ElevenLabs. This keeps your API key on the server. Glove provides a createVoiceTokenHandler helper that handles the token exchange:

app/api/voice/stt-token/route.tstypescript

import { createVoiceTokenHandler } from "glove-next";

export const GET = createVoiceTokenHandler({ provider: "elevenlabs", type: "stt" });

app/api/voice/tts-token/route.tstypescript

import { createVoiceTokenHandler } from "glove-next";

export const GET = createVoiceTokenHandler({ provider: "elevenlabs", type: "tts" });

These routes read your ELEVENLABS_API_KEY from the server environment and return a temporary token. The browser calls them before starting each voice session.

Step 2: Adapter configuration

Create a client-side module that configures the ElevenLabs adapters and the Silero VAD. The VAD is dynamically imported to avoid pulling onnxruntime-web (a WASM dependency) into the Next.js server bundle during SSR or prerendering:

app/lib/voice.tstypescript

import { createElevenLabsAdapters } from "glove-voice";

async function fetchToken(path: string): Promise<string> {
  const res = await fetch(path);
  const data = (await res.json()) as { token?: string; error?: string };
  if (!res.ok || !data.token) {
    throw new Error(data.error ?? `Token fetch failed (${res.status})`);
  }
  return data.token;
}

// ElevenLabs STT (Scribe) + TTS adapters
export const { stt, createTTS } = createElevenLabsAdapters({
  getSTTToken: () => fetchToken("/api/voice/stt-token"),
  getTTSToken: () => fetchToken("/api/voice/tts-token"),
  voiceId: "56bWURjYFHyYyVf490Dp", // "George" — warm, friendly barista persona
});

// Silero VAD — dynamically imported to avoid WASM in SSR
export async function createSileroVAD() {
  const { SileroVADAdapter } = await import("glove-voice/silero-vad");
  const vad = new SileroVADAdapter({
    positiveSpeechThreshold: 0.5,
    negativeSpeechThreshold: 0.35,
    wasm: { type: "cdn" },
  });
  await vad.init();
  return vad;
}

The voiceId selects the TTS voice. The coffee shop uses “George” — a warm, conversational voice that fits the friendly barista persona. The VAD thresholds control how sensitive the turn detection is: positiveSpeechThreshold is the confidence needed to start detecting speech, and negativeSpeechThreshold is when it decides the user has stopped talking.

Step 3: The useGloveVoice hook

In the chat component, initialize the VAD on mount and pass everything to useGloveVoice:

app/components/chat.tsx (voice setup)tsx

import { useGlove, Render } from "glove-react";
import { useGloveVoice } from "glove-react/voice";
import type { TurnMode } from "glove-react/voice";
import { stt, createTTS, createSileroVAD } from "../lib/voice";
import { systemPrompt, voiceSystemPrompt } from "../lib/system-prompt";

export default function Chat({ sessionId }: { sessionId: string }) {
  const [turnMode, setTurnMode] = useState<TurnMode>("vad");

  // Cart state, tools, and glove hook setup...
  const tools = useMemo(() => createCoffeeTools(cartOps), [cartOps]);
  const glove = useGlove({ tools, sessionId });
  const { runnable } = glove;

  // Initialize Silero VAD model on mount (dynamic import avoids SSR issues)
  const [vadReady, setVadReady] = useState(false);
  const vadRef = useRef<Awaited<ReturnType<typeof createSileroVAD>> | null>(null);

  useEffect(() => {
    createSileroVAD().then((v) => {
      vadRef.current = v;
      setVadReady(true);
    });
  }, []);

  // Build voice config — only include VAD once it has loaded
  const voiceConfig = useMemo(
    () => ({
      stt,
      createTTS,
      vad: vadReady ? vadRef.current ?? undefined : undefined,
      turnMode,
    }),
    [vadReady, turnMode],
  );

  const voice = useGloveVoice({ runnable, voice: voiceConfig });

  // Swap system prompt when voice activates
  useEffect(() => {
    if (!runnable) return;
    if (voice.isActive) {
      runnable.setSystemPrompt(voiceSystemPrompt);
    } else {
      runnable.setSystemPrompt(systemPrompt);
    }
  }, [voice.isActive, runnable]);

  // voice.start()  — requests mic, opens STT, begins listening
  // voice.stop()   — releases mic, closes STT and TTS
  // voice.mode     — "idle" | "listening" | "thinking" | "speaking"
  // voice.isActive — true when mode is not "idle"
}

The useGloveVoice hook returns a simple state machine. It cycles through four modes: idle (not started), listening (microphone active, waiting for speech), thinking (user finished speaking, waiting for AI response), and speaking (TTS playing back the AI response). After speaking finishes, it returns to listening automatically.

The hook also supports two turn modes: "vad" (hands-free — the VAD detects when the user stops talking and auto-commits the turn) and "manual" (push-to-talk — the user holds a button or spacebar to record, and the turn commits on release).

Step 4: The voice orb

The coffee shop displays an animated orb that communicates voice state through motion:

Listening: A gentle breathing pulse on the outer ring — “I am here, speak.”
Thinking: The ring tightens and rotates — “Processing your words.”
Speaking: Concentric ripples expand outward — “Sound is coming from me.”

In VAD mode, tapping the orb ends the voice session. During speaking, tapping triggers barge-in (interrupt) and snaps back to listening. In manual mode, the orb acts as the push-to-talk button — click to start recording, click again to stop and commit. The CSS animations are driven by a class that changes with the voice mode: voice-orb--listening, voice-orb--thinking, voice-orb--speaking, and voice-orb--recording for manual mode.

Dynamic system prompts

The coffee shop uses two system prompts — one for text mode and one for voice mode. When the user activates the microphone, the component calls runnable.setSystemPrompt(voiceSystemPrompt) to swap the prompt. When voice ends, it swaps back.

The voice prompt adds a section at the end that tells the AI which tools to avoid and which to use instead:

app/lib/system-prompt.ts (voice additions)typescript

export const voiceSystemPrompt = `${systemPrompt}

## Voice Mode — IMPORTANT
The user is interacting via voice. They CANNOT click buttons or interact
with visual elements. You must adapt your tool usage and speaking style.

### Tool Substitutions (voice mode)
These tools block on user clicks and MUST NOT be used in voice mode:
- **show_products** -> use **get_products** instead (returns product data
  as text for you to narrate)
- **show_cart** -> use **get_cart** instead (returns full cart breakdown
  as text)
- **ask_preference** -> DO NOT use. Instead, ask the user verbally and
  let them respond by speaking.

These tools still work in voice mode (non-blocking):
- **get_products** — look up products and narrate the results.
- **get_cart** — look up cart contents and read them back.
- **add_to_cart** — works normally. Confirm verbally what you added.
- **show_product_detail** — still displays a card, but describe the
  product verbally too.
- **show_info** — still displays a card, but speak the key info aloud.
- **checkout** — still works (the form will appear on screen).

### Speaking Style
- Be conversational — speak naturally, as if chatting at a coffee counter.
- Describe products verbally — mention name, origin, roast, key tasting
  notes, and price.
- Keep it concise — voice responses should be shorter than text.
- Ask one thing at a time.`;

The tool substitution pattern is the heart of multimodal tool design. Every interactive tool (show_products, show_cart, ask_preference) has a voice-friendly counterpart that either returns text data or is replaced by natural conversation. The AI is smart enough to follow these instructions consistently — when the system prompt says “use get_products instead of show_products”, it does.

The checkout tool is the interesting exception. It works in both modes because even in voice mode, the user needs a visual form to enter their email address and select a grind. The system prompt says checkout “still works (the form will appear on screen)” so the AI knows to use it normally.

Display patterns summary

Tool	Pattern	Display Strategy	Why
`ask_preference`	`pushAndWait`	`hide-on-complete`	Multi-choice chips disappear after user picks an option
`show_products`	`pushAndWait`	`hide-on-complete`	Product carousel disappears after user selects or adds
`checkout`	`pushAndWait` + `unAbortable`	`hide-on-complete`	Order form stays on screen even during voice interrupts
`show_product_detail`	`pushAndForget`	`stay`	Product detail card persists in the conversation
`show_cart`	`pushAndForget`	`hide-on-new`	Old cart card replaced when updated cart appears
`show_info`	`pushAndForget`	`stay`	Info cards (sourcing, brewing, confirmations) persist
`add_to_cart`	No display	n/a	Pure data — updates cart state, returns confirmation text
`get_products`	No display	n/a	Voice-only — returns product data as text for narration
`get_cart`	No display	n/a	Voice-only — returns cart contents as text for narration

Running it

The coffee shop is a working example in the Glove monorepo. To run it locally:

terminalbash

# Clone the repo and install dependencies
git clone https://github.com/your-org/glove.git
cd glove
pnpm install

Create a .env.local file in the examples/coffee/ directory with your API keys:

examples/coffee/.env.localbash

# Required — LLM provider
OPENROUTER_API_KEY=your-openrouter-key

# Optional — only needed for voice mode
ELEVENLABS_API_KEY=your-elevenlabs-key

Then start the dev server:

terminalbash

pnpm --filter glove-coffee run dev

Try these conversations in text mode:

“What do you recommend?” — the AI asks about your preferences through option chips, then shows matching coffees in a carousel
“Tell me about the Yirgacheffe” — a detailed product card appears with origin, tasting notes, intensity bar, and description
“Add it to my bag” — the cart updates instantly; the AI shows the bag summary
“I am ready to check out” — the checkout form appears with grind selection, email, and totals

Try these in voice mode (click the microphone button):

“What coffees do you have?” — the AI narrates the catalog instead of showing cards
“Add the Huila Reserve” — the AI confirms verbally: “Done! I have added the Huila Reserve to your bag.”
“What is in my bag?” — the AI reads back the contents and total
“Let me check out” — the checkout form appears on screen (even in voice mode, you need the form to type your email)

Key takeaways

The coffee shop demonstrates several patterns worth learning from:

Tool factories with shared state. The CartOps interface lets multiple tools read and modify the same cart without prop drilling or global state.
Dual tool sets for multimodal interaction. Interactive tools for text mode, text-only equivalents for voice mode. The system prompt tells the AI which set to use.
unAbortable for critical flows. The checkout form cannot be dismissed by voice interrupts — two layers of protection ensure the user's form data is safe.
Dynamic system prompts. Swapping the system prompt at runtime lets a single app support completely different interaction styles without duplicating tool logic.
Display strategy selection. Using hide-on-complete for interactive tools keeps the conversation clean, while hide-on-new for the cart prevents stale data, and stay for info cards keeps useful context visible.

Next steps

Voice Documentation — full guide to STT, TTS, VAD, turn modes, and the voice pipeline lifecycle
The Display Stack — deep dive into pushAndWait, pushAndForget, and display strategies
Build a Shopping Assistant — a conceptual ecommerce build that explores product browsing, variant selection, and cart patterns
Build a Coding Agent — the gate-execute-display pattern for server mutations with diff previews and command approval
Build a Travel Planner — progressive preference gathering with all client-side tools
Build a Terminal Agent — use glove-core directly without React or Next.js
defineTool API Reference — full API for typed tool definitions with displayPropsSchema, resolveSchema, and unAbortable