In this showcase you will explore a working coffee ordering assistant that supports both text and voice interaction. The user types or speaks to a friendly barista AI, browses specialty coffees through interactive cards, adds items to a bag, and checks out through a form — all inside a single chat interface.
This is a real, runnable example in the Glove monorepo at examples/coffee/. Unlike the other showcases, which walk through conceptual builds, this page explains an app you can launch and use today. It demonstrates three capabilities that work together: interactive tool cards (the display stack), voice-driven conversation (ElevenLabs STT and TTS with Silero VAD), and an unAbortable checkout pattern that prevents voice interruptions from killing a critical form.
Prerequisites: You should have completed Getting Started and read The Display Stack. If you plan to set up voice, read the Voice docs as well.
A coffee ordering assistant where a user can say “I want something fruity and light” and the app will:
pushAndWait)pushAndWait)pushAndForget)pushAndForget)unAbortable, meaning it survives voice interruptions (pushAndWait)pushAndForget)In voice mode, the same flow works hands-free. The AI narrates product details instead of showing clickable cards, uses voice-only tools (get_products, get_cart), and asks preference questions verbally instead of through option chips.
The coffee shop is a Next.js application. It uses glove-react for the display stack and chat loop, glove-next for the LLM proxy, and glove-voice for the voice pipeline. Here is how the pieces connect:
/api/chat — a createChatHandler route that proxies to the LLM provider. It sends tool schemas to the AI and streams back responses. It does not execute tools./api/voice/stt-token and /api/voice/tts-token — server routes that generate short-lived ElevenLabs tokens. The browser calls these before starting the voice pipeline so that API keys never leave the server.do functions — run in the browser. When the AI requests a tool call, useGlove executes the do function client-side. The function uses display.pushAndWait() or display.pushAndForget() to show React components in the chat.useState in the browser. A CartOps interface (add, get, clear) is passed to tool factories so they can read and modify the bag.createRemoteStore, so refreshing the page restores the full history.The app also has a text mode and a voice mode. In text mode, the AI uses interactive tools with clickable UI. In voice mode, it swaps to voice-friendly tools that return plain text for the AI to narrate. The system prompt itself changes when voice activates — more on this in the dynamic system prompts section.
The coffee shop has 9 tools organized into three categories based on how they interact with the user. Understanding these categories is key to designing tools that work in both text and voice modes.
These tools show a UI component and block until the user interacts. The AI pauses while the card is on screen. Think of them as questions that need a click to answer.
The ask_preference tool presents a question with multiple-choice options. The AI calls it to gather brew method, taste preference, or occasion — one question at a time, progressively.
import React from "react";
import { defineTool } from "glove-react";
import { z } from "zod";
const inputSchema = z.object({
question: z.string().describe("The question to display"),
options: z
.array(
z.object({
label: z.string().describe("Display text"),
value: z.string().describe("Value returned when selected"),
}),
)
.describe("2-6 options to present"),
});
export function createAskPreferenceTool() {
return defineTool({
name: "ask_preference",
description:
"Present the user with a set of options to choose from. " +
"Blocks until they pick one. Use for brew method, roast " +
"preference, mood, or any multiple-choice question.",
inputSchema,
displayPropsSchema: inputSchema,
resolveSchema: z.string(),
displayStrategy: "hide-on-complete",
async do(input, display) {
const selected = await display.pushAndWait(input);
const selectedOption = input.options.find((o) => o.value === selected);
return {
status: "success" as const,
data: `User selected: ${selected}`,
renderData: {
question: input.question,
selected: selectedOption ?? { label: selected, value: selected },
},
};
},
render({ props, resolve }) {
return (
<div style={{ padding: 20, background: "#fefdfb", border: "1px dashed #8fa88f" }}>
<p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 500, color: "#1e2e1e" }}>
{props.question}
</p>
<div style={{ display: "flex", gap: 8, flexWrap: "wrap" }}>
{props.options.map((opt) => (
<button
key={opt.value}
onClick={() => resolve(opt.value)}
style={{
padding: "8px 16px",
background: "transparent",
border: "1px solid #b8cab8",
color: "#2d422d",
fontFamily: "'DM Sans', sans-serif",
fontSize: 13,
cursor: "pointer",
}}
>
{opt.label}
</button>
))}
</div>
</div>
);
},
renderResult({ data }) {
const { question, selected } = data as {
question: string;
selected: { label: string; value: string };
};
return (
<div style={{ padding: 20, background: "#fefdfb", border: "1px dashed #8fa88f" }}>
<p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 500, color: "#1e2e1e" }}>
{question}
</p>
<div style={{ display: "inline-block", padding: "8px 16px", background: "#111a11", color: "#fefdfb", fontFamily: "'DM Sans', sans-serif", fontSize: 13 }}>
{selected.label}
</div>
</div>
);
},
});
}When the user clicks an option, resolve(opt.value) sends the string back to the do function. The displayStrategy: "hide-on-complete" makes the option chips disappear after the user picks, replaced by a compact renderResult showing just the selected option. This keeps the conversation clean when multiple preference questions are asked in sequence.
The show_products tool shows a horizontal carousel of product cards. Each card has a “Details” button (to drill into a product) and an “Add to bag” button (to add directly). The AI passes an array of product IDs, or ["all"] for the full catalog:
export function createShowProductsTool(cartOps: CartOps) {
return defineTool({
name: "show_products",
description:
"Display a carousel of coffee products for the user to browse " +
"and select from. Blocks until the user picks a product.",
inputSchema: z.object({
product_ids: z
.array(z.string())
.describe('Array of product IDs to show, or ["all"] for the full catalog'),
prompt: z.string().optional().describe("Optional text shown above the products"),
}),
displayPropsSchema: inputSchema,
resolveSchema: z.object({
productId: z.string(),
action: z.enum(["select", "add"]),
}),
displayStrategy: "hide-on-complete",
async do(input, display) {
const selected = await display.pushAndWait(input);
const product = getProductById(selected.productId);
if (!product) return "Product not found.";
// If the user clicked "Add to bag", update the cart immediately
const resultText =
selected.action === "add"
? (() => {
cartOps.add(selected.productId);
const cart = cartOps.get();
const total = cart.reduce((s, i) => s + i.price * i.qty, 0);
return `User added ${product.name} to their bag. Cart now has ${cart.length} item(s), total ${formatPrice(total)}.`;
})()
: `User selected ${product.name} (${product.origin}, ${product.roast} roast, ${formatPrice(product.price)}).`;
return {
status: "success" as const,
data: resultText,
renderData: {
productId: selected.productId,
action: selected.action,
productName: product.name,
price: product.price,
},
};
},
// render() shows a scrollable carousel of ProductCard components
// renderResult() shows a compact "Added Yirgacheffe - $22.00" label
});
}The resolve schema is an object with productId and action ("select" or "add"). This lets the AI know whether the user wants to see details or add straight to the bag. If the action is "add", the do function updates the cart via cartOps.add() before returning the result to the AI.
These tools show a card and do not block. The AI keeps talking immediately after the card appears. Use them for information the user should see but does not need to act on.
The show_product_detail tool shows an expanded card with the full product description, origin, roast level, tasting notes, and intensity bar. The show_cart tool shows a summary of the shopping bag with line items, quantities, and totals. The show_info tool shows general information cards for sourcing details, brewing tips, or order confirmations.
export function createShowCartTool(cartOps: CartOps) {
return defineTool({
name: "show_cart",
description:
"Display the current shopping bag contents as a summary card. Non-blocking.",
inputSchema: z.object({}),
displayPropsSchema: z.object({ items: z.array(z.any()) }),
displayStrategy: "hide-on-new",
async do(_input, display) {
const cart = cartOps.get();
if (cart.length === 0) return "The bag is empty.";
await display.pushAndForget({ items: cart });
const total = cart.reduce((s, i) => s + i.price * i.qty, 0);
return {
status: "success" as const,
data: `Displayed cart: ${cart.length} item(s), ${formatPrice(total)}.`,
renderData: { items: cart },
};
},
render({ props }) {
return <CartSummary items={props.items as CartItem[]} />;
},
renderResult({ data }) {
const { items } = data as { items: CartItem[] };
return <CartSummary items={items} />;
},
});
}The cart tool uses displayStrategy: "hide-on-new". This means when the AI calls show_cart again (after the user adds another item), the previous cart card disappears and the new one takes its place. Without this strategy, the conversation would accumulate stale cart snapshots.
The show_info tool supports a variant field — "info" for general cards and "success" for order confirmations. The success variant shows a green accent bar on the left:
export function createShowInfoTool() {
return defineTool({
name: "show_info",
description:
"Display a persistent information card in the chat. Use for " +
"sourcing details, brewing tips, order confirmations, or general info.",
inputSchema: z.object({
title: z.string().describe("Card title"),
content: z.string().describe("Card body text"),
variant: z
.enum(["info", "success"])
.optional()
.describe("info = general, success = confirmation/order placed"),
}),
displayPropsSchema: z.object({
title: z.string(),
content: z.string(),
variant: z.string(),
}),
async do(input, display) {
const variant = input.variant ?? "info";
await display.pushAndForget({ title: input.title, content: input.content, variant });
return {
status: "success" as const,
data: `Displayed info card: ${input.title}`,
renderData: { title: input.title, content: input.content, variant },
};
},
render({ props }) {
const accentColor = props.variant === "success" ? "#4ade80" : "#6b8a6b";
return (
<div style={{
background: "#fefdfb",
border: "1px solid #dce5dc",
borderLeft: `3px solid ${accentColor}`,
padding: 16,
maxWidth: 400,
}}>
<p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 600, color: "#111a11" }}>
{props.title}
</p>
<p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 13, lineHeight: 1.6, color: "#3d5a3d", whiteSpace: "pre-wrap" }}>
{props.content}
</p>
</div>
);
},
});
}These tools return plain text. They have no render function, no display stack involvement. The AI calls them, gets text back, and speaks it to the user. They exist so the voice mode has equivalents for the interactive tools that require clicking.
import type { ToolConfig } from "glove-react";
import { z } from "zod";
import { formatPrice, getProductsByIds } from "../products";
export function createGetProductsTool(): ToolConfig {
return {
name: "get_products",
description:
"Look up product details and return them as text. " +
"Use this in voice mode instead of show_products.",
inputSchema: z.object({
product_ids: z
.array(z.string())
.describe('Array of product IDs to look up, or ["all"] for everything'),
}),
async do(input) {
const ids = (input as { product_ids: string[] }).product_ids;
const products = getProductsByIds(ids.includes("all") ? "all" : ids);
if (products.length === 0) {
return { status: "error" as const, data: "No products found for the given IDs." };
}
const lines = products.map(
(p) =>
`- ${p.name} (${p.id}): ${p.origin}, ${p.roast} roast, ${formatPrice(p.price)}/${p.weight}. Notes: ${p.notes.join(", ")}. Intensity: ${p.intensity}/10. ${p.description}`,
);
return { status: "success" as const, data: lines.join("\n") };
},
};
}import type { ToolConfig } from "glove-react";
import { z } from "zod";
import { formatPrice } from "../products";
import type { CartOps } from "../theme";
export function createGetCartTool(cartOps: CartOps): ToolConfig {
return {
name: "get_cart",
description:
"Look up the current shopping bag contents and return them as text. " +
"Use this in voice mode instead of show_cart.",
inputSchema: z.object({}),
async do() {
const cart = cartOps.get();
if (cart.length === 0) {
return { status: "success" as const, data: "The bag is empty." };
}
const lines = cart.map(
(item) => `- ${item.name} x${item.qty} — ${formatPrice(item.price * item.qty)}`,
);
const subtotal = cart.reduce((s, i) => s + i.price * i.qty, 0);
const totalItems = cart.reduce((s, i) => s + i.qty, 0);
return {
status: "success" as const,
data: `${totalItems} item(s) in bag:\n${lines.join("\n")}\nSubtotal: ${formatPrice(subtotal)}`,
};
},
};
}Notice these are plain ToolConfig objects, not defineTool calls. Since they have no display UI, they do not need typed display props or resolve schemas. They are pure data lookups — the AI gets text back and narrates it aloud.
All 9 tools are assembled through a factory function that receives the CartOps interface:
import type { ToolConfig } from "glove-react";
import type { CartOps } from "../theme";
import { createAskPreferenceTool } from "./ask-preference";
import { createShowProductsTool } from "./show-products";
import { createShowProductDetailTool } from "./show-product-detail";
import { createAddToCartTool } from "./add-to-cart";
import { createShowCartTool } from "./show-cart";
import { createCheckoutTool } from "./checkout";
import { createShowInfoTool } from "./show-info";
import { createGetProductsTool } from "./get-products";
import { createGetCartTool } from "./get-cart";
export function createCoffeeTools(cartOps: CartOps): ToolConfig[] {
return [
createAskPreferenceTool(),
createShowProductsTool(cartOps),
createShowProductDetailTool(),
createAddToCartTool(cartOps),
createShowCartTool(cartOps),
createCheckoutTool(cartOps),
createShowInfoTool(),
// Voice-friendly tools — return data as text for the LLM to narrate
createGetProductsTool(),
createGetCartTool(cartOps),
];
}The tool factory pattern lets tools that need cart access (like show_products, checkout, and get_cart) share the same CartOps instance. Cart state lives in React, not on the server, so it updates instantly when the user adds an item.
This is the key pattern in the coffee shop example. The checkout tool uses unAbortable: true, which means the tool keeps running even when the user interrupts it.
Why does this matter? In voice mode, the user might speak while the checkout form is on screen. Normally, speaking triggers a barge-in — the voice pipeline interrupts the current AI turn, aborts any running tools, and starts listening for the new utterance. This is great for casual conversation (the user can say “actually, never mind” mid-response), but it is terrible for checkout. If the user accidentally makes a sound — a cough, a background noise, even saying “let me type my email” — the checkout form would vanish and the cart data would be lost.
The unAbortable flag provides two layers of protection:
pushAndWait resolver is active (the display manager's resolver store has entries), the voice pipeline suppresses barge-in. The user's microphone still picks up audio, but the pipeline will not interrupt the current turn. This means speaking during checkout does not trigger a new AI response.tool.unAbortable before killing the tool. If the flag is set, the tool keeps running to completion. ThepushAndWait promise resolves normally when the user submits the form.Here is the checkout tool implementation:
import React, { useState } from "react";
import { defineTool } from "glove-react";
import { z } from "zod";
import { formatPrice, GRIND_OPTIONS, type CartItem } from "../products";
import type { CartOps } from "../theme";
export function createCheckoutTool(cartOps: CartOps) {
return defineTool({
name: "checkout",
description:
"Present the checkout form with the current cart, grind selection, " +
"and email input. Blocks until the user submits or cancels. " +
"Only call when the user is ready to checkout.",
inputSchema: z.object({}),
displayPropsSchema: z.object({ items: z.array(z.any()) }),
resolveSchema: z.union([
z.object({ grind: z.string(), email: z.string() }),
z.null(),
]),
unAbortable: true,
displayStrategy: "hide-on-complete",
async do(_input, display) {
const cart = cartOps.get();
if (cart.length === 0) return "Cannot checkout — the bag is empty.";
const result = await display.pushAndWait({ items: cart });
if (!result) {
return {
status: "success" as const,
data: "User cancelled checkout and wants to continue shopping.",
renderData: { cancelled: true },
};
}
const total = cart.reduce((s, i) => s + i.price * i.qty, 0);
cartOps.clear();
return {
status: "success" as const,
data: `Order placed! Grind: ${result.grind}. Cart cleared. Total items ordered: ${cart.length}.`,
renderData: {
grind: result.grind,
email: result.email,
items: cart,
total,
},
};
},
render({ props, resolve }) {
return <CheckoutForm items={props.items as CartItem[]} onSubmit={resolve} />;
},
renderResult({ data }) {
const result = data as
| { cancelled: true }
| { grind: string; email: string; items: CartItem[]; total: number };
if ("cancelled" in result) {
return (
<div style={{ padding: 16, background: "#fefdfb", border: "1px solid #dce5dc" }}>
<p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 13, color: "#6b8a6b", fontStyle: "italic" }}>
Checkout cancelled — continued shopping.
</p>
</div>
);
}
return (
<div style={{ background: "#fefdfb", border: "1px solid #dce5dc", borderLeft: "3px solid #4ade80", padding: 16 }}>
<p style={{ fontFamily: "'DM Sans', sans-serif", fontSize: 14, fontWeight: 600, color: "#111a11" }}>
Order Confirmed
</p>
{result.items.map((item) => (
<div key={item.id} style={{ display: "flex", justifyContent: "space-between", padding: "4px 0", fontSize: 12, color: "#3d5a3d" }}>
<span>{item.name} x{item.qty}</span>
<span style={{ fontFamily: "'DM Mono', monospace" }}>{formatPrice(item.price * item.qty)}</span>
</div>
))}
<div style={{ marginTop: 8, paddingTop: 8, borderTop: "1px solid #dce5dc", display: "flex", justifyContent: "space-between" }}>
<span style={{ fontFamily: "'DM Mono', monospace", fontSize: 13, fontWeight: 600 }}>Total</span>
<span style={{ fontFamily: "'DM Mono', monospace", fontSize: 13, fontWeight: 600 }}>{formatPrice(result.total)}</span>
</div>
</div>
);
},
});
}The CheckoutForm component is a regular React form with useState for grind selection and email input. It shows the bag items, a grind picker (Whole Bean, French Press, Pour Over, Espresso, Aeropress), an email field, subtotal, shipping (free over $40), and total. The “Place Order” button calls resolve({ grind, email }) and the “Continue shopping” link calls resolve(null).
The critical line is unAbortable: true. Without it, a voice interrupt during checkout would abort the tool, dismiss the form, and lose the cart data. With it, the form stays on screen no matter what happens in the voice pipeline.
Here is the conceptual flow of what happens when the user speaks during checkout:
// 1. User says "let me check out"
// 2. AI calls checkout tool
// 3. do() calls display.pushAndWait({ items: cart })
// 4. CheckoutForm renders on screen — resolver is registered
// 5. User accidentally speaks while filling in email
// 6. Voice pipeline detects speech...
// 7. Voice layer checks: resolverStore.size > 0? YES
// -> Barge-in SUPPRESSED. Speech is ignored.
// 8. Even if an abort signal fires from another source:
// 9. Core checks: tool.unAbortable? YES
// -> Tool keeps running. pushAndWait stays active.
// 10. User fills in email, clicks "Place Order"
// 11. resolve({ grind: "Pour Over", email: "..." })
// 12. do() receives the result, clears the cart, returns to AI
// 13. AI confirms the order with show_info variant="success"The voice pipeline has four components: speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), and the React hook that ties them together. Here is how to set up each piece.
ElevenLabs uses token-based authentication. Your server generates short-lived tokens, and the browser uses them to connect directly to ElevenLabs. This keeps your API key on the server. Glove provides a createVoiceTokenHandler helper that handles the token exchange:
import { createVoiceTokenHandler } from "glove-next";
export const GET = createVoiceTokenHandler({ provider: "elevenlabs", type: "stt" });import { createVoiceTokenHandler } from "glove-next";
export const GET = createVoiceTokenHandler({ provider: "elevenlabs", type: "tts" });These routes read your ELEVENLABS_API_KEY from the server environment and return a temporary token. The browser calls them before starting each voice session.
Create a client-side module that configures the ElevenLabs adapters and the Silero VAD. The VAD is dynamically imported to avoid pulling onnxruntime-web (a WASM dependency) into the Next.js server bundle during SSR or prerendering:
import { createElevenLabsAdapters } from "glove-voice";
async function fetchToken(path: string): Promise<string> {
const res = await fetch(path);
const data = (await res.json()) as { token?: string; error?: string };
if (!res.ok || !data.token) {
throw new Error(data.error ?? `Token fetch failed (${res.status})`);
}
return data.token;
}
// ElevenLabs STT (Scribe) + TTS adapters
export const { stt, createTTS } = createElevenLabsAdapters({
getSTTToken: () => fetchToken("/api/voice/stt-token"),
getTTSToken: () => fetchToken("/api/voice/tts-token"),
voiceId: "56bWURjYFHyYyVf490Dp", // "George" — warm, friendly barista persona
});
// Silero VAD — dynamically imported to avoid WASM in SSR
export async function createSileroVAD() {
const { SileroVADAdapter } = await import("glove-voice/silero-vad");
const vad = new SileroVADAdapter({
positiveSpeechThreshold: 0.5,
negativeSpeechThreshold: 0.35,
wasm: { type: "cdn" },
});
await vad.init();
return vad;
}The voiceId selects the TTS voice. The coffee shop uses “George” — a warm, conversational voice that fits the friendly barista persona. The VAD thresholds control how sensitive the turn detection is: positiveSpeechThreshold is the confidence needed to start detecting speech, and negativeSpeechThreshold is when it decides the user has stopped talking.
In the chat component, initialize the VAD on mount and pass everything to useGloveVoice:
import { useGlove, Render } from "glove-react";
import { useGloveVoice } from "glove-react/voice";
import type { TurnMode } from "glove-react/voice";
import { stt, createTTS, createSileroVAD } from "../lib/voice";
import { systemPrompt, voiceSystemPrompt } from "../lib/system-prompt";
export default function Chat({ sessionId }: { sessionId: string }) {
const [turnMode, setTurnMode] = useState<TurnMode>("vad");
// Cart state, tools, and glove hook setup...
const tools = useMemo(() => createCoffeeTools(cartOps), [cartOps]);
const glove = useGlove({ tools, sessionId });
const { runnable } = glove;
// Initialize Silero VAD model on mount (dynamic import avoids SSR issues)
const [vadReady, setVadReady] = useState(false);
const vadRef = useRef<Awaited<ReturnType<typeof createSileroVAD>> | null>(null);
useEffect(() => {
createSileroVAD().then((v) => {
vadRef.current = v;
setVadReady(true);
});
}, []);
// Build voice config — only include VAD once it has loaded
const voiceConfig = useMemo(
() => ({
stt,
createTTS,
vad: vadReady ? vadRef.current ?? undefined : undefined,
turnMode,
}),
[vadReady, turnMode],
);
const voice = useGloveVoice({ runnable, voice: voiceConfig });
// Swap system prompt when voice activates
useEffect(() => {
if (!runnable) return;
if (voice.isActive) {
runnable.setSystemPrompt(voiceSystemPrompt);
} else {
runnable.setSystemPrompt(systemPrompt);
}
}, [voice.isActive, runnable]);
// voice.start() — requests mic, opens STT, begins listening
// voice.stop() — releases mic, closes STT and TTS
// voice.mode — "idle" | "listening" | "thinking" | "speaking"
// voice.isActive — true when mode is not "idle"
}The useGloveVoice hook returns a simple state machine. It cycles through four modes: idle (not started), listening (microphone active, waiting for speech), thinking (user finished speaking, waiting for AI response), and speaking (TTS playing back the AI response). After speaking finishes, it returns to listening automatically.
The hook also supports two turn modes: "vad" (hands-free — the VAD detects when the user stops talking and auto-commits the turn) and "manual" (push-to-talk — the user holds a button or spacebar to record, and the turn commits on release).
The coffee shop displays an animated orb that communicates voice state through motion:
In VAD mode, tapping the orb ends the voice session. During speaking, tapping triggers barge-in (interrupt) and snaps back to listening. In manual mode, the orb acts as the push-to-talk button — click to start recording, click again to stop and commit. The CSS animations are driven by a class that changes with the voice mode: voice-orb--listening, voice-orb--thinking, voice-orb--speaking, and voice-orb--recording for manual mode.
The coffee shop uses two system prompts — one for text mode and one for voice mode. When the user activates the microphone, the component calls runnable.setSystemPrompt(voiceSystemPrompt) to swap the prompt. When voice ends, it swaps back.
The voice prompt adds a section at the end that tells the AI which tools to avoid and which to use instead:
export const voiceSystemPrompt = `${systemPrompt}
## Voice Mode — IMPORTANT
The user is interacting via voice. They CANNOT click buttons or interact
with visual elements. You must adapt your tool usage and speaking style.
### Tool Substitutions (voice mode)
These tools block on user clicks and MUST NOT be used in voice mode:
- **show_products** -> use **get_products** instead (returns product data
as text for you to narrate)
- **show_cart** -> use **get_cart** instead (returns full cart breakdown
as text)
- **ask_preference** -> DO NOT use. Instead, ask the user verbally and
let them respond by speaking.
These tools still work in voice mode (non-blocking):
- **get_products** — look up products and narrate the results.
- **get_cart** — look up cart contents and read them back.
- **add_to_cart** — works normally. Confirm verbally what you added.
- **show_product_detail** — still displays a card, but describe the
product verbally too.
- **show_info** — still displays a card, but speak the key info aloud.
- **checkout** — still works (the form will appear on screen).
### Speaking Style
- Be conversational — speak naturally, as if chatting at a coffee counter.
- Describe products verbally — mention name, origin, roast, key tasting
notes, and price.
- Keep it concise — voice responses should be shorter than text.
- Ask one thing at a time.`;The tool substitution pattern is the heart of multimodal tool design. Every interactive tool (show_products, show_cart, ask_preference) has a voice-friendly counterpart that either returns text data or is replaced by natural conversation. The AI is smart enough to follow these instructions consistently — when the system prompt says “use get_products instead of show_products”, it does.
The checkout tool is the interesting exception. It works in both modes because even in voice mode, the user needs a visual form to enter their email address and select a grind. The system prompt says checkout “still works (the form will appear on screen)” so the AI knows to use it normally.
| Tool | Pattern | Display Strategy | Why |
|---|---|---|---|
ask_preference | pushAndWait | hide-on-complete | Multi-choice chips disappear after user picks an option |
show_products | pushAndWait | hide-on-complete | Product carousel disappears after user selects or adds |
checkout | pushAndWait + unAbortable | hide-on-complete | Order form stays on screen even during voice interrupts |
show_product_detail | pushAndForget | stay | Product detail card persists in the conversation |
show_cart | pushAndForget | hide-on-new | Old cart card replaced when updated cart appears |
show_info | pushAndForget | stay | Info cards (sourcing, brewing, confirmations) persist |
add_to_cart | No display | n/a | Pure data — updates cart state, returns confirmation text |
get_products | No display | n/a | Voice-only — returns product data as text for narration |
get_cart | No display | n/a | Voice-only — returns cart contents as text for narration |
The coffee shop is a working example in the Glove monorepo. To run it locally:
# Clone the repo and install dependencies
git clone https://github.com/your-org/glove.git
cd glove
pnpm installCreate a .env.local file in the examples/coffee/ directory with your API keys:
# Required — LLM provider
OPENROUTER_API_KEY=your-openrouter-key
# Optional — only needed for voice mode
ELEVENLABS_API_KEY=your-elevenlabs-keyThen start the dev server:
pnpm --filter glove-coffee run devTry these conversations in text mode:
Try these in voice mode (click the microphone button):
The coffee shop demonstrates several patterns worth learning from:
CartOps interface lets multiple tools read and modify the same cart without prop drilling or global state.unAbortable for critical flows. The checkout form cannot be dismissed by voice interrupts — two layers of protection ensure the user's form data is safe.hide-on-complete for interactive tools keeps the conversation clean, while hide-on-new for the cart prevents stale data, and stay for info cards keeps useful context visible.pushAndWait, pushAndForget, and display strategiesglove-core directly without React or Next.jsdisplayPropsSchema, resolveSchema, and unAbortable