Skip to main content

Documentation Index

Fetch the complete documentation index at: https://libretto.sh/docs/llms.txt

Use this file to discover all available pages before exploring further.

What happens under the hood when your agent builds a browser automation.
Whether the agent is generating a workflow from scratch, recording your actions, or debugging a broken script, it follows the same four-phase process. Each phase is designed around a specific constraint of how AI agents interact with browsers.

1. Browser session

The agent starts by launching a browser and registering it as a named session. The session becomes the agent’s handle to the browser for everything that follows: snapshots, interactions, and validation all target the session by name. Sessions track all state (network traffic, actions, snapshots) in .libretto/sessions/<name>/. This gives the agent a persistent record of what it has seen and done, so it can refer back to earlier observations without re-inspecting the page. If the site requires authentication, the agent opens the browser in headed mode and asks you to log in. It cannot handle login flows on its own because most login pages include CAPTCHAs or multi-factor prompts that require a human.

2. Page snapshot

Rather than reading the full page DOM, the agent captures a screenshot and compact accessibility snapshot. The snapshot output gives the agent a token-efficient view of selectors, interactive elements, and visible state. The compact snapshot strips away irrelevant markup while preserving the information the agent needs to reason about what’s on the page and how to interact with it. This keeps the agent’s context window small while still giving it a detailed understanding of the page. The agent snapshots frequently: before its first interaction, after navigation, and any time it needs to understand what changed. For large pages, it can inspect a subtree from the latest full snapshot by passing a printed ref back to snapshot.

3. Interaction prototyping

Before writing any code, the agent tests its plan against the live browser. It runs Playwright expressions one at a time through exec, validating that each selector actually targets the right element and each action produces the expected result. The agent iterates between snapshot and exec during this phase. A typical loop looks like: snapshot the page to find a selector, exec an action using that selector, snapshot again to confirm the result. This cycle repeats until the agent has working selectors for each step of the workflow. Prototyping against the live page is what separates Libretto from writing a Playwright script from scratch. Instead of guessing at selectors and hoping they work, the agent confirms each one interactively before committing it to the final script.

4. Code generation and validation

Once the agent has validated all the interactions, it writes a TypeScript file that exports a workflow(). The workflow uses the same Playwright selectors the agent already tested, so the script is grounded in what actually worked on the live page. The agent then runs the workflow headless to verify it works end-to-end without any manual intervention. If the run fails, the agent re-enters the snapshot-and-prototype loop (phases 2 and 3) to diagnose and fix the issue, then tries the headless run again. Validation requires a successful headless run with correct output. The agent does not consider a workflow complete until it has passed this check.

How the workflow guides fit in

Each workflow guide follows this same four-phase process with different starting points:
  • One-shot generation: The agent drives the entire process. You provide a prompt, the agent handles all four phases.
  • Interactive building: You drive phases 1 and 2 by performing the workflow in the browser while Libretto records your actions. The agent handles phases 3 and 4.
  • Debugging: You start at a failed workflow (phase 4) and the agent uses phases 2 and 3 to diagnose and fix it.