Skip to main content
Choose the right integration strategy and understand bot detection trade-offs.

Overview

Libretto supports four distinct approaches to capturing data and automating web interactions. Each makes different trade-offs between detection risk, setup complexity, data quality, and control. Knowing the trade-offs helps you pick the right approach for your target site.
ApproachBot detection riskBest for
Regular PlaywrightLow-ModerateSimple DOM extraction, server-rendered sites
Passive interception (page.on('response'))LowSPAs that load data via API calls during navigation
In-browser fetch (pageRequest())Low-ModerateDeep pagination, bulk queries without UI clicking
Direct HTTP from Node.jsVery highSites with no bot detection where API speed matters

Assessing bot detection

Bot detection avoidance is best-effort. Libretto cannot guarantee your automation will go undetected. Using Libretto with authenticated accounts may violate the terms of service of those services. Understand the risks before automating against any site you don’t control.
Libretto captures all network traffic and page state during a session, which lets you (or your agent) check what bot detection measures a site uses before committing to an automation approach:
  • Network log inspection. Query .libretto/sessions/<session>/network.jsonl with jq to review all requests and responses. Look for calls to bot detection services (Cloudflare, Akamai, PerimeterX, DataDome) or challenge endpoints.
  • Fetch patching check. Run npx libretto exec to evaluate window.fetch.toString() in the browser console. If it returns actual JavaScript (not "[native code]"), the site monkey-patches fetch and you should prefer passive interception over in-browser fetch.
  • Snapshot analysis. Use npx libretto snapshot to check for challenge pages, CAPTCHAs, or interstitials that indicate detection.

Approach details

Standard Playwright usage: navigate pages, click elements, fill forms, and read DOM content using selectors and page.evaluate().
// Navigate and interact
await page.goto('https://example.com/search');
await page.fill('#query', 'search term');
await page.click('#submit');
await page.waitForSelector('.results');

// Extract data from the DOM
const results = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('.result-item')).map(el => ({
    title: el.querySelector('h2')?.textContent,
    price: el.querySelector('.price')?.textContent,
  }));
});
Pros:
  • Simplest approach, uses Playwright as intended
  • No need to understand the site’s API structure
  • Works with any site regardless of how data is rendered (server-side, client-side, or hybrid)
  • Data extraction is visual/DOM-based, which maps naturally to what a user sees
  • Easy to debug with headless: false and Playwright’s trace viewer
  • Integrates directly with Libretto’s step-based workflow, recovery, and extraction features
Cons:
  • Slower than API-based approaches because it requires full page rendering
  • Fragile against DOM changes, since selectors break when the site updates its markup
  • Harder to get structured data because you’re scraping rendered HTML rather than clean API responses
  • Cannot access data that isn’t rendered in the DOM (e.g., API responses with fields the UI doesn’t display)
Bot detection risk: LOW-MODERATEPlain Playwright is detectable by browser fingerprinting (Layer 1). Sites with any enterprise bot protection will likely flag it. Sites without active detection won’t notice.
Use playwright-extra with the stealth plugin to patch common fingerprint leaks, or run Playwright with a persistent browser context that looks more like a real browser profile.

Decision guide

Recommended approach: Use in-browser fetch (pageRequest()) for most sites. It gives you full control over which endpoints you call, structured JSON responses, and the real browser’s network fingerprint.For high-security sites with aggressive bot detection (Cloudflare, Akamai, PerimeterX), use Regular Playwright for navigation and passive page.on('response') interception for data capture. This avoids making any extra requests that could trigger detection.
Use Regular Playwright when:
  • The data you need is visible in the DOM and straightforward to extract with selectors
  • The site doesn’t have aggressive bot protection, or you’re using stealth plugins
  • You want the simplest implementation that integrates with Libretto’s recovery and extraction features
  • The data is rendered server-side and doesn’t come from a separate API call
Use passive interception (page.on('response')) when:
  • The site loads data via API calls during normal navigation (most modern SPAs)
  • You want structured JSON data without reverse-engineering the full API
  • Minimizing detection risk is important
  • You’re already navigating through the UI and want to passively capture data along the way
Use in-browser fetch (pageRequest()) when:
  • You need data from API endpoints that the UI doesn’t naturally trigger (e.g., deep pagination, bulk exports)
  • You’ve verified the site doesn’t monkey-patch fetch (or you can work around it)
  • You want maximum control over which data you fetch and when
  • You’ve already reverse-engineered the relevant API endpoints
Use Direct Node.js HTTP when:
  • The target site has zero bot detection
  • Speed and resource efficiency are the primary concerns
  • You’re hitting a public/documented API (not scraping a website)
  • You need to make thousands of concurrent requests