Playwright CLI

Token-Efficient Browser Automation for AI Coding Agents

From Context Bloat to Lean Automation

github.com/microsoft/playwright-cli

AI Agents Need a Browser

AI coding agents like Claude Code, GitHub Copilot, and Cursor are transforming how we build software. But there's a gap…

What agents can do

  • Read & write code
  • Run terminal commands
  • Reason about architecture

What they struggle with

  • See what a web page looks like
  • Click buttons, fill forms
  • Verify UI changes visually

How do we give an AI agent "eyes and hands" in a browser—without overwhelming it?

The Context Window Crunch

Think of the context window as the agent's working memory. It's finite.

  • Browser automation generates massive data per interaction
  • A single page navigation can return thousands of tokens of accessibility tree data
  • Tool schemas, DOM snapshots, console logs—all consuming precious context
  • Over long sessions, context fills up → reasoning degrades
  • "LLMs start to struggle at ~30 tools for large models, ~19 for smaller ones"

First, Let's Understand: Agent Skills

Before we solve the browser problem, meet the platform that makes solutions composable.

  • Skills = folders with a SKILL.md file containing instructions
  • Think of them as "expertise packages" an agent can load on demand
  • Progressive Disclosure: metadata → core instructions → supporting files
  • Only loads what's needed, when it's needed—no context waste
  • Open standard (agentskills.io)—works across Claude Code, Copilot, and more

How Skills Work

Skill Structure


my-skill/
├── SKILL.md          # Instructions (required)
├── references/       # Loaded as needed
├── scripts/          # Executable helpers
└── assets/           # Templates, etc.
            

Frontmatter


---
name: playwright-cli
description: Browser automation via
  CLI commands. Use when testing or
  interacting with web pages.
---
            

3 Layers of Loading

1. Metadata ~100 words
Always in context. Name + description.
2. SKILL.md Body <5K words
Loads when skill is triggered.
3. References & Scripts Unlimited
Loaded on demand by the agent.

Enter Playwright MCP

The first mainstream answer: a Model Context Protocol server for browser automation.

  • 26 tools: browser_navigate, browser_click, browser_snapshot, browser_type
  • Streams full accessibility trees and console output into context
  • Uses structured snapshots—no screenshots needed for interaction
  • Works great for sandboxed environments (Claude Desktop, chat interfaces)
  • But… there's a catch

The MCP Token Tax

~114K
Tokens per task (MCP)
26
Tool schemas in context
1000s
Tokens per navigation
  • Every browser_navigate returns the full accessibility tree
  • All 26 tool schemas loaded at startup—always consuming context
  • Long sessions → context degradation → confused reasoning
"MCP sends the whole library. What if we just sent the page number?"

The Solution: Playwright CLI

A fundamentally different architecture.

  • CLI-first, disk-based architecture
  • Instead of streaming data into the model—save it to disk
  • Return only file paths and minimal confirmations
  • 50+ commands available without context overhead
  • Installed as a Skill—no schema bloat
~27K
Tokens per task (CLI)
4x
Fewer tokens than MCP

The agent gets a toolbelt, not a library.

How Playwright CLI Works


# 1. Open a browser
playwright-cli open https://example.com --headed

# 2. Take a snapshot (saved to disk!)
playwright-cli snapshot
# → .playwright-cli/page-2026-02-12.yml

# 3. Interact using compact element references
playwright-cli fill e8 "Write tests"
playwright-cli press Enter
playwright-cli check e21

# 4. Verify with a screenshot
playwright-cli screenshot
# → .playwright-cli/screenshot-2026-02-12.png
        

Element references like e8, e21 are compact identifiers—no verbose CSS selectors needed!

The Snapshot System

The secret sauce behind token efficiency.

How it works
  1. playwright-cli snapshot captures page state
  2. Saves a YAML file to .playwright-cli/
  3. Assigns compact refs: e8, e21, e255
  4. Agent reads the file only when needed
Key properties
  • Deterministic—same page = same refs
  • Compact—refs, not full selectors
  • Expiring—page changes = new snapshot
  • Disk-based—not in context window

State lives on disk. The context stays clean.

CLI + Skills = Perfect Match


# Install skills for your coding agent
playwright-cli install --skills

# This creates:
# .claude/skills/playwright-cli/SKILL.md
        

What happens

  • Agent reads SKILL.md to learn commands
  • Uses Bash tool to execute CLI commands
  • Reads snapshot files from disk as needed
  • No MCP server running, no tool proliferation

What doesn't happen

  • No 26 tool schemas loaded at startup
  • No accessibility trees in context
  • No context window bloat over time
  • No MCP configuration required

Head-to-Head: CLI vs MCP

Aspect Playwright CLI Playwright MCP
Token Usage ~27K per task ~114K per task
Data Delivery Disk (file paths) Into LLM context
Commands Available 50+ 26 tools
Long Sessions Sustainable Context degrades
Best For Coding agents (shell) Sandboxed envs
Determinism High Variable
Setup npm install -g MCP config in IDE

Use Case: Automated E2E Testing

An agent explores, tests, and verifies—all in one session.


playwright-cli open https://demo.playwright.dev/todomvc --headed
playwright-cli snapshot
playwright-cli fill e8 "Write tests"
playwright-cli press Enter
playwright-cli check e21
playwright-cli screenshot
        
What the agent can do in one session: Explore the app → Identify bugs → Write test code → Run the tests → Verify fixes—all without context overflow.

Use Case: UI Review in the Agent Loop

Low token cost makes iterative visual review practical.

%%{init: {'theme': 'dark'}}%% flowchart TD A["1. Agent edits React component"] --> B["2. Starts dev server"] B --> C["3. Opens browser via CLI"] C --> D["4. Takes screenshot"] D --> E{"5. Looks correct?"} E -- No --> A E -- Yes --> F["6. Commits changes"]

Why this works with CLI

  • Each loop iteration costs minimal tokens
  • Agent can iterate 5-10 times without context pressure
  • Screenshots saved to disk for comparison
  • Snapshots let agent verify structure + visuals
With MCP, each iteration floods context. With CLI, iterations are nearly free.

Use Case: Multi-Session Testing

Test cross-role interactions with named sessions.


# Launch parallel sessions for different roles
playwright-cli -s=admin open https://app.com/admin
playwright-cli -s=user open https://app.com/dashboard

# Admin creates a resource
playwright-cli -s=admin snapshot
playwright-cli -s=admin click e15

# User verifies it appears
playwright-cli -s=user snapshot
playwright-cli -s=user click e8
        
Real-world scenario: Test that when an admin publishes a post, a regular user can see it—using two independent browser sessions controlled by the same agent.

Advanced Features

Network & State

  • Network mocking
    playwright-cli route "https://api.com/*" \
      --status=200 --body='{"mock": true}'
  • State persistence
    playwright-cli state-save logged-in.json
    playwright-cli state-load logged-in.json

Recording & Debug

  • Video recording
    playwright-cli video-start
    playwright-cli video-stop demo.webm
  • Tracing
    playwright-cli tracing-start
    playwright-cli tracing-stop
  • Multi-browser
    Chromium, Firefox, WebKit, Edge

Getting Started


# Install
npm install -g @playwright/cli@latest

# Install browser binaries
playwright-cli install-browser

# Install skills for your coding agent
playwright-cli install --skills

# Start automating!
playwright-cli open https://your-app.com --headed
        
Requirements: Node.js 18+ • A coding agent (Claude Code, GitHub Copilot, Cursor, Windsurf)

Or go skills-less: your agent can read commands directly from playwright-cli --help

When to Use What

They're complementary, not competing.

Use CLI when

  • Coding agent with shell access
  • Token efficiency matters
  • Long automation sessions
  • CI/CD integration
  • Alongside existing test suites

Use MCP when

  • Sandboxed environment (no filesystem)
  • Short exploratory sessions
  • Need rich page awareness in context
  • Self-healing test workflows
  • Chat-based AI interfaces

Key Takeaways

  1. AI agents need browser access, but context windows are precious
  2. Agent Skills provide a lean, composable way to extend capabilities
  3. Playwright CLI saves state to disk instead of flooding the context
  4. 4x fewer tokens than MCP for the same browser tasks
  5. 50+ commands available through simple shell execution
  6. The agent loop era demands token-efficient tools—CLI delivers
"Don't send the whole library into the model. Send the page number—and let the agent read the page when it's ready."

Thank You

Resources

Playwright CLIgithub.com/microsoft/playwright-cli

Playwright MCPgithub.com/microsoft/playwright-mcp

Agent Skills Standardagentskills.io

Claude Code Skills Docscode.claude.com/docs/en/skills

>