Playwright CLI

Token-Efficient Browser Automation for AI Coding Agents

From Context Bloat to Lean Automation

github.com/microsoft/playwright-cli

AI Agents Need a Browser

AI coding agents like Claude Code, GitHub Copilot, and Cursor are transforming how we build software. But there's a gap…

What agents can do

Read & write code
Run terminal commands
Reason about architecture

What they struggle with

See what a web page looks like
Click buttons, fill forms
Verify UI changes visually

How do we give an AI agent "eyes and hands" in a browser—without overwhelming it?

The Context Window Crunch

Think of the context window as the agent's working memory. It's finite.

Browser automation generates massive data per interaction
A single page navigation can return thousands of tokens of accessibility tree data
Tool schemas, DOM snapshots, console logs—all consuming precious context
Over long sessions, context fills up → reasoning degrades
"LLMs start to struggle at ~30 tools for large models, ~19 for smaller ones"

First, Let's Understand: Agent Skills

Before we solve the browser problem, meet the platform that makes solutions composable.

Skills = folders with a SKILL.md file containing instructions
Think of them as "expertise packages" an agent can load on demand
Progressive Disclosure: metadata → core instructions → supporting files
Only loads what's needed, when it's needed—no context waste
Open standard (agentskills.io)—works across Claude Code, Copilot, and more

How Skills Work

Skill Structure


my-skill/
├── SKILL.md          # Instructions (required)
├── references/       # Loaded as needed
├── scripts/          # Executable helpers
└── assets/           # Templates, etc.

Frontmatter


---
name: playwright-cli
description: Browser automation via
  CLI commands. Use when testing or
  interacting with web pages.
---

3 Layers of Loading

              1. Metadata ~100 words

              Always in context. Name + description.

              2. SKILL.md Body <5K words

              Loads when skill is triggered.

              3. References & Scripts Unlimited

              Loaded on demand by the agent.

Enter Playwright MCP

The first mainstream answer: a Model Context Protocol server for browser automation.

26 tools: browser_navigate, browser_click, browser_snapshot, browser_type…
Streams full accessibility trees and console output into context
Uses structured snapshots—no screenshots needed for interaction
Works great for sandboxed environments (Claude Desktop, chat interfaces)
But… there's a catch

The MCP Token Tax

~114K

Tokens per task (MCP)

Tool schemas in context

1000s

Tokens per navigation

Every browser_navigate returns the full accessibility tree
All 26 tool schemas loaded at startup—always consuming context
Long sessions → context degradation → confused reasoning

"MCP sends the whole library. What if we just sent the page number?"

The Solution: Playwright CLI

A fundamentally different architecture.

CLI-first, disk-based architecture
Instead of streaming data into the model—save it to disk
Return only file paths and minimal confirmations
50+ commands available without context overhead
Installed as a Skill—no schema bloat

~27K

Tokens per task (CLI)

Fewer tokens than MCP

The agent gets a toolbelt, not a library.

How Playwright CLI Works


# 1. Open a browser
playwright-cli open https://example.com --headed

# 2. Take a snapshot (saved to disk!)
playwright-cli snapshot
# → .playwright-cli/page-2026-02-12.yml

# 3. Interact using compact element references
playwright-cli fill e8 "Write tests"
playwright-cli press Enter
playwright-cli check e21

# 4. Verify with a screenshot
playwright-cli screenshot
# → .playwright-cli/screenshot-2026-02-12.png

Element references like e8, e21 are compact identifiers—no verbose CSS selectors needed!

The Snapshot System

The secret sauce behind token efficiency.

              How it works
              playwright-cli snapshot captures page state
Saves a YAML file to .playwright-cli/
Assigns compact refs: e8, e21, e255
Agent reads the file only when needed

            

              Key properties
              Deterministic—same page = same refs
Compact—refs, not full selectors
Expiring—page changes = new snapshot
Disk-based—not in context window

            

↓

State lives on disk. The context stays clean.

CLI + Skills = Perfect Match


# Install skills for your coding agent
playwright-cli install --skills

# This creates:
# .claude/skills/playwright-cli/SKILL.md

What happens

Agent reads SKILL.md to learn commands
Uses Bash tool to execute CLI commands
Reads snapshot files from disk as needed
No MCP server running, no tool proliferation

What doesn't happen

No 26 tool schemas loaded at startup
No accessibility trees in context
No context window bloat over time
No MCP configuration required

Head-to-Head: CLI vs MCP

Aspect	Playwright CLI	Playwright MCP
Token Usage	~27K per task	~114K per task
Data Delivery	Disk (file paths)	Into LLM context
Commands Available	50+	26 tools
Long Sessions	Sustainable	Context degrades
Best For	Coding agents (shell)	Sandboxed envs
Determinism	High	Variable
Setup	`npm install -g`	MCP config in IDE

Use Case: Automated E2E Testing

An agent explores, tests, and verifies—all in one session.


playwright-cli open https://demo.playwright.dev/todomvc --headed
playwright-cli snapshot
playwright-cli fill e8 "Write tests"
playwright-cli press Enter
playwright-cli check e21
playwright-cli screenshot

          What the agent can do in one session:
          Explore the app → Identify bugs → Write test code → Run the tests → Verify fixes—all without context overflow.
        

Use Case: UI Review in the Agent Loop

Low token cost makes iterative visual review practical.

%%{init: {'theme': 'dark'}}%% flowchart TD A["1. Agent edits React component"] --> B["2. Starts dev server"] B --> C["3. Opens browser via CLI"] C --> D["4. Takes screenshot"] D --> E{"5. Looks correct?"} E -- No --> A E -- Yes --> F["6. Commits changes"]

Why this works with CLI

Each loop iteration costs minimal tokens
Agent can iterate 5-10 times without context pressure
Screenshots saved to disk for comparison
Snapshots let agent verify structure + visuals

With MCP, each iteration floods context. With CLI, iterations are nearly free.

Use Case: Multi-Session Testing

Test cross-role interactions with named sessions.


# Launch parallel sessions for different roles
playwright-cli -s=admin open https://app.com/admin
playwright-cli -s=user open https://app.com/dashboard

# Admin creates a resource
playwright-cli -s=admin snapshot
playwright-cli -s=admin click e15

# User verifies it appears
playwright-cli -s=user snapshot
playwright-cli -s=user click e8

          Real-world scenario: Test that when an admin publishes a post, a regular user can see it—using two independent browser sessions controlled by the same agent.
        

Advanced Features

Network & State

Network mocking

playwright-cli route "https://api.com/*" \
  --status=200 --body='{"mock": true}'

State persistence

playwright-cli state-save logged-in.json
playwright-cli state-load logged-in.json

Recording & Debug

Video recording

playwright-cli video-start
playwright-cli video-stop demo.webm

Tracing

playwright-cli tracing-start
playwright-cli tracing-stop

Multi-browser
Chromium, Firefox, WebKit, Edge

Getting Started


# Install
npm install -g @playwright/cli@latest

# Install browser binaries
playwright-cli install-browser

# Install skills for your coding agent
playwright-cli install --skills

# Start automating!
playwright-cli open https://your-app.com --headed

          Requirements: Node.js 18+ • A coding agent (Claude Code, GitHub Copilot, Cursor, Windsurf)
        

Or go skills-less: your agent can read commands directly from playwright-cli --help

When to Use What

They're complementary, not competing.

              Use CLI when
              Coding agent with shell access
Token efficiency matters
Long automation sessions
CI/CD integration
Alongside existing test suites

            

              Use MCP when
              Sandboxed environment (no filesystem)
Short exploratory sessions
Need rich page awareness in context
Self-healing test workflows
Chat-based AI interfaces

            

Key Takeaways

AI agents need browser access, but context windows are precious
Agent Skills provide a lean, composable way to extend capabilities
Playwright CLI saves state to disk instead of flooding the context
4x fewer tokens than MCP for the same browser tasks
50+ commands available through simple shell execution
The agent loop era demands token-efficient tools—CLI delivers

"Don't send the whole library into the model. Send the page number—and let the agent read the page when it's ready."

Thank You

Resources

Playwright CLI — github.com/microsoft/playwright-cli

Playwright MCP — github.com/microsoft/playwright-mcp

Agent Skills Standard — agentskills.io

Claude Code Skills Docs — code.claude.com/docs/en/skills