Open Source · Apache 2.0

Stop Writing Brittle
RPA Scripts.
Just Describe What You Want.

ScreenPilot uses AI vision + LLM to automate any desktop application. No CSS selectors. No XPath. No scripting. It sees the screen like a human and adapts to UI changes automatically.

Get Started Free See It In Action
screenpilot — ~/projects
# Automate anything with natural language
$ screenpilot run "Open Chrome, go to github.com, and star the screenpilot repo"
 
ScreenPilot v0.1.0 | Claude Sonnet | Ready
 
Step 1 find_and_click "Chrome icon on taskbar" OK
Step 2 find_and_click "address bar" OK
Step 3 type "github.com/pphouse/screenpilot" OK
Step 4 key "enter" OK
Step 5 wait 2s for page load OK
Step 6 find_and_click "Star button" OK
Step 7 done Task completed!
 
✓ Completed in 7 steps (12.3s)
$
Why Switch?

Traditional RPA Is Broken

Your automations break every time the UI updates. ScreenPilot sees the screen like a human — and adapts.

✘ Traditional RPA

  • Relies on CSS selectors & XPath that break on every UI update
  • Days of scripting to build a single workflow
  • Requires per-application connectors
  • $50K+ annual license costs per enterprise
  • Constant maintenance as UIs change
  • Team of RPA developers needed

✔ ScreenPilot

  • Vision-based — sees the screen, adapts to changes automatically
  • Minutes to automate with plain English descriptions
  • Works with any application — no connectors needed
  • Open source — free forever (Apache 2.0)
  • Self-healing with error recovery & LLM re-planning
  • Anyone can create automations — no code required
Capabilities

Enterprise-Grade Features

Everything you need to replace brittle RPA with intelligent, adaptive automation.

👁

Vision + LLM Intelligence

Uses Claude, GPT-4, or any LiteLLM model to understand screenshots and determine actions. No selectors needed.

🎯

Set-of-Mark Prompting

Overlays numbered markers on screenshots for precise element grounding — based on cutting-edge research (SoM, OmniParser).

🧠

Hierarchical Planning

Two-level task decomposition with Task Memory Tree for coherent execution of complex, multi-step workflows.

🛡

Self-Healing Recovery

Automatic error recovery with escalating strategies: retry, relocate element, scroll, dismiss dialogs, LLM re-planning.

📅

Task Scheduling

Cron-like scheduling for unattended automation. Run tasks daily, weekly, on intervals, or with custom cron expressions.

📄

Workflow Templates

6 built-in templates for common business tasks. Create custom templates for your organization's specific workflows.

🔔

Multi-Channel Notifications

Get alerts via Slack, Microsoft Teams, email, or custom webhooks when automations succeed or fail.

📊

Execution Reports

JSON and HTML reports with step-by-step details, success rates, and aggregate analytics for business stakeholders.

🔌

Plugin System

Extend ScreenPilot with custom plugins. Hook into task lifecycle events, modify actions, add integrations.

How It Works

Three Lines of Code. Any Application.

ScreenPilot's core loop is deceptively simple. The AI does the hard work.

1

Capture Screenshot

Cross-platform screen capture using mss. Supports multi-monitor, resolution scaling, and region capture.

2

LLM Analyzes Screen + Plans

Screenshot is sent to the LLM with SoM markers. The hierarchical planner decomposes goals into concrete actions with precise coordinates.

3

Execute Action

Click, type, scroll, drag, keyboard shortcuts — all executed via pyautogui with safety checks and action logging.

4

Verify & Adapt

Post-action screenshot is captured and compared. If something went wrong, the self-healing system kicks in with recovery strategies.

5

Repeat Until Done

The loop continues until the LLM determines the goal is achieved or reports it cannot be completed.

Applications

Built for Real Business Needs

ScreenPilot targets the $30B+ RPA market with a fundamentally better approach.

💼 Business Process Automation

Automate data entry, form filling, report generation, and file management across any desktop application.

80% reduction in manual data entry time

🔍 QA & Visual Testing

Create visual test scripts that adapt to UI changes. Capture screenshots across application states for documentation.

10x more resilient than selector-based tests

🔗 Legacy System Integration

Bridge modern systems with legacy applications that lack APIs. ScreenPilot interacts with any GUI, including mainframe terminals.

0 API connectors needed

🚀 DevOps & CI/CD

Integrate desktop automation into CI/CD pipelines via the Python SDK and REST API. Run automations as pipeline steps.

156 tests passing in the framework itself
Developer Experience

Multiple Ways to Automate

CLI, Python API, REST API, or SDK — pick what works for your workflow.

CLI
Python
SDK
REST API
# Run a task with natural language $ screenpilot run "Open Excel, create a chart from column A data" # Record a workflow and replay it later $ screenpilot record "daily-report" $ screenpilot replay ~/.screenpilot/recordings/daily-report/workflow.json # Schedule for daily execution $ screenpilot schedule add daily_report "Daily Report" \ "Generate sales report in Excel" --type daily --time 09:00 # Start the web dashboard $ screenpilot serve --port 8420
from screenpilot import ScreenPilotAgent from screenpilot.config import ScreenPilotConfig, LLMConfig # Configure with your preferred LLM config = ScreenPilotConfig( llm=LLMConfig(provider="anthropic", model="claude-sonnet-4-5-20250929") ) # Run an automation task agent = ScreenPilotAgent(config) result = agent.run("Open Chrome and search for 'AI automation'") print(f"Success: {result.success}") print(f"Steps: {result.num_steps} | Time: {result.total_time:.1f}s")
from screenpilot.sdk import ScreenPilotClient # Connect to running ScreenPilot server client = ScreenPilotClient("http://localhost:8420") # Run task and wait for completion task = client.run_task("Fill out the customer form with test data") task.wait(timeout=120) # Direct actions client.find_and_click("Submit button") client.type_text("john@example.com") # Use pre-built templates task = client.run_template("web_form_fill", { "url": "https://example.com/form", "form_data": "name=John, email=john@example.com", })
# Start the server $ screenpilot serve # Run a task via REST API $ curl -X POST http://localhost:8420/task \ -H "Content-Type: application/json" \ -d '{"goal": "Open calculator and compute 42 * 17", "max_steps": 20}' # Find a UI element $ curl -X POST http://localhost:8420/find \ -d '{"target": "the save button"}' # Schedule a recurring task $ curl -X POST http://localhost:8420/schedules \ -d '{"id":"daily","name":"Report","goal":"Generate report","schedule_type":"daily","time_of_day":"09:00"}'
13
Core Modules
156
Tests Passing
7.8K
Lines of Code
3
LLM Providers

Ready to Automate
Without the Fragility?

Open source, free forever. Start automating in minutes.

Star on GitHub Read the Docs