ScreenPilot uses AI vision + LLM to automate any desktop application. No CSS selectors. No XPath. No scripting. It sees the screen like a human and adapts to UI changes automatically.
Your automations break every time the UI updates. ScreenPilot sees the screen like a human — and adapts.
Everything you need to replace brittle RPA with intelligent, adaptive automation.
Uses Claude, GPT-4, or any LiteLLM model to understand screenshots and determine actions. No selectors needed.
Overlays numbered markers on screenshots for precise element grounding — based on cutting-edge research (SoM, OmniParser).
Two-level task decomposition with Task Memory Tree for coherent execution of complex, multi-step workflows.
Automatic error recovery with escalating strategies: retry, relocate element, scroll, dismiss dialogs, LLM re-planning.
Cron-like scheduling for unattended automation. Run tasks daily, weekly, on intervals, or with custom cron expressions.
6 built-in templates for common business tasks. Create custom templates for your organization's specific workflows.
Get alerts via Slack, Microsoft Teams, email, or custom webhooks when automations succeed or fail.
JSON and HTML reports with step-by-step details, success rates, and aggregate analytics for business stakeholders.
Extend ScreenPilot with custom plugins. Hook into task lifecycle events, modify actions, add integrations.
ScreenPilot's core loop is deceptively simple. The AI does the hard work.
Cross-platform screen capture using mss. Supports multi-monitor, resolution scaling, and region capture.
Screenshot is sent to the LLM with SoM markers. The hierarchical planner decomposes goals into concrete actions with precise coordinates.
Click, type, scroll, drag, keyboard shortcuts — all executed via pyautogui with safety checks and action logging.
Post-action screenshot is captured and compared. If something went wrong, the self-healing system kicks in with recovery strategies.
The loop continues until the LLM determines the goal is achieved or reports it cannot be completed.
ScreenPilot targets the $30B+ RPA market with a fundamentally better approach.
Automate data entry, form filling, report generation, and file management across any desktop application.
Create visual test scripts that adapt to UI changes. Capture screenshots across application states for documentation.
Bridge modern systems with legacy applications that lack APIs. ScreenPilot interacts with any GUI, including mainframe terminals.
Integrate desktop automation into CI/CD pipelines via the Python SDK and REST API. Run automations as pipeline steps.
CLI, Python API, REST API, or SDK — pick what works for your workflow.
Open source, free forever. Start automating in minutes.