A MCP (Model Context Protocol) server for visual web access through Safari.
- Visual Web Access: Capture viewport screenshots for visual page inspection
- Full Authentication State: Controls the user's actual Safari session with cookies and login state preserved
- Viewport Scrolling: Scroll by pixel amount or jump to viewport-sized pages for precise positioning
- Interaction Tools: Click elements by visible text and type into input fields with framework reactivity support
- Browser History: Navigate back and forward through browser history
- Console Error Capture: Two-phase injection catches console errors and warnings during and after page load
- macOS → System Settings → Desktop & Dock → Windows → "Prefer tabs when opening documents" option must be set to Always
- macOS → System Settings → Privacy & Security → Screen & System Audio Recording → Terminal app must be enabled
- Safari → Settings → Developer → Automation → "Allow JavaScript from Apple Events" option must be enabled
Add to mcp.json servers configuration:
{
"mcpServers": {
"safari": {
"command": "npx",
"args": ["-y", "@axivo/mcp-safari"],
"env": {
"SAFARI_WINDOW_HEIGHT": "1600"
}
}
}
}All variables are optional — defaults apply if not set:
SAFARI_PAGE_TIMEOUT— Page load and selector wait timeout, in milliseconds (default:10000)SAFARI_WINDOW_BOUNDS— Browser window margin offset from top-left corner, in pixels (default:20)SAFARI_WINDOW_HEIGHT— Browser window height, in pixels (default:1024)SAFARI_WINDOW_WIDTH— Browser window width, in pixels (default:1280)
Here are practical examples of how to use the Safari MCP server with natural language prompts:
- "Open Safari and review the tools usage, then go to
example.com" - "Open Safari and review the tools usage, then search for
example query" - "Take a screenshot of the current page"
- "Read the page content to understand what's on the page"
- "Click the 'Sign In' button"
- "Type my email into the login form and submit"
- "Refresh the page to see the latest changes"
- "Go back to the previous page"
- "Navigate forward two steps in browser history"
- "Scroll down 500 pixels"
- "Scroll to page 3 of this article"
- "Search for 'Claude AI' and click the first result"
- "List all open browser tabs"
- "Open a new browser tab and go to
example.com" - "Switch to the first browser tab"
- "Close the second browser tab"
Note
The "review the tools usage" instruction helps Claude pause and process the _meta.usage guidelines before interacting with the browser.
-
click- Click an element on the browser window
- Optional inputs:
key(string): Key to press (e.g., Escape, ArrowRight, ArrowLeft, Enter, Tab)selector(string): CSS selector to click when no text provided or to scope the text searchtext(string): Text to match - visible text, image alt text, or aria-labelwait(string): CSS selector to wait for after clickx(number): X coordinate (pixels from left of viewport) to click aty(number): Y coordinate (pixels from top of viewport) to click at
- Returns: Click result with page title, URL, viewport pages, tabs, and detected changes
-
close- Close the browser window
- Returns: Session closure confirmation
-
execute- Execute JavaScript in the browser context
- Required inputs:
script(string): JavaScript code to execute
- Returns: Script execution result
-
navigate- Navigate to a URL or through browser history (back/forward)
- Optional inputs:
direction(string:backorforward): Navigate back or forward in browser historyselector(string): CSS selector to wait for after page loadsteps(number, default: 1): Number of steps for back/forward navigationurl(string): URL to navigate to
- Returns: Page title, URL, viewport pages, viewport dimensions, and tab count
-
open- Open a browser window and read
_meta.usagetools guidance - Returns: Tab count and complete tool definitions with usage guidance
- Open a browser window and read
-
read- Get the page title, URL, full text content, and count for viewport-sized screenshots
- Optional inputs:
selector(string): CSS selector to scope text extraction
- Returns: Page title, URL, text content, viewport pages, and any captured console errors/warnings
-
refresh- Refresh the current browser page
- Optional inputs:
hard(boolean, default: false): Bypass browser cache with hard refreshselector(string): CSS selector to wait for after reload
- Returns: Page title, URL, viewport pages, viewport dimensions, and tab count
-
screenshot- Capture a screenshot of the current browser viewport
- Returns: Base64-encoded PNG screenshot with viewport dimensions
-
scroll- Scroll to specific viewport page or by direction with pixel amount
- Optional inputs:
direction(string:upordown): Scroll direction (scrolls one viewport page when used alone)page(number): Scroll to a specific viewport-sized page numberpixels(number): Number of pixels to scroll (used with direction for fine-grained control)
- Returns: Viewport dimensions, scroll offset, and viewport pages
-
search- Search the web using browser's default search engine
- Required inputs:
text(string): Search query
- Returns: Page title, URL, viewport pages, viewport dimensions, and tab count
-
type- Type text into a page input field
- Required inputs:
text(string): Text to type
- Optional inputs:
append(boolean, default: false): Append to existing value instead of replacingselector(string): CSS selector for the target inputsubmit(boolean, default: false): Submit form by pressing Enter after typing
- Returns: Description of the action taken
-
window- Manage browser window tabs
- Required inputs:
action(string:close,list,open,switch): Tab action to perform
- Optional inputs:
index(number): Tab index for close and switch actionsurl(string): URL to open in a new tab (open action only)
- Returns: Array of tabs with active status, index, title, and URL