diff --git a/IMPLEMENTATION_SUMMARY.md b/IMPLEMENTATION_SUMMARY.md new file mode 100644 index 0000000..86cea63 --- /dev/null +++ b/IMPLEMENTATION_SUMMARY.md @@ -0,0 +1,33 @@ +# Implementation Summary: Multi-Tab/Window Support + +This document summarizes the implementation of multi-tab/window support in the MCP Selenium server. + +## Changes + +1. **Added five new tools to `src/lib/server.js` for window management:** + * `get_window_handles`: Retrieves all active window handles. + * `get_current_window_handle`: Gets the handle of the currently focused window. + * `switch_to_window`: Switches focus to a specific window by its handle. + * `switch_to_latest_window`: Switches to the most recently opened window. + * `close_current_window`: Closes the currently active window without ending the session. + +2. **Created `docs/MULTI_TAB_USAGE.md`:** + * Provides detailed usage examples and best practices for the new window management tools. + +3. **Created `docs/CHANGELOG_TAB_SUPPORT.md`:** + * Documents the new features and explains how they remain backward compatible. + +4. **Updated `README.md`:** + * Added a new section documenting the multi-tab/window management tools. + +## Testing Guidance + +To ensure the new tools function correctly, follow these testing steps: + +1. **Start a browser session** using the `start_browser` tool. +2. **Open a new tab/window** by clicking a link that opens in a new tab (e.g., ``). +3. **Use `get_window_handles`** to verify that multiple handles are returned. +4. **Use `switch_to_latest_window`** to switch to the new tab. +5. **Perform an action** (e.g., `get_element_text`) to confirm the context has switched. +6. **Use `close_current_window`** to close the new tab. +7. **Verify that the original tab** is still active and responsive. diff --git a/README.md b/README.md index 407a071..3fb0a06 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ A Model Context Protocol (MCP) server implementation for Selenium WebDriver, ena - Handle keyboard input - Take screenshots - Upload files +- Window Management - Support for headless mode ## Supported Browsers @@ -433,6 +434,78 @@ None required } ``` +### get_window_handles +Gets all window handles. + +**Parameters:** +None required + +**Example:** +```json +{ + "tool": "get_window_handles", + "parameters": {} +} +``` + +### get_current_window_handle +Gets the current window handle. + +**Parameters:** +None required + +**Example:** +```json +{ + "tool": "get_current_window_handle", + "parameters": {} +} +``` + +### switch_to_window +Switches to a window by its handle. + +**Parameters:** +- `handle` (required): The handle of the window to switch to + - Type: string + +**Example:** +```json +{ + "tool": "switch_to_window", + "parameters": { + "handle": "CDwindow-ABC" + } +} +``` + +### switch_to_latest_window +Switches to the most recently opened window. + +**Parameters:** +None required + +**Example:** +```json +{ + "tool": "switch_to_latest_window", + "parameters": {} +} +``` + +### close_current_window +Closes the currently active window. + +**Parameters:** +None required + +**Example:** +```json +{ + "tool": "close_current_window", + "parameters": {} +} +``` ## License diff --git a/docs/CHANGELOG_TAB_SUPPORT.md b/docs/CHANGELOG_TAB_SUPPORT.md new file mode 100644 index 0000000..cd9e887 --- /dev/null +++ b/docs/CHANGELOG_TAB_SUPPORT.md @@ -0,0 +1,19 @@ +# Changelog: Multi-Tab/Window Support + +## New Features + +- **Added five new tools for multi-tab/window management:** + - `get_window_handles`: Retrieves all active window handles. + - `get_current_window_handle`: Gets the handle of the currently focused window. + - `switch_to_window`: Switches focus to a specific window by its handle. + - `switch_to_latest_window`: Switches to the most recently opened window. + - `close_current_window`: Closes the currently active window. + +## Backward Compatibility + +This update is fully backward compatible. Existing tools are unaffected. + +- The `close_session` tool still closes the entire browser session, including all tabs. +- All element interaction tools (`click_element`, `send_keys`, etc.) operate on the currently focused tab, preserving existing behavior. + +Workflows that do not involve multiple tabs will continue to function as before without any changes. diff --git a/docs/MULTI_TAB_USAGE.md b/docs/MULTI_TAB_USAGE.md new file mode 100644 index 0000000..e69dcab --- /dev/null +++ b/docs/MULTI_TAB_USAGE.md @@ -0,0 +1,78 @@ +# Multi-Tab/Window Usage Guide + +This guide provides examples and best practices for using the new multi-tab/window management tools. + +## Available Tools + +- `get_window_handles`: Retrieves all active window handles. +- `get_current_window_handle`: Gets the handle of the currently focused window. +- `switch_to_window`: Switches focus to a specific window by its handle. +- `switch_to_latest_window`: Switches to the most recently opened window. +- `close_current_window`: Closes the currently active window. + +## Example Workflow + +Here’s a common workflow for handling multiple tabs: + +1. **Start a browser and navigate to a page.** + ```json + { + "tool": "start_browser", + "browser": "chrome" + } + { + "tool": "navigate", + "url": "https://example.com" + } + ``` + +2. **Click a link that opens a new tab.** + ```json + { + "tool": "click_element", + "by": "css", + "value": "a[target='_blank']" + } + ``` + +3. **Get all window handles to see the new tab's handle.** + ```json + { + "tool": "get_window_handles" + } + ``` + *Output might look like: `Window handles: CDwindow-ABC, CDwindow-DEF`* + +4. **Switch to the new tab.** + You can either switch by the specific handle or use `switch_to_latest_window`. + ```json + { + "tool": "switch_to_latest_window" + } + ``` + +5. **Perform actions in the new tab.** + ```json + { + "tool": "get_element_text", + "by": "css", + "value": "h1" + } + ``` + +6. **Close the new tab and switch back to the original.** + ```json + { + "tool": "close_current_window" + } + { + "tool": "switch_to_window", + "handle": "CDwindow-ABC" + } + ``` + +## Best Practices + +- **Always get handles after opening a new tab:** Don't assume the handle format. Call `get_window_handles` to get the correct identifiers. +- **Use `switch_to_latest_window` for simplicity:** It's the easiest way to switch to a newly opened tab without needing to manage handles manually. +- **Be mindful of context:** After closing a tab, the driver's focus may be lost. Always switch back to a valid window handle to continue working. diff --git a/src/lib/server.js b/src/lib/server.js index 78d7514..1ae9b77 100755 --- a/src/lib/server.js +++ b/src/lib/server.js @@ -422,6 +422,112 @@ server.tool( } ); +// Window Management Tools +server.tool( + "get_window_handles", + "gets all window handles", + {}, + async () => { + try { + const driver = getDriver(); + const handles = await driver.getAllWindowHandles(); + return { + content: [{ type: 'text', text: `Window handles: ${handles.join(', ')}` }] + }; + } catch (e) { + return { + content: [{ type: 'text', text: `Error getting window handles: ${e.message}` }] + }; + } + } +); + +server.tool( + "get_current_window_handle", + "gets the current window handle", + {}, + async () => { + try { + const driver = getDriver(); + const handle = await driver.getWindowHandle(); + return { + content: [{ type: 'text', text: `Current window handle: ${handle}` }] + }; + } catch (e) { + return { + content: [{ type: 'text', text: `Error getting current window handle: ${e.message}` }] + }; + } + } +); + +server.tool( + "switch_to_window", + "switches to a window by its handle", + { + handle: z.string().describe("The handle of the window to switch to") + }, + async ({ handle }) => { + try { + const driver = getDriver(); + await driver.switchTo().window(handle); + return { + content: [{ type: 'text', text: `Switched to window: ${handle}` }] + }; + } catch (e) { + return { + content: [{ type: 'text', text: `Error switching to window: ${e.message}` }] + }; + } + } +); + +server.tool( + "switch_to_latest_window", + "switches to the most recently opened window", + {}, + async () => { + try { + const driver = getDriver(); + const handles = await driver.getAllWindowHandles(); + if (handles.length > 0) { + const latestHandle = handles[handles.length - 1]; + await driver.switchTo().window(latestHandle); + return { + content: [{ type: 'text', text: `Switched to latest window: ${latestHandle}` }] + }; + } else { + return { + content: [{ type: 'text', text: 'No windows available to switch to' }] + }; + } + } catch (e) { + return { + content: [{ type: 'text', text: `Error switching to latest window: ${e.message}` }] + }; + } + } +); + +server.tool( + "close_current_window", + "closes the currently active window", + {}, + async () => { + try { + const driver = getDriver(); + await driver.close(); + return { + content: [{ type: 'text', text: 'Current window closed' }] + }; + } catch (e) { + return { + content: [{ type: 'text', text: `Error closing current window: ${e.message}` }] + }; + } + } +); + server.tool( "close_session", "closes the current browser session", @@ -477,4 +583,4 @@ process.on('SIGINT', cleanup); // Start the server const transport = new StdioServerTransport(); -await server.connect(transport); \ No newline at end of file +await server.connect(transport);