CVE-2026-25244
Description
WebdriverIO is a test automation framework for unit, e2e and component testing using WebDriver, WebDriver BiDi and Appium. Versions below 9.24.0 contain a command injection vulnerability leading to remote code execution (RCE) in test orchestration. Git permits branch names containing shell metacharacters, and getGitMetadataForAISelection() interpolates these names directly into execSync() calls without sanitization. An attacker can exploit this by supplying a malicious repository (via testOrchestrationOptions.runSmartSelection.source, or the current directory if unset) whose branch name carries a payload, causing the shell to execute arbitrary code. This enables remote code execution on CI/CD servers and developer machines, leading to credential and secret disclosure, source code and SSH key exfiltration, system compromise, and supply chain attacks via tampered build artifacts. The issue has been fixed in version 9.24.0.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
WebdriverIO below 9.24.0 suffers from command injection in getGitMetadataForAISelection() via unsanitized git branch names, enabling RCE on CI/CD and developer machines.
Vulnerability
In @wdio/browserstack-service before version 9.24.0, the function getGitMetadataForAISelection() in helpers.ts [2] directly interpolates user-controlled git branch names into execSync() calls without sanitization. Git permits branch names containing shell metacharacters [1]. The vulnerability is reachable when the test orchestration feature is configured via testOrchestrationOptions.runSmartSelection.source or defaults to the current directory. All versions below 9.24.0 are affected.
Exploitation
An attacker must supply a malicious git repository whose branch name contains a shell command injection payload (e.g., git checkout -b "main;touch\${IFS}/tmp/pwned.txt") [3][4]. The attacker can set the repository source via testOrchestrationOptions.runSmartSelection.source or rely on the default behavior. When the test framework executes getGitMetadataForAISelection(), it extracts the branch name and passes it to execSync(), causing the shell to interpret the injected commands.
Impact
Successful exploitation achieves remote code execution (RCE) on CI/CD servers and developer machines running the vulnerable service. An attacker can disclose credentials, secrets, source code, and SSH keys; compromise the system; and tamper with build artifacts, enabling supply-chain attacks [1][3].
Mitigation
The issue is fixed in version 9.24.0 of @wdio/browserstack-service. Users should upgrade immediately. There is no known workaround [3][4].
AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
2<= 9.23.2+ 1 more
- (no CPE)range: <= 9.23.2
- (no CPE)range: <9.24.0
Patches
30e6748ecdb11fix(browserstack): better handle malicious branch names (#15069)
1 file changed · +102 −22
packages/wdio-browserstack-service/src/testorchestration/helpers.ts+102 −22 modified@@ -1,10 +1,45 @@ import os from 'node:os' import path from 'node:path' -import { execSync } from 'node:child_process' +import { spawnSync } from 'node:child_process' import logger from '@wdio/logger' const log = logger('wdio-browserstack-service:helpers') +/** + * Validate that a git ref (branch name, commit hash, etc.) contains only safe characters + * to prevent command injection when used in shell commands. + * + * Git refs can contain alphanumeric characters, forward slashes, dots, underscores, and hyphens. + * We explicitly reject any characters that could be used for shell injection. + */ +const SAFE_GIT_REF_PATTERN = /^[a-zA-Z0-9_./-]+$/ + +function isValidGitRef(ref: string): boolean { + if (!ref || ref.length === 0 || ref.length > 256) { + return false + } + return SAFE_GIT_REF_PATTERN.test(ref) +} + +/** + * Safely execute a git command using spawnSync to avoid shell injection. + * This function uses array arguments instead of string interpolation. + */ +function safeGitCommand(args: string[], cwd?: string): string { + const result = spawnSync('git', args, { + cwd, + encoding: 'utf-8', + maxBuffer: 10 * 1024 * 1024 // 10MB buffer for large diffs + }) + if (result.error) { + throw result.error + } + if (result.status !== 0) { + throw new Error(result.stderr || `Git command failed with status ${result.status}`) + } + return result.stdout.trim() +} + type GitRemote = { name: string url: string @@ -77,35 +112,45 @@ function getBaseBranch(): string | null { try { // Try to get the default branch from origin/HEAD symbolic ref (works for most providers) try { - const originHeadOutput = execSync('git symbolic-ref refs/remotes/origin/HEAD').toString().trim() + const originHeadOutput = safeGitCommand(['symbolic-ref', 'refs/remotes/origin/HEAD']) if (originHeadOutput.startsWith('refs/remotes/origin/')) { - return originHeadOutput.replace('refs/remotes/', '') + const branch = originHeadOutput.replace('refs/remotes/', '') + if (isValidGitRef(branch)) { + return branch + } + log.debug(`Invalid branch name detected: ${branch}`) } } catch { log.debug('Could not determine base branch from origin/HEAD') } // Fallback: use the first branch in local heads try { - const branchesOutput = execSync('git branch').toString().trim() + const branchesOutput = safeGitCommand(['branch']) const branches = branchesOutput.split('\n').filter(Boolean) if (branches.length > 0) { // Remove the '* ' from current branch if present and return first branch const firstBranch = branches[0].replace(/^\*\s+/, '').trim() - return firstBranch + if (isValidGitRef(firstBranch)) { + return firstBranch + } + log.debug(`Invalid branch name detected: ${firstBranch}`) } } catch { log.debug('Could not determine base branch from local branches') } // Fallback: use the first remote branch if available try { - const remoteBranchesOutput = execSync('git branch -r').toString().trim() + const remoteBranchesOutput = safeGitCommand(['branch', '-r']) const remoteBranches = remoteBranchesOutput.split('\n').filter(Boolean) for (const branch of remoteBranches) { const cleanBranch = branch.trim() if (cleanBranch.startsWith('origin/') && !cleanBranch.includes('HEAD')) { - return cleanBranch + if (isValidGitRef(cleanBranch)) { + return cleanBranch + } + log.debug(`Invalid branch name detected: ${cleanBranch}`) } } } catch { @@ -126,13 +171,25 @@ function getChangedFilesFromCommits(commitHashes: string[]): string[] { try { for (const commit of commitHashes) { + // Validate commit hash to prevent injection + if (!isValidGitRef(commit)) { + log.debug(`Skipping invalid commit hash: ${commit}`) + continue + } + try { // Check if commit has parents - const parentsOutput = execSync(`git log -1 --pretty=%P ${commit}`).toString().trim() + const parentsOutput = safeGitCommand(['log', '-1', '--pretty=%P', '--', commit]) const parents = parentsOutput.split(' ').filter(Boolean) for (const parent of parents) { - const diffOutput = execSync(`git diff --name-only ${parent} ${commit}`).toString().trim() + // Validate parent hash + if (!isValidGitRef(parent)) { + log.debug(`Skipping invalid parent hash: ${parent}`) + continue + } + + const diffOutput = safeGitCommand(['diff', '--name-only', parent, commit]) const files = diffOutput.split('\n').filter(Boolean) for (const file of files) { @@ -188,39 +245,56 @@ export function getGitMetadataForAISelection(folders: string[] | null = []): Git process.chdir(folder) // Get current branch and latest commit - const currentBranch = execSync('git rev-parse --abbrev-ref HEAD').toString().trim() - const latestCommit = execSync('git rev-parse HEAD').toString().trim() + const currentBranch = safeGitCommand(['rev-parse', '--abbrev-ref', 'HEAD']) + const latestCommit = safeGitCommand(['rev-parse', 'HEAD']) result.prId = latestCommit + // Validate branch names to prevent command injection + if (!isValidGitRef(currentBranch)) { + log.warn(`Invalid current branch name detected: ${currentBranch}. Skipping this folder for security reasons.`) + process.chdir(originalDir) + continue + } + + if (!isValidGitRef(latestCommit)) { + log.warn(`Invalid commit hash detected: ${latestCommit}. Skipping this folder for security reasons.`) + process.chdir(originalDir) + continue + } + // Find base branch const baseBranch = getBaseBranch() log.debug(`Base branch for comparison: ${baseBranch}`) let commits: string[] = [] - if (baseBranch) { + if (baseBranch && isValidGitRef(baseBranch)) { try { // Get changed files between base branch and current branch - const changedFilesOutput = execSync(`git diff --name-only ${baseBranch}..${currentBranch}`).toString().trim() + // Using spawnSync with array arguments to prevent command injection + const changedFilesOutput = safeGitCommand(['diff', '--name-only', `${baseBranch}..${currentBranch}`]) log.debug(`Changed files between ${baseBranch} and ${currentBranch}: ${changedFilesOutput}`) result.filesChanged = changedFilesOutput.split('\n').filter(f => f.trim()) // Get commits between base branch and current branch - const commitsOutput = execSync(`git log ${baseBranch}..${currentBranch} --pretty=%H`).toString().trim() + const commitsOutput = safeGitCommand(['log', `${baseBranch}..${currentBranch}`, '--pretty=%H']) commits = commitsOutput.split('\n').filter(Boolean) } catch (error) { log.debug(`Failed to get changed files from branch comparison. Falling back to recent commits. Error: ${error}`) // Fallback to recent commits - const recentCommitsOutput = execSync('git log -10 --pretty=%H').toString().trim() + const recentCommitsOutput = safeGitCommand(['log', '-10', '--pretty=%H']) commits = recentCommitsOutput.split('\n').filter(Boolean) if (commits.length > 0) { result.filesChanged = getChangedFilesFromCommits(commits.slice(0, 5)) } } } else { + if (baseBranch && !isValidGitRef(baseBranch)) { + log.warn(`Invalid base branch name detected: ${baseBranch}. Falling back to recent commits.`) + } // Fallback to recent commits - const recentCommitsOutput = execSync('git log -10 --pretty=%H').toString().trim() + const recentCommitsOutput = safeGitCommand(['log', '-10', '--pretty=%H']) commits = recentCommitsOutput.split('\n').filter(Boolean) if (commits.length > 0) { @@ -235,11 +309,17 @@ export function getGitMetadataForAISelection(folders: string[] | null = []): Git // Only process commits if we have them if (commits.length > 0) { for (const commit of commits) { + // Validate commit hash + if (!isValidGitRef(commit)) { + log.debug(`Skipping invalid commit hash: ${commit}`) + continue + } + try { - const commitMessage = execSync(`git log -1 --pretty=%B ${commit}`).toString().trim() + const commitMessage = safeGitCommand(['log', '-1', '--pretty=%B', '--', commit]) log.debug(`Processing commit: ${commitMessage}`) - const authorName = execSync(`git log -1 --pretty=%an ${commit}`).toString().trim() + const authorName = safeGitCommand(['log', '-1', '--pretty=%an', '--', commit]) authorsSet.add(authorName || 'Unknown') commitMessages.push({ @@ -256,7 +336,7 @@ export function getGitMetadataForAISelection(folders: string[] | null = []): Git if (commits.length === 0 && result.filesChanged.length > 0) { try { // Try to get current git user as fallback - const fallbackAuthor = execSync('git config user.name').toString().trim() || 'Unknown' + const fallbackAuthor = safeGitCommand(['config', 'user.name']) || 'Unknown' authorsSet.add(fallbackAuthor) log.debug(`Added fallback author: ${fallbackAuthor}`) } catch (error) { @@ -268,16 +348,16 @@ export function getGitMetadataForAISelection(folders: string[] | null = []): Git result.authors = Array.from(authorsSet) result.commitMessages = commitMessages - // Get commit date + // Get commit date (latestCommit already validated above) if (latestCommit) { - const commitDate = execSync(`git log -1 --pretty=%cd --date=format:'%Y-%m-%d' ${latestCommit}`).toString().trim() + const commitDate = safeGitCommand(['log', '-1', '--pretty=%cd', '--date=format:%Y-%m-%d', '--', latestCommit]) result.prDate = commitDate.replace(/'/g, '') } // Set PR title and description from latest commit if not already set if ((!result.prTitle || result.prTitle.trim() === '') && latestCommit) { try { - const latestCommitMessage = execSync(`git log -1 --pretty=%B ${latestCommit}`).toString().trim() + const latestCommitMessage = safeGitCommand(['log', '-1', '--pretty=%B', '--', latestCommit]) const messageLines = latestCommitMessage.trim().split('\n') result.prTitle = messageLines[0] || ''
fccc665cb7dffix(browserstack): bump tar - fixes #15027
3 files changed · +12 −13
packages/wdio-browserstack-service/package.json+1 −1 modified@@ -51,7 +51,7 @@ "git-repo-info": "^2.1.1", "gitconfiglocal": "^2.1.0", "glob": "^11.0.0", - "tar": "^6.1.15", + "tar": "^7.5.7", "undici": "^6.21.3", "uuid": "^11.1.0", "webdriverio": "workspace:*",
packages/wdio-browserstack-service/src/util.ts+2 −2 modified@@ -49,7 +49,7 @@ import UsageStats from './testOps/usageStats.js' import TestOpsConfig from './testOps/testOpsConfig.js' import type { StartBinSessionResponse } from '@browserstack/wdio-browserstack-service' import APIUtils from './cli/apiUtils.js' -import tar from 'tar' +import { create } from 'tar' import { fileFromPath } from 'formdata-node/file-from-path' import AccessibilityScripts from './scripts/accessibility-scripts.js' @@ -1387,7 +1387,7 @@ export async function uploadLogs(user: string | undefined, key: string | undefin copiedFileNames.push(path.basename(f)) } - await tar.create( + await create( { file: tarPath, cwd: tmpDir,
pnpm-lock.yaml+9 −10 modified@@ -787,8 +787,8 @@ importers: specifier: ^11.0.0 version: 11.1.0 tar: - specifier: ^6.1.15 - version: 6.2.1 + specifier: ^7.5.7 + version: 7.5.7 undici: specifier: ^6.21.3 version: 6.23.0 @@ -13997,10 +13997,9 @@ packages: engines: {node: '>=10'} deprecated: Old versions of tar are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exhorbitant rates) by contacting i@izs.me - tar@7.5.2: - resolution: {integrity: sha512-7NyxrTE4Anh8km8iEy7o0QYPs+0JKBTj5ZaqHg6B39erLg0qYXN3BijtShwbsNSvQ+LN75+KV+C4QR/f6Gwnpg==} + tar@7.5.7: + resolution: {integrity: sha512-fov56fJiRuThVFXD6o6/Q354S7pnWMJIVlDBYijsTNx6jKSE4pvrDTs6lUnmGvNyfJwFQQwWy3owKz1ucIhveQ==} engines: {node: '>=18'} - deprecated: Old versions of tar are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exhorbitant rates) by contacting i@izs.me temp-dir@1.0.0: resolution: {integrity: sha512-xZFXEGbG7SNC3itwBzI3RYjq/cEhBkx2hJuKGIUOcEULmkQExXiHat2z/qkISYsuR+IKumhEfKKbV5qXmhICFQ==} @@ -22580,7 +22579,7 @@ snapshots: minipass-pipeline: 1.2.4 p-map: 7.0.4 ssri: 12.0.0 - tar: 7.5.2 + tar: 7.5.7 unique-filename: 4.0.0 cacache@20.0.3: @@ -27763,7 +27762,7 @@ snapshots: nopt: 8.1.0 proc-log: 5.0.0 semver: 7.7.3 - tar: 7.5.2 + tar: 7.5.7 tinyglobby: 0.2.15 which: 5.0.0 transitivePeerDependencies: @@ -28237,7 +28236,7 @@ snapshots: promise-retry: 2.0.1 sigstore: 4.0.0 ssri: 12.0.0 - tar: 7.5.2 + tar: 7.5.7 transitivePeerDependencies: - supports-color @@ -28259,7 +28258,7 @@ snapshots: promise-retry: 2.0.1 sigstore: 4.0.0 ssri: 13.0.0 - tar: 7.5.2 + tar: 7.5.7 transitivePeerDependencies: - supports-color @@ -30625,7 +30624,7 @@ snapshots: mkdirp: 1.0.4 yallist: 4.0.0 - tar@7.5.2: + tar@7.5.7: dependencies: '@isaacs/fs-minipass': 4.0.1 chownr: 3.0.0
8f3b10523a81docs(mcp): Introduce Model Context Protocol documentation on website (#15040)
6 files changed · +2572 −0
website/docs/mcp/configuration.md+609 −0 added@@ -0,0 +1,609 @@ +--- +id: configuration +title: Configuration +--- + +This page documents all configuration options for the WebdriverIO MCP server. + +## MCP Server Configuration + +The MCP server is configured through the Claude Desktop or Claude Code configuration files. + +### Basic Configuration + +#### macOS + +Edit `~/Library/Application Support/Claude/claude_desktop_config.json`: + +```json +{ + "mcpServers": { + "wdio-mcp": { + "command": "npx", + "args": ["-y", "@wdio/mcp"] + } + } +} +``` + +#### Windows + +Edit `%APPDATA%\Claude\claude_desktop_config.json`: + +```json +{ + "mcpServers": { + "wdio-mcp": { + "command": "npx", + "args": ["-y", "@wdio/mcp"] + } + } +} +``` + +#### Claude Code + +Edit your project's `.claude/settings.json`: + +```json +{ + "mcpServers": { + "wdio-mcp": { + "command": "npx", + "args": ["-y", "@wdio/mcp"] + } + } +} +``` + +--- + +## Environment Variables + +Configure the Appium server connection and other settings via environment variables. + +### Appium Connection + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `APPIUM_URL` | string | `127.0.0.1` | Appium server hostname | +| `APPIUM_URL_PORT` | number | `4723` | Appium server port | +| `APPIUM_PATH` | string | `/` | Appium server path | + +### Example with Environment Variables + +```json +{ + "mcpServers": { + "wdio-mcp": { + "command": "npx", + "args": ["-y", "@wdio/mcp"], + "env": { + "APPIUM_URL": "192.168.1.100", + "APPIUM_URL_PORT": "4724", + "APPIUM_PATH": "/wd/hub" + } + } + } +} +``` + +--- + +## Browser Session Options + +Options available when starting a browser session via the `start_browser` tool. + +### `headless` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `false` + +Run Chrome in headless mode (no visible browser window). Useful for CI/CD environments or when you don't need to see the browser. + +### `windowWidth` + +- **Type:** `number` +- **Mandatory:** No +- **Default:** `1920` +- **Range:** `400` - `3840` + +Initial browser window width in pixels. + +### `windowHeight` + +- **Type:** `number` +- **Mandatory:** No +- **Default:** `1080` +- **Range:** `400` - `2160` + +Initial browser window height in pixels. + +### `navigationUrl` + +- **Type:** `string` +- **Mandatory:** No + +URL to navigate to immediately after starting the browser. This is more efficient than calling `start_browser` followed by `navigate` separately. + +**Example:** Start browser and navigate in one call: +``` +Start Chrome and navigate to https://webdriver.io +``` + +--- + +## Mobile Session Options + +Options available when starting a mobile app session via the `start_app_session` tool. + +### Platform Options + +#### `platform` + +- **Type:** `string` +- **Mandatory:** Yes +- **Values:** `iOS` | `Android` + +The mobile platform to automate. + +#### `platformVersion` + +- **Type:** `string` +- **Mandatory:** No + +The OS version of the device/simulator/emulator (e.g., `17.0` for iOS, `14` for Android). + +#### `automationName` + +- **Type:** `string` +- **Mandatory:** No +- **Values:** `XCUITest` (iOS), `UiAutomator2` | `Espresso` (Android) + +The automation driver to use. Defaults to `XCUITest` for iOS and `UiAutomator2` for Android. + +### Device Options + +#### `deviceName` + +- **Type:** `string` +- **Mandatory:** Yes + +Name of the device, simulator, or emulator to use. + +**Examples:** +- iOS Simulator: `iPhone 15 Pro`, `iPad Air (5th generation)` +- Android Emulator: `Pixel 7`, `Nexus 5X` +- Real Device: The device name as shown in your system + +#### `udid` + +- **Type:** `string` +- **Mandatory:** No (Required for real iOS devices) + +Unique Device Identifier. Required for real iOS devices (40-character identifier) and recommended for Android real devices. + +**Finding UDID:** +- **iOS:** Connect device, open Finder/iTunes, click on device → Serial Number (click to reveal UDID) +- **Android:** Run `adb devices` in terminal + +### App Options + +#### `appPath` + +- **Type:** `string` +- **Mandatory:** No* + +Path to the application file to install and launch. + +**Supported formats:** +- iOS Simulator: `.app` directory +- iOS Real Device: `.ipa` file +- Android: `.apk` file + +*Either `appPath` must be provided, or `noReset: true` to connect to an already-running app. + +#### `appWaitActivity` + +- **Type:** `string` +- **Mandatory:** No (Android only) + +Activity to wait for on app launch. If not specified, the app's main/launcher activity is used. + +**Example:** `com.example.app.MainActivity` + +### Session State Options + +#### `noReset` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `false` + +Preserve the app state between sessions. When `true`: +- App data is preserved (login state, preferences, etc.) +- Session will **detach** instead of close (keeps app running) +- Useful for testing user journeys across multiple sessions +- Can be used without `appPath` to connect to an already-running app + +#### `fullReset` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `true` + +Completely reset the app before the session. When `true`: +- iOS: Uninstalls and reinstalls the app +- Android: Clears app data and cache +- Useful for starting with a clean state + +Set `fullReset: false` with `noReset: true` to preserve app state completely. + +### Session Timeout + +#### `newCommandTimeout` + +- **Type:** `number` +- **Mandatory:** No +- **Default:** `60` + +How long (in seconds) Appium will wait for a new command before assuming the client has quit and ending the session. Increase this value for longer debugging sessions. + +**Examples:** +- `60` - Default, suitable for most automation +- `300` - 5 minutes, for debugging or slower operations +- `600` - 10 minutes, for very long-running tests + +### Automatic Handling Options + +#### `autoGrantPermissions` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `true` + +Automatically grant app permissions on install/launch. When `true`: +- Camera, microphone, location, etc. permissions are auto-granted +- No manual permission dialog handling needed +- Streamlines automation by avoiding permission popups + +:::note Android Only +This option primarily affects Android. iOS permissions must be handled differently due to system restrictions. +::: + +#### `autoAcceptAlerts` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `true` + +Automatically accept system alerts (dialogs) that appear during automation. + +**Examples of auto-accepted alerts:** +- "Allow notifications?" +- "App would like to access your location" +- "Allow app to access photos?" + +#### `autoDismissAlerts` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `false` + +Dismiss (cancel) system alerts instead of accepting them. Takes precedence over `autoAcceptAlerts` when set to `true`. + +### Appium Server Override + +You can override the Appium server connection on a per-session basis: + +#### `appiumHost` + +- **Type:** `string` +- **Mandatory:** No + +Appium server hostname. Overrides `APPIUM_URL` environment variable. + +#### `appiumPort` + +- **Type:** `number` +- **Mandatory:** No + +Appium server port. Overrides `APPIUM_URL_PORT` environment variable. + +#### `appiumPath` + +- **Type:** `string` +- **Mandatory:** No + +Appium server path. Overrides `APPIUM_PATH` environment variable. + +--- + +## Element Detection Options + +Options for the `get_visible_elements` tool. + +### `elementType` + +- **Type:** `string` +- **Mandatory:** No +- **Default:** `interactable` +- **Values:** `interactable` | `visual` | `all` + +Type of elements to return: +- `interactable`: Buttons, links, inputs, and other clickable elements +- `visual`: Images, SVGs, and visual elements +- `all`: Both interactable and visual elements + +### `inViewportOnly` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `true` + +Only return elements that are visible within the current viewport. When `false`, returns all elements in the view hierarchy (useful for finding off-screen elements). + +### `includeContainers` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `false` + +Include container/layout elements in the results. When `true`: + +**Android containers included:** +- `ViewGroup`, `FrameLayout`, `LinearLayout` +- `RelativeLayout`, `ConstraintLayout` +- `ScrollView`, `RecyclerView` + +**iOS containers included:** +- `View`, `StackView`, `CollectionView` +- `ScrollView`, `TableView` + +Useful for debugging layout issues or understanding the view hierarchy. + +### `includeBounds` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `false` + +Include element bounds/coordinates (x, y, width, height) in the response. Set to `true` for: +- Coordinate-based interactions +- Layout debugging +- Visual element positioning + +### Pagination Options + +For large pages with many elements, use pagination to reduce token usage: + +#### `limit` + +- **Type:** `number` +- **Mandatory:** No +- **Default:** `0` (unlimited) + +Maximum number of elements to return. + +#### `offset` + +- **Type:** `number` +- **Mandatory:** No +- **Default:** `0` + +Number of elements to skip before returning results. + +**Example:** Get elements 21-40: +``` +Get visible elements with limit 20 and offset 20 +``` + +--- + +## Accessibility Tree Options + +Options for the `get_accessibility` tool (browser-only). + +### `limit` + +- **Type:** `number` +- **Mandatory:** No +- **Default:** `100` + +Maximum number of nodes to return. Use `0` for unlimited (not recommended for large pages). + +### `offset` + +- **Type:** `number` +- **Mandatory:** No +- **Default:** `0` + +Number of nodes to skip for pagination. + +### `roles` + +- **Type:** `string[]` +- **Mandatory:** No +- **Default:** All roles + +Filter to specific accessibility roles. + +**Common roles:** `button`, `link`, `textbox`, `checkbox`, `radio`, `heading`, `img`, `listitem` + +**Example:** Get only buttons and links: +``` +Get accessibility tree filtered to button and link roles +``` + +### `namedOnly` + +- **Type:** `boolean` +- **Mandatory:** No +- **Default:** `true` + +Only return nodes that have a name/label. Filters out anonymous containers and reduces noise in the results. + +--- + +## Screenshot Options + +Options for the `take_screenshot` tool. + +### `outputPath` + +- **Type:** `string` +- **Mandatory:** No + +Path where to save the screenshot file. If not provided, returns base64-encoded image data. + +### Automatic Optimization + +Screenshots are automatically processed to optimize for LLM consumption: + +| Optimization | Value | Description | +|--------------|-------|-------------| +| Max dimension | 2000px | Images larger than 2000px are scaled down | +| Max file size | 1MB | Images are compressed to stay under 1MB | +| Format | PNG/JPEG | PNG with max compression; JPEG if needed for size | + +This optimization ensures screenshots can be efficiently processed without exceeding token limits. + +--- + +## Session Behavior + +### Session Types + +The MCP server tracks session types to provide appropriate tools and behavior: + +| Type | Description | Auto-Detach | +|------|-------------|-------------| +| `browser` | Chrome browser session | No | +| `ios` | iOS app session | Yes (if `noReset: true` or no `appPath`) | +| `android` | Android app session | Yes (if `noReset: true` or no `appPath`) | + +### Single-Session Model + +The MCP server operates with a **single-session model**: + +- Only one browser OR app session can be active at a time +- Starting a new session will close/detach the current session +- Session state is maintained globally across tool calls + +### Detach vs Close + +| Action | `detach: false` (Close) | `detach: true` (Detach) | +|--------|-------------------------|-------------------------| +| Browser | Closes Chrome completely | Keeps Chrome running, disconnects WebDriver | +| Mobile App | Terminates app | Keeps app running in current state | +| Use Case | Clean slate for next session | Preserve state, manual inspection | + +--- + +## Performance Considerations + +The MCP server is optimized for efficient LLM communication using **TOON (Token-Oriented Object Notation)** format, which minimizes token usage when sending data to Claude. + +### Browser Automation + +- **Headless mode** is faster but doesn't render visual elements +- **Smaller window sizes** reduce screenshot capture time +- **Element detection** is optimized with a single script execution +- **Screenshot optimization** keeps images under 1MB for efficient processing +- **`inViewportOnly: true`** (default) filters to only visible elements + +### Mobile Automation + +- **XML page source parsing** uses only 2 HTTP calls (vs 600+ for traditional element queries) +- **Accessibility ID selectors** are fastest and most reliable +- **XPath selectors** are slowest - use only as a last resort +- **`inViewportOnly: true`** (default) significantly reduces element count +- **Pagination** (`limit` and `offset`) reduces token usage for screens with many elements +- **`includeBounds: false`** (default) omits coordinate data unless needed + +### Token Usage Tips + +| Setting | Impact | +|---------|--------| +| `inViewportOnly: true` | Filters off-screen elements, reducing response size | +| `includeContainers: false` | Excludes layout elements (ViewGroup, etc.) | +| `includeBounds: false` | Omits x/y/width/height data | +| `limit` with pagination | Process elements in batches instead of all at once | +| `namedOnly: true` (accessibility) | Filters anonymous nodes | + +--- + +## Appium Server Setup + +Before using mobile automation, ensure Appium is properly configured. + +### Basic Setup + +```sh +# Install Appium globally +npm install -g appium + +# Install drivers +appium driver install xcuitest # iOS +appium driver install uiautomator2 # Android + +# Start the server +appium +``` + +### Custom Server Configuration + +```sh +# Start with custom host and port +appium --address 0.0.0.0 --port 4724 + +# Start with logging +appium --log-level debug + +# Start with specific base path +appium --base-path /wd/hub +``` + +### Verify Installation + +```sh +# Check installed drivers +appium driver list --installed + +# Check Appium version +appium --version + +# Test connection +curl http://localhost:4723/status +``` + +--- + +## Troubleshooting Configuration + +### MCP Server Not Starting + +1. Verify npm/npx is installed: `npm --version` +2. Try running manually: `npx @wdio/mcp` +3. Check Claude Desktop logs for errors + +### Appium Connection Issues + +1. Verify Appium is running: `curl http://localhost:4723/status` +2. Check environment variables match Appium server settings +3. Ensure firewall allows connections on the Appium port + +### Session Won't Start + +1. **Browser:** Ensure Chrome is installed +2. **iOS:** Verify Xcode and simulators are available +3. **Android:** Check `ANDROID_HOME` and emulator is running +4. Review Appium server logs for detailed error messages + +### Session Timeouts + +If sessions are timing out during debugging: +1. Increase `newCommandTimeout` when starting the session +2. Use `noReset: true` to preserve state between sessions +3. Use `detach: true` when closing to keep the app running
website/docs/mcp/faq.md+437 −0 added@@ -0,0 +1,437 @@ +--- +id: faq +title: FAQ +--- + +Frequently asked questions about WebdriverIO MCP. + +## General + +### What is MCP? + +MCP (Model Context Protocol) is an open protocol that enables AI assistants like Claude to interact with external tools and services. WebdriverIO MCP implements this protocol to provide browser and mobile automation capabilities to Claude Desktop and Claude Code. + +### What can I automate with WebdriverIO MCP? + +You can automate: +- **Desktop browsers** (Chrome) - navigation, clicking, typing, screenshots +- **iOS apps** - on simulators or real devices +- **Android apps** - on emulators or real devices +- **Hybrid apps** - switching between native and web contexts + +### Do I need to write code? + +No! That's the main benefit of MCP. You can describe what you want to do in natural language, and Claude will use the appropriate tools to accomplish the task. + +**Example prompts:** +- "Open Chrome and navigate to webdriver.io" +- "Click the Get Started button" +- "Take a screenshot of the current page" +- "Start my iOS app and log in as test user" + +--- + +## Installation & Setup + +### How do I install WebdriverIO MCP? + +You don't need to install it separately. The MCP server runs automatically via npx when you configure it in Claude Desktop or Claude Code. + +Add this to your Claude Desktop config: + +```json +{ + "mcpServers": { + "wdio-mcp": { + "command": "npx", + "args": ["-y", "@wdio/mcp"] + } + } +} +``` + +### Where is the Claude Desktop config file? + +- **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json` +- **Windows:** `%APPDATA%\Claude\claude_desktop_config.json` + +### Do I need Appium for browser automation? + +No. Browser automation only requires Chrome to be installed. WebdriverIO handles the ChromeDriver automatically. + +### Do I need Appium for mobile automation? + +Yes. Mobile automation requires: +1. Appium server running (`npm install -g appium && appium`) +2. Platform drivers installed (`appium driver install xcuitest` for iOS, `appium driver install uiautomator2` for Android) +3. Appropriate development tools (Xcode for iOS, Android SDK for Android) + +--- + +## Browser Automation + +### Which browsers are supported? + +Currently, only **Chrome** is supported. Support for other browsers may be added in future versions. + +### Can I run Chrome in headless mode? + +Yes! Ask Claude to start the browser in headless mode: + +"Start Chrome in headless mode" + +Or Claude will use this option when appropriate (e.g., in CI/CD contexts). + +### Can I set the browser window size? + +Yes. You can specify dimensions when starting the browser: + +"Start Chrome with a window size of 1920x1080" + +Supported dimensions: 400-3840 pixels wide, 400-2160 pixels tall. Default is 1920x1080. + +### Can I start the browser and navigate in one step? + +Yes! Use the `navigationUrl` parameter: + +"Start Chrome and navigate to https://webdriver.io" + +This is more efficient than starting the browser and then navigating separately. + +### How do I take screenshots? + +Simply ask Claude: + +"Take a screenshot of the current page" + +Screenshots are automatically optimized: +- Scaled to max 2000px dimension +- Compressed to max 1MB file size +- Format: PNG or JPEG (automatically selected for optimal quality) + +### Can I interact with iframes? + +Currently, the MCP server operates on the main document. iframe interaction may be added in future versions. + +### Can I execute custom JavaScript? + +Yes! Use the `execute_script` tool: + +"Execute script to get the page title" +"Execute script: return document.querySelectorAll('button').length" + +--- + +## Mobile Automation + +### How do I start an iOS app? + +Ask Claude with the necessary details: + +"Start my iOS app located at /path/to/MyApp.app on the iPhone 15 simulator" + +Or for an installed app: + +"Start the app with noReset enabled on the iPhone 15 simulator" + +### How do I start an Android app? + +"Start my Android app at /path/to/app.apk on the Pixel 7 emulator" + +Or for an installed app: + +"Start the app with noReset enabled on the Pixel 7 emulator" + +### Can I test on real devices? + +Yes! For real devices, you'll need the device UDID: + +- **iOS:** Connect device, open Finder, click device, click serial number to reveal UDID +- **Android:** Run `adb devices` in terminal + +Then ask Claude: + +"Start my iOS app on the real device with UDID abc123..." + +### How do I handle permission dialogs? + +By default, permissions are automatically granted (`autoGrantPermissions: true`). If you need to test permission flows, you can disable this: + +"Start my app without automatically granting permissions" + +### What gestures are supported? + +- **Tap:** Tap on elements or coordinates +- **Swipe:** Swipe up, down, left, or right +- **Drag and Drop:** Drag from one element to another or to coordinates + +Note: `long_press` is available through `execute_script` with Appium mobile commands. + +### How do I scroll in mobile apps? + +Use swipe gestures: + +"Swipe up to scroll down" +"Swipe down to scroll up" + +### Can I rotate the device? + +Yes: + +"Rotate the device to landscape" +"Rotate the device to portrait" + +### How do I handle hybrid apps? + +For apps with webviews, you can switch contexts: + +"Get available contexts" +"Switch to the webview context" +"Switch back to native context" + +### Can I execute Appium mobile commands? + +Yes! Use the `execute_script` tool: + +``` +Execute script "mobile: pressKey" with args [{ keycode: 4 }] // Press BACK on Android +Execute script "mobile: activateApp" with args [{ appId: "com.example.app" }] +Execute script "mobile: terminateApp" with args [{ bundleId: "com.example.app" }] +``` + +--- + +## Element Selection + +### How does Claude know which element to interact with? + +Claude uses the `get_visible_elements` tool to identify interactive elements on the page/screen. Each element comes with multiple selector strategies. + +### What if there are too many elements on the page? + +Use pagination to manage large element lists: + +"Get the first 20 visible elements" +"Get visible elements with offset 20 and limit 20" + +The response includes `total`, `showing`, and `hasMore` to help navigate through elements. + +### Can I get only specific types of elements? + +Yes! Use the `elementType` parameter: + +- `interactable` (default): Buttons, links, inputs +- `visual`: Images, SVGs +- `all`: Both types + +"Get visible visual elements on the page" + +### What if Claude clicks the wrong element? + +You can be more specific: + +- Provide exact text: "Click the button that says 'Submit Order'" +- Provide selector: "Click the element with selector #submit-btn" +- Provide accessibility ID: "Click the element with accessibility ID loginButton" + +### What's the best selector strategy for mobile? + +1. **Accessibility ID** (best) - `~loginButton` +2. **Resource ID** (Android) - `id=login_button` +3. **Predicate String** (iOS) - `-ios predicate string:label == "Login"` +4. **XPath** (last resort) - slower but works everywhere + +### What is the accessibility tree and when should I use it? + +The accessibility tree provides semantic information about page elements (roles, names, states). Use `get_accessibility` when: +- `get_visible_elements` doesn't return expected elements +- You need to find elements by accessibility role (button, link, textbox, etc.) +- You need detailed semantic information about elements + +"Get accessibility tree filtered to button and link roles" + +--- + +## Session Management + +### Can I have multiple sessions at once? + +No. The MCP server uses a single-session model. Only one browser or app session can be active at a time. + +### What happens when I close a session? + +It depends on the session type and settings: + +- **Browser:** Chrome closes completely +- **Mobile with `noReset: false`:** App terminates +- **Mobile with `noReset: true` or no `appPath`:** App stays open (session detaches automatically) + +### Can I preserve app state between sessions? + +Yes! Use the `noReset` option: + +"Start my app with noReset enabled" + +This preserves login state, preferences, and other app data. + +### What's the difference between close and detach? + +- **Close:** Terminates the browser/app completely +- **Detach:** Disconnects automation but keeps browser/app running + +Detach is useful when you want to manually inspect the state after automation. + +### My session keeps timing out during debugging + +Increase the command timeout: + +"Start my app with newCommandTimeout of 300 seconds" + +Default is 60 seconds. For long debugging sessions, try 300-600 seconds. + +--- + +## Troubleshooting + +### "Session not found" error + +This means no active session exists. Start a browser or app session first: + +"Start Chrome and navigate to google.com" + +### "Element not found" error + +The element might not be visible or might have a different selector. Try: + +1. Asking Claude to get all visible elements first +2. Providing a more specific selector +3. Waiting for the page/app to fully load +4. Using `inViewportOnly: false` to find off-screen elements + +### Browser won't start + +1. Ensure Chrome is installed +2. Check if another process is using the debugging port (9222) +3. Try headless mode + +### Appium connection failed + +This is the most common issue when starting mobile automation. + +1. **Verify Appium is running**: `curl http://localhost:4723/status` +2. Start Appium if needed: `appium` +3. Check your Appium URL configuration matches the server +4. Ensure drivers are installed: `appium driver list --installed` + +:::tip +The MCP server requires Appium to be running before starting mobile sessions. Make sure to start Appium first: +```sh +appium +``` +Future versions may include automatic Appium service management. +::: + +### iOS Simulator won't start + +1. Ensure Xcode is installed: `xcode-select --install` +2. List available simulators: `xcrun simctl list devices` +3. Check for specific simulator errors in Console.app + +### Android Emulator won't start + +1. Set `ANDROID_HOME`: `export ANDROID_HOME=$HOME/Library/Android/sdk` +2. Check emulators: `emulator -list-avds` +3. Start emulator manually: `emulator -avd <avd-name>` +4. Verify device is connected: `adb devices` + +### Screenshots aren't working + +1. For mobile, ensure the session is active +2. For browser, try a different page (some pages block screenshots) +3. Check Claude Desktop logs for errors + +Screenshots are automatically compressed to max 1MB, so large screenshots will work but may be lower quality. + +--- + +## Performance + +### Why is mobile automation slow? + +Mobile automation involves: +1. Network communication with Appium server +2. Appium communicating with the device/simulator +3. Device rendering and response + +Tips for faster automation: +- Use emulators/simulators instead of real devices for development +- Use accessibility IDs instead of XPath +- Enable `inViewportOnly: true` for element detection +- Use pagination (`limit`) to reduce token usage + +### How can I speed up element detection? + +The MCP server already optimizes element detection using XML page source parsing (2 HTTP calls vs 600+ for traditional element queries). Additional tips: + +- Keep `inViewportOnly: true` (default) +- Set `includeContainers: false` (default) +- Use `limit` and `offset` for pagination on large screens +- Use specific selectors instead of finding all elements + +### Screenshots are slow or failing + +Screenshots are automatically optimized: +- Resized if larger than 2000px +- Compressed to stay under 1MB +- Converted to JPEG if PNG is too large + +This optimization reduces processing time and ensures Claude can handle the image. + +--- + +## Limitations + +### What are the current limitations? + +- **Single session:** Only one browser/app at a time +- **Browser support:** Chrome only (for now) +- **iframe support:** Limited support for iframes +- **File uploads:** Not directly supported via tools +- **Audio/Video:** Cannot interact with media playback +- **Browser extensions:** Not supported + +### Can I use this for production testing? + +WebdriverIO MCP is designed for interactive AI-assisted automation. For production CI/CD testing, consider using WebdriverIO's traditional test runner with full programmatic control. + +--- + +## Security + +### Is my data secure? + +The MCP server runs locally on your machine. All automation happens through local browser/Appium connections. No data is sent to external servers beyond what you explicitly navigate to. + +### Can Claude access my passwords? + +Claude can see page content and interact with elements, but: +- Passwords in `<input type="password">` fields are masked +- You should avoid automating sensitive credentials +- Use test accounts for automation + +--- + +## Contributing + +### How can I contribute? + +Visit the [GitHub repository](https://github.com/webdriverio/mcp) to: +- Report bugs +- Request features +- Submit pull requests + +### Where can I get help? + +- [WebdriverIO Discord](https://discord.webdriver.io/) +- [GitHub Issues](https://github.com/webdriverio/mcp/issues) +- [WebdriverIO Documentation](https://webdriver.io/)
website/docs/MCP.md+437 −0 added@@ -0,0 +1,437 @@ +--- +id: mcp +title: MCP (Model Context Protocol) +--- + +## What can it do? + +WebdriverIO MCP is a **Model Context Protocol (MCP) server** that enables AI assistants like Claude Desktop and Claude Code to automate and interact with web browsers and mobile applications. + +### Why WebdriverIO MCP? + +- **Mobile-First**: Unlike browser-only MCP servers, WebdriverIO MCP supports iOS and Android native app automation via Appium +- **Cross-Platform Selectors**: Smart element detection generates multiple locator strategies (accessibility ID, XPath, UiAutomator, iOS predicates) automatically +- **WebdriverIO Ecosystem**: Built on the battle-tested WebdriverIO framework with its rich ecosystem of services and reporters + +It provides a unified interface for: + +- 🖥️ **Desktop Browsers** (Chrome - headed or headless mode) +- 📱 **Native Mobile Apps** (iOS Simulators / Android Emulators / Real Devices via Appium) +- 📳 **Hybrid Mobile Apps** (Native + WebView context switching via Appium) + +through the [`@wdio/mcp`](https://www.npmjs.com/package/@wdio/mcp) package. + +This allows AI assistants to: + +- **Launch and control browsers** with configurable dimensions, headless mode, and optional initial navigation +- **Navigate websites** and interact with elements (click, type, scroll) +- **Analyze page content** via accessibility tree and visible elements detection with pagination support +- **Take screenshots** automatically optimized (resized, compressed to max 1MB) +- **Manage cookies** for session handling +- **Control mobile devices** including gestures (tap, swipe, drag and drop) +- **Switch contexts** in hybrid apps between native and webview +- **Execute scripts** - JavaScript in browsers, Appium mobile commands on devices +- **Handle device features** like rotation, keyboard, geolocation +- and much more, see the [Tools](./mcp/tools) and [Configuration](./mcp/configuration) options + +:::info + +NOTE For Mobile Apps +Mobile automation requires a running Appium server with the appropriate drivers installed. See [Prerequisites](#prerequisites) for setup instructions. + +::: + +## Installation + +The easiest way to use `@wdio/mcp` is via npx without any local installation: + +```sh +npx @wdio/mcp +``` + +Or install it globally: + +```sh +npm install -g @wdio/mcp +``` + +## Usage with Claude + +To use WebdriverIO MCP with Claude, modify the configuration file: + +```json +{ + "mcpServers": { + "wdio-mcp": { + "command": "npx", + "args": ["-y", "@wdio/mcp"] + } + } +} +``` + +After adding the configuration, restart Claude. The WebdriverIO MCP tools will be available for browser and mobile automation tasks. + +### Usage with Claude Code + +Claude Code automatically detects MCP servers. You can configure it in your project's `.claude/settings.json`, or `.mcp.json`. + +Or add it to .claude.json globally with executing: +```bash +claude mcp add --transport stdio wdio-mcp -- npx -y @wdio/mcp +``` +Validate it by running the `/mcp` command inside claude code. + +## Quick Start Examples + +### Browser Automation + +Ask Claude to automate browser tasks: + +``` +"Open Chrome and navigate to https://webdriver.io" +"Click the 'Get Started' button" +"Take a screenshot of the page" +"Find all visible links on the page" +``` + +### Mobile App Automation + +Ask Claude to automate mobile apps: + +``` +"Start my iOS app on the iPhone 15 simulator" +"Tap the login button" +"Swipe up to scroll down" +"Take a screenshot of the current screen" +``` + +## Capabilities + +### Browser Automation (Chrome) + +| Feature | Description | +|---------|-------------| +| **Session Management** | Launch Chrome in headed/headless mode with custom dimensions and optional navigation URL | +| **Navigation** | Navigate to URLs | +| **Element Interaction** | Click elements, type text, find elements by various selectors | +| **Page Analysis** | Get visible elements (with pagination), accessibility tree (with filtering) | +| **Screenshots** | Capture screenshots (auto-optimized to max 1MB) | +| **Scrolling** | Scroll up/down by configurable pixel amounts | +| **Cookie Management** | Get, set, and delete cookies | +| **Script Execution** | Execute custom JavaScript in browser context | + +### Mobile App Automation (iOS/Android) + +| Feature | Description | +|---------|-------------| +| **Session Management** | Launch apps on simulators, emulators, or real devices | +| **Touch Gestures** | Tap, swipe, drag and drop | +| **Element Detection** | Smart element detection with multiple locator strategies and pagination | +| **App Lifecycle** | Get app state (via `execute_script` for activate/terminate) | +| **Context Switching** | Switch between native and webview contexts in hybrid apps | +| **Device Control** | Rotate device, keyboard control | +| **Geolocation** | Get and set device GPS coordinates | +| **Permissions** | Automatic permission and alert handling | +| **Script Execution** | Execute Appium mobile commands (pressKey, deepLink, shell, etc.) | + +## Prerequisites + +### Browser Automation + +- **Chrome** must be installed on your system +- WebdriverIO handles automated ChromeDriver management + +### Mobile Automation + +#### iOS + +1. **Install Xcode** from the Mac App Store +2. **Install Xcode Command Line Tools**: + ```sh + xcode-select --install + ``` +3. **Install Appium**: + ```sh + npm install -g appium + ``` +4. **Install the XCUITest driver**: + ```sh + appium driver install xcuitest + ``` +5. **Start the Appium server**: + ```sh + appium + ``` +6. **For Simulators**: Open Xcode → Window → Devices and Simulators to create/manage simulators +7. **For Real Devices**: You'll need the device UDID (40-character unique identifier) + +#### Android + +1. **Install Android Studio** and set up Android SDK +2. **Set environment variables**: + ```sh + export ANDROID_HOME=$HOME/Library/Android/sdk + export PATH=$PATH:$ANDROID_HOME/emulator + export PATH=$PATH:$ANDROID_HOME/platform-tools + ``` +3. **Install Appium**: + ```sh + npm install -g appium + ``` +4. **Install the UiAutomator2 driver**: + ```sh + appium driver install uiautomator2 + ``` +5. **Start the Appium server**: + ```sh + appium + ``` +6. **Create an emulator** via Android Studio → Virtual Device Manager +7. **Start the emulator** before running tests + +## Architecture + +### How It Works + +WebdriverIO MCP acts as a bridge between AI assistants and browser/mobile automation: + +``` +┌─────────────────┐ MCP Protocol ┌─────────────────┐ +│ Claude Desktop │ ◄──────────────────► │ @wdio/mcp │ +│ or Claude Code │ (stdio) │ Server │ +└─────────────────┘ └────────┬────────┘ + │ + WebDriverIO API + │ + ┌──────────────────────────────┼──────────────────────────────┐ + │ │ │ + ┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐ + │ Chrome │ │ Appium │ │ Appium │ + │ (Browser) │ │ (iOS) │ │ (Android) │ + └───────────────┘ └───────────────┘ └───────────────┘ +``` + +### Session Management + +- **Single-session model**: Only one browser OR app session can be active at a time +- **Session state** is maintained globally across tool calls +- **Auto-detach**: Sessions with preserved state (`noReset: true`) automatically detach on close + +### Element Detection + +#### Browser (Web) + +- Uses an optimized browser script to find all visible, interactable elements +- Returns elements with CSS selectors, IDs, classes, and ARIA information +- Filters to viewport-visible elements by default + +#### Mobile (Native Apps) + +- Uses efficient XML page source parsing (2 HTTP calls vs 600+ for traditional queries) +- Platform-specific element classification for Android and iOS +- Generates multiple locator strategies per element: + - Accessibility ID (cross-platform, most stable) + - Resource ID / Name attribute + - Text / Label matching + - XPath (full and simplified) + - UiAutomator (Android) / Predicates (iOS) + +## Selector Syntax + +The MCP server supports multiple selector strategies. See [Selectors](./mcp/selectors) for detailed documentation. + +### Web (CSS/XPath) + +``` +# CSS Selectors +button.my-class +#element-id +[data-testid="login"] + +# XPath +//button[@class='submit'] +//a[contains(text(), 'Click')] + +# Text Selectors (WebdriverIO specific) +button=Exact Button Text +a*=Partial Link Text +``` + +### Mobile (Cross-Platform) + +``` +# Accessibility ID (recommended - works on iOS & Android) +~loginButton + +# Android UiAutomator +android=new UiSelector().text("Login") + +# iOS Predicate String +-ios predicate string:label == "Login" + +# iOS Class Chain +-ios class chain:**/XCUIElementTypeButton[`label == "Login"`] + +# XPath (works on both platforms) +//android.widget.Button[@text="Login"] +//XCUIElementTypeButton[@label="Login"] +``` + +## Available Tools + +The MCP server provides 25 tools for browser and mobile automation. See [Tools](./mcp/tools) for the complete reference. + +### Browser Tools + +| Tool | Description | +|------|-------------| +| `start_browser` | Launch Chrome browser (with optional initial URL) | +| `close_session` | Close or detach from session | +| `navigate` | Navigate to a URL | +| `click_element` | Click an element | +| `set_value` | Type text into input | +| `get_visible_elements` | Get visible/interactable elements (with pagination) | +| `get_accessibility` | Get accessibility tree (with filtering) | +| `take_screenshot` | Capture screenshot (auto-optimized) | +| `scroll` | Scroll the page up or down | +| `get_cookies` / `set_cookie` / `delete_cookies` | Cookie management | +| `execute_script` | Execute JavaScript in browser | + +### Mobile Tools + +| Tool | Description | +|------|-------------| +| `start_app_session` | Launch iOS/Android app | +| `tap_element` | Tap element or coordinates | +| `swipe` | Swipe in a direction | +| `drag_and_drop` | Drag between locations | +| `get_app_state` | Check if app is running | +| `get_contexts` / `switch_context` | Hybrid app context switching | +| `rotate_device` | Rotate to portrait/landscape | +| `get_geolocation` / `set_geolocation` | Get or set GPS coordinates | +| `hide_keyboard` | Dismiss on-screen keyboard | +| `execute_script` | Execute Appium mobile commands | + +## Automatic Handling + +### Permissions + +By default, the MCP server automatically grants app permissions (`autoGrantPermissions: true`), eliminating the need to manually handle permission dialogs during automation. + +### System Alerts + +System alerts (like "Allow notifications?") are automatically accepted by default (`autoAcceptAlerts: true`). This can be configured to dismiss instead with `autoDismissAlerts: true`. + +## Configuration + +### Environment Variables + +Configure the Appium server connection: + +| Variable | Default | Description | +|----------|---------|-------------| +| `APPIUM_URL` | `127.0.0.1` | Appium server hostname | +| `APPIUM_URL_PORT` | `4723` | Appium server port | +| `APPIUM_PATH` | `/` | Appium server path | + +### Example with Custom Appium Server + +```json +{ + "mcpServers": { + "wdio-mcp": { + "command": "npx", + "args": ["-y", "@wdio/mcp"], + "env": { + "APPIUM_URL": "192.168.1.100", + "APPIUM_URL_PORT": "4724" + } + } + } +} +``` + +## Performance Optimization + +The MCP server is optimized for efficient AI assistant communication: + +- **TOON Format**: Uses Token-Oriented Object Notation for minimal token usage +- **XML Parsing**: Mobile element detection uses 2 HTTP calls (vs 600+ traditionally) +- **Screenshot Compression**: Images auto-compressed to max 1MB using Sharp +- **Viewport Filtering**: Only visible elements returned by default +- **Pagination**: Large element lists can be paginated to reduce response size + +## TypeScript Support + +The MCP server is written in TypeScript and includes full type definitions. If you're extending or integrating with the server programmatically, you'll benefit from auto-completion and type safety. + +## Error Handling + +All tools are designed with robust error handling: + +- Errors are returned as text content (never thrown), maintaining MCP protocol stability +- Descriptive error messages help diagnose issues +- Session state is preserved even when individual operations fail + +## Use Cases + +### Quality Assurance + +- AI-powered test case execution +- Visual regression testing with screenshots +- Accessibility auditing via accessibility tree analysis + +### Web Scraping & Data Extraction + +- Navigate complex multi-page flows +- Extract structured data from dynamic content +- Handle authentication and session management + +### Mobile App Testing + +- Cross-platform test automation (iOS + Android) +- Onboarding flow validation +- Deep linking and navigation testing + +### Integration Testing + +- End-to-end workflow testing +- API + UI integration verification +- Multi-platform consistency checks + +## Troubleshooting + +### Browser won't start + +- Ensure Chrome is installed +- Check that no other process is using the default debugging port (9222) +- Try headless mode if display issues occur + +### Appium connection failed + +- Verify Appium server is running (`appium`) +- Check the Appium URL and port configuration +- Ensure the appropriate driver is installed (`appium driver list`) + +### iOS Simulator issues + +- Ensure Xcode is installed and up to date +- Check that simulators are available (`xcrun simctl list devices`) +- For real devices, verify the UDID is correct + +### Android Emulator issues + +- Ensure Android SDK is properly configured +- Verify emulator is running (`adb devices`) +- Check that `ANDROID_HOME` environment variable is set + +## Resources + +- [Tools Reference](./mcp/tools) - Complete list of available tools +- [Selectors Guide](./mcp/selectors) - Selector syntax documentation +- [Configuration](./mcp/configuration) - Configuration options +- [FAQ](./mcp/faq) - Frequently asked questions +- [GitHub Repository](https://github.com/webdriverio/mcp) - Source code and issues +- [NPM Package](https://www.npmjs.com/package/@wdio/mcp) - Package on npm +- [Model Context Protocol](https://modelcontextprotocol.io/) - MCP specification
website/docs/mcp/selectors.md+394 −0 added@@ -0,0 +1,394 @@ +--- +id: selectors +title: Selectors +--- + +The WebdriverIO MCP server supports multiple selector strategies for locating elements on web pages and mobile apps. + +:::info + +For comprehensive selector documentation including all WebdriverIO selector strategies, see the main [Selectors](/docs/selectors) guide. This page focuses on selectors commonly used with the MCP server. + +::: + +## Web Selectors + +For browser automation, the MCP server supports all standard WebdriverIO selectors. The most commonly used include: + +| Selector | Example | Description | +|----------|---------|-------------| +| CSS | `#login-button`, `.submit-btn` | Standard CSS selectors | +| XPath | `//button[@id='submit']` | XPath expressions | +| Text | `button=Submit`, `a*=Click` | WebdriverIO text selectors | +| ARIA | `aria/Submit Button` | Accessibility name selectors | +| Test ID | `[data-testid="submit"]` | Recommended for testing | + +For detailed examples and best practices, see the [Selectors](/docs/selectors) documentation. + +--- + +## Mobile Selectors + +Mobile selectors work with both iOS and Android platforms through Appium. + +### Accessibility ID (Recommended) + +Accessibility IDs are the **most reliable cross-platform selector**. They work on both iOS and Android and are stable across app updates. + +``` +# Syntax +~accessibilityId + +# Examples +~loginButton +~submitForm +~usernameField +``` + +:::tip Best Practice +Always prefer accessibility IDs when available. They provide: +- Cross-platform compatibility (iOS + Android) +- Stability across UI changes +- Better test maintainability +- Improved accessibility of your app +::: + +### Android Selectors + +#### UiAutomator + +UiAutomator selectors are powerful and fast for Android. + +``` +# By Text +android=new UiSelector().text("Login") + +# By Partial Text +android=new UiSelector().textContains("Log") + +# By Resource ID +android=new UiSelector().resourceId("com.example:id/login_button") + +# By Class Name +android=new UiSelector().className("android.widget.Button") + +# By Description (Accessibility) +android=new UiSelector().description("Login button") + +# Combined Conditions +android=new UiSelector().className("android.widget.Button").text("Login") + +# Scrollable Container +android=new UiScrollable(new UiSelector().scrollable(true)).scrollIntoView(new UiSelector().text("Item")) +``` + +#### Resource ID + +Resource IDs provide stable element identification on Android. + +``` +# Full Resource ID +id=com.example.app:id/login_button + +# Partial ID (app package inferred) +id=login_button +``` + +#### XPath (Android) + +XPath works on Android but is slower than UiAutomator. + +``` +# By Class and Text +//android.widget.Button[@text='Login'] + +# By Resource ID +//android.widget.EditText[@resource-id='com.example:id/username'] + +# By Content Description +//android.widget.ImageButton[@content-desc='Menu'] + +# Hierarchical +//android.widget.LinearLayout/android.widget.Button[1] +``` + +### iOS Selectors + +#### Predicate String + +iOS Predicate Strings are fast and powerful for iOS automation. + +``` +# By Label +-ios predicate string:label == "Login" + +# By Partial Label +-ios predicate string:label CONTAINS "Log" + +# By Name +-ios predicate string:name == "loginButton" + +# By Type +-ios predicate string:type == "XCUIElementTypeButton" + +# By Value +-ios predicate string:value == "ON" + +# Combined Conditions +-ios predicate string:type == "XCUIElementTypeButton" AND label == "Login" + +# Visibility +-ios predicate string:label == "Login" AND visible == 1 + +# Case Insensitive +-ios predicate string:label ==[c] "login" +``` + +**Predicate Operators:** + +| Operator | Description | +|----------|-------------| +| `==` | Equals | +| `!=` | Not equals | +| `CONTAINS` | Contains substring | +| `BEGINSWITH` | Starts with | +| `ENDSWITH` | Ends with | +| `LIKE` | Wildcard match | +| `MATCHES` | Regex match | +| `AND` | Logical AND | +| `OR` | Logical OR | + +#### Class Chain + +iOS Class Chains provide hierarchical element location with good performance. + +``` +# Direct Child +-ios class chain:**/XCUIElementTypeButton[`label == "Login"`] + +# Any Descendant +-ios class chain:**/XCUIElementTypeButton + +# By Index +-ios class chain:**/XCUIElementTypeCell[3] + +# Combined with Predicate +-ios class chain:**/XCUIElementTypeButton[`name == "submit" AND visible == 1`] + +# Hierarchical +-ios class chain:**/XCUIElementTypeTable/XCUIElementTypeCell[`label == "Settings"`] + +# Last Element +-ios class chain:**/XCUIElementTypeButton[-1] +``` + +#### XPath (iOS) + +XPath works on iOS but is slower than predicate strings. + +``` +# By Type and Label +//XCUIElementTypeButton[@label='Login'] + +# By Name +//XCUIElementTypeTextField[@name='username'] + +# By Value +//XCUIElementTypeSwitch[@value='1'] + +# Hierarchical +//XCUIElementTypeTable/XCUIElementTypeCell[1] +``` + +--- + +## Cross-Platform Selector Strategy + +When writing tests that need to work on both iOS and Android, use this priority order: + +### 1. Accessibility ID (Best) + +``` +# Works on both platforms +~loginButton +``` + +### 2. Platform-Specific with Conditional Logic + +When accessibility IDs aren't available, use platform-specific selectors: + +**Android:** +``` +android=new UiSelector().text("Login") +``` + +**iOS:** +``` +-ios predicate string:label == "Login" +``` + +### 3. XPath (Last Resort) + +XPath works on both platforms but with different element types: + +**Android:** +``` +//android.widget.Button[@text='Login'] +``` + +**iOS:** +``` +//XCUIElementTypeButton[@label='Login'] +``` + +--- + +## Element Types Reference + +### Android Element Types + +| Type | Description | +|------|-------------| +| `android.widget.Button` | Button | +| `android.widget.EditText` | Text input | +| `android.widget.TextView` | Text label | +| `android.widget.ImageView` | Image | +| `android.widget.ImageButton` | Image button | +| `android.widget.CheckBox` | Checkbox | +| `android.widget.RadioButton` | Radio button | +| `android.widget.Switch` | Toggle switch | +| `android.widget.Spinner` | Dropdown | +| `android.widget.ListView` | List view | +| `android.widget.RecyclerView` | Recycler view | +| `android.widget.ScrollView` | Scroll container | + +### iOS Element Types + +| Type | Description | +|------|-------------| +| `XCUIElementTypeButton` | Button | +| `XCUIElementTypeTextField` | Text input | +| `XCUIElementTypeSecureTextField` | Password input | +| `XCUIElementTypeStaticText` | Text label | +| `XCUIElementTypeImage` | Image | +| `XCUIElementTypeSwitch` | Toggle switch | +| `XCUIElementTypeSlider` | Slider | +| `XCUIElementTypePicker` | Picker wheel | +| `XCUIElementTypeTable` | Table view | +| `XCUIElementTypeCell` | Table cell | +| `XCUIElementTypeCollectionView` | Collection view | +| `XCUIElementTypeScrollView` | Scroll view | + +--- + +## Best Practices + +### Do + +- **Use accessibility IDs** for stable, cross-platform selectors +- **Add data-testid attributes** to web elements for testing +- **Use resource IDs** on Android when accessibility IDs aren't available +- **Prefer predicate strings** over XPath on iOS +- **Keep selectors simple** and specific + +### Don't + +- **Avoid long XPath expressions** - they're slow and fragile +- **Don't rely on indices** for dynamic lists +- **Avoid text-based selectors** for localized apps +- **Don't use absolute XPath** (starting from root) + +### Examples of Good vs Bad Selectors + +``` +# Good - Stable accessibility ID +~loginButton + +# Bad - Fragile XPath with indices +//div[3]/form/button[2] + +# Good - Specific CSS with test ID +[data-testid="submit-button"] + +# Bad - Class that might change +.btn-primary-lg-v2 + +# Good - UiAutomator with resource ID +android=new UiSelector().resourceId("com.app:id/submit") + +# Bad - Text that might be localized +android=new UiSelector().text("Submit") +``` + +--- + +## Debugging Selectors + +### Web (Chrome DevTools) + +1. Open Chrome DevTools (F12) +2. Use the Elements panel to inspect elements +3. Right-click an element → Copy → Copy selector +4. Test selectors in Console: `document.querySelector('your-selector')` + +### Mobile (Appium Inspector) + +1. Start Appium Inspector +2. Connect to your running session +3. Click on elements to see all available attributes +4. Use the "Search for element" feature to test selectors + +### Using `get_visible_elements` + +The MCP server's `get_visible_elements` tool returns multiple selector strategies for each element: + +``` +Ask Claude: "Get all visible elements on the screen" +``` + +This returns elements with pre-generated selectors you can use directly. + +#### Advanced Options + +For more control over element discovery: + +``` +# Get only images and visual elements +Get visible elements with elementType "visual" + +# Get elements with their coordinates for layout debugging +Get visible elements with includeBounds enabled + +# Get the next 20 elements (pagination) +Get visible elements with limit 20 and offset 20 + +# Include layout containers for debugging +Get visible elements with includeContainers enabled +``` + +The tool returns a paginated response: +```json +{ + "total": 42, + "showing": 20, + "hasMore": true, + "elements": [...] +} +``` + +### Using `get_accessibility` (Browser Only) + +For browser automation, the `get_accessibility` tool provides semantic information about page elements: + +``` +# Get all named accessibility nodes +Get accessibility tree + +# Filter to only buttons and links +Get accessibility tree filtered to button and link roles + +# Get next page of results +Get accessibility tree with limit 50 and offset 50 +``` + +This is useful when `get_visible_elements` doesn't return expected elements, as it queries the browser's native accessibility API.
website/docs/mcp/tools.md+681 −0 added@@ -0,0 +1,681 @@ +--- +id: tools +title: Tools +--- + +The following tools are available through the WebdriverIO MCP server. These tools enable AI assistants to automate browsers and mobile applications. + +## Session Management + +### `start_browser` + +Launches a Chrome browser session. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `headless` | boolean | No | `false` | Run Chrome in headless mode | +| `windowWidth` | number | No | `1920` | Browser window width (400-3840) | +| `windowHeight` | number | No | `1080` | Browser window height (400-2160) | +| `navigationUrl` | string | No | - | URL to navigate to after starting the browser | + +#### Example + +``` +Start a browser with 1920x1080 resolution and navigate to webdriver.io +``` + +#### Support + +- Desktop Browsers + +--- + +### `start_app_session` + +Launches a mobile app session on iOS or Android via Appium. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `platform` | string | Yes | - | Platform to automate: `iOS` or `Android` | +| `deviceName` | string | Yes | - | Name of the device or simulator/emulator | +| `appPath` | string | No* | - | Path to the app file (.app, .ipa, or .apk) | +| `platformVersion` | string | No | - | OS version (e.g., `17.0`, `14`) | +| `automationName` | string | No | Auto | `XCUITest` (iOS), `UiAutomator2` or `Espresso` (Android) | +| `udid` | string | No | - | Unique device identifier (required for real iOS devices) | +| `noReset` | boolean | No | `false` | Preserve app state between sessions | +| `fullReset` | boolean | No | `true` | Uninstall and reinstall app before session | +| `autoGrantPermissions` | boolean | No | `true` | Automatically grant app permissions | +| `autoAcceptAlerts` | boolean | No | `true` | Automatically accept system alerts | +| `autoDismissAlerts` | boolean | No | `false` | Dismiss (instead of accept) alerts | +| `appWaitActivity` | string | No | - | Activity to wait for on launch (Android only) | +| `newCommandTimeout` | number | No | `60` | Seconds before session times out due to inactivity | +| `appiumHost` | string | No | `127.0.0.1` | Appium server hostname | +| `appiumPort` | number | No | `4723` | Appium server port | +| `appiumPath` | string | No | `/` | Appium server path | + +*Either `appPath` must be provided, or `noReset: true` to connect to an already-running app. + +#### Example + +``` +Start an iOS app session on iPhone 15 simulator with my app at /path/to/app.app +``` + +#### Support + +- iOS Simulators +- iOS Real Devices +- Android Emulators +- Android Real Devices + +--- + +### `close_session` + +Closes the current browser or app session. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `detach` | boolean | No | `false` | Detach from session instead of closing (keeps browser/app running) | + +#### Notes + +Sessions with `noReset: true` or without `appPath` automatically detach on close to preserve state. + +#### Support + +- Desktop Browsers +- Mobile Apps + +--- + +## Navigation + +### `navigate` + +Navigates to a URL. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `url` | string | Yes | The URL to navigate to | + +#### Example + +``` +Navigate to https://webdriver.io +``` + +#### Support + +- Desktop Browsers + +--- + +## Element Interaction + +### `click_element` + +Clicks an element identified by a selector. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `selector` | string | Yes | - | CSS selector, XPath, or mobile selector | +| `scrollToView` | boolean | No | `true` | Scroll element into view before clicking | +| `timeout` | number | No | `3000` | Max time to wait for element (ms) | + +#### Notes + +- Supports WebdriverIO text selectors: `button=Exact text` or `a*=Contains text` +- Uses center alignment for scroll positioning + +#### Example + +``` +Click the element with selector "#submit-button" +``` + +#### Support + +- Desktop Browsers +- Mobile Native Apps + +--- + +### `set_value` + +Types text into an input field. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `selector` | string | Yes | - | Selector for the input element | +| `value` | string | Yes | - | Text to type | +| `scrollToView` | boolean | No | `true` | Scroll element into view before typing | +| `timeout` | number | No | `3000` | Max time to wait for element (ms) | + +#### Notes + +Clears existing value before typing new text. + +#### Example + +``` +Set the value "john@example.com" in the element with selector "#email" +``` + +#### Support + +- Desktop Browsers +- Mobile Native Apps + +--- + +## Page Analysis + +### `get_visible_elements` + +Gets visible and interactable elements on the current page or screen. This is the primary tool for discovering what elements are available for interaction. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `elementType` | string | No | `interactable` | Type of elements: `interactable` (buttons/links/inputs), `visual` (images/SVGs), or `all` | +| `inViewportOnly` | boolean | No | `true` | Only return elements visible in the viewport | +| `includeContainers` | boolean | No | `false` | Include layout containers (ViewGroup, ScrollView, etc.) | +| `includeBounds` | boolean | No | `false` | Include element coordinates (x, y, width, height) | +| `limit` | number | No | `0` | Maximum elements to return (0 = unlimited) | +| `offset` | number | No | `0` | Number of elements to skip (for pagination) | + +#### Returns + +```json +{ + "total": 42, + "showing": 20, + "hasMore": true, + "elements": [...] +} +``` + +**Web elements include:** tagName, type, id, className, textContent, value, placeholder, href, ariaLabel, role, cssSelector, isInViewport + +**Mobile elements include:** Multiple locator strategies (accessibility ID, resource ID, XPath, UiAutomator/predicates), element type, text, and optionally bounds + +#### Notes + +- **Web**: Uses an optimized browser script for fast element detection +- **Mobile**: Uses efficient XML page source parsing (2 HTTP calls vs 600+ for element queries) +- Use pagination (`limit` and `offset`) for large pages to reduce token usage + +#### Example + +``` +Get all visible elements on the page with their coordinates +``` + +#### Support + +- Desktop Browsers +- Mobile Apps + +--- + +### `get_accessibility` + +Gets the accessibility tree of the current page with semantic information about roles, names, and states. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `limit` | number | No | `100` | Maximum nodes to return (0 = unlimited) | +| `offset` | number | No | `0` | Number of nodes to skip (for pagination) | +| `roles` | string[] | No | All | Filter to specific roles (e.g., `["button", "link", "textbox"]`) | +| `namedOnly` | boolean | No | `true` | Only return nodes with a name/label | + +#### Returns + +```json +{ + "total": 85, + "showing": 100, + "hasMore": false, + "nodes": [ + { "role": "button", "name": "Submit" }, + { "role": "link", "name": "Home" } + ] +} +``` + +#### Notes + +- Browser-only. For mobile apps, use `get_visible_elements` instead +- Useful when `get_visible_elements` doesn't return expected elements +- `namedOnly: true` filters out anonymous containers and reduces noise + +#### Support + +- Desktop Browsers + +--- + +## Screenshots + +### `take_screenshot` + +Captures a screenshot of the current viewport. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `outputPath` | string | No | Path to save screenshot file. If omitted, returns base64 data | + +#### Returns + +Base64-encoded image data (PNG or JPEG) with size information. + +#### Notes + +Screenshots are automatically optimized: +- Maximum dimension: 2000px (scaled down if larger) +- Maximum file size: 1MB +- Format: PNG with max compression, or JPEG if needed to meet size limit + +#### Support + +- Desktop Browsers +- Mobile Apps + +--- + +## Scrolling + +### `scroll` + +Scrolls the page up or down by a specified number of pixels. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `direction` | string | Yes | - | Scroll direction: `up` or `down` | +| `pixels` | number | No | `500` | Number of pixels to scroll | + +#### Notes + +Browser-only. For mobile scrolling, use the `swipe` tool instead. + +#### Support + +- Desktop Browsers + +--- + +## Cookie Management + +### `get_cookies` + +Gets cookies from the current session. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `name` | string | No | Specific cookie name to retrieve (omit for all cookies) | + +#### Returns + +Cookie objects with name, value, domain, path, expiry, secure, and httpOnly properties. + +#### Support + +- Desktop Browsers + +--- + +### `set_cookie` + +Sets a cookie in the current session. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `name` | string | Yes | - | Cookie name | +| `value` | string | Yes | - | Cookie value | +| `domain` | string | No | Current | Cookie domain | +| `path` | string | No | `/` | Cookie path | +| `expiry` | number | No | - | Expiration as Unix timestamp (seconds) | +| `secure` | boolean | No | - | Secure flag | +| `httpOnly` | boolean | No | - | HttpOnly flag | +| `sameSite` | string | No | - | SameSite attribute: `strict`, `lax`, or `none` | + +#### Support + +- Desktop Browsers + +--- + +### `delete_cookies` + +Deletes cookies from the current session. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `name` | string | No | Specific cookie name to delete (omit to delete all) | + +#### Support + +- Desktop Browsers + +--- + +## Touch Gestures (Mobile) + +### `tap_element` + +Taps on an element or screen coordinates. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `selector` | string | No* | Selector for the element to tap | +| `x` | number | No* | X coordinate for tap | +| `y` | number | No* | Y coordinate for tap | + +*Either `selector` or both `x` and `y` are required. + +#### Support + +- Mobile Apps + +--- + +### `swipe` + +Performs a swipe gesture in the specified direction. + +#### Parameters + +| Parameter | Type | Mandatory | Default | Description | +|-----------|------|-----------|---------|-------------| +| `direction` | string | Yes | - | Swipe direction: `up`, `down`, `left`, `right` | +| `duration` | number | No | `500` | Swipe duration in milliseconds (100-5000) | +| `percent` | number | No | 0.5/0.95 | Percentage of screen to swipe (0-1) | + +#### Notes + +- Default percent: 0.5 for vertical swipes, 0.95 for horizontal swipes +- Direction indicates content movement: "swipe up" scrolls content up + +#### Example + +``` +Swipe up to scroll down the screen +``` + +#### Support + +- Mobile Apps + +--- + +### `drag_and_drop` + +Drags an element to another element or coordinates. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `sourceSelector` | string | Yes | Source element selector to drag | +| `targetSelector` | string | No* | Target element selector to drop onto | +| `x` | number | No* | Target X offset (if no targetSelector) | +| `y` | number | No* | Target Y offset (if no targetSelector) | +| `duration` | number | No | Default | Drag duration in milliseconds (100-5000) | + +*Either `targetSelector` or both `x` and `y` are required. + +#### Support + +- Mobile Apps + +--- + +## App Lifecycle (Mobile) + +### `get_app_state` + +Gets the current state of an app. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `bundleId` | string | Yes | App identifier (bundle ID for iOS, package name for Android) | + +#### Returns + +App state: `not installed`, `not running`, `running in background (suspended)`, `running in background`, or `running in foreground`. + +#### Support + +- Mobile Apps + +--- + +## Context Switching (Hybrid Apps) + +### `get_contexts` + +Lists all available contexts (native and webviews). + +#### Parameters + +None + +#### Returns + +Array of context names (e.g., `["NATIVE_APP", "WEBVIEW_com.example.app"]`). + +#### Support + +- Mobile Hybrid Apps + +--- + +### `get_current_context` + +Gets the currently active context. + +#### Parameters + +None + +#### Returns + +Current context name (e.g., `NATIVE_APP` or `WEBVIEW_*`). + +#### Support + +- Mobile Hybrid Apps + +--- + +### `switch_context` + +Switches between native and webview contexts. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `context` | string | Yes | Context name or index (1-based) from `get_contexts` | + +#### Example + +``` +Switch to the WEBVIEW_com.example.app context +``` + +#### Support + +- Mobile Hybrid Apps + +--- + +## Device Control (Mobile) + +### `rotate_device` + +Rotates the device to a specific orientation. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `orientation` | string | Yes | `PORTRAIT` or `LANDSCAPE` | + +#### Support + +- Mobile Apps + +--- + +### `hide_keyboard` + +Hides the on-screen keyboard. + +#### Parameters + +None + +#### Support + +- Mobile Apps + +--- + +### `get_geolocation` + +Gets the current GPS coordinates. + +#### Parameters + +None + +#### Returns + +Object with `latitude`, `longitude`, and `altitude`. + +#### Support + +- Mobile Apps + +--- + +### `set_geolocation` + +Sets the device GPS coordinates. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `latitude` | number | Yes | Latitude coordinate (-90 to 90) | +| `longitude` | number | Yes | Longitude coordinate (-180 to 180) | +| `altitude` | number | No | Altitude in meters | + +#### Example + +``` +Set geolocation to San Francisco (37.7749, -122.4194) +``` + +#### Support + +- Mobile Apps + +--- + +## Script Execution + +### `execute_script` + +Executes JavaScript in the browser or mobile commands via Appium. + +#### Parameters + +| Parameter | Type | Mandatory | Description | +|-----------|------|-----------|-------------| +| `script` | string | Yes | JavaScript code (browser) or mobile command (e.g., `mobile: pressKey`) | +| `args` | array | No | Arguments for the script | + +#### Browser Examples + +```javascript +// Get page title +execute_script({ script: "return document.title" }) + +// Get scroll position +execute_script({ script: "return window.scrollY" }) + +// Click element by selector +execute_script({ script: "arguments[0].click()", args: ["#myButton"] }) +``` + +#### Mobile (Appium) Examples + +```javascript +// Press back key (Android) +execute_script({ script: "mobile: pressKey", args: [{ keycode: 4 }] }) + +// Activate app +execute_script({ script: "mobile: activateApp", args: [{ appId: "com.example" }] }) + +// Terminate app +execute_script({ script: "mobile: terminateApp", args: [{ appId: "com.example" }] }) + +// Deep link +execute_script({ script: "mobile: deepLink", args: [{ url: "myapp://screen", package: "com.example" }] }) + +// Shell command (Android) +execute_script({ script: "mobile: shell", args: [{ command: "dumpsys", args: ["battery"] }] }) +``` + +#### Common Android Key Codes + +| Key | Code | +|-----|------| +| BACK | 4 | +| HOME | 3 | +| ENTER | 66 | +| MENU | 82 | +| SEARCH | 84 | + +#### More Mobile Commands + +For a complete list of available Appium mobile commands, see: +- [XCUITest Mobile Commands](https://appium.github.io/appium-xcuitest-driver/latest/reference/execute-methods/) (iOS) +- [UiAutomator2 Mobile Commands](https://github.com/appium/appium-uiautomator2-driver#mobile-commands) (Android) + +#### Support + +- Desktop Browsers +- Mobile Apps (via Appium mobile commands)
website/_sidebars.json+14 −0 modified@@ -224,6 +224,20 @@ "id": "visual-testing" } }, + { + "type": "category", + "label": "MCP", + "items": [ + "mcp/tools", + "mcp/selectors", + "mcp/configuration", + "mcp/faq" + ], + "link": { + "type": "doc", + "id": "mcp" + } + }, { "type": "category", "label": "Migrate",
Vulnerability mechanics
Root cause
"The function getGitMetadataForAISelection() passes unsanitized git branch names and commit hashes directly into execSync() shell calls, enabling OS command injection."
Attack vector
An attacker supplies a malicious repository (via testOrchestrationOptions.runSmartSelection.source, or by placing the victim in a directory with a crafted .git) whose branch name contains shell metacharacters such as backticks or $(). When getGitMetadataForAISelection() calls execSync() with the unsanitized branch name [CWE-78], the injected metacharacters are interpreted by the shell, allowing arbitrary command execution. No authentication is required (CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H). The payload is triggered during test orchestration metadata collection, which runs on CI/CD servers and developer workstations.
Affected code
The vulnerable code is in packages/wdio-browserstack-service/src/testorchestration/helpers.ts, specifically the getGitMetadataForAISelection() function and its helper getBaseBranch() and getChangedFilesFromCommits(). These functions used execSync() with string interpolation of git branch names, commit hashes, and other git refs without sanitization.
What the fix does
The patch replaces all execSync() calls with spawnSync() using array arguments, which prevents shell interpretation of special characters [patch_id=811667]. It also introduces a SAFE_GIT_REF_PATTERN (/^[a-zA-Z0-9_./-]+$/) and an isValidGitRef() validation function that rejects any git ref containing shell metacharacters. Every branch name, commit hash, and parent hash is validated before use, and invalid values cause the operation to be skipped with a warning rather than executed unsafely.
Preconditions
- configThe victim must run test orchestration with AI selection enabled (testOrchestrationOptions.runSmartSelection) or be in a directory controlled by the attacker.
- inputThe attacker must control a git repository whose branch name contains shell metacharacters (e.g., $(malicious_command)).
- authNo authentication is required; the attack can be triggered remotely by pointing the tool at a malicious repository.
Reproduction
The advisory at https://github.com/webdriverio/webdriverio/security/advisories/GHSA-5c46-x3qw-q7j7 describes the vulnerability but does not include a standalone reproduction script or PoC code. No public exploit/PoC references beyond the advisory itself are provided in the bundle.
Generated on May 20, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
5- github.com/webdriverio/webdriverio/security/advisories/GHSA-5c46-x3qw-q7j7nvdExploitVendor Advisory
- github.com/advisories/GHSA-5c46-x3qw-q7j7ghsaADVISORY
- github.com/webdriverio/webdriverio/blob/ea0e3e00288abced4c739ff9e46c46977b7cdbd2/packages/wdio-browserstack-service/src/testorchestration/helpers.tsnvdProduct
- github.com/webdriverio/webdriverio/releases/tag/v9.24.0nvdProductRelease Notes
- nvd.nist.gov/vuln/detail/CVE-2026-25244ghsa
News mentions
0No linked articles in our index yet.