Switch to Chinese Version ๅๆขๆไธญๆ
๐ฅ AI-Driven Android Device Intelligent Manager - Understand and control your phone like Doubao! ๐ฅ
This is an AI-powered Android ADB server developed in Java, integrated with advanced AI vision models. It can see your phone screen and control the device through natural language commands. It's like a "simplified Doubao phone assistant" running on your computer, enabling intelligent interaction with your phone via command line.
- Screen Content Recognition: Integrates advanced AI vision models like GPT-4V/Qwen2-VL to understand text, images, and interface elements on phone screens like humans do
- Intelligent Element Identification: Automatically recognizes UI elements such as buttons, input fields, and app icons, and labels their positions and functions
- Contextual Description: Intelligently describes current phone interface scenes based on context (e.g., "WeChat chat interface", "Taobao product details page")
- Voice-Level Commands: Supports natural language commands like "tap WeChat icon", "input verification code 123456", "return to home screen"
- Multi-Step Task Execution: Understands and executes complex multi-step commands (e.g., "open camera, take a photo, then share to Moments")
- Intelligent Decision Making: Makes reasonable inferences based on context when encountering ambiguities
- App Manager: Install/uninstall/launch/stop apps, manage app permissions and data
- Screen Control: Unlock/lock screen, get resolution, real-time screenshots
- Input Simulation: Simulate text input, key presses, taps, swipes, and complex gestures
- Device Monitoring: Get device information, running status, battery level, and other system parameters
- Vision Model Integration: Supports multiple AI vision models including OpenAI GPT-4V, Qwen2-VL
- Context Understanding: Model context management based on MCP protocol
- Natural Language Processing: Intelligently parses user commands and generates execution plans
- Java 11: High-performance backend implementation
- Appium Java Client: Stable Android device communication layer
- OpenAI API: AI vision model interface
- SLF4J/Logback: Enterprise-level logging system
- Maven: Standardized project build
- Java 11+: Runtime environment
- Android Device: USB debugging enabled (Android 5.0+)
- ADB Tool: Android SDK Platform Tools
- AI API Key: OpenAI API key (for vision features, optional)
git clone <repository-url>
cd mcp-easy-doubao-phonemvn clean packageAfter successful build, the AI assistant JAR package will be located in the target directory.
export OPENAI_API_KEY="your-openai-api-key"
export VISION_MODEL="qwen2.5-vl-7b-instruct" # Or use gpt-4-vision-previewjava -jar target/mcp-easy-doubao-phone-1.0.0-jar-with-dependencies.jar -d <device-id># Start the AI assistant
java -jar target/mcp-easy-doubao-phone-1.0.0-jar-with-dependencies.jar -d emulator-5554
# View current screen content
> describe-screen
# Identify and click WeChat icon
> find-and-click "WeChat"
# Input text
> input-text "Hello, this is Doubao Phone Assistant!"
# Swipe screen
> swipe 100 500 900 500# Describe current interface
> describe-screen
# AI Response: "Currently displaying WeChat chat interface, top shows chat contact 'Zhang San', middle shows chat history, bottom shows input field"
# Intelligently find elements
> find-element "send button"
# AI Response: "Found send button, position: (900, 1700), size: (100x50)"
# Execute complex tasks
> execute "open settings, find Wi-Fi option, connect to network named 'HomeWiFi'"
# Assistant automatically executes: unlock screen โ open settings โ tap Wi-Fi โ select HomeWiFi โ wait for connectiondescribe-screen: Describe current screen content in natural languageannotate-screen: Annotate all interactive elements on the screenfind-element <description>: Find specific screen elements based on descriptionfind-and-click <description>: Find and click specified elements
get-device-info: Get detailed device informationget-battery-status: Check battery statusunlock-screen: Unlock the screenlock-screen: Lock the screen
install-app <apk-path>: Install applicationsuninstall-app <package-name>: Uninstall applicationslaunch-app <package-name>: Launch applicationsstop-app <package-name>: Stop applicationsclear-app-data <package-name>: Clear application dataget-running-apps: Get list of running applications
input-text <text>: Input textpress-key <keycode>: Simulate key pressestap <x> <y>: Simulate tapsswipe <x1> <y1> <x2> <y2>: Simulate swipeslong-press <x> <y> <duration>: Simulate long pressespinch <x> <y> <scale>: Simulate pinch gestures
screenshot: Take and save screenshotsget-screen-resolution: Get screen resolution
| Variable Name | Description | Default Value | Purpose |
|---|---|---|---|
DEVICE_ID |
Device ID (obtained from adb devices) |
None | Required, specifies the device to control |
ADB_PATH |
ADB tool path | adb |
Optional, specifies ADB location |
OPENAI_API_KEY |
AI API key | None | Optional, enables vision features |
VISION_MODEL |
Vision model name | qwen2.5-vl-7b-instruct |
Optional, selects AI model |
API_BASE_URL |
API base URL | https://api.openai.com/v1 |
Optional, customizes API address |
java -jar mcp-easy-doubao-phone-1.0.0-jar-with-dependencies.jar \
-d <device-id> # Device ID
-a <adb-path> # ADB path
-k <api-key> # AI API key
-m <vision-model> # Vision model
-u <api-url> # API base URL- Intelligently identifies UI elements without hardcoding coordinates
- Describes test steps in natural language, reducing maintenance costs
- Provides voice descriptions of screen content for visually impaired users
- Simplifies complex phone operation processes
- Application of AI vision models in mobile interface understanding
- Bridge between natural language commands and device control
- Intelligently recognizes game interface elements
- Automatically completes repetitive operations
- Voice Interaction: Support voice input and output
- Multi-Device Management: Control multiple Android devices simultaneously
- More Powerful AI: Support locally deployed vision models
- Web Interface: Provide visual control panel
- Task Automation: Support recording and playback of operation sequences
MIT License - See LICENSE file for details
Welcome to submit Issues and Pull Requests to build a more powerful Doubao phone assistant together!
๐ก Tip: This project requires Android device. Ensure ADB is correctly installed and USB debugging is enabled. Vision features require a valid OpenAI API key.
๐ Technical Support: Encounter problems? Check project documentation or submit an Issue for help.