Skip to content

feat(video): add video recording and transcription tools for mobile devices#160

Merged
plfavreau merged 15 commits intomainfrom
feat/start-stop-video-transcription
Jan 7, 2026
Merged

feat(video): add video recording and transcription tools for mobile devices#160
plfavreau merged 15 commits intomainfrom
feat/start-stop-video-transcription

Conversation

@plfavreau
Copy link
Copy Markdown
Collaborator

@plfavreau plfavreau commented Jan 6, 2026

🚀 What's new?

Introduces opt-in video recording and transcription capabilities for mobile devices, enabling agents to record screen interactions and analyze them using LLM-based transcription. Includes CLI flag activation, video compression for API limits, ffmpeg validation, and full Android/iOS controller support with example implementation.

🤔 Type of Change

What kind of change is this? Mark with an x

  • Bug fix (non-breaking change that solves an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update to the docs)

✅ Checklist

Before you submit, please make sure you've done the following. If you have any questions, we're here to help!

  • I have read the Contributing Guide.
  • My code follows the project's style guidelines (ruff check . and ruff format . pass).
  • I have added necessary documentation (if applicable).

💬 Any questions or comments?

Have a question or need some help? Join us on Discord!

Summary by CodeRabbit

  • New Features

    • Mobile video recording for Android and iOS with start/stop workflows, session management, and tooling integration
    • AI-powered video analysis utility with SDK/tools support and example transcription workflow
    • CLI option to enable video recording tools with FFmpeg availability check
  • Configuration

    • New configurable video_analyzer block (provider, model, fallback) and guard flags to enable/disable recording
    • Recommended model defaults updated to newer Gemini previews and adjusted fallbacks

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

Adds end-to-end optional video recording and analysis: new video_analyzer config, FFmpeg-backed utilities, device recording APIs for Android/iOS with segmenting, start/stop tools and wrappers, planner/executor integration to include video tools when enabled, CLI/builder flags, and an async analyze_video agent that sends video attachments to an LLM with fallback.

Changes

Cohort / File(s) Summary
LLM config & templates
llm-config.defaults.jsonc, llm-config.override.template.jsonc
Add optional utils.video_analyzer block (provider/model/fallback) in defaults/recommended and update docs to reference llm-config.override.jsonc.
Config model
minitap/mobile_use/config.py
Add video_analyzer to LLMConfigUtils; conditional validation, include Video Analyzer in str, and error on missing requested utils.
Context & SDK types
minitap/mobile_use/context.py, minitap/mobile_use/sdk/types/agent.py
Add video_recording_enabled: bool to MobileUseContext and AgentConfig.
Builder / CLI / Runtime wiring
minitap/mobile_use/sdk/builders/agent_config_builder.py, minitap/mobile_use/main.py, minitap/mobile_use/sdk/agent.py
New with_video_recording_tools() on builder (checks FFmpeg), CLI flag --with-video-recording-tools, pass video_recording_enabled through run_automation and AgentConfig; SDK enforces video_analyzer presence when enabling recording.
Video utilities & session management
minitap/mobile_use/utils/video.py
New module: RecordingSession, VideoRecordingResult, session registry APIs, FFmpeg checks/exception, concatenate/compress/cleanup helpers, duration/size constants.
Device controller interface & implementations
minitap/mobile_use/controllers/device_controller.py, .../android_controller.py, .../ios_controller.py, .../controller_factory.py
Add abstract start_video_recording/stop_video_recording; Android: segmented recording with auto-restart loop, pull/concatenate segments; iOS: idb-based recording, add device_id ctor param; factory updated to pass device_id.
Tools & tool wrappers
minitap/mobile_use/tools/mobile/video_recording.py, minitap/mobile_use/tools/index.py
New start/stop recording BaseTool factories and ToolWrapper entries; export VIDEO_RECORDING_WRAPPERS and integrate into tool selection paths when enabled.
Agent video analyzer
minitap/mobile_use/agents/video_analyzer/__init__.py, .../video_analyzer.py, video_analyzer.md, human.md
New package exposing analyze_video(ctx, video_path, prompt) that compresses/encodes video, loads prompts, builds LLMSession with system+human messages and attachment, uses primary+fallback LLM configs, sends request with timeout, and returns text.
Planner / executor / cortex / graph
minitap/mobile_use/agents/planner/planner.py, planner.md, minitap/mobile_use/agents/cortex/cortex.py, minitap/mobile_use/agents/executor/executor.py, minitap/mobile_use/graph/graph.py
Conditionally include VIDEO_RECORDING_WRAPPERS into executor wrappers when ctx.video_recording_enabled; planner templates receive flag; add planning guideline for start→interact→stop recording.
Examples & docs
minitap/mobile_use/sdk/examples/video_transcription_example.py
New example demonstrating building a video-capable profile, enabling recording tools, recording flow, and using video analyzer to transcribe playback.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent / Planner
    participant Executor as Executor (tools)
    participant Controller as Device Controller
    participant Device as Mobile Device
    participant Video as Local Video File
    participant Analyzer as Video Analyzer
    participant LLM as LLM (primary/fallback)

    Agent->>Executor: start_video_recording tool
    Executor->>Controller: start_video_recording()
    Controller->>Device: start screenrecord (adb/idb)
    Device-->>Controller: recording started
    Controller-->>Executor: VideoRecordingResult(success)
    Executor-->>Agent: tool result

    Agent->>Device: perform interactions
    Device-->>Agent: observations/state

    Agent->>Executor: stop_video_recording tool
    Executor->>Controller: stop_video_recording()
    Controller->>Device: stop & pull segments
    Controller->>Video: concatenate/compress -> final file
    Controller-->>Executor: VideoRecordingResult(video_path)
    Executor-->>Agent: tool result with video_path

    Agent->>Analyzer: analyze_video(ctx, video_path, prompt)
    Analyzer->>Analyzer: compress & encode
    Analyzer->>LLM: SystemMessage + HumanMessage(with attachment)
    alt primary succeeds
      LLM-->>Analyzer: analysis response
    else primary fails
      Analyzer->>LLM: retry with fallback config
      LLM-->>Analyzer: analysis response
    end
    Analyzer-->>Agent: analysis text
    Note over Video,Agent: temporary files cleaned up
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

enhancement

Suggested reviewers

  • KYN4
  • cguiguet

Poem

🐰 I hopped through frames to stitch and see,
I trimmed and baked each clip to fit,
Gemini read, with fallback lit,
Start, interact, stop — then send to me,
I sing your screens — hooray! 🎬✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.10% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically summarizes the main feature addition: video recording and transcription tools for mobile devices, which aligns with the primary changes throughout the changeset.

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 075a674 and 6cc7ebf.

📒 Files selected for processing (1)
  • llm-config.defaults.jsonc
🚧 Files skipped from review as they are similar to previous changes (1)
  • llm-config.defaults.jsonc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ios-tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@plfavreau plfavreau force-pushed the feat/start-stop-video-transcription branch from fa16703 to 94a324a Compare January 6, 2026 12:09
Comment on lines +36 to +45

Use this tool when you need to capture what happens on the screen over time,
such as watching a video, observing animations, or recording a user flow.

The recording will continue until stop_video_recording is called.
Maximum recording time is 15 minutes (or 3 minutes on Android).

Note: Audio is typically NOT recorded. On Android, screenrecord does not capture
audio. On iOS simulators, audio is also not captured.
"""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Python docstring was pushed by mistake and will be fixed in the next commit. Only the tool description itself is needed here, no guidance about when to use it should be included

Comment thread minitap/all-projects.code-workspace Outdated
Comment on lines +1 to +10
{
"folders": [
{
"path": ".."
}
],
"settings": {
"python.languageServer": "None"
}
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has been pushed by mistake and is removed in a next commit x)


# Get video duration using ffprobe
duration_cmd = [
"ffprobe",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes with FFmpeg ;)

Comment on lines +87 to +91
def get_openrouter_video_capable_llm_config() -> LLMConfig:
"""
Returns the LLM config from the override file
"""
return initialize_llm_config()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own tests, I pushed this by mistake and it's fixed in next commits

@plfavreau plfavreau force-pushed the feat/start-stop-video-transcription branch from 087dd92 to dbe6c80 Compare January 6, 2026 12:33
@plfavreau plfavreau changed the title feat: add video recording and transcription tools for mobile devices feat(video): add video recording and transcription tools for mobile devices Jan 6, 2026
@plfavreau plfavreau marked this pull request as ready for review January 6, 2026 12:34
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI Agents
In @llm-config.defaults.jsonc:
- Around line 61-72: The closing brace for the "outputter" object is missing a
trailing comma, which will break the JSONC if the commented "video_analyzer"
block is uncommented; add a trailing comma after the "outputter" closing brace
so the subsequent "video_analyzer" block (provider/model/fallback) can be
enabled safely and remain consistent with the "recommended" section.

In @minitap/mobile_use/agents/video_analyzer/video_analyzer.py:
- Around line 43-55: The code derives MIME type from video_path.suffix but may
be using a different file returned by compress_video_for_api (compressed_path)
and also never removes a temporary compressed file; update the MIME type
computation to use compressed_path.suffix (not video_path.suffix) and ensure
cleanup by deleting compressed_path (e.g., Path.unlink()) in a finally block
when compress_video_for_api returns a different path than the original,
referencing the compress_video_for_api, compressed_path and video_path symbols
to locate and modify the logic.

In @minitap/mobile_use/config.py:
- Around line 96-105: The AgentNode literal incorrectly includes
"video_analyzer"—remove "video_analyzer" from the AgentNode definition so it
only lists actual agent attributes; keep "video_analyzer" in LLMUtilsNode and
ensure any code expecting the util continues to access it via get_utils() /
LLMConfigUtils (as done in video_analyzer.py where it's retrieved with
is_utils=True) rather than via get_agent() which uses getattr(self, item).

In @minitap/mobile_use/controllers/android_controller.py:
- Around line 302-306: The code currently constructs a shell string with f'adb
-s {self.device_id} shell "{cmd}"' and calls asyncio.create_subprocess_shell,
which risks shell injection via self.device_id; change to
asyncio.create_subprocess_exec and pass arguments as a list to avoid a shell:
call asyncio.create_subprocess_exec('adb', '-s', self.device_id, 'shell', cmd,
stdout=..., stderr=...); keep the same stdout/stderr pipes and update any
await/process handling accordingly so you no longer rely on shell interpolation
in the create_subprocess_shell call.
- Around line 471-472: The code uses Path.rename() on all_segments[0] which can
fail across filesystems; replace the rename call with shutil.move to reliably
move the file between temp and output locations (e.g., use
shutil.move(str(all_segments[0]), str(output_path))). Add an import for shutil
if missing and wrap the move in a try/except to surface/log errors consistently
with existing error handling; keep using the same variables all_segments and
output_path so the change is localized to that branch.

In @minitap/mobile_use/controllers/ios_controller.py:
- Around line 334-378: start_video_recording accepts max_duration_seconds but
never enforces it; either enforce it or stop advertising it in the success
message. To enforce: when creating the RecordingSession in
start_video_recording, spawn an asyncio background task (e.g., via
asyncio.create_task) that awaits asyncio.sleep(max_duration_seconds) and then
calls stop_video_recording(device_id) (guarded to ensure the session is still
active), store that task on the RecordingSession (e.g., session.timeout_task)
and cancel it in stop_video_recording when stopping normally; ensure exceptions
in the background task are caught and logged. Alternatively, if you choose not
to enforce it, remove max_duration_seconds from the success message and adjust
the start_video_recording signature/docs to mark it advisory only.

In @minitap/mobile_use/sdk/builders/agent_config_builder.py:
- Around line 210-239: Update the docstring for
AgentConfigBuilder.with_video_recording_tools to correct the timing of
validation: replace the line claiming "ValueError: At build() time if any
profile lacks video_analyzer config" with a statement that the video_analyzer
config is validated later when the Agent is initialized (e.g., in the Agent
class) or simply say a ValueError will be raised when the agent is created/used
if profiles lack video_analyzer, and keep the reference to FFmpeg and the
supported models unchanged.

In @minitap/mobile_use/tools/mobile/video_recording.py:
- Around line 137-144: The finally block currently tries to remove the temp
video file and then calls video_path.parent.rmdir(), which will raise if the
directory is not empty (e.g., compressed file left by compress_video_for_api)
and silently leaks because exceptions are swallowed; update the cleanup in the
finally block (referencing video_path and its parent) to robustly remove the
entire temp directory using shutil.rmtree(video_path.parent, ignore_errors=True)
or explicitly delete all files in that directory before removing it, and remove
the bare except/pass so failures are handled or at least logged via the module
logger.

In @minitap/mobile_use/utils/video.py:
- Around line 128-154: The temporary list file created as list_file is only
unlinked after awaiting process.wait() and is not guaranteed to be removed if
asyncio.create_subprocess_exec or process.wait() raises, so move the cleanup
into a finally block: wrap the subprocess invocation and await in a try/except
as needed but ensure a finally block calls list_file.unlink() (guarding for
existence and errors), and keep returning output_path.exists() on success or
False on error; reference the existing list_file variable,
asyncio.create_subprocess_exec/ffmpeg invocation, process.wait(), and the
logger.error path when implementing the change.
- Line 125: Replace the use of Path.rename on the temp segment with shutil.move
to handle cross-filesystem moves: import shutil at the top and call shutil.move
with the source segment (segments[0]) and output_path (convert to str if
necessary) instead of segments[0].rename(output_path) so the operation succeeds
even when source and destination are on different filesystems.
🧹 Nitpick comments (6)
minitap/mobile_use/controllers/device_controller.py (1)

150-164: Use the DEFAULT_MAX_DURATION_SECONDS constant for consistency.

Both AndroidDeviceController and iOSDeviceController use DEFAULT_MAX_DURATION_SECONDS from minitap.mobile_use.utils.video as the default value. Using the same constant here ensures consistency and maintains a single source of truth for the default duration.

🔎 Proposed change
+from minitap.mobile_use.utils.video import DEFAULT_MAX_DURATION_SECONDS, VideoRecordingResult
-from minitap.mobile_use.utils.video import VideoRecordingResult

     @abstractmethod
     async def start_video_recording(
         self,
-        max_duration_seconds: int = 900,
+        max_duration_seconds: int = DEFAULT_MAX_DURATION_SECONDS,
     ) -> VideoRecordingResult:
minitap/mobile_use/sdk/agent.py (2)

608-616: Consider using a more specific exception type.

The validation correctly guards against misconfiguration, but using ValueError is generic. Consider using a more specific exception from the SDK's exception module (e.g., AgentError or AgentProfileNotFoundError) for better error categorization and handling.

🔎 Proposed refactor
-        raise ValueError(
+        raise AgentError(
             f"with_video_recording_tools() requires 'video_analyzer' in utils for "
             f"profile '{agent_profile.name}'. Add 'video_analyzer' with a "
             f"video-capable model (e.g., gemini-3-flash-preview)."
         )

608-665: Minor redundancy in video recording enablement logic.

The validation at lines 608-616 raises an error when video_recording_enabled is True but video_analyzer is None. Subsequently, lines 662-665 check both conditions again when setting the context flag. While this defensive approach ensures safety, the second check of video_analyzer is not None is redundant given the earlier validation would have already raised an exception.

This is not a functional issue, just a maintainability note for potential simplification.

minitap/mobile_use/graph/graph.py (1)

113-121: LGTM with a note on code pattern.

The dynamic tool construction correctly implements video recording capability integration. The pattern of building executor_wrappers (starting with EXECUTOR_WRAPPERS_TOOLS, then conditionally extending with VIDEO_RECORDING_WRAPPERS) is consistent across planner.py, graph.py, and executor.py files. While this consistency is good, consider extracting this pattern into a helper function if it needs to be modified in the future.

minitap/mobile_use/main.py (1)

23-23: Redundant FFmpeg check, but acceptable for fail-fast behavior.

check_ffmpeg_available() is called here in main() and again inside config.with_video_recording_tools() (as shown in agent_config_builder.py lines 235-236). The duplication is harmless and provides earlier CLI feedback, but you could consider removing the check here if you want to consolidate validation in the builder.

Also applies to: 198-200

minitap/mobile_use/sdk/examples/video_transcription_example.py (1)

20-20: Unused import.

AgentConfig is imported but only used as a type hint for config on line 82. The type annotation is technically correct but redundant since build() already returns AgentConfig. Consider removing the import and annotation, or keep it for documentation purposes.

🔎 Suggested fix (if removing)
 from minitap.mobile_use.sdk.builders.agent_config_builder import AgentConfigBuilder
-from minitap.mobile_use.sdk.types.agent import AgentConfig
 from minitap.mobile_use.sdk.types.task import AgentProfile, TaskRequest
 async def main():
-    config: AgentConfig = (
+    config = (
         AgentConfigBuilder()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0c5eca and dbe6c80.

⛔ Files ignored due to path filters (1)
  • doc/graph.png is excluded by !**/*.png
📒 Files selected for processing (25)
  • llm-config.defaults.jsonc
  • llm-config.override.template.jsonc
  • minitap/mobile_use/agents/cortex/cortex.py
  • minitap/mobile_use/agents/executor/executor.py
  • minitap/mobile_use/agents/planner/planner.md
  • minitap/mobile_use/agents/planner/planner.py
  • minitap/mobile_use/agents/video_analyzer/__init__.py
  • minitap/mobile_use/agents/video_analyzer/human.md
  • minitap/mobile_use/agents/video_analyzer/video_analyzer.md
  • minitap/mobile_use/agents/video_analyzer/video_analyzer.py
  • minitap/mobile_use/config.py
  • minitap/mobile_use/context.py
  • minitap/mobile_use/controllers/android_controller.py
  • minitap/mobile_use/controllers/controller_factory.py
  • minitap/mobile_use/controllers/device_controller.py
  • minitap/mobile_use/controllers/ios_controller.py
  • minitap/mobile_use/graph/graph.py
  • minitap/mobile_use/main.py
  • minitap/mobile_use/sdk/agent.py
  • minitap/mobile_use/sdk/builders/agent_config_builder.py
  • minitap/mobile_use/sdk/examples/video_transcription_example.py
  • minitap/mobile_use/sdk/types/agent.py
  • minitap/mobile_use/tools/index.py
  • minitap/mobile_use/tools/mobile/video_recording.py
  • minitap/mobile_use/utils/video.py
🧰 Additional context used
🧬 Code graph analysis (16)
minitap/mobile_use/agents/planner/planner.py (2)
minitap/mobile_use/tools/index.py (1)
  • format_tools_list (57-58)
minitap/mobile_use/controllers/android_controller.py (1)
  • device (53-56)
minitap/mobile_use/sdk/examples/video_transcription_example.py (4)
minitap/mobile_use/config.py (4)
  • LLM (124-149)
  • LLMConfig (165-207)
  • LLMConfigUtils (159-162)
  • LLMWithFallback (152-156)
minitap/mobile_use/sdk/agent.py (2)
  • Agent (80-1259)
  • init (110-134)
minitap/mobile_use/sdk/builders/agent_config_builder.py (3)
  • AgentConfigBuilder (17-296)
  • with_video_recording_tools (210-239)
  • build (241-296)
minitap/mobile_use/sdk/types/agent.py (1)
  • AgentConfig (56-84)
minitap/mobile_use/controllers/device_controller.py (3)
minitap/mobile_use/utils/video.py (1)
  • VideoRecordingResult (45-52)
minitap/mobile_use/controllers/android_controller.py (2)
  • start_video_recording (372-414)
  • stop_video_recording (416-507)
minitap/mobile_use/controllers/ios_controller.py (2)
  • start_video_recording (334-378)
  • stop_video_recording (380-427)
minitap/mobile_use/sdk/builders/agent_config_builder.py (1)
minitap/mobile_use/utils/video.py (1)
  • check_ffmpeg_available (108-116)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (4)
minitap/mobile_use/context.py (1)
  • MobileUseContext (77-106)
minitap/mobile_use/services/llm.py (2)
  • invoke_llm_with_timeout_message (29-55)
  • with_fallback (234-247)
minitap/mobile_use/utils/logger.py (2)
  • get_logger (118-147)
  • info (79-80)
minitap/mobile_use/utils/video.py (1)
  • compress_video_for_api (169-279)
minitap/mobile_use/tools/mobile/video_recording.py (7)
minitap/mobile_use/context.py (2)
  • DevicePlatform (30-34)
  • MobileUseContext (77-106)
minitap/mobile_use/controllers/controller_factory.py (1)
  • get_controller (45-46)
minitap/mobile_use/graph/state.py (2)
  • State (20-88)
  • asanitize_update (61-88)
minitap/mobile_use/controllers/android_controller.py (3)
  • start_video_recording (372-414)
  • device (53-56)
  • stop_video_recording (416-507)
minitap/mobile_use/controllers/device_controller.py (2)
  • start_video_recording (151-164)
  • stop_video_recording (167-174)
minitap/mobile_use/utils/video.py (1)
  • VideoRecordingResult (45-52)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
  • analyze_video (22-103)
minitap/mobile_use/agents/cortex/cortex.py (1)
minitap/mobile_use/tools/index.py (1)
  • format_tools_list (57-58)
minitap/mobile_use/controllers/controller_factory.py (1)
minitap/mobile_use/controllers/android_controller.py (1)
  • device (53-56)
minitap/mobile_use/controllers/ios_controller.py (1)
minitap/mobile_use/utils/video.py (6)
  • RecordingSession (29-42)
  • VideoRecordingResult (45-52)
  • get_active_session (59-61)
  • has_active_session (74-76)
  • remove_active_session (69-71)
  • set_active_session (64-66)
minitap/mobile_use/main.py (2)
minitap/mobile_use/utils/video.py (1)
  • check_ffmpeg_available (108-116)
minitap/mobile_use/sdk/builders/agent_config_builder.py (1)
  • with_video_recording_tools (210-239)
minitap/mobile_use/graph/graph.py (2)
minitap/mobile_use/tools/index.py (1)
  • get_tools_from_wrappers (43-54)
minitap/mobile_use/agents/executor/tool_node.py (1)
  • ExecutorToolNode (18-152)
minitap/mobile_use/config.py (1)
minitap/mobile_use/sdk/examples/smart_notification_assistant.py (1)
  • get_agent (60-159)
minitap/mobile_use/agents/video_analyzer/__init__.py (1)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
  • analyze_video (22-103)
minitap/mobile_use/controllers/android_controller.py (1)
minitap/mobile_use/utils/video.py (8)
  • RecordingSession (29-42)
  • VideoRecordingResult (45-52)
  • cleanup_video_segments (157-166)
  • concatenate_videos (119-154)
  • get_active_session (59-61)
  • has_active_session (74-76)
  • remove_active_session (69-71)
  • set_active_session (64-66)
minitap/mobile_use/agents/executor/executor.py (1)
minitap/mobile_use/tools/index.py (1)
  • get_tools_from_wrappers (43-54)
minitap/mobile_use/utils/video.py (1)
minitap/mobile_use/utils/logger.py (4)
  • get_logger (118-147)
  • success (82-83)
  • error (88-89)
  • info (79-80)
🔇 Additional comments (43)
minitap/mobile_use/agents/video_analyzer/human.md (1)

1-5: LGTM!

Clean and concise prompt template with proper Jinja2 variable syntax. The structure clearly separates the instruction from the user's specific request.

minitap/mobile_use/agents/video_analyzer/video_analyzer.md (1)

1-37: LGTM!

Comprehensive system prompt with well-organized sections covering focus areas, guidelines, and response formats. The emphasis on timestamps for sequences and noting uncertainty are good practices for video analysis tasks.

llm-config.defaults.jsonc (1)

135-146: LGTM!

The video_analyzer configuration follows the established pattern with primary and fallback models. Using Gemini models for video analysis is appropriate given their multimodal capabilities.

minitap/mobile_use/context.py (1)

90-90: LGTM!

Clean addition of the video_recording_enabled flag with sensible default (False) ensuring opt-in behavior. The field integrates well with the conditional tool inclusion logic across the planner, executor, and graph components.

minitap/mobile_use/controllers/device_controller.py (1)

166-174: LGTM!

The stop_video_recording interface is clean and aligns with the implementations in both Android and iOS controllers. The docstring correctly describes the expected return value structure.

minitap/mobile_use/sdk/types/agent.py (1)

69-82: LGTM!

The video_recording_enabled field is properly documented in the class docstring and follows the existing field patterns with an appropriate default value. This cleanly extends the configuration surface for the new video recording feature.

minitap/mobile_use/agents/planner/planner.md (1)

29-45: LGTM!

Excellent addition to the planning guidelines. The emphasis on making each recording action its own dedicated subgoal is crucial—this prevents truncation issues due to inference latency. The execution order (start → interact → stop) and the practical example provide clear guidance for the planner.

minitap/mobile_use/controllers/controller_factory.py (1)

34-39: LGTM!

The device_id parameter addition correctly aligns with the updated iOSDeviceController constructor signature and maintains consistency with the Android controller instantiation pattern.

minitap/mobile_use/agents/planner/planner.py (2)

14-18: LGTM!

The import statement correctly includes VIDEO_RECORDING_WRAPPERS for the new video recording capability.


42-54: LGTM!

The dynamic construction of executor_wrappers correctly implements the conditional video recording tool inclusion. The implementation properly:

  • Starts with base executor tools
  • Extends with video recording tools when enabled
  • Passes both the combined tool list and the feature flag to the template
minitap/mobile_use/graph/graph.py (1)

24-28: LGTM!

The imports correctly include the video recording wrappers and helper function.

minitap/mobile_use/agents/executor/executor.py (2)

12-16: LGTM!

The imports correctly include the video recording wrappers for conditional tool integration.


62-67: LGTM!

The dynamic tool construction correctly enables video recording tools when the feature is enabled, maintaining consistency with the pattern used in planner.py and graph.py.

llm-config.override.template.jsonc (1)

42-49: LGTM! Configuration structure is consistent.

The video_analyzer configuration block follows the established pattern for agents with fallback support, matching the structure used by the cortex agent.

minitap/mobile_use/agents/video_analyzer/__init__.py (1)

1-5: LGTM! Clean package initialization.

The package follows standard Python conventions with a clear docstring, targeted re-export, and explicit __all__ declaration.

minitap/mobile_use/tools/index.py (1)

15-18: LGTM! Proper separation of opt-in video recording tools.

The video recording wrappers are correctly defined in a separate list rather than being included in EXECUTOR_WRAPPERS_TOOLS, allowing them to be conditionally added when the feature is enabled.

Also applies to: 37-40

minitap/mobile_use/agents/cortex/cortex.py (3)

22-26: LGTM! Clean import organization.

The imports are well-organized and include all necessary components for the video recording feature integration.


50-52: LGTM! Correct conditional tool inclusion.

The implementation properly creates a copy of EXECUTOR_WRAPPERS_TOOLS before conditionally extending it, preventing unintended mutation of the base tool list.


62-62: LGTM! Correct usage of extended tool list.

The call to format_tools_list correctly uses executor_wrappers which now includes video recording tools when enabled.

minitap/mobile_use/sdk/builders/agent_config_builder.py (2)

48-48: LGTM! Appropriate default for opt-in feature.

The _video_recording_enabled flag is correctly initialized to False, ensuring video recording remains an opt-in feature.


295-295: LGTM! Proper flag propagation to AgentConfig.

The video_recording_enabled flag is correctly passed to the AgentConfig constructor, ensuring the configuration reflects the builder's state.

minitap/mobile_use/main.py (2)

29-50: LGTM!

The new video_recording_tools_enabled parameter is cleanly integrated into run_automation, and the conditional builder call follows the established pattern for optional features.


186-193: LGTM!

The CLI option is well-defined with a clear help message, and the value is correctly propagated to run_automation.

Also applies to: 240-240

minitap/mobile_use/agents/video_analyzer/video_analyzer.py (2)

57-81: LGTM!

Template loading and message construction are clean. The multi-modal message format with inline base64 video follows LangChain patterns correctly.


83-103: LGTM!

LLM invocation with fallback and 120-second timeout is appropriate for video analysis. Response extraction handles both object and string responses gracefully.

minitap/mobile_use/sdk/examples/video_transcription_example.py (2)

24-78: LGTM!

The get_video_capable_llm_config() function provides a complete, well-documented example of configuring video analysis with Gemini models. The fallback chain is properly configured.


81-117: LGTM!

The example demonstrates the video recording workflow clearly with proper resource cleanup in finally. The goal steps are well-structured to showcase the feature.

minitap/mobile_use/config.py (3)

159-162: LGTM!

The optional video_analyzer field with default None maintains backward compatibility.


181-182: LGTM!

Good defensive programming: conditional validation when video_analyzer is configured, and get_utils now raises a clear ValueError when a required utility is not configured. This provides actionable error messages to users.

Also applies to: 200-207


194-194: LGTM!

Clean display formatting with fallback to "Not configured" when video_analyzer is absent.

minitap/mobile_use/tools/mobile/video_recording.py (3)

27-74: LGTM!

The start_video_recording tool is well-structured with proper platform handling, result messaging via ToolWrapper callbacks, and state updates.


166-176: LGTM!

The ToolWrapper definitions provide clear success/failure message formatting.


63-72: The agent="video_analyzer" parameter is valid. The AgentNode type explicitly includes "video_analyzer" as a valid value alongside "planner", "orchestrator", "contextor", "cortex", and "executor". The usage in asanitize_update is type-safe and consistent with the codebase design.

minitap/mobile_use/controllers/ios_controller.py (2)

39-50: LGTM!

The device_id parameter addition to __init__ is clean and enables proper session tracking for video recording.


380-427: LGTM!

The stop_video_recording method handles process termination gracefully with timeout fallback to kill, properly cleans up the session, and validates file existence before returning success.

minitap/mobile_use/controllers/android_controller.py (2)

312-370: LGTM!

The auto-restart loop is well-implemented with proper cancellation handling, segment tracking, and error collection. The segment rotation logic correctly handles Android's 3-minute recording limit.


372-414: LGTM!

The start_video_recording and stop_video_recording methods are comprehensive with proper session management, background task handling, segment concatenation, and cleanup. Error handling and status messaging are thorough.

Also applies to: 416-507

minitap/mobile_use/utils/video.py (6)

1-27: LGTM!

The module structure, imports, and constants are well-organized. The comment explaining the Gemini API size limit and base64 overhead is helpful for understanding the 14MB target.


29-53: LGTM!

Both data models are well-structured. The use of arbitrary_types_allowed is appropriate for handling asyncio process and task objects, and Android-specific fields are clearly prefixed.


79-117: LGTM!

The FFmpeg availability checking is well-implemented with clear, platform-specific installation guidance. The user-facing error messages are helpful and actionable.


157-167: LGTM!

The cleanup logic correctly handles segment removal with an optional keep path, and appropriately suppresses exceptions during cleanup operations.


169-279: LGTM!

The compression implementation is comprehensive and robust:

  • Two-pass approach with size checking
  • Dynamic bitrate calculation based on video duration
  • Reasonable fallback behavior (returns original on failure)
  • Good logging and error handling
  • The fallback duration of 120 seconds (line 222) is a sensible default if ffprobe fails

55-77: The session management functions are only accessed from async context within AndroidController and IOSController. Since asyncio operates in a single event loop (single thread), the unprotected dictionary access is not a practical concern for the current implementation. No removal of locks is necessary.

Comment thread llm-config.defaults.jsonc
Comment thread minitap/mobile_use/agents/video_analyzer/video_analyzer.py Outdated
Comment thread minitap/mobile_use/config.py
Comment thread minitap/mobile_use/controllers/android_controller.py
Comment thread minitap/mobile_use/controllers/android_controller.py
Comment thread minitap/mobile_use/controllers/ios_controller.py
Comment thread minitap/mobile_use/sdk/builders/agent_config_builder.py
Comment thread minitap/mobile_use/tools/mobile/video_recording.py
Comment thread minitap/mobile_use/utils/video.py Outdated
Comment thread minitap/mobile_use/utils/video.py
@plfavreau plfavreau self-assigned this Jan 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)

100-101: Minor: Redundant type check on line 101.

Line 100 already ensures content is a string (either from response.content or str(response)), making the isinstance check on line 101 unnecessary.

🔎 Simplification
 content = response.content if hasattr(response, "content") else str(response)
-result = content if isinstance(content, str) else str(content)
+result = str(content)  # Or just: result = content
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbe6c80 and 075a674.

📒 Files selected for processing (5)
  • minitap/mobile_use/agents/video_analyzer/video_analyzer.py
  • minitap/mobile_use/config.py
  • minitap/mobile_use/sdk/builders/agent_config_builder.py
  • minitap/mobile_use/tools/mobile/video_recording.py
  • minitap/mobile_use/utils/video.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • minitap/mobile_use/config.py
🧰 Additional context used
🧬 Code graph analysis (2)
minitap/mobile_use/utils/video.py (1)
minitap/mobile_use/utils/logger.py (1)
  • get_logger (118-147)
minitap/mobile_use/tools/mobile/video_recording.py (9)
minitap/mobile_use/context.py (2)
  • DevicePlatform (30-34)
  • MobileUseContext (77-106)
minitap/mobile_use/controllers/controller_factory.py (1)
  • get_controller (45-46)
minitap/mobile_use/graph/state.py (2)
  • State (20-88)
  • asanitize_update (61-88)
minitap/mobile_use/tools/tool_wrapper.py (1)
  • ToolWrapper (9-12)
minitap/mobile_use/controllers/android_controller.py (3)
  • start_video_recording (372-414)
  • device (53-56)
  • stop_video_recording (416-507)
minitap/mobile_use/controllers/ios_controller.py (2)
  • start_video_recording (334-378)
  • stop_video_recording (380-427)
minitap/mobile_use/controllers/unified_controller.py (1)
  • controller (23-24)
minitap/mobile_use/utils/video.py (1)
  • VideoRecordingResult (45-52)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
  • analyze_video (22-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ios-tests
🔇 Additional comments (4)
minitap/mobile_use/sdk/builders/agent_config_builder.py (1)

48-48: LGTM! Clean builder pattern implementation with proper prerequisite validation.

The implementation correctly:

  • Validates FFmpeg availability when the method is called (not at build time)
  • Uses lazy import for the validation function
  • Maintains the fluent interface pattern
  • Provides comprehensive documentation

The docstring timing issue from the past review has been properly addressed (line 233 now states "When the agent is initialized" rather than "at build() time").

Also applies to: 210-239, 295-295

minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)

43-111: LGTM! Past review issues have been properly addressed.

The implementation correctly:

  • Uses compressed_path.suffix for MIME type derivation (line 55) instead of video_path.suffix
  • Cleans up the compressed file in the finally block when it differs from the original (lines 105-111)

Both concerns from the previous review have been resolved.

minitap/mobile_use/tools/mobile/video_recording.py (1)

28-75: LGTM! Clean tool implementation.

The start recording tool properly:

  • Delegates to platform-specific controllers
  • Handles unsupported platforms
  • Uses the wrapper pattern for success/failure messages
  • Updates state correctly via Command
minitap/mobile_use/utils/video.py (1)

18-281: LGTM! Comprehensive video utility implementation with past issues resolved.

The implementation properly addresses both previous review concerns:

  • Line 125: Uses shutil.move() instead of Path.rename() for cross-filesystem compatibility
  • Lines 154-156: Ensures cleanup of temporary list file in a finally block

Additional strengths:

  • Excellent user experience with OS-specific FFmpeg installation guidance (lines 84-105)
  • Robust error handling with fallbacks (e.g., duration estimation, compression failure recovery)
  • Clean session management with simple dict-based storage
  • Comprehensive logging throughout

Copy link
Copy Markdown
Collaborator

@KYN4 KYN4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insane work!!!
Congrats 🥳

@plfavreau plfavreau merged commit c7a9380 into main Jan 7, 2026
7 checks passed
@plfavreau plfavreau deleted the feat/start-stop-video-transcription branch January 12, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants