feat(video): add video recording and transcription tools for mobile devices#160
feat(video): add video recording and transcription tools for mobile devices#160
Conversation
📝 WalkthroughWalkthroughAdds end-to-end optional video recording and analysis: new video_analyzer config, FFmpeg-backed utilities, device recording APIs for Android/iOS with segmenting, start/stop tools and wrappers, planner/executor integration to include video tools when enabled, CLI/builder flags, and an async analyze_video agent that sends video attachments to an LLM with fallback. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent / Planner
participant Executor as Executor (tools)
participant Controller as Device Controller
participant Device as Mobile Device
participant Video as Local Video File
participant Analyzer as Video Analyzer
participant LLM as LLM (primary/fallback)
Agent->>Executor: start_video_recording tool
Executor->>Controller: start_video_recording()
Controller->>Device: start screenrecord (adb/idb)
Device-->>Controller: recording started
Controller-->>Executor: VideoRecordingResult(success)
Executor-->>Agent: tool result
Agent->>Device: perform interactions
Device-->>Agent: observations/state
Agent->>Executor: stop_video_recording tool
Executor->>Controller: stop_video_recording()
Controller->>Device: stop & pull segments
Controller->>Video: concatenate/compress -> final file
Controller-->>Executor: VideoRecordingResult(video_path)
Executor-->>Agent: tool result with video_path
Agent->>Analyzer: analyze_video(ctx, video_path, prompt)
Analyzer->>Analyzer: compress & encode
Analyzer->>LLM: SystemMessage + HumanMessage(with attachment)
alt primary succeeds
LLM-->>Analyzer: analysis response
else primary fails
Analyzer->>LLM: retry with fallback config
LLM-->>Analyzer: analysis response
end
Analyzer-->>Agent: analysis text
Note over Video,Agent: temporary files cleaned up
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Comment |
fa16703 to
94a324a
Compare
|
|
||
| Use this tool when you need to capture what happens on the screen over time, | ||
| such as watching a video, observing animations, or recording a user flow. | ||
|
|
||
| The recording will continue until stop_video_recording is called. | ||
| Maximum recording time is 15 minutes (or 3 minutes on Android). | ||
|
|
||
| Note: Audio is typically NOT recorded. On Android, screenrecord does not capture | ||
| audio. On iOS simulators, audio is also not captured. | ||
| """ |
There was a problem hiding this comment.
This Python docstring was pushed by mistake and will be fixed in the next commit. Only the tool description itself is needed here, no guidance about when to use it should be included
| { | ||
| "folders": [ | ||
| { | ||
| "path": ".." | ||
| } | ||
| ], | ||
| "settings": { | ||
| "python.languageServer": "None" | ||
| } | ||
| } |
There was a problem hiding this comment.
This file has been pushed by mistake and is removed in a next commit x)
|
|
||
| # Get video duration using ffprobe | ||
| duration_cmd = [ | ||
| "ffprobe", |
There was a problem hiding this comment.
It comes with FFmpeg ;)
| def get_openrouter_video_capable_llm_config() -> LLMConfig: | ||
| """ | ||
| Returns the LLM config from the override file | ||
| """ | ||
| return initialize_llm_config() |
There was a problem hiding this comment.
For my own tests, I pushed this by mistake and it's fixed in next commits
…tallation instructions
…tools for clarity
087dd92 to
dbe6c80
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Fix all issues with AI Agents
In @llm-config.defaults.jsonc:
- Around line 61-72: The closing brace for the "outputter" object is missing a
trailing comma, which will break the JSONC if the commented "video_analyzer"
block is uncommented; add a trailing comma after the "outputter" closing brace
so the subsequent "video_analyzer" block (provider/model/fallback) can be
enabled safely and remain consistent with the "recommended" section.
In @minitap/mobile_use/agents/video_analyzer/video_analyzer.py:
- Around line 43-55: The code derives MIME type from video_path.suffix but may
be using a different file returned by compress_video_for_api (compressed_path)
and also never removes a temporary compressed file; update the MIME type
computation to use compressed_path.suffix (not video_path.suffix) and ensure
cleanup by deleting compressed_path (e.g., Path.unlink()) in a finally block
when compress_video_for_api returns a different path than the original,
referencing the compress_video_for_api, compressed_path and video_path symbols
to locate and modify the logic.
In @minitap/mobile_use/config.py:
- Around line 96-105: The AgentNode literal incorrectly includes
"video_analyzer"—remove "video_analyzer" from the AgentNode definition so it
only lists actual agent attributes; keep "video_analyzer" in LLMUtilsNode and
ensure any code expecting the util continues to access it via get_utils() /
LLMConfigUtils (as done in video_analyzer.py where it's retrieved with
is_utils=True) rather than via get_agent() which uses getattr(self, item).
In @minitap/mobile_use/controllers/android_controller.py:
- Around line 302-306: The code currently constructs a shell string with f'adb
-s {self.device_id} shell "{cmd}"' and calls asyncio.create_subprocess_shell,
which risks shell injection via self.device_id; change to
asyncio.create_subprocess_exec and pass arguments as a list to avoid a shell:
call asyncio.create_subprocess_exec('adb', '-s', self.device_id, 'shell', cmd,
stdout=..., stderr=...); keep the same stdout/stderr pipes and update any
await/process handling accordingly so you no longer rely on shell interpolation
in the create_subprocess_shell call.
- Around line 471-472: The code uses Path.rename() on all_segments[0] which can
fail across filesystems; replace the rename call with shutil.move to reliably
move the file between temp and output locations (e.g., use
shutil.move(str(all_segments[0]), str(output_path))). Add an import for shutil
if missing and wrap the move in a try/except to surface/log errors consistently
with existing error handling; keep using the same variables all_segments and
output_path so the change is localized to that branch.
In @minitap/mobile_use/controllers/ios_controller.py:
- Around line 334-378: start_video_recording accepts max_duration_seconds but
never enforces it; either enforce it or stop advertising it in the success
message. To enforce: when creating the RecordingSession in
start_video_recording, spawn an asyncio background task (e.g., via
asyncio.create_task) that awaits asyncio.sleep(max_duration_seconds) and then
calls stop_video_recording(device_id) (guarded to ensure the session is still
active), store that task on the RecordingSession (e.g., session.timeout_task)
and cancel it in stop_video_recording when stopping normally; ensure exceptions
in the background task are caught and logged. Alternatively, if you choose not
to enforce it, remove max_duration_seconds from the success message and adjust
the start_video_recording signature/docs to mark it advisory only.
In @minitap/mobile_use/sdk/builders/agent_config_builder.py:
- Around line 210-239: Update the docstring for
AgentConfigBuilder.with_video_recording_tools to correct the timing of
validation: replace the line claiming "ValueError: At build() time if any
profile lacks video_analyzer config" with a statement that the video_analyzer
config is validated later when the Agent is initialized (e.g., in the Agent
class) or simply say a ValueError will be raised when the agent is created/used
if profiles lack video_analyzer, and keep the reference to FFmpeg and the
supported models unchanged.
In @minitap/mobile_use/tools/mobile/video_recording.py:
- Around line 137-144: The finally block currently tries to remove the temp
video file and then calls video_path.parent.rmdir(), which will raise if the
directory is not empty (e.g., compressed file left by compress_video_for_api)
and silently leaks because exceptions are swallowed; update the cleanup in the
finally block (referencing video_path and its parent) to robustly remove the
entire temp directory using shutil.rmtree(video_path.parent, ignore_errors=True)
or explicitly delete all files in that directory before removing it, and remove
the bare except/pass so failures are handled or at least logged via the module
logger.
In @minitap/mobile_use/utils/video.py:
- Around line 128-154: The temporary list file created as list_file is only
unlinked after awaiting process.wait() and is not guaranteed to be removed if
asyncio.create_subprocess_exec or process.wait() raises, so move the cleanup
into a finally block: wrap the subprocess invocation and await in a try/except
as needed but ensure a finally block calls list_file.unlink() (guarding for
existence and errors), and keep returning output_path.exists() on success or
False on error; reference the existing list_file variable,
asyncio.create_subprocess_exec/ffmpeg invocation, process.wait(), and the
logger.error path when implementing the change.
- Line 125: Replace the use of Path.rename on the temp segment with shutil.move
to handle cross-filesystem moves: import shutil at the top and call shutil.move
with the source segment (segments[0]) and output_path (convert to str if
necessary) instead of segments[0].rename(output_path) so the operation succeeds
even when source and destination are on different filesystems.
🧹 Nitpick comments (6)
minitap/mobile_use/controllers/device_controller.py (1)
150-164: Use theDEFAULT_MAX_DURATION_SECONDSconstant for consistency.Both
AndroidDeviceControllerandiOSDeviceControlleruseDEFAULT_MAX_DURATION_SECONDSfromminitap.mobile_use.utils.videoas the default value. Using the same constant here ensures consistency and maintains a single source of truth for the default duration.🔎 Proposed change
+from minitap.mobile_use.utils.video import DEFAULT_MAX_DURATION_SECONDS, VideoRecordingResult -from minitap.mobile_use.utils.video import VideoRecordingResult @abstractmethod async def start_video_recording( self, - max_duration_seconds: int = 900, + max_duration_seconds: int = DEFAULT_MAX_DURATION_SECONDS, ) -> VideoRecordingResult:minitap/mobile_use/sdk/agent.py (2)
608-616: Consider using a more specific exception type.The validation correctly guards against misconfiguration, but using
ValueErroris generic. Consider using a more specific exception from the SDK's exception module (e.g.,AgentErrororAgentProfileNotFoundError) for better error categorization and handling.🔎 Proposed refactor
- raise ValueError( + raise AgentError( f"with_video_recording_tools() requires 'video_analyzer' in utils for " f"profile '{agent_profile.name}'. Add 'video_analyzer' with a " f"video-capable model (e.g., gemini-3-flash-preview)." )
608-665: Minor redundancy in video recording enablement logic.The validation at lines 608-616 raises an error when
video_recording_enabledis True butvideo_analyzeris None. Subsequently, lines 662-665 check both conditions again when setting the context flag. While this defensive approach ensures safety, the second check ofvideo_analyzer is not Noneis redundant given the earlier validation would have already raised an exception.This is not a functional issue, just a maintainability note for potential simplification.
minitap/mobile_use/graph/graph.py (1)
113-121: LGTM with a note on code pattern.The dynamic tool construction correctly implements video recording capability integration. The pattern of building
executor_wrappers(starting withEXECUTOR_WRAPPERS_TOOLS, then conditionally extending withVIDEO_RECORDING_WRAPPERS) is consistent across planner.py, graph.py, and executor.py files. While this consistency is good, consider extracting this pattern into a helper function if it needs to be modified in the future.minitap/mobile_use/main.py (1)
23-23: Redundant FFmpeg check, but acceptable for fail-fast behavior.
check_ffmpeg_available()is called here inmain()and again insideconfig.with_video_recording_tools()(as shown inagent_config_builder.pylines 235-236). The duplication is harmless and provides earlier CLI feedback, but you could consider removing the check here if you want to consolidate validation in the builder.Also applies to: 198-200
minitap/mobile_use/sdk/examples/video_transcription_example.py (1)
20-20: Unused import.
AgentConfigis imported but only used as a type hint forconfigon line 82. The type annotation is technically correct but redundant sincebuild()already returnsAgentConfig. Consider removing the import and annotation, or keep it for documentation purposes.🔎 Suggested fix (if removing)
from minitap.mobile_use.sdk.builders.agent_config_builder import AgentConfigBuilder -from minitap.mobile_use.sdk.types.agent import AgentConfig from minitap.mobile_use.sdk.types.task import AgentProfile, TaskRequestasync def main(): - config: AgentConfig = ( + config = ( AgentConfigBuilder()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
doc/graph.pngis excluded by!**/*.png
📒 Files selected for processing (25)
llm-config.defaults.jsoncllm-config.override.template.jsoncminitap/mobile_use/agents/cortex/cortex.pyminitap/mobile_use/agents/executor/executor.pyminitap/mobile_use/agents/planner/planner.mdminitap/mobile_use/agents/planner/planner.pyminitap/mobile_use/agents/video_analyzer/__init__.pyminitap/mobile_use/agents/video_analyzer/human.mdminitap/mobile_use/agents/video_analyzer/video_analyzer.mdminitap/mobile_use/agents/video_analyzer/video_analyzer.pyminitap/mobile_use/config.pyminitap/mobile_use/context.pyminitap/mobile_use/controllers/android_controller.pyminitap/mobile_use/controllers/controller_factory.pyminitap/mobile_use/controllers/device_controller.pyminitap/mobile_use/controllers/ios_controller.pyminitap/mobile_use/graph/graph.pyminitap/mobile_use/main.pyminitap/mobile_use/sdk/agent.pyminitap/mobile_use/sdk/builders/agent_config_builder.pyminitap/mobile_use/sdk/examples/video_transcription_example.pyminitap/mobile_use/sdk/types/agent.pyminitap/mobile_use/tools/index.pyminitap/mobile_use/tools/mobile/video_recording.pyminitap/mobile_use/utils/video.py
🧰 Additional context used
🧬 Code graph analysis (16)
minitap/mobile_use/agents/planner/planner.py (2)
minitap/mobile_use/tools/index.py (1)
format_tools_list(57-58)minitap/mobile_use/controllers/android_controller.py (1)
device(53-56)
minitap/mobile_use/sdk/examples/video_transcription_example.py (4)
minitap/mobile_use/config.py (4)
LLM(124-149)LLMConfig(165-207)LLMConfigUtils(159-162)LLMWithFallback(152-156)minitap/mobile_use/sdk/agent.py (2)
Agent(80-1259)init(110-134)minitap/mobile_use/sdk/builders/agent_config_builder.py (3)
AgentConfigBuilder(17-296)with_video_recording_tools(210-239)build(241-296)minitap/mobile_use/sdk/types/agent.py (1)
AgentConfig(56-84)
minitap/mobile_use/controllers/device_controller.py (3)
minitap/mobile_use/utils/video.py (1)
VideoRecordingResult(45-52)minitap/mobile_use/controllers/android_controller.py (2)
start_video_recording(372-414)stop_video_recording(416-507)minitap/mobile_use/controllers/ios_controller.py (2)
start_video_recording(334-378)stop_video_recording(380-427)
minitap/mobile_use/sdk/builders/agent_config_builder.py (1)
minitap/mobile_use/utils/video.py (1)
check_ffmpeg_available(108-116)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (4)
minitap/mobile_use/context.py (1)
MobileUseContext(77-106)minitap/mobile_use/services/llm.py (2)
invoke_llm_with_timeout_message(29-55)with_fallback(234-247)minitap/mobile_use/utils/logger.py (2)
get_logger(118-147)info(79-80)minitap/mobile_use/utils/video.py (1)
compress_video_for_api(169-279)
minitap/mobile_use/tools/mobile/video_recording.py (7)
minitap/mobile_use/context.py (2)
DevicePlatform(30-34)MobileUseContext(77-106)minitap/mobile_use/controllers/controller_factory.py (1)
get_controller(45-46)minitap/mobile_use/graph/state.py (2)
State(20-88)asanitize_update(61-88)minitap/mobile_use/controllers/android_controller.py (3)
start_video_recording(372-414)device(53-56)stop_video_recording(416-507)minitap/mobile_use/controllers/device_controller.py (2)
start_video_recording(151-164)stop_video_recording(167-174)minitap/mobile_use/utils/video.py (1)
VideoRecordingResult(45-52)minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
analyze_video(22-103)
minitap/mobile_use/agents/cortex/cortex.py (1)
minitap/mobile_use/tools/index.py (1)
format_tools_list(57-58)
minitap/mobile_use/controllers/controller_factory.py (1)
minitap/mobile_use/controllers/android_controller.py (1)
device(53-56)
minitap/mobile_use/controllers/ios_controller.py (1)
minitap/mobile_use/utils/video.py (6)
RecordingSession(29-42)VideoRecordingResult(45-52)get_active_session(59-61)has_active_session(74-76)remove_active_session(69-71)set_active_session(64-66)
minitap/mobile_use/main.py (2)
minitap/mobile_use/utils/video.py (1)
check_ffmpeg_available(108-116)minitap/mobile_use/sdk/builders/agent_config_builder.py (1)
with_video_recording_tools(210-239)
minitap/mobile_use/graph/graph.py (2)
minitap/mobile_use/tools/index.py (1)
get_tools_from_wrappers(43-54)minitap/mobile_use/agents/executor/tool_node.py (1)
ExecutorToolNode(18-152)
minitap/mobile_use/config.py (1)
minitap/mobile_use/sdk/examples/smart_notification_assistant.py (1)
get_agent(60-159)
minitap/mobile_use/agents/video_analyzer/__init__.py (1)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
analyze_video(22-103)
minitap/mobile_use/controllers/android_controller.py (1)
minitap/mobile_use/utils/video.py (8)
RecordingSession(29-42)VideoRecordingResult(45-52)cleanup_video_segments(157-166)concatenate_videos(119-154)get_active_session(59-61)has_active_session(74-76)remove_active_session(69-71)set_active_session(64-66)
minitap/mobile_use/agents/executor/executor.py (1)
minitap/mobile_use/tools/index.py (1)
get_tools_from_wrappers(43-54)
minitap/mobile_use/utils/video.py (1)
minitap/mobile_use/utils/logger.py (4)
get_logger(118-147)success(82-83)error(88-89)info(79-80)
🔇 Additional comments (43)
minitap/mobile_use/agents/video_analyzer/human.md (1)
1-5: LGTM!Clean and concise prompt template with proper Jinja2 variable syntax. The structure clearly separates the instruction from the user's specific request.
minitap/mobile_use/agents/video_analyzer/video_analyzer.md (1)
1-37: LGTM!Comprehensive system prompt with well-organized sections covering focus areas, guidelines, and response formats. The emphasis on timestamps for sequences and noting uncertainty are good practices for video analysis tasks.
llm-config.defaults.jsonc (1)
135-146: LGTM!The
video_analyzerconfiguration follows the established pattern with primary and fallback models. Using Gemini models for video analysis is appropriate given their multimodal capabilities.minitap/mobile_use/context.py (1)
90-90: LGTM!Clean addition of the
video_recording_enabledflag with sensible default (False) ensuring opt-in behavior. The field integrates well with the conditional tool inclusion logic across the planner, executor, and graph components.minitap/mobile_use/controllers/device_controller.py (1)
166-174: LGTM!The
stop_video_recordinginterface is clean and aligns with the implementations in both Android and iOS controllers. The docstring correctly describes the expected return value structure.minitap/mobile_use/sdk/types/agent.py (1)
69-82: LGTM!The
video_recording_enabledfield is properly documented in the class docstring and follows the existing field patterns with an appropriate default value. This cleanly extends the configuration surface for the new video recording feature.minitap/mobile_use/agents/planner/planner.md (1)
29-45: LGTM!Excellent addition to the planning guidelines. The emphasis on making each recording action its own dedicated subgoal is crucial—this prevents truncation issues due to inference latency. The execution order (start → interact → stop) and the practical example provide clear guidance for the planner.
minitap/mobile_use/controllers/controller_factory.py (1)
34-39: LGTM!The
device_idparameter addition correctly aligns with the updatediOSDeviceControllerconstructor signature and maintains consistency with the Android controller instantiation pattern.minitap/mobile_use/agents/planner/planner.py (2)
14-18: LGTM!The import statement correctly includes
VIDEO_RECORDING_WRAPPERSfor the new video recording capability.
42-54: LGTM!The dynamic construction of
executor_wrapperscorrectly implements the conditional video recording tool inclusion. The implementation properly:
- Starts with base executor tools
- Extends with video recording tools when enabled
- Passes both the combined tool list and the feature flag to the template
minitap/mobile_use/graph/graph.py (1)
24-28: LGTM!The imports correctly include the video recording wrappers and helper function.
minitap/mobile_use/agents/executor/executor.py (2)
12-16: LGTM!The imports correctly include the video recording wrappers for conditional tool integration.
62-67: LGTM!The dynamic tool construction correctly enables video recording tools when the feature is enabled, maintaining consistency with the pattern used in planner.py and graph.py.
llm-config.override.template.jsonc (1)
42-49: LGTM! Configuration structure is consistent.The
video_analyzerconfiguration block follows the established pattern for agents with fallback support, matching the structure used by thecortexagent.minitap/mobile_use/agents/video_analyzer/__init__.py (1)
1-5: LGTM! Clean package initialization.The package follows standard Python conventions with a clear docstring, targeted re-export, and explicit
__all__declaration.minitap/mobile_use/tools/index.py (1)
15-18: LGTM! Proper separation of opt-in video recording tools.The video recording wrappers are correctly defined in a separate list rather than being included in
EXECUTOR_WRAPPERS_TOOLS, allowing them to be conditionally added when the feature is enabled.Also applies to: 37-40
minitap/mobile_use/agents/cortex/cortex.py (3)
22-26: LGTM! Clean import organization.The imports are well-organized and include all necessary components for the video recording feature integration.
50-52: LGTM! Correct conditional tool inclusion.The implementation properly creates a copy of
EXECUTOR_WRAPPERS_TOOLSbefore conditionally extending it, preventing unintended mutation of the base tool list.
62-62: LGTM! Correct usage of extended tool list.The call to
format_tools_listcorrectly usesexecutor_wrapperswhich now includes video recording tools when enabled.minitap/mobile_use/sdk/builders/agent_config_builder.py (2)
48-48: LGTM! Appropriate default for opt-in feature.The
_video_recording_enabledflag is correctly initialized toFalse, ensuring video recording remains an opt-in feature.
295-295: LGTM! Proper flag propagation to AgentConfig.The
video_recording_enabledflag is correctly passed to theAgentConfigconstructor, ensuring the configuration reflects the builder's state.minitap/mobile_use/main.py (2)
29-50: LGTM!The new
video_recording_tools_enabledparameter is cleanly integrated intorun_automation, and the conditional builder call follows the established pattern for optional features.
186-193: LGTM!The CLI option is well-defined with a clear help message, and the value is correctly propagated to
run_automation.Also applies to: 240-240
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (2)
57-81: LGTM!Template loading and message construction are clean. The multi-modal message format with inline base64 video follows LangChain patterns correctly.
83-103: LGTM!LLM invocation with fallback and 120-second timeout is appropriate for video analysis. Response extraction handles both object and string responses gracefully.
minitap/mobile_use/sdk/examples/video_transcription_example.py (2)
24-78: LGTM!The
get_video_capable_llm_config()function provides a complete, well-documented example of configuring video analysis with Gemini models. The fallback chain is properly configured.
81-117: LGTM!The example demonstrates the video recording workflow clearly with proper resource cleanup in
finally. The goal steps are well-structured to showcase the feature.minitap/mobile_use/config.py (3)
159-162: LGTM!The optional
video_analyzerfield with defaultNonemaintains backward compatibility.
181-182: LGTM!Good defensive programming: conditional validation when
video_analyzeris configured, andget_utilsnow raises a clearValueErrorwhen a required utility is not configured. This provides actionable error messages to users.Also applies to: 200-207
194-194: LGTM!Clean display formatting with fallback to "Not configured" when video_analyzer is absent.
minitap/mobile_use/tools/mobile/video_recording.py (3)
27-74: LGTM!The
start_video_recordingtool is well-structured with proper platform handling, result messaging via ToolWrapper callbacks, and state updates.
166-176: LGTM!The
ToolWrapperdefinitions provide clear success/failure message formatting.
63-72: Theagent="video_analyzer"parameter is valid. TheAgentNodetype explicitly includes"video_analyzer"as a valid value alongside"planner","orchestrator","contextor","cortex", and"executor". The usage inasanitize_updateis type-safe and consistent with the codebase design.minitap/mobile_use/controllers/ios_controller.py (2)
39-50: LGTM!The
device_idparameter addition to__init__is clean and enables proper session tracking for video recording.
380-427: LGTM!The
stop_video_recordingmethod handles process termination gracefully with timeout fallback to kill, properly cleans up the session, and validates file existence before returning success.minitap/mobile_use/controllers/android_controller.py (2)
312-370: LGTM!The auto-restart loop is well-implemented with proper cancellation handling, segment tracking, and error collection. The segment rotation logic correctly handles Android's 3-minute recording limit.
372-414: LGTM!The
start_video_recordingandstop_video_recordingmethods are comprehensive with proper session management, background task handling, segment concatenation, and cleanup. Error handling and status messaging are thorough.Also applies to: 416-507
minitap/mobile_use/utils/video.py (6)
1-27: LGTM!The module structure, imports, and constants are well-organized. The comment explaining the Gemini API size limit and base64 overhead is helpful for understanding the 14MB target.
29-53: LGTM!Both data models are well-structured. The use of
arbitrary_types_allowedis appropriate for handling asyncio process and task objects, and Android-specific fields are clearly prefixed.
79-117: LGTM!The FFmpeg availability checking is well-implemented with clear, platform-specific installation guidance. The user-facing error messages are helpful and actionable.
157-167: LGTM!The cleanup logic correctly handles segment removal with an optional keep path, and appropriately suppresses exceptions during cleanup operations.
169-279: LGTM!The compression implementation is comprehensive and robust:
- Two-pass approach with size checking
- Dynamic bitrate calculation based on video duration
- Reasonable fallback behavior (returns original on failure)
- Good logging and error handling
- The fallback duration of 120 seconds (line 222) is a sensible default if ffprobe fails
55-77: The session management functions are only accessed from async context within AndroidController and IOSController. Since asyncio operates in a single event loop (single thread), the unprotected dictionary access is not a practical concern for the current implementation. No removal of locks is necessary.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
100-101: Minor: Redundant type check on line 101.Line 100 already ensures
contentis a string (either fromresponse.contentorstr(response)), making theisinstancecheck on line 101 unnecessary.🔎 Simplification
content = response.content if hasattr(response, "content") else str(response) -result = content if isinstance(content, str) else str(content) +result = str(content) # Or just: result = content
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
minitap/mobile_use/agents/video_analyzer/video_analyzer.pyminitap/mobile_use/config.pyminitap/mobile_use/sdk/builders/agent_config_builder.pyminitap/mobile_use/tools/mobile/video_recording.pyminitap/mobile_use/utils/video.py
🚧 Files skipped from review as they are similar to previous changes (1)
- minitap/mobile_use/config.py
🧰 Additional context used
🧬 Code graph analysis (2)
minitap/mobile_use/utils/video.py (1)
minitap/mobile_use/utils/logger.py (1)
get_logger(118-147)
minitap/mobile_use/tools/mobile/video_recording.py (9)
minitap/mobile_use/context.py (2)
DevicePlatform(30-34)MobileUseContext(77-106)minitap/mobile_use/controllers/controller_factory.py (1)
get_controller(45-46)minitap/mobile_use/graph/state.py (2)
State(20-88)asanitize_update(61-88)minitap/mobile_use/tools/tool_wrapper.py (1)
ToolWrapper(9-12)minitap/mobile_use/controllers/android_controller.py (3)
start_video_recording(372-414)device(53-56)stop_video_recording(416-507)minitap/mobile_use/controllers/ios_controller.py (2)
start_video_recording(334-378)stop_video_recording(380-427)minitap/mobile_use/controllers/unified_controller.py (1)
controller(23-24)minitap/mobile_use/utils/video.py (1)
VideoRecordingResult(45-52)minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
analyze_video(22-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ios-tests
🔇 Additional comments (4)
minitap/mobile_use/sdk/builders/agent_config_builder.py (1)
48-48: LGTM! Clean builder pattern implementation with proper prerequisite validation.The implementation correctly:
- Validates FFmpeg availability when the method is called (not at build time)
- Uses lazy import for the validation function
- Maintains the fluent interface pattern
- Provides comprehensive documentation
The docstring timing issue from the past review has been properly addressed (line 233 now states "When the agent is initialized" rather than "at build() time").
Also applies to: 210-239, 295-295
minitap/mobile_use/agents/video_analyzer/video_analyzer.py (1)
43-111: LGTM! Past review issues have been properly addressed.The implementation correctly:
- Uses
compressed_path.suffixfor MIME type derivation (line 55) instead ofvideo_path.suffix- Cleans up the compressed file in the finally block when it differs from the original (lines 105-111)
Both concerns from the previous review have been resolved.
minitap/mobile_use/tools/mobile/video_recording.py (1)
28-75: LGTM! Clean tool implementation.The start recording tool properly:
- Delegates to platform-specific controllers
- Handles unsupported platforms
- Uses the wrapper pattern for success/failure messages
- Updates state correctly via Command
minitap/mobile_use/utils/video.py (1)
18-281: LGTM! Comprehensive video utility implementation with past issues resolved.The implementation properly addresses both previous review concerns:
- Line 125: Uses
shutil.move()instead ofPath.rename()for cross-filesystem compatibility- Lines 154-156: Ensures cleanup of temporary list file in a finally block
Additional strengths:
- Excellent user experience with OS-specific FFmpeg installation guidance (lines 84-105)
- Robust error handling with fallbacks (e.g., duration estimation, compression failure recovery)
- Clean session management with simple dict-based storage
- Comprehensive logging throughout
…i-2.5-pro fallback
🚀 What's new?
Introduces opt-in video recording and transcription capabilities for mobile devices, enabling agents to record screen interactions and analyze them using LLM-based transcription. Includes CLI flag activation, video compression for API limits,
ffmpegvalidation, and full Android/iOS controller support with example implementation.🤔 Type of Change
What kind of change is this? Mark with an
x✅ Checklist
Before you submit, please make sure you've done the following. If you have any questions, we're here to help!
ruff check .andruff format .pass).💬 Any questions or comments?
Have a question or need some help? Join us on Discord!
Summary by CodeRabbit
New Features
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.