You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Modern System Recipe for Situated Embodied Human–Robot Conversation with Real-Time Multimodal LLMs and Tool-Calling
About
We present a minimal system recipe for situated embodied conversation that pairs a real-time multimodal language model with a small set of tools for attention and active perception, enabling robots to interleave dialogue with “what to look at, when to look, and what to say” under tight latency constraints.