Google's Gemini 3.5 Flash now interacts directly with computer interfaces by interpreting screen pixels and executing keyboard and mouse actions. The model processes visual snapshots to navigate software and automate complex workflows. This update shifts the model from a passive chatbot to an active agent capable of operating third-party applications.