The VegAS framework replaces single-action decoding with an ensemble sampling and verification process. This test-time mechanism uses a generative verifier to filter candidate actions, reducing errors in out-of-distribution scenarios. It targets the brittleness of Multimodal Large Language Models in physical environments. Practitioners gain a more robust method for action selection.