The CONCORD framework uses speaker verification to capture only the owner's voice, creating privacy-preserving but incomplete transcripts. To fill these gaps, agents use spatio-temporal resolution and minimal A2A queries. This approach allows assistants to understand conversations without recording non-consenting speakers. It provides a technical blueprint for deploying always-listening AI agents in social settings.