The CONCORD framework uses real-time speaker verification to capture only the owner's speech, intentionally creating gaps in transcripts. To fill these holes, agents use spatio-temporal resolution and minimal assistant-to-assistant queries. This approach allows AI agents to understand conversations without recording non-consenting speakers. It provides a technical path for deploying always-listening assistants in social settings.