A new theoretical framework from Apple defines phonetic similarity using probabilistic distances between fixed-dimensional acoustic neighbor embeddings. The research demonstrates uniform cluster-wise isotropy in these representations. This mathematical approach allows developers to interpret audio-text embeddings more predictably. It provides a principled method for mapping variable-width audio content into a stable embedding space.