I am computing embeddings for a video and an audio clip, and I calculated their inner product to be 1. Just when I thought everything was going smoothly, I discovered that the cosine similarity between their embeddings was only 0.2391. It turned out that the norm of the audio embedding was not 1, meaning that despite the inner product being 1, their actual similarity was quite low.