Main takeaways
Images = grids of numbers: AI learns to spot patterns, from edges to objects
Sound = pictures of vibrations: We turn audio into spectrograms and use image AI!
The big insight: Text, images, and audio all become embeddings—the same kind of numbers
Multimodal AI: ChatGPT, Claude, and Gemini can see images because everything speaks the same mathematical “language”
Hands-on: You trained your own AI with Teachable Machine!
Critical thinking: Deepfakes, bias, and privacy are real concerns we must address