DATASCI 101: Introduction to AI Applications

Lecture 08: Quiz 01 Review Session

Danilo Freire

Department of Data and Decision Sciences
Emory University

Welcome back! 🧠

Today’s plan

  • Quiz 01 is on Thursday, September 24 (50 minutes, in class)
  • Open laptop and notes; you can search the web
  • The quiz tests understanding and critical evaluation, not memorisation
  • Today is your session, not mine. I will ask, you will answer
  • We will work through:
    1. A rapid recap of the seven lectures
    2. One realistic scenario, broken down together
    3. Practice questions in pairs
    4. Open Q&A
  • Stop me whenever something is unclear. That is the whole point of today

Quiz logistics

  • Format: 5 short essay questions plus one bonus
  • Length: 50 minutes, in person
  • Material allowed: laptops, notes, lecture slides, web search, AI assistants
  • What is graded: your reasoning, not the polish of your AI’s prose
  • Honour code: write your own answers; if you used an AI, say which one
  • Coverage: Lectures 01 to 07

Why open laptop?

The job is not to remember definitions. The job is to recognise problems, judge trade-offs, and write something useful faster than the AI can.

Part 1: rapid recap 🚀

How this part works

  • I will name a concept from one of the seven lectures
  • One of you explains it in your own words (about 30 seconds)
  • The rest of us push back, add nuance, or correct
  • I will keep it moving. Wrong answers are useful; silence is not
  • We have about 12 minutes for this section

Concept 1: hallucination

  • Volunteer or victim? 🙋
  • In one sentence: what is an AI hallucination, and why does it happen?
  • Follow-ups for the room:
    • Is a hallucination the same as a lie?
    • Is a hallucination the same as a factual error?
    • Why might an open-laptop quiz punish someone who copy-pastes an LLM answer?

Concept 2: dataset and labels

  • Pick someone new
  • What does it mean to “label” a dataset, and why is it usually the most expensive step?
  • Quick room poll: in each case, is the label easy, hard, or contested?
    • “Is this email spam?”
    • “Is this tweet toxic?”
    • “Is this CT scan showing pneumonia?”
    • “Is this résumé a good fit for the role?”
  • Connect to: garbage in, garbage out

Concept 3: the three learning paradigms

  • Three students, one paradigm each. One sentence per person:
    • Supervised learning — ?
    • Unsupervised learning — ?
    • Reinforcement learning — ?
  • Then back to the room: name one real product that uses each
  • Trick question: where does training a chatbot with thumbs up / thumbs down fit? (answer: RL from human feedback — RLHF)

Concept 4: precision vs recall

  • This is the single most quiz-worthy concept of the block. 🎯
  • Volunteer to draw the confusion matrix on the board (or describe it)
  • Then the room debates:
    • A cancer screening model — should you optimise precision or recall? Why?
    • A spam filter for your inbox — same question
    • A criminal sentencing risk score — same question
  • Punchline: the answer depends on which mistake is more costly, and who pays the cost

Concept 5: overfitting and validation

  • Cold call: what is overfitting, in plain English?
  • Follow-up: why do we split data into training and validation sets?
  • Room scenario:
    • You train a model on Emory student grades from 2018-2024
    • It scores 99% accuracy on those students
    • Should you trust it on the 2026 cohort? Why or why not?
  • Connect to: Goodhart’s law — when a measure becomes a target, it ceases to be a good measure

Concept 6: tokens and embeddings

  • Pair up with the person next to you. One minute. 🗣️
    • One of you explains tokenisation
    • The other explains embeddings
  • Then I pick two pairs to share
  • Rapid-fire room questions:
    • Why does ChatGPT charge by the token and not by the word?
    • Why does the order of words in a sentence matter, even after embedding?
    • What is a context window, and why does it cost so much to make it bigger?

Concept 7: multimodal

  • Last concept of the recap
  • What does “multimodal” mean for an AI system?
  • Room:
    • Name three modalities we discussed
    • Why is audio harder than text in some ways and easier in others?
    • When you upload a photo to ChatGPT, what is happening under the hood?

Part 2: a real scenario 🏥

The setup

A startup, HealthScribe, sells an AI tool to hospitals. The tool listens to doctor-patient conversations and writes the clinical note.

  • The model is a fine-tuned LLM
  • Training data: 80,000 doctor-patient transcripts from a single large hospital network in California, 2019-2024
  • The labels are the final clinical notes that doctors wrote and signed off on
  • HealthScribe reports 92% agreement with doctor-written notes on a held-out test set
  • They are now selling the product to hospitals across the country

🩺

We will work through this together.

I will not move on until I get answers from at least three different people on each question.

Question A: the dataset

  • What problems can you spot with the training data?
  • Hints, if the room is quiet:
    • Where did the data come from? Who is in it? Who is not in it?
    • The labels are doctor-written notes — is that the truth, or one person’s interpretation?
    • The data is from 2019-2024 — does medicine change over five years?
  • Try to name at least three distinct issues

Question B: the metric

  • HealthScribe reports 92% agreement with doctor-written notes
  • Volunteer: what is being measured here?
  • Then the room:
    • Is “agreement with the doctor” the same as getting it right?
    • What are the model and the doctor agreeing on when they agree?
    • On what kind of patient might the 8% disagreement concentrate?
  • This is a classic case of Goodhart’s law: optimising agreement is not the same as optimising care

Question C: precision and recall

  • Imagine the AI’s job is to flag every mention of an allergy in the conversation, so it can be added to the patient record
  • The room decides:
    • Would you optimise precision (only flag what you are very sure of) or recall (flag everything that might be an allergy)?
    • Defend your choice with a real harm that comes from each kind of mistake
  • This is the single most likely structure for a quiz question. ✏️

Question D: a deployment decision

  • Your hospital is considering buying HealthScribe
  • The room votes by show of hands:
    • Yes, buy it
    • No, do not buy it
    • Buy it, but with conditions
  • Whatever the room votes, I want one defender of each position to argue for 60 seconds
  • Then the room: what evidence would change your mind?

Part 3: practice questions 📝

How this part works

  • I will show you three practice questions, one at a time
  • They are written in the same style as the real Quiz 01
  • Pair up with the person next to you
  • You have two minutes per question to discuss your answer
  • Then I will pick a pair to share, and the room critiques
  • These are not the real quiz questions, but the format is identical

Practice question 1

A friend tells you: “ChatGPT is just an autocomplete on steroids. It doesn’t actually understand anything.” Another friend says: “It got 90% on the bar exam, so clearly it understands law.” 🤷

Your task: who is closer to the truth, and why? In your answer:

  1. Explain what tokenisation and next-token prediction have to do with the first friend’s claim
  2. Explain why a high benchmark score is not the same as understanding (cite Goodhart’s law if you can)
  3. Take a position. Defend it in 3-4 sentences

Two minutes. Discuss with your partner. ⏱️

Practice question 2

A university wants to build an AI that flags students who are “at risk of dropping out.” They have:

  • Five years of student records (grades, attendance, financial aid status)
  • The label is “did this student drop out by the end of year 2”

Your task:

  1. What kind of learning problem is this (supervised, unsupervised, reinforcement)?
  2. What is the difference between precision and recall for this model, and which would you optimise? Defend your choice
  3. Name one ethical concern that does not appear anywhere in the technical metric

Two minutes. ⏱️

Practice question 3

You upload a photo of a handwritten recipe to ChatGPT and ask it to convert it to a typed shopping list. It gets the recipe mostly right, but lists “2 cups of sugar” when the recipe clearly says “2 cups of salt.” 🧂

Your task:

  1. What is happening at the tokeniser and vision encoder stage that could cause this kind of mistake?
  2. Is this a hallucination, a factual error, or both? Explain
  3. If you were Anthropic or OpenAI, what is one thing you would change to make this kind of error less likely?

Two minutes. ⏱️

Part 4: open Q&A 💬

Anything unclear

  • This is your last chance before Thursday
  • No question is too small. If you are unsure, others are too
  • Topics most often asked about in past semesters:
    • Embeddings — what does the vector “mean”?
    • Validation vs test set — what is the difference?
    • Multimodal — how does the model “see” an image?
    • The proxy problem — when is a metric a bad proxy?

Last reminders before Thursday

  • Thursday, September 24, 2:30 pm, this room
  • Bring your laptop and your charger 🔌
  • The quiz is on Canvas — log in before you arrive
  • Read each question twice before you start writing
  • If you are stuck, skip and come back — do not lose the easy points
  • Cite the AI tool you used (if any). Honesty over polish
  • See you Thursday. You are ready. 💪

See you Thursday! 👋