DATASCI 185: Introduction to AI Applications

Lecture 10: Creativity and Hallucination

Danilo Freire

Department of Data and Decision Sciences
Emory University

Welcome back! 🎭

Recap of last class

  • Last time, we explored prompt engineering
  • Key techniques:
    • PTCF Framework: Purpose, Task, Context, Format
    • Chain-of-Thought: “Let’s think step by step”
    • Personas: Giving AI expert roles
    • AI Agents: LLMs that use tools
  • Practical tips for better prompts, iterative design
  • Today: When LLMs go creatively wrong 🎭
  • Are hallucinations AI’s biggest problem?

Lecture overview

Today’s agenda

Part 1: What Are Hallucinations?

  • When AI confidently lies
  • Why this happens (it’s not a bug!)
  • Types of hallucinations

Part 2: Real-World Failures

  • Lawyers and fake citations
  • Medical misinformation
  • Academic integrity issues

Part 3: Creativity vs. Accuracy

  • The creativity-accuracy tradeoff
  • Temperature and randomness
  • When hallucinations are… useful?

Part 4: Detection and Prevention

  • Spotting hallucinations
  • Prompting techniques that help
  • Preview: RAG as a solution

Meme of the day 😄

Source: X.com

What Are Hallucinations? 🤥

When AI confidently makes things up

What is an AI hallucination?

When an AI system generates content that is plausible but factually incorrect, presented with high confidence.

Key characteristics:

  • ✅ Sounds completely reasonable
  • ✅ Uses proper grammar and tone
  • ✅ Fits the context of conversation
  • ❌ Contains fabricated information
  • ❌ May invent citations, facts, or events
  • ❌ AI doesn’t “know” it’s lying

The unsettling part:

There’s often no signal that the AI is hallucinating. It sounds just as confident as when it’s correct!

Hallucination example

The AI sounds confident, but is completely wrong

Why do LLMs hallucinate?

It’s not a bug, it’s how LLMs work!

Remember: LLMs predict the next word

  • Input: “The capital of France is”
  • Output: “Paris” (high probability)
  • Output: “Lyon” (lower probability)
  • Output: “Atlantis” (still some probability!)

The problem:

  • Plausibility ≠ truth
  • LLMs have no “fact-checking” mechanism
  • They can’t access the internet during generation (unless designed to)
  • No way to say “I genuinely don’t know”
  • LLMs are optimised to sound right, not to be right

Why hallucinations happen

Source: Medium

The sycophancy problem 🙇

LLMs are trained to be helpful…sometimes too helpful!

What is sycophancy?

The tendency to give users the answer they seem to want, rather than the accurate answer.

How this causes hallucinations:

  • User asks a question → model doesn’t know → but saying “I don’t know” feels unhelpful → model makes something up
  • User pushes back → model changes its (correct!) answer to agree
  • User asks leading questions → model confirms false premises
  • According to Geng et al (2025), GPT-5 changed their answers up to 55% of the time after 10 rounds of discussion!

Training data problems 📚

Garbage in, hallucination out! (remember lecture 03 about data?)

LLMs learn from trillions of tokens scraped from the internet. This creates problems:

Problem What Happens
Conflicting sources Wikipedia says X, a blog says Y → model learns both as “plausible”
Outdated information Training data has a cutoff → model treats old facts as current
Rare topics Few examples → model “interpolates” (often wrongly)
Errors in sources Mistakes on the internet become mistakes in the model
Fictional content Novels, fan fiction, speculation all look like “text” to the model

The interpolation problem:

For rare topics, the model fills in gaps using patterns from similar-but-different topics. This is where confident nonsense comes from!

Example: Rare topic hallucination

Q: “Tell me about the 1987 Liechtenstein general election.”

The model might:

  1. Know Liechtenstein has elections ✓
  2. Know 1987 election patterns in Europe ✓
  3. Invent specific parties, candidates, results

It sounds plausible because it’s statistically consistent with European elections…but it’s all fabricated!

Rule of thumb:

The more obscure or specific your question, the higher the hallucination risk. Models are most reliable on topics with abundant, consistent training data.

Hallucination rates by domain 📊

Not all domains are equally risky.

Research shows hallucination rates vary dramatically by topic:

Domain Hallucination Risk Why?
Common knowledge Low Abundant, consistent training data
Coding Low-Medium Syntax is verifiable; patterns are clear
Recent events High Beyond training cutoff
Medical/Legal High Requires precision; errors are costly
Obscure history Very High Sparse data → interpolation
Citations Very High Specific strings rarely memorised

Hallucination rates vary by domain. Higher is better

Source: Chakraborty et al (2025)

Types of hallucinations

Type Description Example
Factual Wrong facts “Einstein was born in France”
Fabrication Invented information Made-up statistics, fake quotes
Citation Non-existent sources “According to Smith (2023)…” (doesn’t exist)
Logical Contradicts itself “It’s 5pm now… earlier today at 7pm…”
Temporal Wrong timelines Mixing up historical dates
Entity Confusing people/things Attributing quotes to wrong person

What do you think is the most dangerous type?

Types of hallucinations

Source: The Cloud Girl Blog

Each type presents different risks

Real-World Failures 💥

The lawyer who trusted ChatGPT

Mata v. Avianca Airlines (2023)

  • Lawyer Steven Schwartz used ChatGPT for legal research
  • Submitted a brief citing 6 cases that didn’t exist
  • ChatGPT had invented case names, citations, and quotes
  • Cases sounded completely plausible!
  • Judge sanctioned both attorneys

The wake-up call:

“Is Varghese a real case?” Schwartz asked the chatbot. “Yes,” ChatGPT doubled down, “it is a real case.”

Lawyer ChatGPT case

Source: CNN

Warning: Asking AI “Is this true?” doesn’t work, it will usually confirm its own hallucinations!

Medical misinformation

Healthcare hallucinations can be lethal:

Examples documented in research:

  • AI suggested wrong drug dosages
  • Invented drug interactions that don’t exist
  • Recommended treatments for wrong conditions
  • Created fake medical citations

A 2024 study found:

  • ChatGPT gave incorrect medical advice 51% of the time
  • Many errors could cause serious patient harm
  • Model couldn’t identify its own errors
  • Source: Hadi et al. (2024)

Medical hallucination dangers

Source: The New York Times (2025)

  • Many people can’t afford doctors
  • Easy to believe confident AI, hard to verify
  • Medical jargon sounds authoritative

Academic integrity issues

Fake citations are a major problem:

For students:

  • AI invents realistic-sounding paper titles
  • Creates plausible author names
  • Fabricates journals and conferences
  • Very hard to detect without checking

For academia more broadly:

  • AI-generated papers being submitted
  • Fake peer reviews
  • Fabricated data
  • “Paper mills” using AI

How to avoid this:

  1. Never trust AI citations
  2. Verify EVERY reference
  3. Use academic databases (Google Scholar, PubMed)
  4. Look up authors and journals

Red flags:

  • Citation format is slightly off
  • Can’t find paper anywhere
  • Author has no other publications
  • Journal name doesn’t exist

Discussion: Who is responsible? 🤔

When AI hallucinations cause harm…

Who bears responsibility?

A. The user who relied on AI

B. The company that built the AI

C. The AI itself (!)

D. No one, it’s just a tool

E. Society for not regulating AI

Consider:

  • Lawyer submits fake citations
  • Doctor follows wrong AI advice
  • Student uses fabricated sources
  • Journalist publishes AI-generated misinformation

Discuss with a neighbour:

Where do you stand?

⏱️ 3 minutes to debate!

Creativity vs. Accuracy ⚖️

The creativity-accuracy tradeoff

However…

The same mechanism that causes hallucinations also enables creativity!

Consider:

  • A very “safe” AI only says things it’s 100% sure of
  • A creative AI takes risks, combines ideas in new ways
  • You can’t have unlimited creativity with perfect accuracy

The spectrum:

←—————————————————————————————→
BORING                     CREATIVE
but accurate           but risky

"I don't know"         New ideas!
Refuses often          Sometimes wrong
Very literal           Imaginative

Different tasks need different points on this spectrum!

Creativity-accuracy tradeoff

Source: Kevin Kelly (2024)

Different tasks require different balances

When hallucinations are… useful? 🤔

Controversial take: Hallucinations can be great, too!

Consider these use cases:

  1. Creative writing
    • We WANT the AI to invent characters, plots, dialogue
    • “Hallucinating” is literally the job!
  2. Brainstorming
    • Novel combinations of ideas
    • “What if…” scenarios
  3. Role-playing games
    • Improvising characters and worlds
    • Making up stories on the fly
  4. Art and design
    • Imagining things that don’t exist
    • Creating impossible visuals

The problem isn’t hallucination per se, it’s hallucination at the wrong time

Sometimes we want the AI to make things up!

Source: Medium

Match the task to the tool. Use high-creativity settings for creative work, high-accuracy approaches for factual work (obviously!)

Activity: Confidence Calibration Test 📊

Does AI know when it’s wrong?

Let’s ask https://chat.z.ai/ these questions and request a confidence rating (1-10) for each answer. Z.ai was chosen because you can turn off web access

  1. “What day of the week was January 14, 1642?” Confidence?”
  • Tuesday (I checked on Google Calendar!)
  1. “Spell Inconstitucionalissimamente backwards. Confidence?”
  • etnemamissilanoicutitsnocnI (well, me!)
  1. “Name the third person to walk on the Moon. Confidence?”
  • Charles “Pete” Conrad Jr (Wikipedia)
  1. “What is the exact population of Tuvalu as of 2024? Confidence?”
  • 9,646 (World Bank estimate)
  1. “Who is the current mayor of Reykjavik, Iceland? Confidence?”
  • Heiða Björg Hilmisdóttir

Then let’s verify each answer!

What to observe:

Question AI Confidence Actually Correct?
Day of the week ? / 10 ✓ or ✗
Spell backwards ? / 10 ✓ or ✗
Third moonwalker ? / 10 ✓ or ✗
Tuvalu population ? / 10 ✓ or ✗
Reykjavik mayor ? / 10 ✓ or ✗

Discussion:

  • Is confidence correlated with accuracy?
  • Did the AI ever give high confidence but wrong answers?
  • Can you trust AI’s self-assessment?

Detection and Prevention 🛡️

How to spot hallucinations

Red flags to watch for:

Warning Sign Example
Very specific numbers “73.2% of studies show…”
Precise citations “Smith et al. (2022), p. 47”
Obscure details Historical minutiae
Too-perfect answers Exactly what you wanted
Confident tone No hedging or uncertainty

Verification strategies:

  1. Google the claim: does it appear elsewhere?
  2. Check citations: do they exist?
  3. Ask for sources: then verify them!
  4. Cross-reference: use multiple sources
  5. Trust your expertise: if something seems off, it might be

Specific claim, impressive detail, completely wrong! The answer is the Hubble telescope

Learn to recognise warning signs

Golden rule: The more specific and confident the AI sounds, the more you should verify!

Prompting techniques that help

You can reduce hallucinations through prompting:

Technique What to Add to Your Prompt
Ask for uncertainty “If you’re not sure, say so. Don’t guess.”
Request sources “Only cite sources you’re certain exist.”
Chain-of-thought “Think step by step. Show your reasoning.”
Confidence levels “Rate your confidence 1-10 for each claim.”
Output constraints “Be precise and factual. Avoid speculation.”

Does this eliminate hallucinations?

No! But it can reduce them significantly. The key is stacking multiple techniques together.

Basic anti-hallucination prompt:

Answer my question factually. 

Rules:
- Only state things you're 
  confident about
- Say "I'm not sure" when 
  uncertain
- Don't invent citations
- If you don't know, admit it
- Show your reasoning step 
  by step

Question: [your question]

Next slides: We’ll build more sophisticated templates using PTCF and meta-prompting from Lecture 09!

Using PTCF to reduce hallucinations

Remember the PTCF framework from last class?

Element Anti-Hallucination Version
Persona “You are a careful fact-checker who never guesses”
Task “Verify and summarise only confirmed facts”
Context “Base answers ONLY on the provided text”
Format “Use [Verified], [Uncertain], or [Unknown] tags”

Why PTCF helps:

  • Persona activates cautious, precise patterns from training
  • Task constrains scope, no room for invention
  • Context grounds responses in real information
  • Format forces explicit uncertainty signals

PTCF anti-hallucination template:

PERSONA: You are a research 
assistant who prioritises 
accuracy over completeness. 
Never guess or fabricate.

TASK: Answer the following 
question using ONLY information 
you are highly confident about.

CONTEXT: This is for an academic 
paper. Incorrect information 
could damage my credibility.

FORMAT: 
- Start each fact with [Verified] 
  or [Uncertain]
- If you don't know, say 
  "I don't have reliable 
  information on this"
- Never invent citations

QUESTION: [your question]

Meta-prompting: Let AI improve your prompts

Meta-prompting means asking AI to help you write better prompts. This is especially useful for reducing hallucinations!

Why it works:

  • The model has seen millions of prompts in training
  • It knows which phrasings activate careful vs. creative modes
  • It can identify ambiguities you might miss

When to use meta-prompting:

  • You’re getting unreliable outputs
  • You’re not sure what constraints to add
  • You want to catch edge cases
  • You need prompts for high-stakes tasks

Key insight: The AI can help you build prompts that make it less likely to hallucinate!

Meta-prompting example:

I'm building a prompt to ask 
you about historical events.

I'm worried about hallucinations
—you might invent dates, names, 
or events that didn't happen.

Help me write a prompt that:
1. Minimises hallucination risk
2. Makes you flag uncertainty
3. Prevents invented citations

What constraints should I add? 
What phrasing reduces the 
chance you'll make things up?

Try it! Ask ChatGPT or Claude to critique and improve your prompts. You’ll often get surprisingly useful suggestions.

The knowledge cutoff problem

LLMs only know what they learned during training

  • ChatGPT’s training data has a cutoff date
  • Ask about yesterday’s news → doesn’t know
  • Ask about your company’s policies → doesn’t know
  • Ask about your course materials → doesn’t know

What happens when you ask about recent events?

Option What AI Does
Honest “I don’t have that information”
Hallucinating Makes something up!
Confused Gives outdated info as current

This is why chatbots now have:

  • Web browsing capabilities
  • File upload features
  • Knowledge retrieval systems (RAG!)

Knowledge cutoff

LLMs are “frozen in time” at their training date

Preview of next class: RAG solves this by giving AI access to real, up-to-date documents!

Preview: RAG as a solution

RAG = Retrieval-Augmented Generation

The core idea:

  1. When you ask a question…
  2. First search your documents for relevant information
  3. Give that information to the LLM
  4. LLM generates an answer using your data

Why this reduces hallucinations:

  • AI has real information to work with
  • Answers are grounded in documents
  • Can cite sources, so you can verify!
  • Reduces need to “make stuff up”

Next class, we’ll explore RAG in depth!

RAG preview

Source: Towards AI

Teaser: Tools like NotebookLM, ChatPDF, and Perplexity all use RAG!

Summary 📚

Main takeaways

  • Hallucinations: AI confidently states false information

  • Why it happens: LLMs predict plausible, not true

  • Temperature: The creativity-accuracy dial

  • Real failures: Lawyers, doctors, academics affected

  • Detection: Verify specific claims, check citations

  • Prevention: Careful prompting helps but doesn’t eliminate

Your anti-hallucination toolkit

Always do:

  • ✅ Verify specific facts
  • ✅ Check every citation
  • ✅ Cross-reference with trusted sources
  • ✅ Be sceptical of confident tones
  • ✅ Ask AI to admit uncertainty

Never do:

  • ❌ Trust AI for critical decisions alone
  • ❌ Submit AI citations without checking
  • ❌ Assume confident = correct
  • ❌ Skip verification for “obvious” answers

Prompting tips:

"If you're not sure, say so"

"Only cite sources that exist"

"Rate your confidence 1-10"

"Show your reasoning step by step"

… or just embrace randomness for creative tasks! 😄

Further reading

Academic:

Accessible:

Cases and news:

Tools to explore:

…and that’s all for today! 🎉