DATASCI 185: Introduction to AI Applications

Lecture 10: Creativity and Hallucination

Danilo Freire

danilo.freire@emory.edu

Department of Data and Decision Sciences
Emory University

Welcome back! 🎭

Recap of last class

Last time, we explored prompt engineering
Key techniques:
- PTCF Framework: Purpose, Task, Context, Format
- Chain-of-Thought: “Let’s think step by step”
- Personas: Giving AI expert roles
- AI Agents: LLMs that use tools
Practical tips for better prompts, iterative design
Today: When LLMs go creatively wrong 🎭
Are hallucinations AI’s biggest problem?

Source: Nielsen Norman Group

Lecture overview

Today’s agenda

Part 1: What Are Hallucinations?

When AI confidently lies
Why this happens (it’s not a bug!)
Types of hallucinations

Part 2: Real-World Failures

Lawyers and fake citations
Medical misinformation
Academic integrity issues

Part 3: Creativity vs. Accuracy

The creativity-accuracy tradeoff
Temperature and randomness
When hallucinations are… useful?

Part 4: Detection and Prevention

Spotting hallucinations
Prompting techniques that help
Preview: RAG as a solution

Meme of the day 😄

Source: X.com

What Are Hallucinations? 🤥

When AI confidently makes things up

What is an AI hallucination?

When an AI system generates content that is plausible but factually incorrect, presented with high confidence.

Key characteristics:

✅ Sounds completely reasonable
✅ Uses proper grammar and tone
✅ Fits the context of conversation
❌ Contains fabricated information
❌ May invent citations, facts, or events
❌ AI doesn’t “know” it’s lying

The tricky part:

There’s often no signal that the AI is hallucinating. It sounds just as confident as when it’s correct.

The AI sounds confident, but is completely wrong

Why do LLMs hallucinate?

It’s not a bug, it’s how LLMs work!

Remember: LLMs predict the next word

Input: “The capital of France is”
Output: “Paris” (high probability)
Output: “Lyon” (lower probability)
Output: “Atlantis” (still some probability!)

The problem:

Plausibility ≠ truth
LLMs have no “fact-checking” mechanism
They can’t access the internet during generation (unless designed to)
No way to say “I genuinely don’t know”
LLMs are optimised to sound right, not to be right

Source: Medium

The sycophancy problem 🙇

LLMs are trained to be helpful…sometimes too helpful!

What is sycophancy?

The tendency to give users the answer they seem to want, rather than the accurate answer.

How this causes hallucinations:

User asks a question → model doesn’t know → but saying “I don’t know” feels unhelpful → model makes something up
User pushes back → model changes its (correct!) answer to agree
User asks leading questions → model confirms false premises
According to Geng et al (2025), GPT-5 changed their answers up to 55% of the time after 10 rounds of discussion!

Source: Zvi Mowshovitz (2025)

Training data problems 📚

Garbage in, hallucination out! (remember lecture 03 about data?)

LLMs learn from trillions of tokens scraped from the internet. This creates problems:

Problem	What Happens
Conflicting sources	Wikipedia says X, a blog says Y → model learns both as “plausible”
Outdated information	Training data has a cutoff → model treats old facts as current
Rare topics	Few examples → model “interpolates” (often wrongly)
Errors in sources	Mistakes on the internet become mistakes in the model
Fictional content	Novels, fan fiction, speculation all look like “text” to the model

The interpolation problem:

For rare topics, the model fills in gaps using patterns from similar-but-different topics. This is where confident nonsense comes from!

Example: Rare topic hallucination

Q: “Tell me about the 1987 Liechtenstein general election.”

The model might:

Know Liechtenstein has elections ✓
Know 1987 election patterns in Europe ✓
Invent specific parties, candidates, results ✗

It sounds plausible because it’s statistically consistent with European elections…but it’s all fabricated!

Rule of thumb:

The more obscure or specific your question, the higher the hallucination risk. Models are most reliable on topics with abundant, consistent training data.

Hallucination rates by domain 📊

Not all domains are equally risky.

Research shows hallucination rates vary dramatically by topic:

Domain	Hallucination Risk	Why?
Common knowledge	Low	Abundant, consistent training data
Coding	Low-Medium	Syntax is verifiable; patterns are clear
Recent events	High	Beyond training cutoff
Medical/Legal	High	Requires precision; errors are costly
Obscure history	Very High	Sparse data → interpolation
Citations	Very High	Specific strings rarely memorised

GPT-4 hallucinated on ~29% of common references (Chelli et al. (2024))
But ~58% on legal questions (Dahl et al. (2024))

Hallucination rates vary by domain. Higher is better

Source: Chakraborty et al (2025)

Types of hallucinations

Type	Description	Example
Factual	Wrong facts	“Einstein was born in France”
Fabrication	Invented information	Made-up statistics, fake quotes
Citation	Non-existent sources	“According to Smith (2023)…” (doesn’t exist)
Logical	Contradicts itself	“It’s 5pm now… earlier today at 7pm…”
Temporal	Wrong timelines	Mixing up historical dates
Entity	Confusing people/things	Attributing quotes to wrong person

What do you think is the most dangerous type?

Source: The Cloud Girl Blog

Citation hallucinations are the hardest to catch because they look so specific

Real-World Failures 💥

The lawyer who trusted ChatGPT

Mata v. Avianca Airlines (2023)

Lawyer Steven Schwartz used ChatGPT for legal research
Submitted a brief citing 6 cases that didn’t exist
ChatGPT had invented case names, citations, and quotes
Cases sounded completely plausible!
Judge sanctioned both attorneys

The wake-up call:

“Is Varghese a real case?” Schwartz asked the chatbot. “Yes,” ChatGPT doubled down, “it is a real case.”

Source: CNN

Warning: Asking AI “Is this true?” doesn’t work, it will usually confirm its own hallucinations!

Medical misinformation

Healthcare hallucinations can be lethal:

Examples documented in research:

AI suggested wrong drug dosages
Invented drug interactions that don’t exist
Recommended treatments for wrong conditions
Created fake medical citations

A 2024 study found:

ChatGPT gave incorrect medical advice 51% of the time
Many errors could cause serious patient harm
Model couldn’t identify its own errors
Source: Hadi et al. (2024)

Source: The New York Times (2025)

Many people can’t afford doctors
Easy to believe confident AI, hard to verify
Medical jargon sounds authoritative

Academic integrity issues

Fake citations are a major problem:

For students:

AI invents realistic-sounding paper titles
Creates plausible author names
Fabricates journals and conferences
Very hard to detect without checking

For academia more broadly:

AI-generated papers being submitted
Fake peer reviews
Fabricated data
“Paper mills” using AI

How to avoid this:

Never trust AI citations
Verify EVERY reference
Use academic databases (Google Scholar, PubMed)
Look up authors and journals

Red flags:

Citation format is slightly off
Can’t find paper anywhere
Author has no other publications
Journal name doesn’t exist

Discussion: Who is responsible? 🤔

When AI hallucinations cause harm…

Who bears responsibility?

A. The user who relied on AI

B. The company that built the AI

C. The AI itself (!)

D. No one, it’s just a tool

E. Society for not regulating AI

Consider:

Lawyer submits fake citations
Doctor follows wrong AI advice
Student uses fabricated sources
Journalist publishes AI-generated misinformation

Discuss with a neighbour:

Where do you stand?

⏱️ 3 minutes to debate!

Creativity vs. Accuracy ⚖️

The creativity-accuracy tradeoff

However…

The same mechanism that causes hallucinations also enables creativity!

Consider:

A very “safe” AI only says things it’s 100% sure of
A creative AI takes risks, combines ideas in new ways
You can’t have unlimited creativity with perfect accuracy

The spectrum:

←—————————————————————————————→
BORING                     CREATIVE
but accurate           but risky

"I don't know"         New ideas!
Refuses often          Sometimes wrong

Different tasks need different points on this spectrum!

Source: Kevin Kelly (2024)

You’d want a creative AI for brainstorming, but a cautious one for tax advice

When hallucinations are… useful? 🤔

Controversial take: Hallucinations can be great, too!

Consider these use cases:

Creative writing
- We WANT the AI to invent characters, plots, dialogue
- “Hallucinating” is literally the job!
Brainstorming
- Novel combinations of ideas
- “What if…” scenarios
Role-playing games
- Improvising characters and worlds
- Making up stories on the fly
Art and design
- Imagining things that don’t exist
- Creating impossible visuals

The problem isn’t hallucination per se, it’s hallucination at the wrong time

Sometimes we want the AI to make things up!

Source: Medium

More here: The Case for AI Hallucination

Bottom line: Before you worry about hallucinations, ask yourself whether you actually need the AI to be factual right now.

Activity: Confidence Calibration Test 📊

Does AI know when it’s wrong?

Let’s ask https://chat.z.ai/ these questions and request a confidence rating (1-10) for each answer. Z.ai was chosen because you can turn off web access

“What day of the week was January 14, 1642?” Confidence?”

Tuesday (I checked on Google Calendar!)

“Spell Inconstitucionalissimamente backwards. Confidence?”

etnemamissilanoicutitsnocnI (well, me!)

“Name the third person to walk on the Moon. Confidence?”

Charles “Pete” Conrad Jr (Wikipedia)

“What is the exact population of Tuvalu as of 2024? Confidence?”

9,646 (World Bank estimate)

“Who is the current mayor of Reykjavik, Iceland? Confidence?”

Heiða Björg Hilmisdóttir

Then let’s verify each answer!

What to observe:

Question	AI Confidence	Actually Correct?
Day of the week	? / 10	✓ or ✗
Spell backwards	? / 10	✓ or ✗
Third moonwalker	? / 10	✓ or ✗
Tuvalu population	? / 10	✓ or ✗
Reykjavik mayor	? / 10	✓ or ✗

Discussion:

Is confidence correlated with accuracy?
Did the AI ever give high confidence but wrong answers?
Can you trust AI’s self-assessment?

Detection and Prevention 🛡️

How to spot hallucinations

Red flags to watch for:

Warning Sign	Example
Very specific numbers	“73.2% of studies show…”
Precise citations	“Smith et al. (2022), p. 47”
Obscure details	Historical minutiae
Too-perfect answers	Exactly what you wanted
Confident tone	No hedging or uncertainty

Verification strategies:

Google the claim: does it appear elsewhere?
Check citations: do they exist?
Ask for sources: then verify them!
Cross-reference: use multiple sources
Trust your expertise: if something seems off, it might be

Specific claim, impressive detail, completely wrong! The answer is the Hubble telescope

Impressive-sounding detail is often the first sign something is made up

Golden rule: If the AI gives you a precise percentage or a page number, that’s exactly when you should double-check.

Prompting techniques that help

You can reduce hallucinations through prompting:

Technique	What to Add to Your Prompt
Ask for uncertainty	“If you’re not sure, say so. Don’t guess.”
Request sources	“Only cite sources you’re certain exist.”
Chain-of-thought	“Think step by step. Show your reasoning.”
Confidence levels	“Rate your confidence 1-10 for each claim.”
Output constraints	“Be precise and factual. Avoid speculation.”

Does this eliminate hallucinations?

No! But it can reduce them. Combining several of these techniques in one prompt works better than any single trick.

Basic anti-hallucination prompt:

Answer my question factually. 

Rules:
- Only state things you're 
  confident about
- Say "I'm not sure" when 
  uncertain
- Don't invent citations
- If you don't know, admit it
- Show your reasoning step 
  by step

Question: [your question]

Next slides: We’ll build more sophisticated templates using PTCF and meta-prompting from Lecture 09!

Using PTCF to reduce hallucinations

Remember the PTCF framework from last class?

Element	Anti-Hallucination Version
Persona	“You are a careful fact-checker who never guesses”
Task	“Verify and summarise only confirmed facts”
Context	“Base answers ONLY on the provided text”
Format	“Use [Verified], [Uncertain], or [Unknown] tags”

Why PTCF helps:

Persona activates cautious, precise patterns from training
Task constrains scope, no room for invention
Context grounds responses in real information
Format forces explicit uncertainty signals

PTCF anti-hallucination template:

PERSONA: You are a research 
assistant who prioritises 
accuracy over completeness. 
Never guess or fabricate.

TASK: Answer the following 
question using ONLY information 
you are highly confident about.

CONTEXT: This is for an academic 
paper. Incorrect information 
could damage my credibility.

FORMAT: 
- Start each fact with [Verified] 
  or [Uncertain]
- If you don't know, say 
  "I don't have reliable 
  information on this"
- Never invent citations

QUESTION: [your question]

Meta-prompting: Let AI improve your prompts

Meta-prompting means asking AI to help you write better prompts. This is especially useful for reducing hallucinations!

Why it works:

The model has seen millions of prompts in training
It knows which phrasings activate careful vs. creative modes
It can identify ambiguities you might miss

When to use meta-prompting:

You’re getting unreliable outputs
You’re not sure what constraints to add
You want to catch edge cases
You need prompts for high-stakes tasks

Key insight: You’re basically asking the model “how should I talk to you so you don’t make things up?”

Meta-prompting example:

I'm building a prompt to ask 
you about historical events.

I'm worried about hallucinations.
You might invent dates, names,
or events that didn't happen.

Help me write a prompt that:
1. Minimises hallucination risk
2. Makes you flag uncertainty
3. Prevents invented citations

What constraints should I add? 
What phrasing reduces the 
chance you'll make things up?

Try it! Ask ChatGPT or Claude to critique and improve your prompts. The suggestions are often better than what you’d write on your own.

The knowledge cutoff problem

LLMs only know what they learned during training

ChatGPT’s training data has a cutoff date
Ask about yesterday’s news → doesn’t know
Ask about your company’s policies → doesn’t know
Ask about your course materials → doesn’t know

What happens when you ask about recent events?

Option	What AI Does
Honest	“I don’t have that information”
Hallucinating	Makes something up!
Confused	Gives outdated info as current

This is why chatbots now have:

Web browsing capabilities
File upload features
Knowledge retrieval systems (RAG!)

Anything after the cutoff date is a blind spot, and the model won’t tell you that

Preview of next class: RAG solves this by giving AI access to real, up-to-date documents!

Preview: RAG as a solution

RAG = Retrieval-Augmented Generation

The core idea:

When you ask a question…
First search your documents for relevant information
Give that information to the LLM
LLM generates an answer using your data

Why this reduces hallucinations:

AI has real information to work with
Answers are grounded in documents
Can cite sources, so you can verify!
Reduces need to “make stuff up”

Next class, we’ll explore RAG in depth!

Source: Towards AI

Teaser: Tools like NotebookLM, ChatPDF, and Perplexity all use RAG!

Summary 📚

Main takeaways

Hallucinations: AI confidently states false information
Why it happens: LLMs predict plausible, not true
Temperature: The creativity-accuracy dial
Real failures: Lawyers, doctors, academics affected
Detection: Verify specific claims, check citations
Prevention: Careful prompting helps but doesn’t eliminate

Your anti-hallucination toolkit

Always do:

✅ Verify specific facts
✅ Check every citation
✅ Cross-reference with trusted sources
✅ Be sceptical of confident tones
✅ Ask AI to admit uncertainty

Never do:

❌ Trust AI for critical decisions alone
❌ Submit AI citations without checking
❌ Assume confident = correct
❌ Skip verification for “obvious” answers

Prompting tips:

"If you're not sure, say so"

"Only cite sources that exist"

"Rate your confidence 1-10"

"Show your reasoning step by step"

For creative tasks, you can skip most of these. Let the model run wild 😄

…and that’s all for today! 🎉