DATASCI 185: Introduction to AI Applications

Lecture 15: Types of Bias and How They Arise

Danilo Freire

Department of Data and Decision Sciences
Emory University

Welcome back! ⚖️

Recap of last class

  • Last time we explored data documentation and governance
  • Datasheets for datasets, model cards for models, system cards for AI systems
  • Data lineage: tracking where data comes from
  • The consent problem in AI training data
  • Today: Types of bias and how they arise
  • Perhaps one of the most important topics in this course!
  • AI is reshaping society, but whose values does it encode?

Source: Anthropic

Lecture overview

Today’s agenda

Part 1: The Big Picture

  • What is bias? (It’s complicated!)
  • The mirror problem: AI reflects us
  • Real-world harms: Lives affected

Part 2: Types of Bias

  • Historical bias: The past encoded
  • Representation bias: Who’s missing?
  • Measurement bias: Proxies gone wrong
  • Aggregation and evaluation bias

Part 3: Applying the Framework

  • Activity: Spot the bias!
  • Where bias shows up (preview)

Part 4: The Hard Questions

  • The impossibility theorem: Can we be fair?
  • Fairness tradeoffs: Uncomfortable choices
  • Is AI fixable, or do we need new approaches?

Meme of the day

Source: r/sciencememes

A story to start with

Robert Julian-Borchak Williams, Detroit, 2020

  • Arrested in front of his family
  • Accused of stealing watches
  • Held for 30 hours in jail
  • Later released: wrong person

What happened?

  • A facial recognition system matched his driver’s licence photo to grainy surveillance footage
  • The algorithm was wrong
  • Robert is Black. Research shows facial recognition has higher error rates for darker-skinned faces

This wasn’t an AI bug per se, it was how the system was built

Robert Williams case

Source: ACLU

What is Bias? 🤔

Defining bias: It’s complicated

“Bias” means different things:

Context Meaning Example
Statistical Systematic deviation Biased estimator under/overestimates
Cognitive Mental shortcuts Confirmation bias
Cultural Learned assumptions “Doctors are male”
Algorithmic Systematic unfairness Different error rates by group
Historical Past inequalities Less data on minorities

In AI, bias typically means:

A system that produces systematically unfair outcomes for certain groups of people.

But who defines “unfair”? That’s where it gets hard…

The mirror problem: AI reflects us

A provocative question:

Is AI biased… or is it just showing us what we already are?

The uncomfortable truth:

  • AI learns from human-generated data
  • Historical data contains historical discrimination
  • If humans made biased decisions, AI learns those patterns
  • AI can amplify existing biases at scale

Example: Word embeddings

  • “Man” is to “Doctor” as “Woman” is to… “Nurse”
  • The AI learned this from millions of human texts
  • It’s reflecting our own sexism

AI trained on human data inherits human biases

Source: UNESCO

Debate point: Does this make AI less accountable, or more? 🤔

Discussion: Who’s responsible? 🎯

Take a position:

If an AI hiring tool discriminates against women…

Who is responsible?

A. The company using the tool

B. The company that built the tool

C. The data scientists who trained it

D. The society that created biased data

E. All of the above—but in what proportion?

Think about:

  • Who had the power to prevent this?
  • Who benefits from using AI?
  • Who bears the harm?

Some perspectives:

Tech companies say: “We just provide tools, thus users are responsible”

Regulators say: “Those who deploy AI must ensure it’s fair”

Scholars say: “Responsibility is distributed across the entire pipeline”

Affected individuals say: “I was harmed and I don’t care who’s technically responsible”

Discuss with your neighbour (or think about it!)

⏱️ 2 minutes

Why this matters: The stakes are high

AI systems are making decisions about:

Domain Decision Affected
Employment Who gets hired, promoted, fired Millions of applicants
Finance Who gets loans, credit, insurance Billions globally
Healthcare Who gets treatment, care priority Life and death
Criminal justice Bail, sentencing, parole Freedom and families
Education Admissions, resources, grading Future opportunities

The scale is unprecedented:

  • Algorithms make decisions faster than humans can review
  • A single biased system can affect millions of people
  • Errors compound: one bad decision leads to worse data for future decisions

Types of Bias 📊

A taxonomy of bias

Bias Type When It Occurs Example
Historical Past decisions encoded in data Loan data from discriminatory era
Representation Some groups underrepresented Few dark-skinned faces in training
Measurement Proxies used for unmeasurable concepts Using ZIP code for creditworthiness
Aggregation Treating diverse groups as one “One model fits all” fails
Evaluation Wrong benchmarks for testing Testing on unrepresentative data
Deployment Model used in wrong context US model applied globally

Bias can enter at every stage of the AI lifecycle:

Data collection → Data labelling → Model training → Model evaluation → Deployment → Use

Historical bias: The past encoded

What is it?

Historical bias occurs when past discrimination is baked into the training data, even if the data accurately reflects the real world at the time

The Amazon hiring case (2018):

  • Amazon built an AI to screen CVs
  • Trained on 10 years of past hiring decisions
  • Past hiring was male-dominated
  • AI learned: penalise CVs mentioning “women’s”
    • “Women’s chess club captain” → downgraded
    • All-women’s college → downgraded
  • Amazon scrapped the tool

The irony: The data was “accurate”, it did reflect Amazon’s historical hiring. But that history was biased!

Amazon hiring AI

Source: Reuters

Lesson: Accurate data ≠ fair data

Representation bias: Who’s missing?

When certain groups are underrepresented in training data, the model performs poorly for them

Example: Voice assistants

  • Voice recognition systems trained primarily on:
    • American and British accents
    • Male voices (often from tech employees)
    • Native speakers
  • Result: Higher error rates for:
    • Non-native English speakers
    • Regional accents (Scottish, Indian, Nigerian)
    • Women’s voices in some systems
    • Children and elderly users

Why?

  • Whoever collects the data determines who’s in it
  • Convenience samples from developers themselves
  • “Edge cases” are actually most of the world

Voice recognition bias

Source: Axios

Lesson: If you’re not in the data, you’re invisible to the AI!

Measurement bias: Proxies gone wrong

Using a measurable proxy for something you actually care about… but the proxy doesn’t work equally for everyone

Example: ZIP code as a credit proxy

  • Banks can’t legally use race to make loan decisions
  • But they can use ZIP codes
  • ZIP codes correlate strongly with race due to housing discrimination
  • Result: A “race-neutral” variable that encodes race

Other problematic proxies:

What We Want Proxy Used Problem
Intelligence Standardised tests Reflects access to prep
Job quality Tenure Penalises caregivers
Creditworthiness Payment history Assumes equal opportunity
Health needs Past spending Reflects access barriers

Measurement bias

Source: Harvard Law Review

Lesson: The proxy you choose can encode systemic inequality…even when the “bad” variable isn’t in the data.

Aggregation bias: One size doesn’t fit all

Treating diverse populations as homogeneous when the underlying relationships differ across groups

Example: Diabetes prediction

  • A single model is trained on data from all patients
  • But diabetes manifests differently across ethnicities:
    • Different genetic risk factors
    • Different dietary patterns
    • Different symptom presentations
  • A single model may work well on average but poorly for specific groups
  • Simpson’s paradox

The maths:

Model Overall Accuracy Group A Group B
Single model 85% 90% 75%
Group-specific 88% 89% 86%

Average performance can hide disparate impact

Source: Wikipedia

Question: Should we build separate models for different groups? What are the tradeoffs?

Evaluation bias: Testing on the wrong people

When the benchmark dataset used to test a model doesn’t represent the population it will be used on

The benchmark problem:

  • Standard benchmarks become industry standards
  • Everyone optimises for the same tests
  • If the test is biased, success on the test means nothing

Example: ImageNet

  • For years, the gold standard in computer vision
  • But images were predominantly from the US and Europe
  • Models trained and tested on it worked great… in the US
  • Deployed globally: failures on everyday objects from other cultures

Evaluation bias

Source: Wired

Lesson: A model that passes biased tests is a biased model that looks good on paper

Deployment bias: Right model, wrong context

What is it?

When a model is used in a different context than it was designed for.

Example: US model in India

  • Credit scoring model trained on US financial data
  • Deployed in India to assess loan applications
  • Problem: Different banking systems, income patterns, credit histories
  • Result: Inappropriate decisions for the new context

Example: COVID-19 detection

  • Models trained on hospital data from one region
  • Deployed in different regions with different equipment
  • X-ray machines, demographics, disease prevalence all differed
  • Performance dropped significantly

Deployment bias

Source: Yang et al (2024)

Lesson: A model is only valid where it was tested

Bias through the AI lifecycle

Bias through the AI lifecycle

Fixing one stage doesn’t guarantee a fair system. You need to check the entire pipeline

Feedback loops: Bias that amplifies itself

What is a feedback loop?

When an algorithm’s predictions influence the data it will be trained on in the future

Example: Predictive policing

  1. Algorithm predicts crime hotspots based on past arrest data
  2. Police patrol those areas more heavily
  3. More patrols → more arrests (whether or not crime rates differ)
  4. New arrest data confirms the algorithm’s predictions
  5. Algorithm becomes more confident in biased patterns
  6. Cycle continues…

The problem:

  • The algorithm creates evidence for its own predictions
  • Bias compounds over time
  • Becomes impossible to know what “ground truth” is

Feedback loop: Hot-spots policing

Source: SpotCrime

Other feedback loops:

  • Loan denials → worse credit → more denials
  • Resume filters → homogeneous workforce → more biased training data

Activity: Spot the bias! 🔍

Scenario: University admissions

An AI recommends admissions based on SAT scores, high school GPA, and extracurriculars

  • What types of bias might this contain?
  • Who might be disadvantaged?
  • What proxies are being used?

Discuss one scenario with a neighbour:

  1. Identify at least 2 types of bias
  2. Propose how you might mitigate them
  3. What tradeoffs would you face?

⏱️ 2 minutes

The Hard Questions ❓

The impossibility theorem

You can’t have it all

Three fairness criteria (simplified):

  1. Calibration: Among those given the same score, outcomes should be similar across groups

  2. Equal false positive rates: Groups should have equal rates of being wrongly flagged

  3. Equal false negative rates: Groups should have equal rates of being wrongly missed

The impossibility theorem:

If base rates differ between groups, you cannot satisfy all three simultaneously

Translation: If Group A reoffends at 40% and Group B at 20%, you must choose which type of error to equalise

There is no mathematically “fair” solution.

Impossibility theorem

Implication: Fairness is a values choice, not a technical problem.

Who defines “fair”? 🤔

Different definitions reflect different values:

Definition Prioritises Drawback
Equal treatment Consistency Ignores context
Equal outcomes Equity May require discrimination
Equal error rates Group parity May sacrifice accuracy
Calibration Individual accuracy Hides disparity

The uncomfortable truth:

  • There is no neutral choice
  • Whoever chooses the definition shapes the outcome
  • “We just use the data” is itself a choice

Fairness tradeoffs: Uncomfortable choices

Tradeoff 1: Accuracy vs. Fairness

  • Making predictions equally accurate across groups may reduce overall accuracy
  • Who pays the cost?

Tradeoff 2: Individual vs. Group Fairness

  • Treating individuals identically (blind to group) may produce unequal group outcomes
  • Treating groups equally may disadvantage qualified individuals

Tradeoff 3: Transparency vs. Gaming

  • Revealing how the algorithm works allows people to game it
  • Keeping it secret prevents accountability

Tradeoff 4: Short-term vs. Long-term

  • Using current data perpetuates historical inequalities
  • Ignoring current data may reduce accuracy today

There are no easy answers.

Different stakeholders will prioritise differently:

  • Affected communities: Equalise outcomes
  • Companies: Maximise accuracy
  • Regulators: Ensure process fairness
  • Courts: Protect individual rights

Who gets to decide which tradeoff to make?

Activity: Make the tradeoff! ⚖️

You’re designing a loan approval algorithm

The data shows:

  • Group A: 80% repay loans
  • Group B: 60% repay loans (due to historical economic disadvantage)

You must choose ONE approach:

  1. Same threshold for all: 70% predicted repayment = approved
    • Result: More Group B rejected
  2. Equal approval rates: Adjust thresholds so both groups have 50% approval
    • Result: More defaults, higher risk
  3. Equal false rejection rates: Ensure same % of good borrowers rejected across groups
    • Result: Different approval rates

Questions to discuss:

  • Which approach is “fairest”?
  • Who benefits and who is harmed by each?
  • Would your answer change if Group B’s lower rate was due to discrimination (not ability)?
  • Should the algorithm try to correct historical injustice, or just not perpetuate it?

Vote with your class!

Which approach would you choose and why?

⏱️ 3 minutes to debate!

Perspectives: Is AI bias fixable?

View 1: Optimists 🌟

  • AI bias is a technical problem with technical solutions
  • Better data, better algorithms, better audits
  • AI might be less biased than humans
    • Humans are inconsistent, AI is consistent
    • AI can be audited, humans are opaque
  • Progress is being made: datasets improving, regulations emerging

Key argument:

“A biased AI can be fixed. A biased human is much harder to change.”

View 2: Sceptics 🤔

  • AI bias reflects societal problems that can’t be coded away
  • “Fair AI” is a distraction from addressing root causes
  • Who defines fairness? (Usually those in power)
  • AI obscures human accountability
  • Some decisions shouldn’t be automated at all

Key argument:

“Fixing AI bias without fixing society is like putting a bandage on a broken bone.”

What do YOU think? 🎙️

Summary 📚

Main takeaways

  • AI reflects us: Bias in data → bias in outputs

  • Historical bias: Past discrimination gets encoded

  • Representation bias: Who’s missing matters

  • Measurement bias: Proxies can perpetuate inequality

  • Impossibility theorem: You can’t satisfy all fairness criteria

  • The hard questions: Fairness is values, not just maths

…and that’s all for today! 🎉