Promises and problems of Data Science for crime science

Advanced Crime Analysis UCL

Bennett Kleinberg

4 March 2019

Advances, Promises and Problems

Today

problematic trends in data science
fallacies in data science
ethical considerations of data science for crime scientists
an outlook
“R Markdown” talk (Isabelle)

What do you think? Could there be problems?

Problematic trends

Extreme view: current academic data science is catering hype to compensate the Google envy.

Problematic trends

Assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions. Everywhere assumptions.

Pfeffer et al. (2018)

cognition -> language assumption
online behaviour -> real behaviour assumption
methodological flaws: random sampling
even if: bias population remains!

Why is this a problem?

Intermezzo: Reproducibility crisis

If we care about data science, we need to do a better job.

The technology fallacy

Img source

The technology fallacy

Img source

The technology fallacy

Popular belief: technology will solve all problems.

esp. true for data
“so we just need more data”
so why not use it for all the difficult problems?

The technology fallacy

Recent case:

Full article, Exposing YouTube video

The technology fallacy

Mmh, that’s strange…?

apparently not a solved problem
and there’s more
- Facebook
- Twitter, etc. and content removal
still: very much relying on humans

The technology fallacy

Problem:

this creates unrealistic expectations
biggest challenge for data science: expectation management

The naivité fallacy

source

The naivité fallacy

	Terrorist	Passenger
Terrorist	950	50	1,000
Passenger	4,950	94,050	99,000
	5,900	94,100	100,000

P(terrorist|alarm) = 950/5900 = 16.10%

The naivité fallacy

Put simply: you can sell anything.

Here’s an idea

ai_terrorism_detection = function(person){
  person_classification = 'no terrorist'
  return(person_classification)
}

“UCL RESEARCHERS USE AI TO FIGHT TERRORISM!”

“AI 99.9999% ACCURATE IN SPOTTING TERRORISTS!”

Data science headlines

Guide to data science headlines

“UK government reveals new AI tool for flagging extremist content”

“UK government ~~reveals new AI tool for flagging extremist content~~ buys snake oil”

The naivité fallacy

What to do about it:

avoid the hype
there is no rocket science here
95% is just (a type of) regression
if it sounds too good to be true, it is

Beware of the hype!

The category mistake of data science

Category mistake

https://www.youtube.com/watch?v=fCLI6kxFFTE

Category mistake

So we are getting there with self-driving cars.
Hence: we can also address the other challenges.

!!!!

Category mistake

Geller, 1999, 538 article

“I would not be at all surprised if earthquakes are just practically, inherently unpredictable.”

(Ned Field)

Category mistake

Building a sophisticated visual recogntion system != predicting everything
Static phenomena vs. complex systems

Human behaviour might be the ultimate frontier in prediction.

If you only read one book in 2019…

Read: “The Signal and the noise”, Nate Silver

Ethical issues

Ethics & data science

data sources
(machine) learning systems
reinforcing systems
responsible practices

Ethics & data science

Your turn: do you see problems for these aspects?

data sources
(machine) learning systems

Ethics & data science

What about “reinforcing systems”?

Ethics & data science

Choose 1:

FP/FN issue in the hand of practitioners
academics’ responsibility

An outlook

What would an ideal Data Science look like?

Be specific…

Academic data science

“Industry” data science

Extreme view:

current academic data science is catering hype to compensate the Google envy.

Academic data science

What it is doing	What it should be doing
creating “cool” studies	testing assumptions
pumping out non-reproducible papers	investing in fundamental data science research
hiring people to do cool things with our data	starting with the problem
getting on the data science train	focus on methods of data science

Outlook

we need boring studies!
- longitudinal studies
- assumption checks
- replications
we need to accept that Google & Co. are a different league in applying things
we need to focus on the “ACADEMIC” part
we need unis as control mechanism, not as a player

For the future

Assumptions, assumptions, , assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions. Everywhere assumptions.

Test them!

This week

FEEDBACK submission + revision + your project

Next week

Lecture : The Applied Data Science pipeline
Tutorial: full pipeline + your project

Promises and problems of Data Science for crime science

Advanced Crime Analysis UCL

Bennett Kleinberg

4 March 2019

Today

What do you think? Could there be problems?

Problematic trends

Problematic trends

Problematic trends

Problematic trends

Problematic trends

Why is this a problem?

Intermezzo: Reproducibility crisis

The technology fallacy

The technology fallacy

The technology fallacy

The technology fallacy

The technology fallacy

The technology fallacy

The technology fallacy

The naivité fallacy

The naivité fallacy

The naivité fallacy

The naivité fallacy

The naivité fallacy

The naivité fallacy

The naivité fallacy

Here’s an idea

Data science headlines

Guide to data science headlines

The naivité fallacy

The category mistake of data science

Category mistake

Category mistake

Category mistake

Category mistake

Ethical issues

Ethics & data science

Ethics & data science

Ethics & data science

Ethics & data science

An outlook

Be specific…

Academic data science

Outlook

For the future

This week

Next week

END