Promises and problems of Data Science for crime science

Advanced Crime Analysis UCL

Bennett Kleinberg

4 March 2019

Advances, Promises and Problems

Today

  • problematic trends in data science
  • fallacies in data science
  • ethical considerations of data science for crime scientists
  • an outlook

  • “R Markdown” talk (Isabelle)

What do you think? Could there be problems?

Pfeffer et al. (2018)

Pfeffer et al. (2018)

  • cognition -> language assumption
  • online behaviour -> real behaviour assumption
  • methodological flaws: random sampling
  • even if: bias population remains!

Why is this a problem?

Intermezzo: Reproducibility crisis

If we care about data science, we need to do a better job.

The technology fallacy

The technology fallacy

Img source

The technology fallacy

Img source

The technology fallacy

Popular belief: technology will solve all problems.

  • esp. true for data
  • “so we just need more data”
  • so why not use it for all the difficult problems?

The technology fallacy

Recent case:

Full article, Exposing YouTube video

The technology fallacy

Mmh, that’s strange…?

  • apparently not a solved problem
  • and there’s more
    • Facebook
    • Twitter, etc. and content removal
  • still: very much relying on humans

The technology fallacy

Problem:

  • this creates unrealistic expectations
  • biggest challenge for data science: expectation management

The naivité fallacy

The naivité fallacy

The naivité fallacy

source

The naivité fallacy

The naivité fallacy

The naivité fallacy

Terrorist Passenger
Terrorist 950 50 1,000
Passenger 4,950 94,050 99,000
5,900 94,100 100,000

P(terrorist|alarm) = 950/5900 = 16.10%

The naivité fallacy

Put simply: you can sell anything.

Here’s an idea

ai_terrorism_detection = function(person){
  person_classification = 'no terrorist'
  return(person_classification)
}

“UCL RESEARCHERS USE AI TO FIGHT TERRORISM!”

“AI 99.9999% ACCURATE IN SPOTTING TERRORISTS!”

Data science headlines

Guide to data science headlines

“UK government reveals new AI tool for flagging extremist content”

=

“UK government reveals new AI tool for flagging extremist content buys snake oil”

The naivité fallacy

What to do about it:

  • avoid the hype
  • there is no rocket science here
  • 95% is just (a type of) regression
  • if it sounds too good to be true, it is

Beware of the hype!

The category mistake of data science

Category mistake

https://www.youtube.com/watch?v=fCLI6kxFFTE

Category mistake

  • So we are getting there with self-driving cars.
  • Hence: we can also address the other challenges.

!!!!

Category mistake

Geller, 1999, 538 article

“I would not be at all surprised if earthquakes are just practically, inherently unpredictable.”

(Ned Field)

Category mistake

  • Building a sophisticated visual recogntion system != predicting everything
  • Static phenomena vs. complex systems

Human behaviour might be the ultimate frontier in prediction.

If you only read one book in 2019…

Read: “The Signal and the noise”, Nate Silver

Ethical issues

Ethics & data science

  • data sources
  • (machine) learning systems
  • reinforcing systems
  • responsible practices

Ethics & data science

Your turn: do you see problems for these aspects?

  • data sources
  • (machine) learning systems

Ethics & data science

What about “reinforcing systems”?

Ethics & data science

Choose 1:

  1. FP/FN issue in the hand of practitioners
  2. academics’ responsibility

An outlook

What would an ideal Data Science look like?

Be specific…

Academic data science

vs

“Industry” data science

Extreme view:

current academic data science is catering hype to compensate the Google envy.

Academic data science

What it is doing What it should be doing
creating “cool” studies testing assumptions
pumping out non-reproducible papers investing in fundamental data science research
hiring people to do cool things with our data starting with the problem
getting on the data science train focus on methods of data science

Outlook

  • we need boring studies!
    • longitudinal studies
    • assumption checks
    • replications
  • we need to accept that Google & Co. are a different league in applying things
  • we need to focus on the “ACADEMIC” part
  • we need unis as control mechanism, not as a player

For the future

Assumptions, assumptions, , assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions, assumptions. Everywhere assumptions.

Test them!

This week

FEEDBACK submission + revision + your project

Next week

  • Lecture : The Applied Data Science pipeline
  • Tutorial: full pipeline + your project

END