Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Ethics ✕ Data Science

Machine learning 101

Simon Munzert and Johannes Himmelreich

1 / 45

Table of contents



  1. Machine learning, deep learning, AI

  2. Basic concepts in machine learning

  3. Overview of ML landscape

  4. Performance metrics

  5. AI for public policy

2 / 45

Machine learning, deep learning, AI


3 / 45

What is AI?

Artificial intelligence

"Artificial intelligence (AI) is intelligence - perceiving, synthesizing, and inferring information - demonstrated by machines, as opposed to intelligence displayed by non-human animals and humans. Example tasks in which this is done include speech recognition, computer vision, translation between (natural) languages, as well as other mappings of inputs."

Wikipedia, Artificial intelligence

"The effort to automate intellectual tasks normally performed humans."

Chollet and Allaire, 2018, Deep Learning with R

Source Wikipedia

4 / 45

What is AI?

Machine learning

"Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn' (...) It is seen as a part of artificial intelligence."

Wikipedia, Machine learning

"Machine learning is a specific subfield of AI that aims at automatically developing programs (called models) purely from exposure to training data. This process of turning models data into a program is called learning."

Chollet and Allaire, 2018, Deep Learning with R

Source Wikipedia

5 / 45

What is AI?

Data mining

"Application of machine learning methods to large databases is called data mining. The analogy is that a large volume of earth and raw material is extracted from a mine, which when processed leads to a small amount of very precious material; similarly, in data mining, a large volume of data is processed to construct a simple model with valuable use, for example, having high predictive accuracy."

Alpaydin, 2014, Introduction to Machine Learning

Source Wikipedia

6 / 45

What is AI?

Deep learning

"Deep learning is the subset of machine learning methods based on neural networks with representation learning. The adjective "deep" refers to the use of multiple layers in the network."

Wikipedia, Deep learning

Source Wikipedia

7 / 45

Corporate investment in AI

8 / 45

Basic concepts in machine learning


9 / 45

Regression vs. classification

Regression

  • Predicts a continuous outcome
  • Example: Predicting house prices, GDP growth, temperature

Classification

  • Predicts a categorical outcome
  • Example: Predicting whether a person will default on a loan, whether an email is spam, whether a patient has a disease
10 / 45

Regression vs. classification

Regression

  • Predicts a continuous outcome
  • Example: Predicting house prices, GDP growth, temperature

Classification

  • Predicts a categorical outcome
  • Example: Predicting whether a person will default on a loan, whether an email is spam, whether a patient has a disease

Classifcation problems in the wild

Classification problems occur often, perhaps even more so than regression problems, e.g.:

  1. A woman arrives at the emergency room with a set of symptoms. Which condition does she have?
  2. An online banking service must be able to determine whether or not a transaction is fraudulent, on the basis of the user’s IP address, past transaction history, and so forth.
  3. On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not.

Decision-making problems often are classification problems!

10 / 45

Supervised and unsupervised learning

Supervised learning

  • The algorithm learns from labeled data, i.e., data with known outcomes
  • The algorithm is trained on a training dataset and evaluated on a test dataset
  • The goal is to the predict unobserved outcomes

Unsupervised learning

  • The algorithm learns from unlabeled data
  • There are inputs but no supervising output; we can still learn about relationships and structure from such data

Analogies

  • Supervised: Child in school learns math (with teacher’s input)
  • Unsupervised: Child at home plays with toys (without teacher’s input)


11 / 45

Training, validation and test dataset

12 / 45

Overfitting

13 / 45

Overfitting

14 / 45

Overfitting

15 / 45

Overfitting

16 / 45

Overfitting

17 / 45

Overfitting

18 / 45

Overfitting

19 / 45

Overfitting in classification

20 / 45

Overfitting in classification

21 / 45

Overfitting in classification

Explained: The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.

22 / 45

Overfitting, ultimately explained



23 / 45

Overview of ML landscape


24 / 45

The ML landscape (Microsoft.com)

25 / 45

ML decision tree

Source Sundararajan et al. 2021

26 / 45

Affiliation of AI researchers

27 / 45

Performance metrics


28 / 45

AI performance in knowledge tests

29 / 45

AI capabilities vs. human performance

30 / 45

ML performance benchmarking in the wild

Source Oueslati, 2024, Watching the Watchers: A Comparative Audit of Cloud‑Based Commercial Content Moderation Services.

31 / 45

ML performance benchmarking in the wild

Source Wiik, 2024, GPT-4o vs. GPT-4 vs. Gemini 1.5 — Performance Analysis (Accuracy).

32 / 45

Assessing classification performance

Accuracy

  • Accuracy = Number of correct predictionsTotal number of predictions=TP+TNTP+TN+FP+FN
  • Error rate: 1Accuracy

Usefulness

  • Accuracy is a simple and intuitive metric.
  • But it can be misleading, especially in imbalanced datasets where the classes are not evenly represented.
  • Example: In a dataset with 90% of class A and 10% of class B, a model that predicts all instances as class A will have an accuracy of 90%, but it will not be useful for predicting class B instances.
33 / 45

Assessing classification performance

Accuracy

  • Accuracy = Number of correct predictionsTotal number of predictions=TP+TNTP+TN+FP+FN
  • Error rate: 1Accuracy

Usefulness

  • Accuracy is a simple and intuitive metric.
  • But it can be misleading, especially in imbalanced datasets where the classes are not evenly represented.
  • Example: In a dataset with 90% of class A and 10% of class B, a model that predicts all instances as class A will have an accuracy of 90%, but it will not be useful for predicting class B instances.

Example


What is the accuracy of our recidivism classifier?

33 / 45

Assessing classification performance

Precision

  • Precision = Number of true positive predictionsNumber of positive predictions=TPTP+FP

Usefulness

  • Precision focuses on the accuracy of positive predictions and is useful when the cost of false positives is high.
34 / 45

Assessing classification performance

Precision

  • Precision = Number of true positive predictionsNumber of positive predictions=TPTP+FP

Usefulness

  • Precision focuses on the accuracy of positive predictions and is useful when the cost of false positives is high.
34 / 45

Assessing classification performance

Precision

  • Precision = Number of true positive predictionsNumber of positive predictions=TPTP+FP

Usefulness

  • Precision focuses on the accuracy of positive predictions and is useful when the cost of false positives is high.

Example


What is the precision of our recidivism classifier?

35 / 45

Assessing classification performance

Recall (Sensitivity)

  • Recall = Number of true positive predictionsNumber of true positives=TPTP+FN
  • "True positive rate"

Usefulness

  • Recall focuses on capturing all positive instances and is important when the cost of false negatives is high.
  • Example: In a medical diagnosis, recall is important to ensure that all patients with a disease are correctly identified.
  • The complementary measure is specificity (true negative rate; e.g. how many healthy people are identified as not having the condition)
36 / 45

Assessing classification performance

Recall (Sensitivity)

  • Recall = Number of true positive predictionsNumber of true positives=TPTP+FN
  • "True positive rate"

Usefulness

  • Recall focuses on capturing all positive instances and is important when the cost of false negatives is high.
  • Example: In a medical diagnosis, recall is important to ensure that all patients with a disease are correctly identified.
  • The complementary measure is specificity (true negative rate; e.g. how many healthy people are identified as not having the condition)

Example


What is the recall of our recidivism classifier?

37 / 45

Assessing classification performance

F1 score

  • F1 score = 2×Precision×RecallPrecision+Recall=2TP2TP+FP+FN
  • F1 score is the harmonic mean of precision and recall.

Usefulness

  • It provides a balance between precision and recall, especially when there is an imbalance between the classes.
  • F1 score ranges from 0 to 1, where 1 indicates perfect precision and recall, and 0 indicates poor performance.

Illustration


Normalised harmonic mean plot where x is precision, y is recall and the vertical axis is F1 score, in % points

Source Andong87

38 / 45

Assessing classification performance

F1 score

  • F1 score = 2×Precision×RecallPrecision+Recall=2TP2TP+FP+FN
  • F1 score is the harmonic mean of precision and recall.

Usefulness

  • It provides a balance between precision and recall, especially when there is an imbalance between the classes.
  • F1 score ranges from 0 to 1, where 1 indicates perfect precision and recall, and 0 indicates poor performance.

Example


What is the F1 score of our recidivism classifier?

39 / 45

Why performance metrics can matter

Scenario

  • Outcome: Recidivism where individual recidivates (1) or not (0)
  • False Positive (FP): Model predicts an individual will recidivate when they actually do not.
  • False Negative (FN): Model predicts an individual will not recidivate when they actually do.
    • This could result in individuals who are at risk, being released without proper intervention, potentially leading to repeat offenses.

Assigning costs

  • What are downstream costs of FP and FN?
  • At which level do the costs apply - individual, societal, ...?

Ethical and economic reasoning

  • How should we weigh the costs of FP and FN?
  • What should we prioritize in our model - reducing FP, FN, or balancing both?
40 / 45

AI for public policy


41 / 45

The COMPAS algorithm to predict criminals' recidivism

Background

  • Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is a decision support tool developed by Northpointe (now Equivant) used by U.S. courts to assess the likelihood of recidivism
  • Produced several scales (Pretrial release risk, General recidivism, Violent recidivism) based on factors such as age, criminal history, and substance abuse
  • The algorithm is proprietary and its inner workings are not public




Source Practitioner's Guide to COMPAS Core

42 / 45

The COMPAS algorithm to predict criminals' recidivism

The ProPublica and other investigations

  • In 2016, ProPublica published an investigation showing that COMPAS was biased against African Americans
  • Bias: The algorithm was more likely for African Americans to wrongly predict that defendants would re-offend.
  • Accuracy: only 20% of people predicted to commit violent crimes actually went on to do so (in a later study estimated with 65%, still worse than a group of humans with little expertise)






Source ProPublica 2016

43 / 45

The COMPAS algorithm to predict criminals' recidivism

The ProPublica and other investigations

  • In 2016, ProPublica published an investigation showing that COMPAS was biased against African Americans
  • Bias: The algorithm was more likely for African Americans to wrongly predict that defendants would re-offend.
  • Accuracy: only 20% of people predicted to commit violent crimes actually went on to do so (in a later study estimated with 65%, still worse than a group of humans with little expertise)





Source Dressel and Fair, 2018, Science Advances

44 / 45

Discussion



  1. Where do you see potential for AI in public policy-making?

  2. Are there applications of AI in Georgian government that you are aware of?

  3. What role does AI play in your personal (professional) life?



45 / 45

Table of contents



  1. Machine learning, deep learning, AI

  2. Basic concepts in machine learning

  3. Overview of ML landscape

  4. Performance metrics

  5. AI for public policy

2 / 45
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow