QTM 151 - Introduction to Statistical Computing II

Lecture 03 - Maths Operations, Arrays, and Boolean Logic

Danilo Freire

Emory University

Welcome to another lecture! 😊

Today’s agenda

Introducing NumPy, random, and Boolean logic

  • First, a brief overview of NumPy
  • NumPy (short for “Numerical Python”) is a library that provides support for large, multi-dimensional arrays and matrices
  • An array is a collection of numbers that are arranged in a regular grid (vector, matrix, high-dimensional array - tensors)
  • In simpler terms, NumPy arrays are a “super-powered list of numbers”
  • NumPy is the backbone of many other libraries in Python, such as pandas and scikit-learn
  • We will also learn about the random module, which generates random numbers
  • Then, we will see how to use Boolean logic in Python

Let’s get started! 🚀

Importing NumPy and Matplotlib

  • As usual, we start by importing the libraries we will use
  • NumPy has several functions
  • For instance, \(ln(x), e^x, sin(x), cos(x), \sqrt{x}\)
  • Remember that exponentiation in Python is done using **, not ^


  • You can check a list of NumPy functions here (there are many!)
# Importing packages
import numpy as np
import matplotlib.pyplot as plt

# log(x): logarithm with base "e"
# exp(x): Euler constant power of "x"
# sin(x): sine of x
# cos(x): cosine of x
# We're substituting x = 1
print(np.log(1))
print(np.exp(1))
print(np.sin(1))
print(np.cos(1))
print(np.sqrt(1))
0.0
2.718281828459045
0.8414709848078965
0.5403023058681398
1.0

Try it yourself! 🧠

  • Create a new variable, \(x = 5\)
  • Compute \(\pi x^2\)
  • Compute \(\frac{1}{\sqrt{2\pi}}e^{-x^2}\)
    • This is the probability density function of the standard normal distribution (don’t worry if you don’t know what it is yet! 🤓)
  • Don’t forget how to exponentiate in Python 😉
x = 10
x ** 5

# Not x^5
100000

Vector arrays with NumPy 📊

Creating arrays from lists

  • NumPy arrays are created using the np.array() function
  • We can create arrays from lists
  • We can also create arrays with a sequence of numbers using np.arange()
  • We can create arrays with zeros or ones using np.zeros() and np.ones()
  • And we’re going to learn all of this in this lecture! 🤓

Creating arrays from lists

  • Create an array from a list

  • \(a = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\)

  • \(b = \begin{pmatrix} 0 \\ 1 \\ 0\end{pmatrix}\)

  • \(c = \begin{pmatrix} 10 \\ 100 \\ 1000 \\ 2000 \\ 5000 \end{pmatrix}\)

  • \(d = \begin{pmatrix} 4 \\ 2 \end{pmatrix}\)

vec_a  = np.array([1,2,3])
vec_b  = np.array([0,1,0])
vec_c  = np.array([10,100,1000,2000,5000])
vec_d  = np.array([4,2])

Accessing an element of an array

  • We can access elements of an array using square brackets []

  • Remember that Python is zero-indexed

  • Access the first and the third element of \(a\)

print(vec_a)
print(vec_a[0])
print(vec_a[2])
[1 2 3]
1
3

Operations with a single array and a scalar

  • We can perform operations with a single array and a scalar
  • For instance, we can add or multiply a scalar to an array


  • Add 2 to each element of \(a\)
  • \(a + 2 = \begin{pmatrix} a_1 + 2 \\ a_2 + 2 \\ a_3 + 2 \end{pmatrix}\)
# Print the original array
print(vec_a)

# Adding 2 to each element of a
print(vec_a + 2)
[1 2 3]
[3 4 5]
  • A scalar refers to either an int or float
  • We can do many common operations with
print(vec_a * 2)
print(vec_a / 2)
print(vec_a + 2)
print(vec_a ** 2)
[2 4 6]
[0.5 1.  1.5]
[3 4 5]
[1 4 9]

Element-by-element addition between two arrays of the same size

\(a + b = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix} +\) \(\begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix} =\) \(\begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \\ a_3 + b_3 \end{pmatrix}\)

print(vec_a)
print(vec_b)

# When you add two arrays of the same size,
# Python adds the individual elements in each position
print(vec_a + vec_b)
[1 2 3]
[0 1 0]
[1 3 3]

Element-by-element multiplication between two arrays of the same size

\(a * b = \begin{pmatrix} a_1 * b_1 \\ a_2 * b_2 \\ a_3 * b_3 \end{pmatrix}\)

print(vec_a)
print(vec_b)

# When you multiply two arrays of the same size,
# Python multiplies the individual elements in each position
print(vec_a * vec_b)

# We can do other similar element-by-element operations
# such as subtraction, and division.
print(vec_a - vec_b)
print(vec_a / vec_b)
[1 2 3]
[0 1 0]
[0 2 0]
[1 1 3]
[inf  2. inf]

Summary statistics 📊

Summary statistics of an array

  • NumPy provides several functions to compute summary statistics of an array
  • For instance, we can compute the mean, median, standard deviation, variance, minimum, and maximum
  • We can also compute the sum, product, and cumulative sum


print(np.mean(vec_a))
print(np.std(vec_a))
print(np.min(vec_a))
print(np.median(vec_a))
print(np.max(vec_a))
2.0
0.816496580927726
1
2.0
3

Try it yourself! 🧠

  • Compute the mean of:

\(e = \begin{pmatrix} 10 \\ 8 \\ 15 \\ 0 \\ 24 \end{pmatrix}\)

Common pitfall

Make sure that the arrays are of the same size!

print(vec_a)
print(vec_c)

# Print the shape of the arrays
print(vec_a.shape)
print(vec_c.shape)
[1 2 3]
[  10  100 1000 2000 5000]
(3,)
(5,)
# When you add two arrays of different sizes,
# Python will raise an error
print(vec_a + vec_c)
Traceback (most recent call last):
  File "<python-input-10>", line 1, in <module>
    print(vec_a + vec_c)
          ~~~~~~^~~~~~~
ValueError: operands could not be broadcast together with shapes (3,) (5,) 
>>> 

Questions?

Random numbers with Python 🎲

Generating random numbers

  • Why randomness?
    • Simulate different scenarios: high risk or low risk
    • Study properties of a complex system and/or estimator
    • In medicine, randomly assign subjects to treatment or control
    • In finance, simulate stock prices
    • In sports, simulate outcomes of games, etc
  • This code creates a vector of random variables generated from a normal distribution
  • It has the mean loc (location) and standard deviation scale
  • The number of distinct variables is size
# Generate 10 random variables 
# from a normal distribution
# with mean = 0 and sd = 1
randomvar_a = np.random.normal(loc=0, scale=1, size=10)
print(randomvar_a)
[ 1.63578806  0.17780578  0.70001686 -0.42550348 -2.17372052  0.12463876
  0.57423812 -0.29389136  1.30200639 -0.902976  ]

Random numbers differ every time!

  • Avoid this problem by drawing from a “pregenerated” set.
  • This is called a seed, and it is set using np.random.seed()
  • This allows for reproducibility of results
np.random.seed(151)

random_var_b = np.random.normal(loc=0, scale=1, size=10)
print(random_var_b)
[-0.63673759  0.53155853  0.99020835 -0.6241344   1.46778078  0.40501276
  1.29817371 -2.61363271  1.35643373  1.87316055]

Compute a histogram with the results

  • We can use the plt.hist() function to compute a histogram
# Create a histogram of the random variable
randomvar_x = np.random.normal(loc=0, scale=1, size=1000)

plt.hist(x = randomvar_x)
plt.xlabel("Variable a")
plt.ylabel("Frequency")
plt.title("Histogram of random variable a")
plt.show()

Try it yourself!

  • Try this again with \(size = 100, 1000, 10000\) and see how the histogram changes

Boolean logic 🧠

Boolean logic

A bit of history

  • Named after George Boole, a British mathematician and philosopher
  • Boole’s work on logic laid the foundation for modern computer science
  • Boolean logic is a branch of algebra that deals with true and false values
  • It is useful for computer programming because it is based on binary values
  • In Python, Boolean values are True and False
  • We use them to make decisions in our code

Testing expressions with text 🐍

True and False values

  • We can test whether two values are equal using the == operator (two equal signs)

  • For example, 5 == 5 returns True

  • We can also test whether two values are not equal using the != operator (exclamation mark followed by an equal sign, rendered as != here)

  • Let’s see how this works in Python

  • First, let’s load the matplotlib and numpy libraries

import matplotlib.pyplot as plt
import numpy as np

# You can compare two strings by using a double "equal sign"
# This can be useful if you're trying to evaluate whether data was entered correctly

"Is this the real life?" == "is this just fantasy?" 
False

Note: the order of the strings matter!

"ab" == "ba" 
False

True and False values

  • Equality of strings is most useful when you’re comparing an unknown variable to a benchmark

  • Below, try switching the value of any_questions

any_questions = "no"
print(any_questions == "no")
True
any_questions = "yes" 
print(any_questions == "no")
False

Test for the presence of keywords in a sentence (in operator)

  • We can test whether a keyword is present in a sentence or a list using the in operator
  • For example, "apple" in "I like apples" returns True
  • Let’s see how this works in Python
"apple" in "I like apples"
True
keyword = "economic"
sentence = "The Federal Reserve makes forecasts about many economic outcomes"

keyword in sentence

# Try changing the keyword!
True
  • Now, let’s test whether a keyword is present in a list
current_month = "September"
list_summer_months = ["June","July","August"]

print(current_month in list_summer_months)
print('June' in list_summer_months)
False
True

Testing expressions with numbers 🐍

Testing expressions with numbers

  • Tests with numbers
    • Strictly less than (<), less than or equal (<=)
    • Equal (==)
    • Strictly more than (>), greater than or equal to (>=)
  • Let’s see how this works in Python
x = 5

print(x < 5)
print(x <= 5)
print(x == 5)
print(x >= 5)
print(x > 5)
False
True
True
True
False

Validate a data type

  • We can test whether a variable is of a certain data type using the isinstance() function
  • For example, isinstance(5, int) returns True
  • Other data types include float, str, list, tuple, dict, set, bool
y = 10

print(isinstance(y,int))
print(isinstance(y,float))
print(isinstance(y,str))
True
False
False

Equality of vectors

  • We can test whether two vectors are equal using the == operator
  • For example, [1,2,3] == [1,2,3] returns True
  • Please note that the equality of vectors is done element-wise
vec_a = np.array([1,2,3])
vec_b = np.array([1,2,4])

vec_a == vec_b
array([ True,  True, False])

Try it out! 🧠

  • Define \(x= -1\). Check whether \(x^2 + 2x + 1 = 0\) is true
    • Please remember that \(x^2\) is written as x**2 in Python
    • Please note the difference between == and =
  • Appendix 03

Testing multiple conditions 🐍

Testing multiple conditions

  • We can test multiple conditions using the and and or operators
  • The and operator returns True if both conditions are True
  • The or operator returns True if at least one condition is True

not: the negation operator

  • We can negate a condition using the not operator
  • The not operator returns True if the condition is False and vice versa
    • Yes, it’s a bit confusing at first, but it’s intuitive once you see it in action
  • For instance, imagine that you want to know whether someone can vote in the US
    • Here let’s assume the person is a national and we only care about age
    • The person can vote if they are at least 18 years old
age  = 22

# Can this person legally vote in the US?
not (age < 18)
True
  • The not operator can be separated by a space and parentheses are not necessary
  • But parentheses can be helpful to organize your code logically
not age < 18
True

Condition A and B need to be satisfied: & operator

  • We can test multiple conditions using the & operator
  • The & operator returns True if both conditions are True
  • For example, 5 > 3 & 5 < 10 returns True
# We use the "&" symbol to separate "AND" conditions
age = 31

# Is this age between 20 and 30 (including these ages)?
(age >= 20) & (age <= 30)
False

Condition A or B needs to be satisfied: | operator

  • To test whether at least one condition is True, we use the | operator
  • For example, 5 > 3 | 5 > 10 returns True
# We use the "|" symbol to separate "OR" conditions.
age = 31

# Is this age higher than 20 or lower than 30?
(age >= 20) | (age <= 30)
True
# Another example
student_status = "freshman" 

# Is the student in the first two years of undergrad?
(student_status == "freshman") | (student_status == "sophomore")
True

Try it out! 🚀

  • Now, let’s test whether you can identify the correct expression
  • Write code that checks the following conditions:
    • Whether age is strictly less than 20, or greater than 30
    • Not in the age range 25-27
  • Appendix 04

And that’s all for today! 🥳

Thank you very much and see you next time! 🙏

Appendix 01

Mathematical functions in NumPy

  • Create a new variable, \(x = 5\)
  • Compute \(\pi x^2\)
  • Compute \(\frac{1}{\sqrt{2\pi}}e^{-x^2}\)
x = 5
print(np.pi * x ** 2)
print(1 / np.sqrt(2 * np.pi) * np.exp(-x ** 2))
78.53981633974483
5.540487995575833e-12

Back to the main text

Appendix 02

Summary statistics of an array

  • Compute the mean of
  • \(e = \begin{pmatrix} 10 \\ 8 \\ 15 \\ 0 \\ 24 \end{pmatrix}\)
vec_e = np.array([10, 8, 15, 0, 24])
print(np.mean(vec_e))
11.4

Back to the main text

Appendix 03

  • Whether \(x = -1\) is a solution to the equation \(x^2 + 2x + 1 = 0\)
  • Please remember that \(x^2\) is written as x**2 in Python
x = -1
print(x ** 2 + 2 * x + 1 == 0)
True

Back to the main text

Appendix 04

  • Whether age (age = 31) is strictly less than 20, or greater than 30
  • Not in the age range 25-27
age = 31

(age < 20) | (age > 30) 
True
(age < 25) | (age > 27)
True

The second answer uses | because & evaluates both statements at the same time, and one cannot be less than 25 and greater than 27 at the same time. Therefore, it must be |.

Back to the main text