QTM 151 - Introduction to Statistical Computing II

Lecture 03 - Maths Operations, Arrays, and Boolean Logic

Danilo Freire

danilo.freire@emory.edu

Emory University

Welcome to another lecture! 😊

Today’s agenda

Introducing NumPy, `random`, and Boolean logic

First, a brief overview of NumPy
NumPy (short for “Numerical Python”) is a library that provides support for large, multi-dimensional arrays and matrices
An array is a collection of numbers that are arranged in a regular grid (vector, matrix, high-dimensional array - tensors)
In simpler terms, NumPy arrays are a “super-powered list of numbers”
NumPy is the backbone of many other libraries in Python, such as pandas and scikit-learn
We will also learn about the random module, which generates random numbers
Then, we will see how to use Boolean logic in Python

Let’s get started! 🚀

Importing NumPy and Matplotlib

As usual, we start by importing the libraries we will use
NumPy has several functions
For instance, \(ln(x), e^x, sin(x), cos(x), \sqrt{x}\)
Remember that exponentiation in Python is done using **, not ^

You can check a list of NumPy functions here (there are many!)

# Importing packages
import numpy as np
import matplotlib.pyplot as plt

# log(x): logarithm with base "e"
# exp(x): Euler constant power of "x"
# sin(x): sine of x
# cos(x): cosine of x
# We're substituting x = 1
print(np.log(1))
print(np.exp(1))
print(np.sin(1))
print(np.cos(1))
print(np.sqrt(1))

0.0
2.718281828459045
0.8414709848078965
0.5403023058681398
1.0

Try it yourself! 🧠

Create a new variable, \(x = 5\)
Compute \(\pi x^2\)
Compute \(\frac{1}{\sqrt{2\pi}}e^{-x^2}\)
- This is the probability density function of the standard normal distribution (don’t worry if you don’t know what it is yet! 🤓)
Don’t forget how to exponentiate in Python 😉

x = 10
x ** 5

# Not x^5

Appendix 01

Vector arrays with NumPy 📊

Creating arrays from lists

NumPy arrays are created using the np.array() function
We can create arrays from lists
We can also create arrays with a sequence of numbers using np.arange()
We can create arrays with zeros or ones using np.zeros() and np.ones()
And we’re going to learn all of this in this lecture! 🤓

Creating arrays from lists

Create an array from a list
\(a = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\)
\(b = \begin{pmatrix} 0 \\ 1 \\ 0\end{pmatrix}\)
\(c = \begin{pmatrix} 10 \\ 100 \\ 1000 \\ 2000 \\ 5000 \end{pmatrix}\)
\(d = \begin{pmatrix} 4 \\ 2 \end{pmatrix}\)

vec_a  = np.array([1,2,3])
vec_b  = np.array([0,1,0])
vec_c  = np.array([10,100,1000,2000,5000])
vec_d  = np.array([4,2])

Accessing an element of an array

We can access elements of an array using square brackets []
Remember that Python is zero-indexed
Access the first and the third element of \(a\)

print(vec_a)
print(vec_a[0])
print(vec_a[2])

[1 2 3]
1
3

Operations with a single array and a scalar

We can perform operations with a single array and a scalar
For instance, we can add or multiply a scalar to an array

Add 2 to each element of \(a\)
\(a + 2 = \begin{pmatrix} a_1 + 2 \\ a_2 + 2 \\ a_3 + 2 \end{pmatrix}\)

# Print the original array
print(vec_a)

# Adding 2 to each element of a
print(vec_a + 2)

[1 2 3]
[3 4 5]

A scalar refers to either an int or float
We can do many common operations with

print(vec_a * 2)
print(vec_a / 2)
print(vec_a + 2)
print(vec_a ** 2)

[2 4 6]
[0.5 1.  1.5]
[3 4 5]
[1 4 9]

Element-by-element addition between two arrays of the same size

\(a + b = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix} +\) \(\begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix} =\) \(\begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \\ a_3 + b_3 \end{pmatrix}\)

print(vec_a)
print(vec_b)

# When you add two arrays of the same size,
# Python adds the individual elements in each position
print(vec_a + vec_b)

[1 2 3]
[0 1 0]
[1 3 3]

Element-by-element multiplication between two arrays of the same size

\(a * b = \begin{pmatrix} a_1 * b_1 \\ a_2 * b_2 \\ a_3 * b_3 \end{pmatrix}\)

print(vec_a)
print(vec_b)

# When you multiply two arrays of the same size,
# Python multiplies the individual elements in each position
print(vec_a * vec_b)

# We can do other similar element-by-element operations
# such as subtraction, and division.
print(vec_a - vec_b)
print(vec_a / vec_b)

[1 2 3]
[0 1 0]
[0 2 0]
[1 1 3]
[inf  2. inf]

Summary statistics 📊

Summary statistics of an array

NumPy provides several functions to compute summary statistics of an array
For instance, we can compute the mean, median, standard deviation, variance, minimum, and maximum
We can also compute the sum, product, and cumulative sum

print(np.mean(vec_a))
print(np.std(vec_a))
print(np.min(vec_a))
print(np.median(vec_a))
print(np.max(vec_a))

2.0
0.816496580927726
1
2.0
3

Try it yourself! 🧠

Compute the mean of:

\(e = \begin{pmatrix} 10 \\ 8 \\ 15 \\ 0 \\ 24 \end{pmatrix}\)

Appendix 02

Common pitfall

Make sure that the arrays are of the same size!

print(vec_a)
print(vec_c)

# Print the shape of the arrays
print(vec_a.shape)
print(vec_c.shape)

[1 2 3]
[  10  100 1000 2000 5000]
(3,)
(5,)

# When you add two arrays of different sizes,
# Python will raise an error
print(vec_a + vec_c)

Traceback (most recent call last):
  File "<python-input-10>", line 1, in <module>
    print(vec_a + vec_c)
          ~~~~~~^~~~~~~
ValueError: operands could not be broadcast together with shapes (3,) (5,) 
>>>

Questions?

Random numbers with Python 🎲

Generating random numbers

Why randomness?
- Simulate different scenarios: high risk or low risk
- Study properties of a complex system and/or estimator
- In medicine, randomly assign subjects to treatment or control
- In finance, simulate stock prices
- In sports, simulate outcomes of games, etc

This code creates a vector of random variables generated from a normal distribution
It has the mean loc (location) and standard deviation scale
The number of distinct variables is size

# Generate 10 random variables 
# from a normal distribution
# with mean = 0 and sd = 1
randomvar_a = np.random.normal(loc=0, scale=1, size=10)
print(randomvar_a)

[ 1.63578806  0.17780578  0.70001686 -0.42550348 -2.17372052  0.12463876
  0.57423812 -0.29389136  1.30200639 -0.902976  ]

Random numbers differ every time!

Avoid this problem by drawing from a “pregenerated” set.
This is called a seed, and it is set using np.random.seed()
This allows for reproducibility of results

np.random.seed(151)

random_var_b = np.random.normal(loc=0, scale=1, size=10)
print(random_var_b)

[-0.63673759  0.53155853  0.99020835 -0.6241344   1.46778078  0.40501276
  1.29817371 -2.61363271  1.35643373  1.87316055]

Compute a histogram with the results

We can use the plt.hist() function to compute a histogram

# Create a histogram of the random variable
randomvar_x = np.random.normal(loc=0, scale=1, size=1000)

plt.hist(x = randomvar_x)
plt.xlabel("Variable a")
plt.ylabel("Frequency")
plt.title("Histogram of random variable a")
plt.show()

Try it yourself!

Try this again with \(size = 100, 1000, 10000\) and see how the histogram changes

Boolean logic 🧠

Boolean logic

A bit of history

Named after George Boole, a British mathematician and philosopher
Boole’s work on logic laid the foundation for modern computer science
Boolean logic is a branch of algebra that deals with true and false values
It is useful for computer programming because it is based on binary values
In Python, Boolean values are True and False
We use them to make decisions in our code

Testing expressions with text 🐍

True and False values

We can test whether two values are equal using the == operator (two equal signs)
For example, 5 == 5 returns True
We can also test whether two values are not equal using the != operator (exclamation mark followed by an equal sign, rendered as != here)
Let’s see how this works in Python
First, let’s load the matplotlib and numpy libraries

import matplotlib.pyplot as plt
import numpy as np

# You can compare two strings by using a double "equal sign"
# This can be useful if you're trying to evaluate whether data was entered correctly

"Is this the real life?" == "is this just fantasy?"

False

Note: the order of the strings matter!

"ab" == "ba"

False

True and False values

Equality of strings is most useful when you’re comparing an unknown variable to a benchmark
Below, try switching the value of any_questions

any_questions = "no"
print(any_questions == "no")

True

any_questions = "yes" 
print(any_questions == "no")

False

Test for the presence of keywords in a sentence (`in` operator)

We can test whether a keyword is present in a sentence or a list using the in operator
For example, "apple" in "I like apples" returns True
Let’s see how this works in Python

"apple" in "I like apples"

True

keyword = "economic"
sentence = "The Federal Reserve makes forecasts about many economic outcomes"

keyword in sentence

# Try changing the keyword!

True

Now, let’s test whether a keyword is present in a list

current_month = "September"
list_summer_months = ["June","July","August"]

print(current_month in list_summer_months)
print('June' in list_summer_months)

False
True

Testing expressions with numbers 🐍

Testing expressions with numbers

Tests with numbers
- Strictly less than (<), less than or equal (<=)
- Equal (==)
- Strictly more than (>), greater than or equal to (>=)
Let’s see how this works in Python

x = 5

print(x < 5)
print(x <= 5)
print(x == 5)
print(x >= 5)
print(x > 5)

False
True
True
True
False

Validate a data type

We can test whether a variable is of a certain data type using the isinstance() function
For example, isinstance(5, int) returns True
Other data types include float, str, list, tuple, dict, set, bool

y = 10

print(isinstance(y,int))
print(isinstance(y,float))
print(isinstance(y,str))

True
False
False

Equality of vectors

We can test whether two vectors are equal using the == operator
For example, [1,2,3] == [1,2,3] returns True
Please note that the equality of vectors is done element-wise

vec_a = np.array([1,2,3])
vec_b = np.array([1,2,4])

vec_a == vec_b

array([ True,  True, False])

Try it out! 🧠

Define \(x= -1\). Check whether \(x^2 + 2x + 1 = 0\) is true
- Please remember that \(x^2\) is written as x**2 in Python
- Please note the difference between == and =
Appendix 03

Testing multiple conditions 🐍

Testing multiple conditions

We can test multiple conditions using the and and or operators
The and operator returns True if both conditions are True
The or operator returns True if at least one condition is True

`not`: the negation operator

We can negate a condition using the not operator
The not operator returns True if the condition is False and vice versa
- Yes, it’s a bit confusing at first, but it’s intuitive once you see it in action
For instance, imagine that you want to know whether someone can vote in the US
- Here let’s assume the person is a national and we only care about age
- The person can vote if they are at least 18 years old

age  = 22

# Can this person legally vote in the US?
not (age < 18)

True

The not operator can be separated by a space and parentheses are not necessary
But parentheses can be helpful to organize your code logically

not age < 18

True

Condition A and B need to be satisfied: `&` operator

We can test multiple conditions using the & operator
The & operator returns True if both conditions are True
For example, 5 > 3 & 5 < 10 returns True

# We use the "&" symbol to separate "AND" conditions
age = 31

# Is this age between 20 and 30 (including these ages)?
(age >= 20) & (age <= 30)

False

Condition A or B needs to be satisfied: `|` operator

To test whether at least one condition is True, we use the | operator
For example, 5 > 3 | 5 > 10 returns True

# We use the "|" symbol to separate "OR" conditions.
age = 31

# Is this age higher than 20 or lower than 30?
(age >= 20) | (age <= 30)

True

# Another example
student_status = "freshman" 

# Is the student in the first two years of undergrad?
(student_status == "freshman") | (student_status == "sophomore")

True

Try it out! 🚀

Now, let’s test whether you can identify the correct expression

Write code that checks the following conditions:
- Whether age is strictly less than 20, or greater than 30
- Not in the age range 25-27
Appendix 04

And that’s all for today! 🥳

Thank you very much and see you next time! 🙏

Appendix 01

Mathematical functions in NumPy

Create a new variable, \(x = 5\)
Compute \(\pi x^2\)
Compute \(\frac{1}{\sqrt{2\pi}}e^{-x^2}\)

x = 5
print(np.pi * x ** 2)
print(1 / np.sqrt(2 * np.pi) * np.exp(-x ** 2))

78.53981633974483
5.540487995575833e-12

Back to the main text

Appendix 02

Summary statistics of an array

Compute the mean of
\(e = \begin{pmatrix} 10 \\ 8 \\ 15 \\ 0 \\ 24 \end{pmatrix}\)

vec_e = np.array([10, 8, 15, 0, 24])
print(np.mean(vec_e))

11.4

Back to the main text

Appendix 03

Whether \(x = -1\) is a solution to the equation \(x^2 + 2x + 1 = 0\)
Please remember that \(x^2\) is written as x**2 in Python

x = -1
print(x ** 2 + 2 * x + 1 == 0)

True

Back to the main text

Appendix 04

Whether age (age = 31) is strictly less than 20, or greater than 30
Not in the age range 25-27

age = 31

(age < 20) | (age > 30)

True

(age < 25) | (age > 27)

True

The second answer uses | because & evaluates both statements at the same time, and one cannot be less than 25 and greater than 27 at the same time. Therefore, it must be |.

Back to the main text

QTM 151 - Introduction to Statistical Computing II

Welcome to another lecture! 😊

Today’s agenda

Introducing NumPy, random, and Boolean logic

Let’s get started! 🚀

Importing NumPy and Matplotlib

Try it yourself! 🧠

Vector arrays with NumPy 📊

Creating arrays from lists

Creating arrays from lists

Accessing an element of an array

Operations with a single array and a scalar

Element-by-element addition between two arrays of the same size

Element-by-element multiplication between two arrays of the same size

Summary statistics 📊

Summary statistics of an array

Try it yourself! 🧠

Common pitfall

Make sure that the arrays are of the same size!

Questions?

Random numbers with Python 🎲

Generating random numbers

Random numbers differ every time!

Compute a histogram with the results

Try it yourself!

Boolean logic 🧠

Boolean logic

A bit of history

Testing expressions with text 🐍

True and False values

True and False values

Test for the presence of keywords in a sentence (in operator)

Testing expressions with numbers 🐍

Testing expressions with numbers

Validate a data type

Equality of vectors

Try it out! 🧠

Testing multiple conditions 🐍

Testing multiple conditions

not: the negation operator

Condition A and B need to be satisfied: & operator

Condition A or B needs to be satisfied: | operator

Try it out! 🚀

And that’s all for today! 🥳

Thank you very much and see you next time! 🙏

Appendix 01

Mathematical functions in NumPy

Appendix 02

Summary statistics of an array

Appendix 03

Appendix 04

Introducing NumPy, `random`, and Boolean logic

Test for the presence of keywords in a sentence (`in` operator)

`not`: the negation operator

Condition A and B need to be satisfied: `&` operator

Condition A or B needs to be satisfied: `|` operator