QTM 151 - Introduction to Statistical Computing II

Lecture 04 - Mathematical Operations, Arrays, and Random Numbers

Danilo Freire

Emory University

11 September, 2024

Welcome to another lecture! 😊

Brief recap 📚

In the last lecture, we learned how to:

  • Install packages in Python using conda install
  • Create, access, and modify variables
  • Manipulate lists
  • Use the print() function to display information
  • Assignment 01 is due today! 🚨
  • Assignment 02 is already online on Canvas and GitHub
  • Feel free to email your assignments to me at or submit it via Canvas (hopefully it is working now!)
  • We will mark them and provide feedback on your work soon

Questions?

An announcement from our department 📢

QTM Open House! 🏠

QTM open house

  • The QTM Open House is an event where you can learn more about the QTM major
  • You can meet the faculty, staff, and students
  • It is this Friday, September 13th, from 1:30 to 3:30 PM at PAIS 290

Today’s agenda

Introducing NumPy and random

  • A brief overview of Matplotlib and NumPy
  • NumPy (short for “Numerical Python”) is a library that provides support for large, multi-dimensional arrays and matrices
  • An array is a collection of numbers that are arranged in a regular grid (vector, matrix, high-dimensional array - tensors)
  • In simpler terms, NumPy arrays are a “super-powered list of numbers”
  • NumPy is the backbone of many other libraries in Python, such as pandas and scikit-learn
  • We will also learn about the random module, which generates random numbers
  • NumPy already comes with Anaconda, so you don’t need to install it
  • In case you are using a different Python distribution, you can install NumPy using conda install numpy or pip install numpy
  • The alias for NumPy is np, so you can import it using import numpy as np
  • The random module is part of the Python Standard Library, so you don’t need to install it
  • You can just import it using import random
    • Interestingly, the random module does not have an alias 🤷🏻‍♂️

Let’s get started! 🚀

Visualising data with Matplotlib 📊

Visualising lists with histograms

  • We can use the matplotlib package to create plots
  • The hist() function creates a histogram
  • We can pass a list as an argument to the hist() function
  • We can also customise the plot by adding labels, titles, and changing the colour (more on that later)
  • You print the graph by using the show() function


  • Try it yourself!
  • Create a list with repeated string values (maybe repeat the movies you like a few times?) and compute your own histogram Appendix 01
import matplotlib.pyplot as plt

# Create a new list
list_colours_02 = ["red","yellow","yellow","green","red","red"]
print(list_colours_02)

# Create a histogram of the list of numbers
plt.hist(x = list_colours_02)
plt.show()
['red', 'yellow', 'yellow', 'green', 'red', 'red']

Scatter plots

  • We can also create scatter plots using the scatter() function
  • The scatter() function takes two lists as arguments
    • The first list contains the x-coordinates
    • The second list contains the y-coordinates
  • We use them to visualise the relationship between two continuous variables
  • Here, we will use the xlabel() and ylabel() functions to label the axes
list_numbers = [1,2,3,4,5]
list_numbers_sqr = [1,4,9,16,25]

# Create a scatter plot
plt.scatter(x = list_numbers, y = list_numbers_sqr)
plt.xlabel("A meaningful name for the X-axis") 
plt.ylabel("Favourite name for Y-axis") 
plt.show()

Scatter plots

  • Try it yourself!
  • Create two lists with numbers, then create your own scatter plot Appendix 02

Importing NumPy and Matplotlib

  • As usual, we start by importing the libraries we will use
  • NumPy has several functions
  • For instance, \(ln(x), e^x, sin(x), cos(x), \sqrt{x}\)
  • Remember that exponentiation in Python is done using **, not ^


  • You can check a list of NumPy functions here (there are many!)
# Importing packages
import numpy as np
import matplotlib.pyplot as plt

# log(x) computes the logarithm with base "e" (Euler constant)
# exp(x) compute the Euler constant raised to the power of "x"
# sin(x) computes the sine of x
# cos(x) computes the cosine of x
# In this example, we're substituting x = 1
print(np.log(1))
print(np.exp(1))
print(np.sin(1))
print(np.cos(1))
print(np.sqrt(1))
0.0
2.718281828459045
0.8414709848078965
0.5403023058681398
1.0

Try it yourself! 🧠

  • Create a new variable, \(x = 5\)
  • Compute \(\pi x^2\)
  • Compute \(\frac{1}{\sqrt{2\pi}}e^{-x^2}\)
    • This is the probability density function of the standard normal distribution (don’t worry if you don’t know what it is yet! 🤓) Appendix 03


  • Don’t forget how to exponentiate in Python 😉
x = 10
x ** 5

# Not x^5
100000

Vector arrays with NumPy 📊

Creating arrays from lists

  • NumPy arrays are created using the np.array() function
  • We can create arrays from lists
  • We can also create arrays with a sequence of numbers using np.arange()
  • We can create arrays with zeros or ones using np.zeros() and np.ones()

Creating arrays from lists

  • Create an array from a list

  • \(a = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\)

  • \(b = \begin{pmatrix} 0 \\ 1 \\ 0\end{pmatrix}\)

  • \(c = \begin{pmatrix} 10 \\ 100 \\ 1000 \\ 2000 \\ 5000 \end{pmatrix}\)

  • \(d = \begin{pmatrix} 4 \\ 2 \end{pmatrix}\)

vec_a  = np.array([1,2,3])
vec_b  = np.array([0,1,0])
vec_c  = np.array([10,100,1000,2000,5000])
vec_d  = np.array([4,2])

Accessing an element of an array

  • We can access elements of an array using square brackets []

  • Remember that Python is zero-indexed

  • Access the first and the third element of \(a\)

print(vec_a)
print(vec_a[0])
print(vec_a[2])
[1 2 3]
1
3

Operations with a single array and a scalar

  • We can perform operations with a single array and a scalar
  • For instance, we can add or multiply a scalar to an array


  • Add 2 to each element of \(a\)
  • \(a + 2 = \begin{pmatrix} a_1 + 2 \\ a_2 + 2 \\ a_3 + 2 \end{pmatrix}\)
# Print the original array
print(vec_a)

# Adding 2 to each element of a
print(vec_a + 2)
[1 2 3]
[3 4 5]
  • A scalar refers to either an int or float
  • We can do many common operations with
print(vec_a * 2)
print(vec_a / 2)
print(vec_a + 2)
print(vec_a ** 2)
[2 4 6]
[0.5 1.  1.5]
[3 4 5]
[1 4 9]

Element-by-element addition between two arrays of the same size

\(a + b = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix} +\) \(\begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix} =\) \(\begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \\ a_3 + b_3 \end{pmatrix}\)

print(vec_a)
print(vec_b)

# When you add two arrays of the same size,
# Python adds the individual elements in each position
print(vec_a + vec_b)
[1 2 3]
[0 1 0]
[1 3 3]

Element-by-element multiplication between two arrays of the same size

\(a * b = \begin{pmatrix} a_1 * b_1 \\ a_2 * b_2 \\ a_3 * b_3 \end{pmatrix}\)

print(vec_a)
print(vec_b)

# When you multiply two arrays of the same size,
# Python multiplies the individual elements in each position
print(vec_a * vec_b)

# We can do other similar element-by-element operations
# such as subtraction, and division.
print(vec_a - vec_b)
print(vec_a / vec_b)
[1 2 3]
[0 1 0]
[0 2 0]
[1 1 3]
[inf  2. inf]

Summary statistics 📊

Summary statistics of an array

  • NumPy provides several functions to compute summary statistics of an array
  • For instance, we can compute the mean, median, standard deviation, variance, minimum, and maximum
  • We can also compute the sum, product, and cumulative sum


print(np.mean(vec_a))
print(np.std(vec_a))
print(np.min(vec_a))
print(np.median(vec_a))
print(np.max(vec_a))
2.0
0.816496580927726
1
2.0
3
  • Try it yourself! Compute the mean of

\(e = \begin{pmatrix} 10 \\ 8 \\ 15 \\ 0 \\ 24 \end{pmatrix}\)

Appendix 05

Common pitfall

Make sure that the arrays are of the same size!

print(vec_a)
print(vec_c)

# Print the shape of the arrays
print(vec_a.shape)
print(vec_c.shape)
[1 2 3]
[  10  100 1000 2000 5000]
(3,)
(5,)
# When you add two arrays of different sizes,
# Python will raise an error
print(vec_a + vec_c)

Questions?

Random numbers with Python 🎲

Generating random numbers

  • Why randomness?
    • Simulate different scenarios: high risk or low risk
    • Study properties of a complex system and/or estimator
    • In medicine, randomly assign subjects to treatment or control
    • In finance, simulate stock prices
    • In sports, simulate outcomes of games, etc
  • This code creates a vector of random variables generated from a normal distribution
  • It has the mean “loc” (location) and standard deviation “scale”
  • The number of distinct variabels is “size”
# Generate 10 random variables from a normal distribution
# with mean 0 and standard deviation 1
randomvar_a = np.random.normal(loc=0, scale=1, size=10)
print(randomvar_a)
[-1.13072443 -0.95606555  0.4283258   0.5013118   0.35686527 -0.40947552
 -0.03762921 -0.43229851 -0.63764318 -1.35366981]

Random numbers differ every time!

  • Avoid this problem by drawing from a “pregenerated” set.
  • This is called a seed, and it is set using np.random.seed()
  • This allows for reproducibility of results
np.random.seed(151)

random_var_b = np.random.normal(loc=0, scale=1, size=10)
print(random_var_b)
[-0.63673759  0.53155853  0.99020835 -0.6241344   1.46778078  0.40501276
  1.29817371 -2.61363271  1.35643373  1.87316055]

Compute a histogram with the results

  • We can use the plt.hist() function to compute a histogram
# Create a histogram of the random variable
randomvar_x = np.random.normal(loc=0, scale=1, size=1000)

plt.hist(x = randomvar_x)
plt.xlabel("Variable a")
plt.ylabel("Frequency")
plt.title("Histogram of random variable a")
plt.show()

Try it yourself!

  • Try this again with \(size = 100, 1000, 10000\) and see how the histogram changes Appendix 03

And that’s it for today! 🎉

Summary

  • Today we larned to:
    • Use mathematical functions in NumPy
    • Create arrays from lists
    • Access elements of an array
    • Perform operations with a single array and a scalar
    • Perform element-by-element operations between two arrays
    • Compute summary statistics of an array
    • Generate random numbers
    • Create histograms of random variables
  • In our next lecture, we will learn how to:
    • Introduce boolean types
    • Test different categories of expressions with text and numbers
    • Study if/else statements

Questions?

Thank you and see you next time! 🙏🏼

Appendix 01

Create a list with repeated string values and compute your own histogram

favourite_books = ["The Odyssey", "Don Quijote", "The Illiad", "The Odyssey", "The Illiad", "The Illiad"]
plt.hist(x = favourite_books)
plt.show()

Back to the main text

Appendix 02

Create two lists with numbers, then create your own scatter plot

list_x = [5, 10, 15, 20, 25]
list_y = [10, 20, 30, 40, 50]

plt.scatter(x = list_x, y = list_y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

Back to the main text

Appendix 03

Mathematical functions in NumPy

  • Create a new variable, \(x = 5\)
  • Compute \(\pi x^2\)
  • Compute \(\frac{1}{\sqrt{2\pi}}e^{-x^2}\)
x = 5
print(np.pi * x ** 2)
print(1 / np.sqrt(2 * np.pi) * np.exp(-x ** 2))
78.53981633974483
5.540487995575833e-12

Back to the main text

Appendix 04

Summary statistics of an array

  • Compute the mean of
  • \(e = \begin{pmatrix} 10 \\ 8 \\ 15 \\ 0 \\ 24 \end{pmatrix}\)
vec_e = np.array([10, 8, 15, 0, 24])
print(np.mean(vec_e))
11.4

Back to the main text

Appendix 05

Histogram of a random variable

  • Create a histogram of the random variable
  • Try this again with \(size = 100, 1000, 10000\) and see how the histogram changes
# Create a histogram of the random variable
randomvar_x = np.random.normal(loc=0, scale=1, size=100)

plt.hist(x = randomvar_x)
plt.xlabel("Variable a")
plt.ylabel("Frequency")
plt.title("Histogram of random variable a")
plt.show()

Back to the main text