QTM 350 - Data Science Computing

Lecture 13 - Python Data Types, Boolean Logic, and Control Structures

Danilo Freire

Emory University

16 October, 2024

Nice to see you all again!
How was the break? 🍂

Brief recap of last class 📚

What did we learn?

  • We have learned a lot of things so far! 🤓
  • Command line, Git, GitHub, Quarto, AI tools, and more! 💻
  • We are now ready to start learning a bit of Python! 🐍
  • We will have three sessions on Python
  • Many of you are already familiar with Python, so this will be a review for you
  • For those who are not, don’t worry! We will start from the basics! 😉
  • Let’s get started! 🚀

Installing Python - Anaconda

  • There are several ways to install Python
  • I recommend that you install Anaconda
  • It is free, well-maintained, and comes with many useful packages for data science
  • You can install it on Windows, Mac, and Linux
  • Follow the instructions on the website
  • Our course website has a detailed guide on installing Anaconda and integrating it with VSCode. You can find the tutorial here. Feel free to check it out! 📚
  • If you have any questions, let me know! 😉

Python basics

  • Python is a high-level, interpreted, and general-purpose programming language
  • It is widely used in data science, machine learning, web development, and more
  • We will use it in this course to analyse data, interact with SQL databases, and parallel computing
  • Python is known for its simplicity and readability
print("Hello, world!")
Hello, world!
# Space indentation 
if 5 > 2:
    print("Five is greater than two!")
# Slicing notation
a = "Hello, World!"
print(a[2:5])
llo

Python data types

  • Python has several built-in data types, with the most common being:
    • int (integer): whole numbers (e.g., 1, 2, 3)
    • float (floating-point number): numbers with decimal points (e.g., 1.0, 2.5, 3.14)
    • str (string): text (e.g., “Hello, world!”)
    • bool (boolean): logical values (e.g., True, False)
    • list: ordered, mutable collection of items (e.g., [1, 2, 3])
    • tuple: ordered, immutable collection of items (e.g., (1, 2, 3))
    • dict (dictionary): unordered collection of key-value pairs (e.g., {“name”: “Alice”, “age”: 25})
    • set: unordered collection of unique items (e.g., {1, 2, 3})
  • You can check the type of a variable using the type() function
x = 5 
print(type(x)) 
<class 'int'>
  • You can convert between data types using the int(), float(), str(), bool(), list(), tuple(), dict(), and set() functions
x = 5
y = float(x)
print(y)
5.0

Variables and values in Python

  • Values: pieces of data a computer programme works with
    • Examples: numbers (42), text (“Hello!”)
    • Different types: integers, strings, etc.
  • Variables: names that refer to values
    • In maths/stats: often x, y
    • In Python: more descriptive names possible: age, name, etc
  • Variable naming rules:
    • Must start with a letter or underscore
    • Can include letters, numbers, underscores
    • Cannot be Python reserved words (e.g. for, while, class)
    • Case-sensitive (myVarmyvar)
  • Think of variables as boxes holding information
    • Can contain single numbers, vectors, strings, etc

  • Use assignment operator (=) to assign values to variables
    • Example: x = 42
  • Best practices for variable names:
    • Use descriptive names
    • Avoid overwriting built-in functions or keywords

Arithmetic operators

  • Python supports several arithmetic operators, including:
    • + (addition)
    • - (subtraction)
    • * (multiplication)
    • / (division)
    • // (floor division)
    • % (modulus)
    • ** (exponentiation)
  • Let’s have a go at applying these operators to numeric types and observe the results
1 + 2 + 3 + 4 + 5  # add
15
2 * 3.14159  # multiply
6.28318
2 ** 10  # exponent
1024
  • Division may produce a different dtype than expected, it will change int to float
int_2 = 2
type(int_2)
type(int_2 / int_2)
float
  • The syntax // allows us to do “integer division” (aka “floor division”) and retain the int data type, it always rounds down
int(101 / 2)
50
  • The % “modulus” operator gives us the remainder after division
101 % 2
1

None

  • None is a special constant in Python that represents the absence of a value
  • It is often used to represent missing data or as a placeholder
x = None
print(x)
None
type(x)
NoneType
  • None works with complex data types as well
x = [None, 2, 3]

for i in x:
    print(i)
None
2
3

Strings

  • Text is stored as a data type called a string
  • We can think of a string as a sequence of characters
  • Strings can be enclosed in single quotes ('...') or double quotes ("...")
x = "Hello, world!"
print(x)
Hello, world!
y = 'Hello, world!'
print(y)
Hello, world!
  • There’s no difference between the two methods, but there are cases where having both is useful!
  • We also have triple double quotes, which are typically used for function documentation, e.g., """This function adds two numbers"""
  • If the string contains a quotation or apostrophe, we can use a combination of single and double quotes to define the string
z = "Alice's cat"
print(z)
Alice's cat
quote = 'Donald Knuth: "Premature optimisation is the root of all evil."'
quote
'Donald Knuth: "Premature optimisation is the root of all evil."'

Boolean logic

  • Boolean logic is a branch of algebra that deals with true and false values
  • In Python, we have two boolean values: True and False
the_truth = True
the_truth
True
type(the_truth)
bool
lies = False
lies
False
type(lies)
bool

Comparison operators

  • We can use boolean operators to compare values and return boolean values
  • The main boolean operators are:
    • == (equal)
    • != (not equal)
    • > (greater than)
    • < (less than)
    • >= (greater than or equal to)
    • <= (less than or equal to)
    • is (identity)
    • not (negation)
    • in (membership)
  • Some examples
2 < 3
True
"AI" == "Solve all the world's problems"
False
2 != "2"
True
2 == 2.0
True
2 is 2
True
2 is not 3
True
"AI" in "AI will solve all the world's problems"
True

Boolean operators

  • We can combine boolean values using boolean operators
  • The main boolean operators are:
    • and (logical and)
    • or (logical or)
    • not (logical not)
  • The and operator returns True if both operands are True
  • The or operator returns True if at least one operand is True
  • The not operator returns True if the operand is False
  • Some examples
True and True
True
True and False
False
True or False
True
not True
False
not not True
True
("Python 2" != "Python 3") and (2 <= 3)
True

Lists and tuples

  • Lists and tuples are ordered collections of items
  • Lists are mutable, while tuples are immutable
  • We will start with lists
  • Lists are defined using square brackets ([])
my_list = [1, 2, "THREE", 4, 0.5]
my_list
[1, 2, 'THREE', 4, 0.5]
type(my_list)
list
  • Lists can hold any datatype - even other lists!
another_list = [1, "two", [3, 4, "five"], True, None, {"key": "value"}]
another_list
[1, 'two', [3, 4, 'five'], True, None, {'key': 'value'}]
  • You can get the length of a list using the len() function
len(another_list)
6
  • Tuples look similar to lists, but they are defined using parentheses (()) and are immutable
  • Tuples are often used to store related pieces of information, such as coordinates
today = (1, 2, "THREE", 4, 0.5)
today
(1, 2, 'THREE', 4, 0.5)
type(today)
tuple
len(today)
5

Indexing and slicing

  • We can access individual items in a list or tuple using indexing
  • Indexing starts at 0
  • We can also access multiple items using slicing
  • Slicing uses the syntax start:stop:step
my_list = [1, 2, "THREE", 4, 0.5]
my_list[0]
1
my_list[2]
'THREE'
my_list[-1]
0.5
my_list[1:3]
[2, 'THREE']
  • Note from the above that the start of the slice is inclusive and the end is exclusive

  • So my_list[1:3] fetches elements 1 and 2, but not 3

  • Strings behave the same as lists and tuples when it comes to indexing and slicing

  • Remember, we think of them as a sequence of characters

alphabet = "abcdefghijklmnopqrstuvwxyz"
alphabet[0]
'a'
alphabet[1:3]
'bc'
alphabet[-1]
'z'
alphabet[:5]
'abcde'

List methods

  • Lists have several methods that allow us to manipulate them
  • Some common methods include:
    • append(): add an item to the end of the list
    • insert(): add an item at a specific position
    • remove(): remove an item by value
    • pop(): remove an item by index
    • reverse(): reverse the list
  • Let’s see some examples
my_list = [1, 2, "THREE", 4, 0.5]
my_list.append("six")
my_list
[1, 2, 'THREE', 4, 0.5, 'six']
my_list.insert(2, "two")
my_list
[1, 2, 'two', 'THREE', 4, 0.5, 'six']
my_list.remove("two")
my_list
[1, 2, 'THREE', 4, 0.5, 'six']
my_list.pop(2)
my_list
[1, 2, 4, 0.5, 'six']
my_list.reverse()
my_list
['six', 0.5, 4, 2, 1]

Sets

  • Sets are unordered collections of unique items
  • Sets are defined using curly braces ({})
  • Sets do not allow duplicates
  • We can perform set operations, such as union, intersection, and difference
  • Sets are mutable, but their items must be immutable
  • We can convert lists to sets using the set() function
  • Some examples
my_set = {1, 2, 3, 4, 5}
my_set
{1, 2, 3, 4, 5}
my_set.add(6)
my_set
{1, 2, 3, 4, 5, 6}
my_set.add(6)
my_set
{1, 2, 3, 4, 5, 6}
my_list = [1, 2, 3, 4, 5, 5]
my_set = set(my_list)
my_set
{1, 2, 3, 4, 5}
my_set2 = {4, 5, 6, 7, 8}
my_set.union(my_set2)
{1, 2, 3, 4, 5, 6, 7, 8}
my_set.intersection(my_set2)
{4, 5}

String methods

  • Strings also have several methods for manipulation
  • Some common methods include:
    • upper(): convert to uppercase
    • lower(): convert to lowercase
    • strip(): remove leading and trailing whitespace
    • replace(): replace a substring
    • split(): split the string into a list
  • Let’s see some examples
  • Note that strings are immutable, so these methods return new strings
  • The original string is not modified
x = " Hello, world! "
x.upper()
' HELLO, WORLD! '
x.lower()
' hello, world! '
x.strip()
'Hello, world!'
x.replace("world", "Python")
' Hello, Python! '
x.split(",")
[' Hello', ' world! ']
x.strip().split(" ") # chaining methods
['Hello,', 'world!']

String formatting

  • Python has ways of creating strings by “filling in the blanks” and formatting them nicely
  • This is helpful for when you want to print statements that include variables or statements
  • There are a few ways of doing this but I use and recommend f-strings which were introduced in Python 3.6
  • All you need to do is put the letter f out the front of your string and then you can include variables with curly-bracket notation {}
name = "Alice"
age = 25
f"My name is {name} and I am {age} years old."
'My name is Alice and I am 25 years old.'
name = "Newborn Baby"
age = 4 / 12
day = 20
month = 3
year = 2020
template_new = f"Hello, my name is {name}. I am {age:.2f} years old. I was born on {day}/{month:02}/{year}."
template_new
'Hello, my name is Newborn Baby. I am 0.33 years old. I was born on 20/03/2020.'
  • The :.2f notation is a way of formatting the number to two decimal places
  • The :02 notation is a way of formatting the number to two digits, with leading zeros if necessary
  • More on string formatting here

Dictionaries

  • Dictionaries are unordered collections of key-value pairs
  • Dictionaries are defined using curly braces ({})
  • We can access values using keys
  • Dictionaries are mutable
  • We can convert lists of tuples to dictionaries using the dict() function
my_dict = {"name": "Alice",
           "age": 25, 
           "city": "Atlanta"}
my_dict
{'name': 'Alice', 'age': 25, 'city': 'Atlanta'}
my_dict["name"]
'Alice'
my_dict["age"]
25
my_dict["city"]
'Atlanta'
my_dict["city"] = "New York"
my_dict
{'name': 'Alice', 'age': 25, 'city': 'New York'}
my_dict["country"] = "USA"
my_dict
{'name': 'Alice', 'age': 25, 'city': 'New York', 'country': 'USA'}
my_dict.keys()
dict_keys(['name', 'age', 'city', 'country'])
my_dict.values()
dict_values(['Alice', 25, 'New York', 'USA'])

Empties

  • Sometimes you’ll want to create empty objects that will be filled later on
  • For lists, tuples, and sets, you can create an empty object using [], (), and {} respectively
  • You can also create them using the list(), tuple(), and set() functions
  • For dictionaries, you can create an empty object using {} or dict()
  • For strings, you can create an empty string using "" or str()
empty_list = []
print(empty_list)
[]
empty_list = list()
print(empty_list)
[]
empty_dict = {}
print(empty_dict)
{}
empty_dict = dict()
print(empty_dict)
{}
  • You get the idea! 😉

Conditional statements 🚦

Conditional statements

If, elif, else

  • Conditional statements allow us to execute different blocks of code based on whether a condition is True or False
  • The main conditional statements in Python are:
    • if: execute a block of code if a condition is True
    • elif: execute a block of code if the previous condition is False and the current condition is True
    • else: execute a block of code if all previous conditions are False
    • The syntax is if condition:
    • The code block is defined by indentation
    • You can have multiple elif statements
    • You can have zero or one else statement

If statements

If statements

  • Let’s see an example
name = "Danilo"

if name.lower() == "danilo":
    print("That's my name too!")
elif name.lower() == "juan":
    print("That's a nice name!")
else:
    print(f"Hello {name}! That's a cool name!")
print("Nice to meet you!")
That's my name too!
Nice to meet you!
  • Note that the elif and else blocks are optional
  • You can have as many elif blocks as you want
  • The if statement evaluates the condition name.lower() == "danilo"
  • If the condition is True, the code block under the if statement is executed
  • If the condition is False, the code block under the elif statement is executed
  • If all conditions are False, the code block under the else statement is executed
  • The code block is defined by indentation (4 spaces)
  • The print("Nice to meet you!") statement is executed regardless of the condition
name = "David"

if name.lower() == "danilo":
    print("That's my name too!")
elif name.lower() == "juan":
    print("That's a nice name!")
else:
    print(f"Hello {name}! That's a cool name!")
print("Nice to meet you!")
Hello David! That's a cool name!
Nice to meet you!

Nested if statements

  • You can nest if statements inside other if statements
  • This allows you to create more complex conditions
  • The code block is defined by indentation, as usual
name = "Danilo"
age = 42

if name.lower() == "danilo":
    if age >= 18:
        print("Welcome, Danilo! You are an adult!")
    else:
        print("Welcome, Danilo! You are a minor!")
else:
    print(f"Hello {name}!")
Welcome, Danilo! You are an adult!

Inline if statements

  • You can use inline if statements to execute a single line of code based on a condition
  • The syntax is value = true_value if condition else false_value
words = ["the", "list", "of", "words"]

x = "long list" if len(words) > 10 else "short list"
x
'short list'
  • The code is equivalent to
words = ["the", "list", "of", "words"]

if len(words) > 10:
    x = "long list"
else:
    x = "short list"
x
'short list'

Truth value testing

  • In Python, any object can be tested for truth value
  • The following objects are considered False:
    • None
    • False
    • Zero of any numeric type (e.g., 0, 0.0, 0j)
    • Empty sequences (e.g., [], (), {}, "")
    • Empty sets and dictionaries
  • All other objects are considered True
  • This is useful when you want to check if a variable is empty, such as when you perform some data cleaning and want to remove missing values
  • Let’s see some examples using if statements
x = 0

if x:
    print("True")
else:
    print("False")
False
x = 1

if x:
    print("True")
else:
    print("False")
True

For loops 🔄

For loops

  • For loops allow us to iterate over a sequence of items
for n in [2, 7, -1, 5]:
    print(f"The number is {n} and its square is {n**2}")
print("I'm outside the loop!")
The number is 2 and its square is 4
The number is 7 and its square is 49
The number is -1 and its square is 1
The number is 5 and its square is 25
I'm outside the loop!
  • The syntax is
for item in sequence:
    # code block
  • The code block is defined by indentation (4 spaces)
  • The item variable takes on each value in the sequence in turn
  • The sequence can be a list, tuple, set, dictionary, range, or string
  • Colon : ends the first line of the loop
  • The item variable can be any name you want

For loops and range()

  • A very common pattern is to use for with the range()
  • range() gives you a sequence of integers up to some value (non-inclusive of the end-value) and is typically used for looping
range(5)
range(0, 5)
for i in range(5):
    print(i)
0
1
2
3
4
for i in range(1, 6):
    print(i)
1
2
3
4
5
  • The range() function can take up to three arguments: start, stop, and step
  • The default value for start is 0
  • The default value for step is 1
  • The stop value is non-inclusive
for i in range(1, 10, 2):
    print(i)
1
3
5
7
9
for i in range(10, 1, -2):
    print(i)
10
8
6
4
2
  • The range() function is very useful for creating loops that iterate a specific number of times

Nested for loops

  • You can nest for loops inside other for loops
  • This allows you to create more complex loops or iterate over multi-dimensional data structures
  • The code block is defined by indentation, as usual
for x in [1, 2, 3]:
    for y in ["a", "b", "c"]:
        print((x, y))
(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
(3, 'a')
(3, 'b')
(3, 'c')
  • The print((x, y)) statement is executed for each combination of x and y
  • There are other clever ways of doing these kind of thing in Python
  • When looping over objects, you want want to use zip() or enumerate()
  • More information on these functions here and here

while loops

  • while loops allow us to execute a block of code as long as a condition is True
  • The syntax is while condition:
n = 10

while n > 0:
    print(n)
    n -= 1
print("Blast off!")
10
9
8
7
6
5
4
3
2
1
Blast off!
  • The n -= 1 statement decrements the value of n by 1
  • The loop continues until n is less than or equal to 0
  • Be careful with while loops, as they can run indefinitely if the condition is never False!
  • You can use break to exit a loop prematurely
n = 10

while n > 0:
    print(n)
    if n == 5:
        break
    n -= 1
print("Blast off!")
10
9
8
7
6
5
Blast off!

Comprehensions

  • Comprehensions are a concise way to create lists, sets, and dictionaries based on existing sequences
  • List are the most common, though
  • They provide a more readable and often more efficient alternative to using loops and conditional statements
squares = [x**2 for x in range(10)]
squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  • This is equivalent to
squares = []
for x in range(10):
    squares.append(x**2)
squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  • The syntax is [expression for item in sequence]
  • You can also add an if statement to filter the items
  • The syntax is [expression for item in sequence if condition]
even_squares = [x**2 for x in range(10) if x % 2 == 0]
even_squares
[0, 4, 16, 36, 64]
  • Which is equivalent to
even_squares = []
for x in range(10):
    if x % 2 == 0:
        even_squares.append(x**2)
even_squares
[0, 4, 16, 36, 64]

Functions 🛠️

Functions

  • Functions are the building blocks of Python programmes
  • They allow you to encapsulate code and reuse it
  • Functions are defined using the def keyword
  • The syntax is def function_name(parameters):, then the code block (indented)
def greet(name):
    return f"Hello, {name}!"

greet("Alice")
'Hello, Alice!'
  • The return statement exits the function and returns a value
  • The name variable is a parameter
  • Parameters are the inputs to the function (you can have multiple parameters)
  • The greet("Alice") statement calls the function with the argument "Alice"
def square(n):
    n_squared = n**2
    return n_squared

square(5)
25

Namespaces

  • Python has several types of namespaces
  • The ones that we use more often are:
    • Local namespace: variables defined inside a function
    • Global namespace: variables defined outside a function
    • Built-in namespace: variables that are built into Python (e.g., print(), len())
  • Variables defined inside a function are local to that function and cannot be accessed outside the function
def cat_string(str1, str2):
    string = str1 + str2
    return string

cat_string('My name is ', 'Danilo')
'My name is Danilo'
# print(string) # it will raise an error
  • Global variables can be accessed inside a function
  • However, you should avoid using them if possible, as it can make your code harder to read and debug
## define global variable
my_var = 2
## define function
def add_two(x):
    ## references my_var
    return x + my_var

add_two(2)

## print my_var
print(my_var)
2

Lambda functions

  • Our last topic is lambda functions
  • Lambda functions are small, anonymous functions, which are defined using the lambda keyword
  • They are usually short and simple (only one line of code), and are sometimes used as arguments to higher-order functions
add_two = lambda x: x + 2

add_two(2)
4
  • The syntax is lambda parameters: expression
  • You can have multiple parameters
  • You can also have multiple expressions, but only one expression can be returned
add = lambda x, y: x + y
print(add(3, 5)) 
8
maximum = lambda x, y: x if x > y else y
print(maximum(10, 20))
20

Summary 📚

Summary

  • We covered the very basics of Python programming
  • We learned about data types, operators, and conditional statements
  • We also learned about loops, functions, and comprehensions
  • This is just the tip of the iceberg! 🐍
  • Next steps: practice, practice, practice! 🤓
  • Exercises available here, here, and here
  • Next class, we will review numpy and pandas 📊
  • If you have any questions, feel free to ask! 🙋‍♂️

And that’s it for today! 🎉

Thank you for your attention!
Have a great rest of your day! 🌞