QTM 151 - Introduction to Statistical Computing II

Lecture 09 - Global and Local Variables

Danilo Freire

Emory University

30 September, 2024

Hello again! 🥳

Recap of last class 📚

In our last class, we learned

  • How to write functions with def and return
  • What paratemers, arguments, and return values are
  • How to combine functions with if statements
  • How to use lambda to create quick, throwaway functions

Today’s plan 📅

  • Today, we will learn about variable scope in Python
  • Scope is important because it determines the visibility of variables, that is, where you can access them in your code
  • We will learn about local, enclosing, global, and built-in scopes
  • We will also learn about the global keyword
  • We will see how to use the apply and map functions to apply functions to many variables at once
  • Finally, we will learn about .py files and how to import them as modules

Understanding scope in Python 🧐

What is variable scope?

  • Scope is the area of a programme where a variable is accessible
  • Think of scope as a variable’s “visibility” in different parts of your code
  • Python uses the LEGB rule to determine variable scope:
    • Local: Inside the current function
    • Enclosing: Inside enclosing/nested functions
    • Global: At the top level of the module
    • Built-in: In the built-in namespace
  • The LEGB rule defines the order Python searches for variables
  • It is easier to understand them with an example:
x = 10  # Global scope

def print_x():
    x = 20  # Local scope
    print(x)  # Prints 20 (local)

print_x()

print(x)  # Prints 10 (global)

Global scope

Variables defined outside a function

  • Most variables we have seen so far are in the global scope
    • Example: x = 10 is a global variable
  • They are stored in the global namespace and are accessible from anywhere in the code
  • Global variables are created when you assign them values, and are destroyed when you close Python
message_hello = "hello"
number3       = 3

print(message_hello + " world")
print(number3 * 2)
hello world
6
  • Global variables can be used in your code, but you should be careful with them when writing functions
  • The reason is that functions can change the value of global variables, which can lead to unexpected results
  • It is recommended to include all variables that a function needs as parameters

Global scope

  • Let’s create a function that sums 3 numbers
  • \(f(x,y,z) = x + y + z\)
  • We will pass the numbers as arguments to the function
# Correct example:
def fn_add_recommended(x,y,z):
    return(x + y + z)

print(fn_add_recommended(x = 1, y = 2, z = 5))
print(fn_add_recommended(x = 1, y = 2, z = 10))
8
13
  • If you do not include the variables as parameters, Python will try to use global variables if they exist
# Incorrect example:
def fn_add_notrecommended(x,y):
    return(x + y + z)

z = 5
print(fn_add_notrecommended(x = 1, y = 2))
z = 10
print(fn_add_notrecommended(x = 1, y = 2)) 
8
13
del z # Remove variable z from global scope
print(fn_add_notrecommended(x = 1, y = 2)) 
NameError: name 'z' is not defined

Local scope

Variables defined inside a function

  • Variables defined inside a function are local to that function
  • They are not accessible outside the function
  • Local variables are destroyed when the function returns
  • If you try to access a local variable outside the function, you will get a NameError
  • They include parameters and variables created inside the function
  • Example:
  • In the code below, x is a local variable to the function print_x()
def print_x():
    x = 20  # Local scope
    print(x)  # Prints 20 (local)

print_x() # Prints 20

print(x)  # NameError: name 'x' is not defined
>>> print_x()
20
>>> print(x)  # NameError: name 'x' is not defined
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

Local variables supercede global variables

Remember the LEGB rule

# This is an example where we define a quadratic function
# (x,y) are both local variables of the function
# 
# When we call the function, only the arguments matter.
# any intermediate value inside the function

def fn_square(x):
    y = x**2
    return(y)

x = 5
y = -5

print(fn_square(x = 1))

print(x)
print(y)
1
5
-5

Local variables are not stored in the working environment

# The following code assigns a global variable x
# Inside the function

x = 5
y = 4

print("Example 1:")
print(fn_square(x = 10))
print(x)
print(y)

print("Example 2:")
print(fn_square(x = 20))
print(x)
print(y)
Example 1:
100
5
4
Example 2:
400
5
4

Permanent changes to global variables

  • If you want to change a global variable inside a function, you need to use the global keyword
  • The global keyword tells Python that you want to use the global variable, not create a new local variable
def modify_x():
    global x
    x = x + 5

x = 1

modify_x()
print(x)
6
  • I don’t think I have ever used global in my code
  • It makes the code harder to read and understand
  • You should avoid it too 😉

Try it out! 🚀

def modify_x():
    global x
    x = x + 5

x = 1

modify_x()
print(x)
6
  • What happens if we run the function modify_x() again?
  • What happens if we add global y inside fn_square?
  • Appendix 01

Built-in scope

Variables defined in Python

  • We have also seen many built-in functions in Python, like print(), len(), sum(), etc
  • They are available in any part of your code, and you don’t need to define them
  • Python has a list of variables that are always available to prevent you from using the same names
  • Most of them are error names
print(len("hello"))

m = min([4, 3, 1, 7])
print(m)
5
1
import builtins

# View a list of attributes of a given object with dir()
print(dir(builtins))
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BaseExceptionGroup', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EncodingWarning', 'EnvironmentError', 'Exception', 'ExceptionGroup', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '__IPYTHON__', '__build_class__', '__debug__', '__doc__', '__import__', '__loader__', '__name__', '__package__', '__spec__', 'abs', 'aiter', 'all', 'anext', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'display', 'divmod', 'enumerate', 'eval', 'exec', 'execfile', 'filter', 'float', 'format', 'frozenset', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'range', 'repr', 'reversed', 'round', 'runfile', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

Enclosing scope

Variables defined in enclosing functions

  • They are variables defined in enclosing functions
  • Enclosing functions are functions that contain other functions (nested functions)
  • Enclosing scope is between local and global scopes in the LEGB rule
  • They are easier to understand once you understand local and global scopes
  • We will not use them much in this course

Enclosing scope

Variables defined in enclosing functions

# Define a function that 
# contains another function
def outer():
    x = "outer x" # Local to outer()
    
    # Define a nested function
    def inner():
        x = "inner x" # Local to inner()
        print(x) # Print local to inner()

    inner() # Run inner()
    print(x) # Print local to outer()

outer() # Run outer()
inner x
outer x
# Define a function that 
# contains another function
def outer():
    x = "outer x" # Local to outer()
    
    # Define a nested function
    def inner():
        # x = "inner x" 
        print(x) # No local x, so use enclosing x

    inner() # Run inner()
    print(x) # Print local to outer()

outer() # Run outer()
outer x
outer x

Operations over many variables 🧮

Pandas

  • pandas is the main library for data manipulation in Python 🐼
  • We will use it a lot in this course (and in your life as a data scientist!)
  • It is built on top of numpy and matplotlib, and has a gazillion functions to work with data 😁
  • If you use R already, think about it as the dplyr of Python
  • We will learn more about it in the next classes

Applying functions to a dataset

  • The apply function is used to apply a function to a dataset
    • (This course is full of surprises, isn’t it? 😄)
  • It is a method of a pandas DataFrame
  • It can be used with built-in functions, custom functions, or lambda functions
    • df.apply(function)
  • You can apply functions to rows or columns
    • df.apply(function, axis=0) applies the function to each column (default)
    • df.apply(function, axis=1) applies the function to each row

Applying functions to a dataset

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

print(df.apply(np.sqrt))
          A         B         C
0  1.000000  2.000000  2.645751
1  1.414214  2.236068  2.828427
2  1.732051  2.449490  3.000000
print(df.apply(np.sum, axis=1))
0    12
1    15
2    18
dtype: int64
print(df.apply(lambda x: x**2))
   A   B   C
0  1  16  49
1  4  25  64
2  9  36  81

Applying functions to a dataset

  • Let’s do a quick exercise
# Create an empty DataFrame
data = pd.DataFrame()

# Add variables
data["age"] = [18,29,15,32,6]
data["num_underage_siblings"] = [0,0,1,1,0]
data["num_adult_siblings"] = [1,0,0,1,0]

display(data)
age num_underage_siblings num_adult_siblings
0 18 0 1
1 29 0 0
2 15 1 0
3 32 1 1
4 6 0 0
  • Now let’s define some functions
# The first two functions return True/False depending on age constraints
# The third function returns the sum of two numbers
# The fourth function returns a string with the age bracket

fn_iseligible_vote = lambda age: age >= 18

fn_istwenties = lambda age: (age >= 20) & (age < 30)

fn_sum = lambda x,y: x + y

def fn_agebracket(age):
    if (age >= 18):
        status = "Adult"
    elif (age >= 10) & (age < 18):
        status = "Adolescent"
    else:
        status = "Child"
    return(status)

Applying functions to a dataset

  • Now let’s apply the functions to the data["age"] column
data["can_vote"]    = data["age"].apply(fn_iseligible_vote)
data["in_twenties"] = data["age"].apply(fn_istwenties)
data["age_bracket"] = data["age"].apply(fn_agebracket)

display(data)
age num_underage_siblings num_adult_siblings can_vote in_twenties age_bracket
0 18 0 1 True False Adult
1 29 0 0 True True Adult
2 15 1 0 False False Adolescent
3 32 1 1 True False Adult
4 6 0 0 False False Child

Creating a new variable

  • You can also create a new variable using the apply function
# Creating a new variable
data["new_var"] = data["age"].apply(lambda age: age >= 18)

display(data)
age num_underage_siblings num_adult_siblings can_vote in_twenties age_bracket new_var
0 18 0 1 True False Adult True
1 29 0 0 True True Adult True
2 15 1 0 False False Adolescent False
3 32 1 1 True False Adult True
4 6 0 0 False False Child False

Deleting a variable

  • You can also delete a variable using the drop function
data = data.drop(columns = ["new_var"])

display(data)
age num_underage_siblings num_adult_siblings can_vote in_twenties age_bracket
0 18 0 1 True False Adult
1 29 0 0 True True Adult
2 15 1 0 False False Adolescent
3 32 1 1 True False Adult
4 6 0 0 False False Child

Mapping functions to a list, array, or series

  • The map function is used to apply a function to a list, an array, or a series
    • A series is a single column of a pandas DataFrame
  • In pandas, map works very similarly to the apply function, and they are interchangeable when working with series
  • map can be faster than apply for simple functions, but apply is more flexible as it can be used with DataFrames (many columns)
  • However, if you are using regular lists (e.g., list01 = [1,2,3]), you should use map instead of apply
    • apply is not a built-in Python function
data["age_bracket01"] = data["age"].map(fn_agebracket)

display(data[["age","age_bracket01"]])
age age_bracket01
0 18 Adult
1 29 Adult
2 15 Adolescent
3 32 Adult
4 6 Child
data["age_bracket02"] = data["age"].apply(fn_agebracket)

display(data[["age","age_bracket02"]])
age age_bracket02
0 18 Adult
1 29 Adult
2 15 Adolescent
3 32 Adult
4 6 Child

Mapping functions to a list, array, or series

  • Using map with a list and an array
# Create a list
list01 = [1,2,3,4,5]

# Map a function to the list
list02 = list(map(lambda x: x**2, list01))

print(list02)
[1, 4, 9, 16, 25]
# Create a numpy array
array01 = np.array([1,2,3,4,5])

# Map a function to the array
array02 = np.array(list(map(lambda x: x**2, array01)))

print(array02)
[ 1  4  9 16 25]
  • Trying to use apply with a list or an array will raise an error
# Create a list
list01 = [1,2,3,4,5]

# Apply a function to the list
list02 = list(apply(lambda x: x**2, list01))

print(list02)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[168], line 5
      2 list01 = [1,2,3,4,5]
      4 # Apply a function to the list
----> 5 list02 = list(apply(lambda x: x**2, list01))
      7 print(list02)

NameError: name 'apply' is not defined

Try it yourself! 🚀

  • Write a lambda function checking whether num_siblings \(\ge 1\)
  • Add a variable to the dataset called has_siblings
  • Assign True/False to this variable using apply()
  • Appendix 02

Try it yourself! 🚀

  • Read the car dataset data_raw/features.csv
  • Create a function that tests whether mpg \(\ge\) 29
  • Add a variable mpg_above_29 which is True/False if mpg \(\ge\) 29
  • Store the new dataset to data_clean/features.csv
  • Appendix 03

Try it yourself! 🚀

Last exercise of the day! 🏁

  • Create a lambda function with arguments {fruit,color}
  • The function returns the string "A {fruit} is {color}"
  • Create the following two lists:
    • list_fruits = ["banana","strawberry","kiwi"]
    • list_colors = ["yellow","red","green"]
  • Use the list(map()) function to output a list with the form
  • ["A banana is yellow","A strawberry is red","A kiwi is green"]
  • Appendix 04

Importing modules 📦

Importing modules

What is a module?

  • While .ipynb files are great for learning and teaching, they are not the best for sharing code
  • When you write a lot of functions, you should save them in a .py file, which is a Python script
  • A Python script, or module, is just a file containing Python code
  • This code can be functions, classes, or variables
  • A folder containing Python scripts is called a package
  • You can import modules to use their code in your own code
  • We can import functions into the working environment from a file
# Import the folder `scripts` as a package
# And the file `example_functions.py` as `ef`
import scripts.example_functions as ef

print(ef.fn_quadratic(2))
print(ef.fn_cubic(3))

ef.message_hello("Juan")
4
27
'hi Juan'

Importing modules

Importing variables

  • You can also import variables from a module
  • However, it is not recommended to import variables
  • It is better to import functions and use them to create variables
  • This is because variables can be changed in the module, leading to unexpected results
  • Example:
import scripts.example_variables as ev

# When we run this code
# the value of alpha will be overwritten

alpha = 1
print(alpha)
print(ev.alpha)

from scripts.example_variables import *

print(alpha)
print(beta)
print(gamma)
print(delta)
1
5
5
10
20
100

And that’s it for today! 🎉

Thanks very much! 😊

Appendix 01

def modify_x():
    global x
    x = x + 5

x = 1

# Now, running the function 
# will permanently increase x by 5.

modify_x()
print(x)
modify_x()
print(x)
6
11
def fn_square(x):
    global y
    y = x**2
    return(y)

x = 5
y = -5

print("Example 1:")
print(fn_square(x = 10))
print(x)
print(y)
Example 1:
100
5
100

Back to exercise 01

Appendix 02

  • Write a lambda function checking whether num_siblings \(\ge 1\)
  • Add a variable to the dataset called has_siblings
  • Assign True/False to this variable using apply()
fn_has_siblings = lambda num_siblings: num_siblings >= 1

data["has_siblings"] = data["num_adult_siblings"].apply(fn_has_siblings)

display(data[["num_adult_siblings","has_siblings"]])
num_adult_siblings has_siblings
0 1 True
1 0 False
2 0 False
3 1 True
4 0 False

Back to exercise 02

Appendix 03

  • Read the car dataset data_raw/features.csv
  • Create a function that tests whether mpg \(\ge\) 29
  • Add a variable mpg_above_29 which is True/False if mpg \(\ge\) 29
  • Store the new dataset to data_clean/features.csv
data_raw = pd.read_csv("data_raw/features.csv")

data_raw["mpg_above_29"] = data_raw["mpg"].apply(lambda mpg: mpg >= 29)

display(data_raw[["mpg","mpg_above_29"]])

data_raw.to_csv("data_clean/features.csv", index = False)
mpg mpg_above_29
0 18.0 False
1 15.0 False
2 18.0 False
3 16.0 False
4 17.0 False
... ... ...
393 27.0 False
394 44.0 True
395 32.0 True
396 28.0 False
397 31.0 True

398 rows × 2 columns

Back to exercise 03

Appendix 04

  • Create a lambda function with arguments {fruit,color}
  • The function returns the string "A {fruit} is {color}"
  • Create the following two lists:
    • list_fruits = ["banana","strawberry","kiwi"]
    • list_colors = ["yellow","red","green"]
  • Use the list(map()) function to output a list with the form
  • ["A banana is yellow","A strawberry is red","A kiwi is green"]
fn_fruitcolor = lambda fruit, color: print("A " + fruit + " is " + color)

list_fruits  = ["banana","strawberry","kiwi"]
list_colors  = ["yellow","red","green"]

list(map(fn_fruitcolor, list_fruits, list_colors))
A banana is yellow
A strawberry is red
A kiwi is green
[None, None, None]

Back to exercise 04