QTM 151 - Introduction to Statistical Computing II

Lecture 09 - Global and Local Variables

Danilo Freire

danilo.freire@emory.edu

Emory University

Hello again! 🥳

Recap of last class 📚

In our last class, we learned

How to write functions with def and return
What paratemers, arguments, and return values are
How to combine functions with if statements
How to use lambda to create quick, throwaway functions

Today’s plan 📅

Today, we will learn about variable scope in Python
Scope is important because it determines the visibility of variables, that is, where you can access them in your code
We will learn about local, enclosing, global, and built-in scopes
We will also learn about the global keyword
We will see how to use the apply and map functions to apply functions to many variables at once
Finally, we will learn about .py files and how to import them as modules

Understanding scope in Python 🧐

What is variable scope?

Scope is the area of a programme where a variable is accessible
Think of scope as a variable’s “visibility” in different parts of your code
Python uses the LEGB rule to determine variable scope:
- Local: Inside the current function
- Enclosing: Inside enclosing/nested functions
- Global: At the top level of the module
- Built-in: In the built-in namespace
The LEGB rule defines the order Python searches for variables

It is easier to understand them with an example:

x = 10  # Global scope

def print_x():
    x = 20  # Local scope
    print(x)  # Prints 20 (local)

print_x()

print(x)  # Prints 10 (global)

Global scope

Variables defined outside a function

Most variables we have seen so far are in the global scope
- Example: x = 10 is a global variable
They are stored in the global namespace and are accessible from anywhere in the code
Global variables are created when you assign them values, and are destroyed when you close Python

message_hello = "hello"
number3       = 3

print(message_hello + " world")
print(number3 * 2)

hello world
6

Global variables can be used in your code, but you should be careful with them when writing functions
The reason is that functions can change the value of global variables, which can lead to unexpected results
It is recommended to include all variables that a function needs as parameters

Global scope

Recommended and not recommended practices

Let’s create a function that sums 3 numbers
\(f(x,y,z) = x + y + z\)
We will pass the numbers as arguments to the function

# Correct example:
def fn_add_recommended(x,y,z):
    return(x + y + z)

print(fn_add_recommended(x = 1, y = 2, z = 5))
print(fn_add_recommended(x = 1, y = 2, z = 10))

8
13

If you do not include the variables as parameters, Python will try to use global variables if they exist

# Incorrect example:
def fn_add_notrecommended(x,y):
    return(x + y + z)

z = 5
print(fn_add_notrecommended(x = 1, y = 2))
z = 10
print(fn_add_notrecommended(x = 1, y = 2))

8
13

del z # Remove variable z from global scope
print(fn_add_notrecommended(x = 1, y = 2))

NameError: name 'z' is not defined

Local scope

Variables defined inside a function

Variables defined inside a function are local to that function
They are not accessible outside the function
Local variables are destroyed when the function returns
If you try to access a local variable outside the function, you will get a NameError
They include parameters and variables created inside the function

Example:
In the code below, x is a local variable to the function print_x()

def print_x():
    x = 20  # Local scope
    print(x)  # Prints 20 (local)

print_x() # Prints 20

print(x)  # NameError: name 'x' is not defined

>>> print_x()
20
>>> print(x)  # NameError: name 'x' is not defined
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

Local variables supercede global variables

Remember the LEGB rule

# This is an example where we define a quadratic function
# (x,y) are both local variables of the function
# 
# When we call the function, only the arguments matter.
# any intermediate value inside the function

def fn_square(x):
    y = x**2
    return(y)

x = 5
y = -5

print(fn_square(x = 1))

print(x)
print(y)

1
5
-5

Local variables are not stored in the working environment

# The following code assigns a global variable x
# Inside the function

x = 5
y = 4

print("Example 1:")
print(fn_square(x = 10))
print(x)
print(y)

print("Example 2:")
print(fn_square(x = 20))
print(x)
print(y)

Example 1:
100
5
4
Example 2:
400
5
4

Permanent changes to global variables

If you want to change a global variable inside a function, you need to use the global keyword
The global keyword tells Python that you want to use the global variable, not create a new local variable

def modify_x():
    global x
    x = x + 5

x = 1

modify_x()
print(x)

I don’t think I have ever used global in my code
It makes the code harder to read and understand
You should avoid it too 😉

Try it out! 🚀

def modify_x():
    global x
    x = x + 5

x = 1

modify_x()
print(x)

What happens if we run the function modify_x() again?
What happens if we add global y inside fn_square?
Appendix 01

Built-in scope

Variables defined in Python

We have also seen many built-in functions in Python, like print(), len(), sum(), etc
They are available in any part of your code, and you don’t need to define them
Python has a list of variables that are always available to prevent you from using the same names
Most of them are error names

print(len("hello"))

m = min([4, 3, 1, 7])
print(m)

5
1

import builtins

# View a list of attributes of a given object with dir()
print(dir(builtins))

['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BaseExceptionGroup', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EncodingWarning', 'EnvironmentError', 'Exception', 'ExceptionGroup', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '__IPYTHON__', '__build_class__', '__debug__', '__doc__', '__import__', '__loader__', '__name__', '__package__', '__spec__', 'abs', 'aiter', 'all', 'anext', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'display', 'divmod', 'enumerate', 'eval', 'exec', 'execfile', 'filter', 'float', 'format', 'frozenset', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'range', 'repr', 'reversed', 'round', 'runfile', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

Enclosing scope

Variables defined in enclosing functions

They are variables defined in enclosing functions
Enclosing functions are functions that contain other functions (nested functions)
Enclosing scope is between local and global scopes in the LEGB rule
They are easier to understand once you understand local and global scopes
We will not use them much in this course

Enclosing scope

Variables defined in enclosing functions

# Define a function that 
# contains another function
def outer():
    x = "outer x" # Local to outer()
    
    # Define a nested function
    def inner():
        x = "inner x" # Local to inner()
        print(x) # Print local to inner()

    inner() # Run inner()
    print(x) # Print local to outer()

outer() # Run outer()

inner x
outer x

# Define a function that 
# contains another function
def outer():
    x = "outer x" # Local to outer()
    
    # Define a nested function
    def inner():
        # x = "inner x" 
        print(x) # No local x, so use enclosing x

    inner() # Run inner()
    print(x) # Print local to outer()

outer() # Run outer()

outer x
outer x

Operations over many variables 🧮

Pandas

pandas is the main library for data manipulation in Python 🐼
We will use it a lot in this course (and in your life as a data scientist!)
It is built on top of numpy and matplotlib, and has a gazillion functions to work with data 😁
If you use R already, think about it as the dplyr of Python
- A list of equivalences between dplyr and pandas
We will learn more about it in the next classes

Applying functions to a dataset

The apply function is used to apply a function to a dataset
- (This course is full of surprises, isn’t it? 😄)
It is a method of a pandas DataFrame
It can be used with built-in functions, custom functions, or lambda functions
- df.apply(function)
You can apply functions to rows or columns
- df.apply(function, axis=0) applies the function to each column (default)
- df.apply(function, axis=1) applies the function to each row

Applying functions to a dataset

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

print(df.apply(np.sqrt))

          A         B         C
0  1.000000  2.000000  2.645751
1  1.414214  2.236068  2.828427
2  1.732051  2.449490  3.000000

print(df.apply(np.sum, axis=1))

0    12
1    15
2    18
dtype: int64

print(df.apply(lambda x: x**2))

   A   B   C
0  1  16  49
1  4  25  64
2  9  36  81

Applying functions to a dataset

Let’s do a quick exercise

# Create an empty DataFrame
data = pd.DataFrame()

# Add variables
data["age"] = [18,29,15,32,6]
data["num_underage_siblings"] = [0,0,1,1,0]
data["num_adult_siblings"] = [1,0,0,1,0]

display(data)

	age	num_underage_siblings	num_adult_siblings
0	18	0	1
1	29	0	0
2	15	1	0
3	32	1	1
4	6	0	0

Now let’s define some functions

# The first two functions return True/False depending on age constraints
# The third function returns the sum of two numbers
# The fourth function returns a string with the age bracket

fn_iseligible_vote = lambda age: age >= 18

fn_istwenties = lambda age: (age >= 20) & (age < 30)

fn_sum = lambda x,y: x + y

def fn_agebracket(age):
    if (age >= 18):
        status = "Adult"
    elif (age >= 10) & (age < 18):
        status = "Adolescent"
    else:
        status = "Child"
    return(status)

Applying functions to a dataset

Now let’s apply the functions to the data["age"] column

data["can_vote"]    = data["age"].apply(fn_iseligible_vote)
data["in_twenties"] = data["age"].apply(fn_istwenties)
data["age_bracket"] = data["age"].apply(fn_agebracket)

display(data)

	age	num_underage_siblings	num_adult_siblings	can_vote	in_twenties	age_bracket
0	18	0	1	True	False	Adult
1	29	0	0	True	True	Adult
2	15	1	0	False	False	Adolescent
3	32	1	1	True	False	Adult
4	6	0	0	False	False	Child

Creating a new variable

You can also create a new variable using the apply function

# Creating a new variable
data["new_var"] = data["age"].apply(lambda age: age >= 18)

display(data)

	age	num_underage_siblings	num_adult_siblings	can_vote	in_twenties	age_bracket	new_var
0	18	0	1	True	False	Adult	True
1	29	0	0	True	True	Adult	True
2	15	1	0	False	False	Adolescent	False
3	32	1	1	True	False	Adult	True
4	6	0	0	False	False	Child	False

Deleting a variable

You can also delete a variable using the drop function

data = data.drop(columns = ["new_var"])

display(data)

	age	num_underage_siblings	num_adult_siblings	can_vote	in_twenties	age_bracket
0	18	0	1	True	False	Adult
1	29	0	0	True	True	Adult
2	15	1	0	False	False	Adolescent
3	32	1	1	True	False	Adult
4	6	0	0	False	False	Child

Mapping functions to a list, array, or series

The map function is used to apply a function to a list, an array, or a series
- A series is a single column of a pandas DataFrame
In pandas, map works very similarly to the apply function, and they are interchangeable when working with series
map can be faster than apply for simple functions, but apply is more flexible as it can be used with DataFrames (many columns)
However, if you are using regular lists (e.g., list01 = [1,2,3]), you should use map instead of apply
- apply is not a built-in Python function

data["age_bracket01"] = data["age"].map(fn_agebracket)

display(data[["age","age_bracket01"]])

	age	age_bracket01
0	18	Adult
1	29	Adult
2	15	Adolescent
3	32	Adult
4	6	Child

data["age_bracket02"] = data["age"].apply(fn_agebracket)

display(data[["age","age_bracket02"]])

	age	age_bracket02
0	18	Adult
1	29	Adult
2	15	Adolescent
3	32	Adult
4	6	Child

Mapping functions to a list, array, or series

Using map with a list and an array

# Create a list
list01 = [1,2,3,4,5]

# Map a function to the list
list02 = list(map(lambda x: x**2, list01))

print(list02)

[1, 4, 9, 16, 25]

# Create a numpy array
array01 = np.array([1,2,3,4,5])

# Map a function to the array
array02 = np.array(list(map(lambda x: x**2, array01)))

print(array02)

[ 1  4  9 16 25]

Trying to use apply with a list or an array will raise an error

# Create a list
list01 = [1,2,3,4,5]

# Apply a function to the list
list02 = list(apply(lambda x: x**2, list01))

print(list02)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[168], line 5
      2 list01 = [1,2,3,4,5]
      4 # Apply a function to the list
----> 5 list02 = list(apply(lambda x: x**2, list01))
      7 print(list02)

NameError: name 'apply' is not defined

Try it yourself! 🚀

Write a lambda function checking whether num_siblings \(\ge 1\)
Add a variable to the dataset called has_siblings
Assign True/False to this variable using apply()
Appendix 02

Try it yourself! 🚀

Read the car dataset data_raw/features.csv
Create a function that tests whether mpg \(\ge\) 29
Add a variable mpg_above_29 which is True/False if mpg \(\ge\) 29
Store the new dataset to data_clean/features.csv
Appendix 03

Try it yourself! 🚀

Last exercise of the day! 🏁

Create a lambda function with arguments {fruit,color}
The function returns the string "A {fruit} is {color}"
Create the following two lists:
- list_fruits = ["banana","strawberry","kiwi"]
- list_colors = ["yellow","red","green"]
Use the list(map()) function to output a list with the form
["A banana is yellow","A strawberry is red","A kiwi is green"]
Appendix 04

Importing modules 📦

Importing modules

What is a module?

While .ipynb files are great for learning and teaching, they are not the best for sharing code
When you write a lot of functions, you should save them in a .py file, which is a Python script
A Python script, or module, is just a file containing Python code
This code can be functions, classes, or variables
A folder containing Python scripts is called a package
You can import modules to use their code in your own code

We can import functions into the working environment from a file

# Import the folder `scripts` as a package
# And the file `example_functions.py` as `ef`
import scripts.example_functions as ef

print(ef.fn_quadratic(2))
print(ef.fn_cubic(3))

ef.message_hello("Juan")

4
27

'hi Juan'

Importing modules

Importing variables

You can also import variables from a module
However, it is not recommended to import variables
It is better to import functions and use them to create variables
This is because variables can be changed in the module, leading to unexpected results

Example:

import scripts.example_variables as ev

# When we run this code
# the value of alpha will be overwritten

alpha = 1
print(alpha)
print(ev.alpha)

from scripts.example_variables import *

print(alpha)
print(beta)
print(gamma)
print(delta)

And that’s it for today! 🎉

Thanks very much! 😊

Appendix 01

def modify_x():
    global x
    x = x + 5

x = 1

# Now, running the function 
# will permanently increase x by 5.

modify_x()
print(x)
modify_x()
print(x)

6
11

def fn_square(x):
    global y
    y = x**2
    return(y)

x = 5
y = -5

print("Example 1:")
print(fn_square(x = 10))
print(x)
print(y)

Example 1:
100
5
100

Back to exercise 01

Appendix 02

Write a lambda function checking whether num_siblings \(\ge 1\)
Add a variable to the dataset called has_siblings
Assign True/False to this variable using apply()

fn_has_siblings = lambda num_siblings: num_siblings >= 1

data["has_siblings"] = data["num_adult_siblings"].apply(fn_has_siblings)

display(data[["num_adult_siblings","has_siblings"]])

	num_adult_siblings	has_siblings
0	1	True
1	0	False
2	0	False
3	1	True
4	0	False

Back to exercise 02

Appendix 03

Read the car dataset data_raw/features.csv
Create a function that tests whether mpg \(\ge\) 29
Add a variable mpg_above_29 which is True/False if mpg \(\ge\) 29
Store the new dataset to data_clean/features.csv

data_raw = pd.read_csv("data_raw/features.csv")

data_raw["mpg_above_29"] = data_raw["mpg"].apply(lambda mpg: mpg >= 29)

display(data_raw[["mpg","mpg_above_29"]])

data_raw.to_csv("data_clean/features.csv", index = False)

	mpg	mpg_above_29
0	18.0	False
1	15.0	False
2	18.0	False
3	16.0	False
4	17.0	False
...	...	...
393	27.0	False
394	44.0	True
395	32.0	True
396	28.0	False
397	31.0	True

398 rows × 2 columns

Back to exercise 03

Appendix 04

Create a lambda function with arguments {fruit,color}
The function returns the string "A {fruit} is {color}"
Create the following two lists:
- list_fruits = ["banana","strawberry","kiwi"]
- list_colors = ["yellow","red","green"]
Use the list(map()) function to output a list with the form
["A banana is yellow","A strawberry is red","A kiwi is green"]

fn_fruitcolor = lambda fruit, color: print("A " + fruit + " is " + color)

list_fruits  = ["banana","strawberry","kiwi"]
list_colors  = ["yellow","red","green"]

list(map(fn_fruitcolor, list_fruits, list_colors))

A banana is yellow
A strawberry is red
A kiwi is green

[None, None, None]

Back to exercise 04