QTM 350 - Data Science Computing

Lecture 02: Computational Literacy

Danilo Freire

Department of Quantitative Theory and Methods
Emory University

22 January, 2025

Recap and lecture overview 📚

Course information

  • Instructor: Danilo Freire
  • Lectures: Mondays and Wednesdays, 2:30-3:45pm
  • Office hours: At any time, just send me an email in advance
  • Please remember to check the course repository regularly for updates and announcements 😉

Course information

  • The course focuses on three key areas of data science: reliability, reproducibility, and robustness
  • Main topics: command line and shell scripting (terminal), version control (git and GitHub), reproducible reports (Quarto and Jupyter Notebooks), data wrangling and storage (Python and SQL), data visualisation (Python), AI-paired programming (Copilot), introduction to containers (Docker), and parallel computing (Python)
  • Grading:
    • 50% assignments (10x)
    • 30% in-class quizzes (5x)
    • 20% final project
  • You can discuss assignments with your classmates, but please submit your own work
  • AI is allowed in all assignments and quizzes
  • Late submissions will be penalised by 10% per day
  • To accommodate any challenges, I will drop the lowest assignment and quiz grades
  • Additional information will be made available on the course repository

Questions about the course organisation?

Software installation

  • As we discussed in the first lecture, you will need to install the following software:

  • A terminal

    • You can install WSL if you are using Windows. Or you can use VS Code’s built-in terminal
    • Mac users already have a terminal. I suggest you install iTerm2, Homebrew and Oh My Zsh for a better experience
    • Linux users are good to go

Tip

Learning objectives

By the end of this lecture, you will be able to:

  1. Learn how computers work from the ground up, starting with binary code
  2. Get familiar with other key computer encodings like hexadecimal, ASCII, and Unicode
  3. Learn about the pioneers of computing and the development of Assembly language
  4. Understand the difference between low-level and high-level programming languages and when to use each

Let’s get started! 🚀 💻

Brief history of computing

The first computers

  • Historically, a computer was a person who makes calculations, especially with a calculating machine
  • To do calculations we use numbers. How to represent them?

Four-species mechanical calculators

Silicon-based computers

The 1970s marked the transition from mechanical to electronic:

Von Neumann Architecture

Von Neumann Architecture

  • Advantages:
    • Efficient memory use, with less need for separate areas
    • Flexibility in data storage and manipulation
    • Simplicity in design and operation
  • Disadvantages:
    • Von Neumann bottleneck: Limits computing performance due to sequential processing of instructions and data through a single bus
    • The CPU often waits for data due to its faster processing speed compared to memory transfer rates
    • Harvard architecture is an alternative that separates data and instruction memory

Data representation

Computers run on 0s and 1s

  • Computers represent everything by using 0s and 1s
  • Transistors act as switches, with 1 for high voltage level and 0 for low voltage level
  • Computers use binary because transistors are easy to fabricate in silicon and can be densely packed on a chip
  • But how does this work?
  • How can we represent text, images, and videos using only 0s and 1s?
  • This leads us to abstraction: representing ideas at different levels of detail by identifying what is essential
  • We will use abstraction to translate 0s and 1s to decimal numbers, then translate those numbers to other types

Converting coins to dollars

  • We can convert between number systems by translating a value from one system to the other
  • For example, the coins on the left represent the same value as $0.87
  • Using pictures is clunky. Let’s make a new representation system for coins

Converting coins to dollars

  • To represent coins, we will make a number with four digits
  • The first represents quarters, the second dimes, the third nickels, and the fourth pennies
    • c3102 =
    • 3 x $0.25 + 1 x $0.10 + 0 x $0.05 + 2 x $0.01 =
    • $0.87

Converting dollars to coins

  • How do we convert money from dollars to coins? Assume we want to minimise the number of coins used

  • For example, what is $0.59 in coin representation? Use the same four-digit system: quarters, dimes, nickels, and pennies

  • $0.59 = 2 x $0.25 + 0 x $0.10 + 1 x $0.05 + 4 x $0.01 = c2014

Quick questions!

  • Think-Pair-Share: do the following conversions

  • What is c1112 in dollars?

  • What is $0.61 in coin representation?

Solutions:

  • c1112 = $0.42 = 1 x $0.25 + 1 x $0.10 + 1 x $0.05 + 2 x $0.01

  • $0.61 = c2101 = 2 x $0.25 + 1 x $0.10 + 0 x $0.05 + 1 x $0.01

Number systems – binary

  • Now let us go back to computers! 💻

  • We can represent numbers using only 0s and 1s with the binary number system

  • Instead of counting the number of 1s, 5s, 10s, and 25s coins you need, count the number of 1s, 2s, 4s, 8s, etc

  • Why these numbers? They are powers of 2. This is a number in base 2

  • A single binary digit is a bit, e.g., 101 has three bits

  • An 8-bit group is called a byte, e.g., 10101010

  • Binary numbers grow as follows:

    • 0 represents zero
    • 1 represents one
    • 10 represents two
    • 100 represents four
    • 1000 represents eight, and so on…

Quick question!

  • Think-Pair-Share: what is the binary representation of the decimal number 3?
  1. 101
  1. 11
  1. 111
  1. 010

Your turn!

Practice Exercise 01:

  1. What binary number represents 5?

  2. What binary number represents 7?

  3. What binary number represents 9?

  4. What binary number represents 11?

Convert binary to decimal

To convert a binary number to decimal, just add each power of 2 that is represented by a 1.

  • For example, 00011000 = 16 + 8 = 24
128 64 32 16 8 4 2 1
0 0 0 1 1 0 0 0


  • Another example: 10010001 = 128 + 16 + 1 = 145
128 64 32 16 8 4 2 1
1 0 0 1 0 0 0 1

So far, so good? 😃

Binary and abstraction

Binary and abstraction


  • Now that we can represent numbers using binary, we can represent everything computers store using binary
  • We just need to use abstraction to interpret bits or numbers in particular ways
  • Let us consider colours, images, and text

Images as collections of colours

  • What if we want to represent an image? How can we convert that to numbers?
  • First, break the image down into a grid of colours, where each dot of color has a distinct hue
  • A dot of color in this context is called a pixel
  • Now we just need to represent a single color (a pixel) as a number!

Images as collections of colours

RGB colour model

  • The RGB colour model is widely used in digital displays
  • Each pixel is represented by three numbers, each ranging from 0 to 255
  • The first number represents the amount of red, the second the amount of green, and the third the amount of blue
  • 00000000 is no r/g/b and 11111111 is very r/g/b!
  • You can try different colours here

Number systems – Hexadecimal

What is hexadecimal?

  • When we represent values with multiple bytes, it can be hard to distinguish where numbers begin and end
  • Hexadecimal is a number system with 16 digits: 0123456789ABCDEF
  • It is used to represent binary numbers in a more compact way
  • Each hex digit corresponds to 4 binary bits, making it a shorthand for binary:
    • 0000 = 0
    • 0001 = 1
    • 0010 = 2
    • 1110 = E
    • 1111 = F

Binary to hex conversion

  • Convert binary to hex by grouping into blocks of four bits.
  • Example: Binary 1001 1110 0000 1010 converts to Hex 9E0A.


Practice Exercise 02:

  1. Convert the decimal number 13 to binary.

  2. Convert the decimal number 13 to hexadecimal.

Hexadecimal and HTML

Hex and RGB

  • HTML uses hexadecimal to represent colours

  • Six-digit hex numbers specify colours:

    • FFFFFF = White
    • 000000 = Black
  • Each pair of digits represents a colour component (RGB).

  • Each color channel typically has a range from 0 to 255 (in 8-bit systems), which gives a total of 256 intensity levels for each primary color.

  • When you combine the three channels, you get a possible color palette of \(256^3\) or about 16.7 million colours

Represent text as individual characters

Characters and glyphs

  • Next, how do we represent text?
  • First, we break it down into smaller parts, like with images. In this case, we can break text down into individual characters
  • A character is the smallest component of text, like A, B, or /.
  • A glyph is the graphical representation of a character.
  • In programming, the display of glyphs is typically handled by GUI (Graphical User Interface) toolkits or font renderers

Represent text as individual characters

Lookup tables

  • For example, the text “Hello World” becomes H, e, l, l, o, space, W, o, r, l, d
  • Unlike colours, characters do not have a logical connection to numbers
  • To represent characters as numbers, we use a lookup table called ASCII
  • ASCII stands for American Standard Code for Information Interchange
  • As long as every computer uses the same lookup table, computers can always translate a set of numbers into the same set of characters

ASCII is nothing but a simple lookup table

Yes, really!

For basic characters, we can use the encoding system called ASCII. This maps the numbers 0 to 255 to characters. Therefore, one character is represented by one byte

Check it out here: ASCII Table

ASCII is nothing but a simple lookup table

Translation

“Hello World” =

01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100

Your turn!

Practice Exercise 03

  • Translate the following binary into ASCII text:

01011001 01100001 01111001

ASCII Limitations

  • ASCII only includes unaccented characters.
  • Languages requiring accented characters cannot be represented.
  • Even English needs characters like ‘é’ for words such as ‘café’.
  • To address this, Unicode was developed

  • Unicode is a superset of ASCII that includes characters from all languages, as well as symbols and emojis

  • The Unicode system represents every character that can be typed into a computer. It uses up to 5 bytes, which can represent up to 1 trillion characters!

  • UTF-8 stands for Transformation Format 8-bit

  • Find all the Unicode characters here: https://symbl.cc/en/unicode-table/

    • “Danilo” in Unicode: \u0044\u0061\u006e\u0069\u006c\u006f
    • “QTM 350” in Unicode: \u0051\u0054\u004d\u0020\u0033\u0035\u0030
  • Decoder: https://symbl.cc/en/tools/decoder/

Questions? 🤔

Programming languages and the genesis of programming 🌟 🔡 🐍

The genesis of programming

Zuse’s computers

  • Konrad Zuse was a German engineer and computer pioneer
  • He created the first programmable computer, the Z3, in 1941
  • The Z3 was the first computer to use binary arithmetic and read binary instructions from punch tape
  • Example: Z4 had 512 bytes of memory
  • Zuse also created the first high-level programming language, Plankalkül

What is Assembly language?

  • Assembly language is a low-level programming language that allows writing machine code in human-readable text
  • Each instruction corresponds to a single machine code instruction
  • The first assemblers were human!
  • Programmers wrote assembly code, which secretaries transcribed to binary for machine processing

Some curious facts about Assembly!

Margaret Hamilton and the Apollo 11 code
  • The Apollo 11 mission to the moon was programmed in assembly language

  • The code is available here: https://github.com/chrislgarry/Apollo-11 (good luck reading it! 😅)

  • One of the files is the BURN_BABY_BURN--MASTER_IGNITION_ROUTINE.agc 🔥 🚀

  • But if Assembly is so fast and efficient, why don’t we use it all the time?

Low-level vs high-level languages

  • Compiled Languages: Convert code to binary instructions before execution (e.g., C++, Fortran, Go).
  • Interpreted Languages: Run inside a program that interprets and executes commands immediately (e.g., R, Python).

Low-level vs high-level languages

Code that is worth a thousand words

  • “Hello, World!” in machine code:
R> 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 
  • “Hello, World!” in Assembly (x86 Assembly for Linux)
R> section .data
+     message db 'Hello, World!', 10    ; 10 is the ASCII code for newline
+ 
+ section .text
+     global _start
+ 
+ _start:
+     mov eax, 4          ; system call number for write
+     mov ebx, 1          ; file descriptor 1 is stdout
+     mov ecx, message    ; address of string to output
+     mov edx, 14         ; number of bytes
+     int 0x80            ; call kernel
+ 
+     mov eax, 1          ; system call number for exit
+     xor ebx, ebx        ; exit status 0
+     int 0x80            ; call kernel
  • “Hello, World!” in Python:
R> print("Hello, World!")

Question: Is Natural Language Programming the Future of High-Level Languages? 🤖

Summary 💡

Summary

  • Computational Literacy: Binary and hexadecimal numbers, characters (ASCII, Unicode), and distinction between high vs low-level programming languages
  • Early Computing: Konrad Zuse’s pioneering work with programmable digital computers and the use of binary arithmetic
  • Assembly Language: The initial approach to programming using human-readable instructions for machine code
  • Calculators: The evolution from Leibniz’s four-species calculating machine to modern electronic computing
  • Silicon Microchip Computers: The 1970s revolution with transistors, integrated circuits, and the emergence of Von Neumann architecture
  • Modern Programming Languages: From low-level assembly languages to high-level languages like Python; distinction between compiled and interpreted languages

Next class

  • We will learn about the command line and shell scripting in the terminal
  • Please have your WSL or iTerm2 installed, and we will start coding!
  • If you have VS Code, that’s even better! 😉
  • Please check the installation tutorials for more information, and let me know if you have any questions 😃
  • Assignment 01 is already online. Please check it out! Due date: 11 September 2024.

Thank you very much and see you next class! 😊 🙏

Solution - Practice Exercise 01

  1. What binary number represents 5?
  • In binary, the number 5 is represented as 101, which equates to \((1 \times 2^2) + (0 \times 2^1) + (1 \times 2^0)\).
  1. What binary number represents 7?
  • In binary, the number 7 is represented as 111, which equates to \((1 \times 2^2) + (1 \times 2^1) + (1 \times 2^0)\).
  1. What binary number represents 9?
  • In binary, the number 9 is represented as 1001, which equates to \((1 \times 2^3) + (0 \times 2^2) + (0 \times 2^1) + (1 \times 2^0)\).
  1. What binary number represents 11?
  • In binary, the number 11 is represented as 1011, which equates to \((1 \times 2^3) + (0 \times 2^2) + (1 \times 2^1) + (1 \times 2^0)\).

Solution - Practice Exercise 02

  1. Decimal 13 is 1101 in binary.
  • Break it down: \(13 = (1 \times 2^3) + (1 \times 2^2) + (0 \times 2^1) + (1 \times 2^0)\).
  1. Binary 1101 is D in hexadecimal.
  • Group the binary into blocks of four: 1101.
  • Convert each block to hex: 1101 (binary) = D (hex).
  • Let’s take a closer look at how to convert the binary number 1101 to hexadecimal:
  • Start with the binary number: 1101
  • Convert it to decimal by summing the powers of 2:
    • \(1 \times 2^3\) = 8
    • \(1 \times 2^2\) = 4
    • \(0 \times 2^1\) = 0
    • \(1 \times 2^0\) = 1
  • Add the decimal values: \(8 + 4 + 0 + 1 = 13\)
  • The decimal number 13 corresponds to the hexadecimal number D.
  • Therefore, binary 1101 is D in hexadecimal.

Solution - Practice Exercise 03

  • Step 1: Identify the binary strings: 01011001 01100001 01111001

  • Step 2: Convert each binary string to its decimal equivalent

    • 01011001 = 89
    • 01100001 = 97
    • 01111001 = 121
  • Step 3: Map each decimal value to its corresponding ASCII character

    • 89 = Y
    • 97 = a
    • 121 = y
  • Step 4: Combine the ASCII characters to form the final text

    • Result: Yay