QTM 350 - Data Science Computing

Lecture 03: Command Line Interface

Danilo Freire

Department of Quantitative Theory and Methods
Emory University

Recap and lecture overview 📚

Brief recap of last class

Early computing and data representation

  • Computers evolved from people to mechanical calculators to silicon-based machines
  • Modern computers use the Von Neumann architecture, storing both instructions and data in memory
  • Computers represent data using binary (base 2) numbers made up of 0s and 1s
  • A bit is a single binary digit; 8 bits make a byte
  • Hexadecimal (base 16) is a compact way to represent binary, with each hex digit corresponding to 4 bits
  • Abstraction allows representing complex data like images and text using numbers

Brief recap of last class

Representing images, colours and text

  • Images can be broken down into a grid of coloured pixels
  • Colours are represented using the RGB model, with each colour channel (red, green, blue) ranging from 0-255
  • 8-bit color uses 256 levels per channel, allowing for over 16 million possible colors
  • Text is broken into individual characters, with each character mapped to a number using an encoding like ASCII
  • ASCII is a simple lookup table mapping the numbers 0-255 to characters
  • Unicode extends ASCII to support accented characters and symbols from all languages

Brief recap of last class

Programming languages

  • Konrad Zuse created the first programmable computers and high-level programming language in the 1940s
  • Assembly allows writing human-readable instructions that map closely to machine code
  • High-level languages like Python abstract away hardware details and are more portable across systems
  • Low-level languages are harder to read and write but very fast and efficient
  • Compiled languages are converted to machine code before execution; interpreted languages are executed on the fly

Today’s lecture

Command line: the old school way of interacting with computers

  • Today, we will learn about the command line, a text-based interface to interact with computers
  • We will learn some basic commands to navigate the file system, create and delete files, and run programs
  • We will also learn about shell scripting, a way to automate tasks using the command line
  • The command line is still widely used in data science and programming, especially for remote servers, cloud computing, and automation

Questions? 🤓

What is the command line? 💻

A computer in a nutshell

Operating system

Credit Dave Kerr

  • The operating system (OS) is system software that interfaces with (and manages access to) a computer’s hardware. It also provides software resources
  • The OS is divided into the kernel and user space
  • The kernel is the core of the OS. It’s responsible for interfacing with hardware (drivers), managing resources etc. Running software in the kernel is extremely sensitive! That’s why users are kept away from it!
  • Curiosity: You can see the Linux kernel source code on GitHub
  • The user space provides an interface for users, who can run programs/applications on the machine. Hardware access of programmes (e.g., memory usage) is managed by the kernel. Programmes in user space are essentially in sandboxes, which sets a limit to how much damage they can do.

A computer in a nutshell

Kernels and shells

  • The shell is just a general name for any user space program that allows access to resources in the system, via some kind of interface
  • Shells come in many different flavours but are generally provided to aid a human operator in accessing the system. This could be interactively, by typing at a terminal, or via scripts, which are files that contain a sequence of commands
  • Modern computers use graphical user interfaces (GUIs) as the standard tool for human-computer interaction
  • Why “kernel” and “shell”? The kernel is the soft, edible part of a nut or seed, which is surrounded by a shell to protect it. Useful metaphor, innit?

Interacting with the shell

Terminals

Credit Dave Kerr

  • Things are still a bit more complicated
  • We’re not directly interacting with the “shell” but using a terminal
  • A terminal is just a program that reads input from the keyboard, passes that input to another programme, and displays the results on the screen
  • A shell program on its own does not do this - it requires a terminal as an interface
  • Why “terminal”? Back in the old days (before computer screen existed), terminal machines (hardware!) were used to let humans interface with large machines (“mainframes”). Often many terminals were connected to a single machine
  • When you want to work with a computer in a data center (or remotely in cloud computing), you’ll still do pretty much the same

Interacting with the shell

Command line

Credit Dave Kerr

  • Terminals are really quite simple - they’re just interfaces

  • The first thing that a terminal will do is run a shell - a programme we can use to operate the computer

  • Back to the shell: the shell usually takes input

    • Interactively from the user via the terminal’s command line
    • Executes scripts (without command line)
  • In interactive mode the shell then returns output

    • To the terminal where it is printed/shown
    • To files or other locations
  • The command line represents what is shown and entered in the terminal. They can be customised (e.g., with colour highlighting) to make interaction more convenient

Shell variants

Bash, Zsh, and others

  • It is important to note that there are many different shell programmes, and they differ in terms of functionality
  • On most Unix-like systems, the default shell is a program called bash, which stands for “Bourne Again Shell”
  • Other examples are the Z Shell (or zsh; default on MacOS), Windows Command Prompt (cmd.exe, the default CLI on MS Windows), Windows PowerShell, C Shell, and many more
  • When a terminal opens, it will immediately start the user’s preferred shell programme. (This can be changed.)

Why bother with the shell? 🤷🏻‍♂️

Why bother with the shell?

Why should you use this…

… instead of this?

Why bother with the shell?

The programmer’s best friend

  1. Speed. Typing is fast: A skilled shell user can manipulate a system at dazzling speeds just using a keyboard. Typing commands is generally much faster than exploring through user interfaces with a mouse.

  2. Power. Both for executing commands and for fixing problems. There are some things you just can’t do in an IDE or GUI. It also avoids memory complications associated with certain applications and/or IDEs.

  3. Reproducibility. Scripting is reproducible, while clicking is not.

  4. Portability. A shell can be used to interface to almost any type of computer, from a mainframe to a Raspberry Pi, in a very similar way. The shell is often the only game in town for high performance computing (interacting with servers and super computers).

  5. Automation. Shells are programmable: Working in the shell allows you to program workflows, that is create scripts to automate time-consuming or repetitive processes.

  6. Become a marketable data scientist. Modern programming is often polyglot. The shell provides a common interface for tooling. Modern solutions are often built to run in containers on Linux. In this environment shell knowledge has become very valuable. In short, the shell is having a renaissance in the age of data science.

The Unix philosophy

The Unix philosophy

The shell tools that we’re going to be using have their roots in the Unix family of operating systems originally developed at Bells Labs in the 1970s.

Besides paying homage, acknowledging the Unix lineage is important because these tools still embody the “Unix philosophy”:

Do One Thing And Do It Well

By pairing and chaining well-designed individual components, we can build powerful and much more complex larger systems.

You can see why the Unix philosophy is also referred to as “minimalist and modular”.

Again, this philosophy is very clearly expressed in the design and functionality of the Unix shell.

Things to use the shell for

  • Navigating the file system
  • Version control with Git
  • Renaming and moving files
  • Finding things on your computer
  • Writing and running code
  • Installing and updating software
  • Monitoring system resources
  • Connecting to cloud environments
  • Running analyses (“jobs”) on super computers
  • … and much more!

Shell basics 🐚 🤓

Shell: First look

Let’s open up our shell!

A convenient way to do this is through VS Code’s built-in Terminal.

Click on the View menu, then Terminal. You can also use the shortcut Ctrl + ` (backtick).

Your system default shell is loaded. To find out what that is, type echo $SHELL in the terminal.

$ echo $SHELL
/bin/zsh

It is Z shell in my case

… what about you? It is your turn to find out!

Your turn!

Of course, it’s always possible to open up the shell directly if you prefer. It’s your turn!

Feel free to check our class tutorial on how to set up your shell in VS Code.

Open your terminal and type the following commands (without the $):

$ echo $SHELL 
$ whoami 
$ pwd 
$ mkdir new-folder
$ cd ..
$ ls
$ man ls # type 'j' to scroll down, 'k' to scroll up, 'q' to quit

Share your results with a colleague (or the class)!

Shell: First look

You should see something like:

$  username@hostname:~$

This is shell-speak for: “Who am I and where am I?”

  • username denotes a specific user (one of potentially many on this computer).

  • @hostname denotes the name of the computer or server.

  • :~ denotes the directory path (where ~ signifies the user’s home directory).

  • $ (or maybe %) denotes the start of the command prompt.

    • (For a special “superuser” called root, the dollar sign will change to a #).
$ whoami
$ pwd
dafreir
/Users/dafreir/Documents/github/qtm350/lectures/lecture-03

Syntax

Syntax

All bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # in long format and human-readable sizes
$ 
$ ls -lh ~/Documents 


$ # sort the file and display only unique lines
$ 
$ sort -u file.txt 

Commands

  • You don’t always need options or arguments

  • For example:

    • ls ~/Documents/ and ls -lh ~/Documents are both valid commands that will yield (different) output
  • However, you always need a command.

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # in long format and human-readable sizes
$ 
$ ls -lh ~/Documents 


$ # sort the file and display only unique lines
$ 
$ sort -u file.txt 

Options (also called Flags)

  • Start with a dash. Usually one letter.

  • Multiple options can be chained under a single dash.

$ ls -l -a -h /var/log # This works
$ ls -lah /var/log # So does this
  • An exception is with (rarer) options requiring two dashes.
$ ls --group-directories-first --human-readable /var/log
  • l: Use a long listing format. This option shows detailed information about the files and directories

  • h: With -l, print sizes in human-readable format (e.g., KB, MB)

  • u: Unique, it filters out the duplicate entries in the output

  • Think it’s difficult to memorize what the individual letters stand for? You’re totally right!

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # with human-readable sizes
$ $ ls -lh ~/Documents/


$ # sort the file and remove duplicates
$ $ sort -u file.txt

Arguments

  • Tell the command what to operate on.

  • Totally depends on the command what legit inputs are.

  • Can be a file, path, a set of files and folders, a string, and more

  • Sometimes more than just one argument is needed:

$ mv figs/cat.png best-figs/cat02.png

Help! 🆘 😟

Multiple ways to get help

  • The man tool can be used to look at the manual page for a topic.

  • The man pages are grouped into sections, we can see them with man man.

  • The cht.sh website can be used directly from the shell to get help on tools. Run it like this: curl cht.sh/command (how to install curl).

  • Or you can use an LLM (or Google) to search for help on a command.
  • There’s a great chance that someone else has already asked the same question you have 😉

Multiple ways to get help

  • You can also install the tldr tool which provides simplified help pages for common commands. Run it like this: tldr command
$ tldr ls

ls

List directory contents.
More information: <https://www.gnu.org/software/coreutils/manual/html_node/ls-invocation.html>.

- List files one per line:
    ls -1

- List [a]ll files, including hidden files:
    ls -a

- List files with a trailing symbol to indicate file type (directory/, symbolic_link@, executable*, ...):
    ls -F

- List [a]ll files in [l]ong format (permissions, ownership, size, and modification date):
    ls -la

- List files in [l]ong format with size displayed using [h]uman-readable units (KiB, MiB, GiB):
    ls -lh

- List files in [l]ong format, sorted by [S]ize (descending) [R]ecursively:
    ls -lSR

- List files in [l]ong format, sorted by [t]ime the file was modified and in [r]everse order (oldest first):
    ls -ltr

- Only list [d]irectories:
    ls -d */

  • For more info on how to get help, see here.

Getting help with man

The man command (“manual pages”) is your friend if you need help.

$ man ls
LS(1)                       General Commands Manual                      LS(1)

NNAAMMEE
     llss – list directory contents

SSYYNNOOPPSSIISS
     llss [--@@AABBCCFFGGHHIILLOOPPRRSSTTUUWWaabbccddeeffgghhiikkllmmnnooppqqrrssttuuvvwwxxyy11%%,,] [----ccoolloorr=_w_h_e_n]
        [--DD _f_o_r_m_a_t] [_f_i_l_e _._._.]

DDEESSCCRRIIPPTTIIOONN
     For each operand that names a _f_i_l_e of a type other than directory, llss
     displays its name as well as any requested, associated information.  For
     each operand that names a _f_i_l_e of type directory, llss displays the names
     of files contained within that directory, as well as any requested,
     associated information.

     If no operands are given, the contents of the current directory are
     displayed.  If more than one operand is given, non-directory operands are
     displayed first; directory and non-directory operands are sorted
     separately and in lexicographical order.

     The following options are available:

     --@@      Display extended attribute keys and sizes in long (--ll) output.

     --AA      Include directory entries whose names begin with a dot (‘_.’)
             except for _. and _._..  Automatically set for the super-user unless
             --II is specified.

     --BB      Force printing of non-printable characters (as defined by
             ctype(3) and current locale settings) in file names as \_x_x_x,
             where _x_x_x is the numeric value of the character in octal.  This
             option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --CC      Force multi-column output; this is the default when output is to
             a terminal.

     --DD _f_o_r_m_a_t
             When printing in the long (--ll) format, use _f_o_r_m_a_t to format the
             date and time output.  The argument _f_o_r_m_a_t is a string used by
             strftime(3).  Depending on the choice of format string, this may
             result in a different number of columns in the output.  This
             option overrides the --TT option.  This option is not defined in
             IEEE Std 1003.1-2008 (“POSIX.1”).

     --FF      Display a slash (‘/’) immediately after each pathname that is a
             directory, an asterisk (‘*’) after each that is executable, an at
             sign (‘@’) after each symbolic link, an equals sign (‘=’) after
             each socket, a percent sign (‘%’) after each whiteout, and a
             vertical bar (‘|’) after each that is a FIFO.

     --GG      Enable colorized output.  This option is equivalent to defining
             CLICOLOR or COLORTERM in the environment and setting
             ----ccoolloorr=_a_u_t_o.  (See below.)  This functionality can be compiled
             out by removing the definition of COLORLS.  This option is not
             defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --HH      Symbolic links on the command line are followed.  This option is
             assumed if none of the --FF, --dd, or --ll options are specified.

     --II      Prevent --AA from being automatically set for the super-user.  This
             option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --LL      Follow all symbolic links to final target and list the file or
             directory the link references rather than the link itself.  This
             option cancels the --PP option.

     --OO      Include the file flags in a long (--ll) output.  This option is
             incompatible with IEEE Std 1003.1-2008 (“POSIX.1”).  See
             chflags(1) for a list of file flags and their meanings.

     --PP      If argument is a symbolic link, list the link itself rather than
             the object the link references.  This option cancels the --HH and
             --LL options.

     --RR      Recursively list subdirectories encountered.

     --SS      Sort by size (largest file first) before sorting the operands in
             lexicographical order.

     --TT      When printing in the long (--ll) format, display complete time
             information for the file, including month, day, hour, minute,
             second, and year.  The --DD option gives even more control over the
             output format.  This option is not defined in IEEE Std
             1003.1-2008 (“POSIX.1”).

     --UU      Use time when file was created for sorting or printing.  This
             option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --WW      Display whiteouts when scanning directories.  This option is not
             defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --XX      When listing recursively, do not descend into directories that
             would cross file system boundaries.  More specifically, this
             option will prevent descending into directories that have a
             different device number.

     --aa      Include directory entries whose names begin with a dot (‘_.’).

     --bb      As --BB, but use C escape codes whenever possible.  This option is
             not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --cc      Use time when file status was last changed for sorting or
             printing.

     ----ccoolloorr=_w_h_e_n
             Output colored escape sequences based on _w_h_e_n, which may be set
             to either aallwwaayyss, aauuttoo, or nneevveerr.

             aallwwaayyss will make llss always output color.  If TERM is unset or set
             to an invalid terminal, then llss will fall back to explicit ANSI
             escape sequences without the help of termcap(5).  aallwwaayyss is the
             default if ----ccoolloorr is specified without an argument.

             aauuttoo will make llss output escape sequences based on termcap(5),
             but only if stdout is a tty and either the --GG flag is specified
             or the COLORTERM environment variable is set and not empty.

             nneevveerr will disable color regardless of environment variables.
             nneevveerr is the default when neither ----ccoolloorr nor --GG is specified.

             For compatibility with GNU coreutils, llss supports yyeess or ffoorrccee as
             equivalent to aallwwaayyss, nnoo or nnoonnee as equivalent to nneevveerr, and ttttyy
             or iiff--ttttyy as equivalent to aauuttoo.

     --dd      Directories are listed as plain files (not searched recursively).

     --ee      Print the Access Control List (ACL) associated with the file, if
             present, in long (--ll) output.

     --ff      Output is not sorted.  This option turns on --aa.  It also negates
             the effect of the --rr, --SS and --tt options.  As allowed by IEEE Std
             1003.1-2008 (“POSIX.1”), this option has no effect on the --dd, --ll,
             --RR and --ss options.

     --gg      This option has no effect.  It is only available for
             compatibility with 4.3BSD, where it was used to display the group
             name in the long (--ll) format output.  This option is incompatible
             with IEEE Std 1003.1-2008 (“POSIX.1”).

     --hh      When used with the --ll option, use unit suffixes: Byte, Kilobyte,
             Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the
             number of digits to four or fewer using base 2 for sizes.  This
             option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --ii      For each file, print the file's file serial number (inode
             number).

     --kk      This has the same effect as setting environment variable
             BLOCKSIZE to 1024, except that it also nullifies any --hh options
             to its left.

     --ll      (The lowercase letter “ell”.) List files in the long format, as
             described in the _T_h_e _L_o_n_g _F_o_r_m_a_t subsection below.

     --mm      Stream output format; list files across the page, separated by
             commas.

     --nn      Display user and group IDs numerically rather than converting to
             a user or group name in a long (--ll) output.  This option turns on
             the --ll option.

     --oo      List in long format, but omit the group id.

     --pp      Write a slash (‘/’) after each filename if that file is a
             directory.

     --qq      Force printing of non-graphic characters in file names as the
             character ‘?’; this is the default when output is to a terminal.

     --rr      Reverse the order of the sort.

     --ss      Display the number of blocks used in the file system by each
             file.  Block sizes and directory totals are handled as described
             in _T_h_e _L_o_n_g _F_o_r_m_a_t subsection below, except (if the long format
             is not also requested) the directory totals are not output when
             the output is in a single column, even if multi-column output is
             requested.  (--ll) format, display complete time information for
             the file, including month, day, hour, minute, second, and year.
             The --DD option gives even more control over the output format.
             This option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --tt      Sort by descending time modified (most recently modified first).
             If two files have the same modification timestamp, sort their
             names in ascending lexicographical order.  The --rr option reverses
             both of these sort orders.

             Note that these sort orders are contradictory: the time sequence
             is in descending order, the lexicographical sort is in ascending
             order.  This behavior is mandated by IEEE Std 1003.2 (“POSIX.2”).
             This feature can cause problems listing files stored with
             sequential names on FAT file systems, such as from digital
             cameras, where it is possible to have more than one image with
             the same timestamp.  In such a case, the photos cannot be listed
             in the sequence in which they were taken.  To ensure the same
             sort order for time and for lexicographical sorting, set the
             environment variable LS_SAMESORT or use the --yy option.  This
             causes llss to reverse the lexicographical sort order when sorting
             files with the same modification timestamp.

     --uu      Use time of last access, instead of time of last modification of
             the file for sorting (--tt) or long printing (--ll).

     --vv      Force unedited printing of non-graphic characters; this is the
             default when output is not to a terminal.

     --ww      Force raw printing of non-printable characters.  This is the
             default when output is not to a terminal.  This option is not
             defined in IEEE Std 1003.1-2001 (“POSIX.1”).

     --xx      The same as --CC, except that the multi-column output is produced
             with entries sorted across, rather than down, the columns.

     --yy      When the --tt option is set, sort the alphabetical output in the
             same order as the time output.  This has the same effect as
             setting LS_SAMESORT.  See the description of the --tt option for
             more details.  This option is not defined in IEEE Std 1003.1-2001
             (“POSIX.1”).

     --%%      Distinguish dataless files and directories with a '%' character
             in long (--ll) output, and don't materialize dataless directories
             when listing them.

     --11      (The numeric digit “one”.) Force output to be one entry per line.
             This is the default when output is not to a terminal.

     --,      (Comma) When the --ll option is set, print file sizes grouped and
             separated by thousands using the non-monetary separator returned
             by localeconv(3), typically a comma or period.  If no locale is
             set, or the locale does not have a non-monetary separator, this
             option has no effect.  This option is not defined in IEEE Std
             1003.1-2001 (“POSIX.1”).

     The --11, --CC, --xx, and --ll options all override each other; the last one
     specified determines the format used.

     The --cc, --uu, and --UU options all override each other; the last one
     specified determines the file time used.

     The --SS and --tt options override each other; the last one specified
     determines the sort order used.

     The --BB, --bb, --ww, and --qq options all override each other; the last one
     specified determines the format used for non-printable characters.

     The --HH, --LL and --PP options all override each other (either partially or
     fully); they are applied in the order specified.

     By default, llss lists one entry per line to standard output; the
     exceptions are to terminals or when the --CC or --xx options are specified.

     File information is displayed with one or more ⟨blank⟩s separating the
     information associated with the --ii, --ss, and --ll options.

   TThhee LLoonngg FFoorrmmaatt
     If the --ll option is given, the following information is displayed for
     each file: file mode, number of links, owner name, group name, number of
     bytes in the file, abbreviated month, day-of-month file was last
     modified, hour file last modified, minute file last modified, and the
     pathname.  If the file or directory has extended attributes, the
     permissions field printed by the --ll option is followed by a '@'
     character.  Otherwise, if the file or directory has extended security
     information (such as an access control list), the permissions field
     printed by the --ll option is followed by a '+' character.  If the --%%
     option is given, a '%' character follows the permissions field for
     dataless files and directories, possibly replacing the '@' or '+'
     character.

     If the modification time of the file is more than 6 months in the past or
     future, and the --DD or --TT are not specified, then the year of the last
     modification is displayed in place of the hour and minute fields.

     If the owner or group names are not a known user or group name, or the --nn
     option is given, the numeric ID's are displayed.

     If the file is a character special or block special file, the device
     number for the file is displayed in the size field.  If the file is a
     symbolic link the pathname of the linked-to file is preceded by “->”.

     The listing of a directory's contents is preceded by a labeled total
     number of blocks used in the file system by the files which are listed as
     the directory's contents (which may or may not include _. and _._. and other
     files which start with a dot, depending on other options).

     The default block size is 512 bytes.  The block size may be set with
     option --kk or environment variable BLOCKSIZE.  Numbers of blocks in the
     output will have been rounded up so the numbers of bytes is at least as
     many as used by the corresponding file system blocks (which might have a
     different size).

     The file mode printed under the --ll option consists of the entry type and
     the permissions.  The entry type character describes the type of file, as
     follows:

           --     Regular file.
           bb     Block special file.
           cc     Character special file.
           dd     Directory.
           ll     Symbolic link.
           pp     FIFO.
           ss     Socket.
           ww     Whiteout.

     The next three fields are three characters each: owner permissions, group
     permissions, and other permissions.  Each field has three character
     positions:

           1.   If rr, the file is readable; if --, it is not readable.

           2.   If ww, the file is writable; if --, it is not writable.

           3.   The first of the following that applies:

                      SS     If in the owner permissions, the file is not
                            executable and set-user-ID mode is set.  If in the
                            group permissions, the file is not executable and
                            set-group-ID mode is set.

                      ss     If in the owner permissions, the file is
                            executable and set-user-ID mode is set.  If in the
                            group permissions, the file is executable and
                            setgroup-ID mode is set.

                      xx     The file is executable or the directory is
                            searchable.

                      --     The file is neither readable, writable,
                            executable, nor set-user-ID nor set-group-ID mode,
                            nor sticky.  (See below.)

                These next two apply only to the third character in the last
                group (other permissions).

                      TT     The sticky bit is set (mode 1000), but not execute
                            or search permission.  (See chmod(1) or
                            sticky(7).)

                      tt     The sticky bit is set (mode 1000), and is
                            searchable or executable.  (See chmod(1) or
                            sticky(7).)

     The next field contains a plus (‘+’) character if the file has an ACL, or
     a space (‘ ’) if it does not.  The llss utility does not show the actual
     ACL unless the --ee option is used in conjunction with the --ll option.

EENNVVIIRROONNMMEENNTT
     The following environment variables affect the execution of llss:

     BLOCKSIZE           If this is set, its value, rounded up to 512 or down
                         to a multiple of 512, will be used as the block size
                         in bytes by the --ll and --ss options.  See _T_h_e _L_o_n_g
                         _F_o_r_m_a_t subsection for more information.

     CLICOLOR            Use ANSI color sequences to distinguish file types.
                         See LSCOLORS below.  In addition to the file types
                         mentioned in the --FF option some extra attributes
                         (setuid bit set, etc.) are also displayed.  The
                         colorization is dependent on a terminal type with the
                         proper termcap(5) capabilities.  The default “cons25”
                         console has the proper capabilities, but to display
                         the colors in an xterm(1), for example, the TERM
                         variable must be set to “xterm-color”.  Other
                         terminal types may require similar adjustments.
                         Colorization is silently disabled if the output is
                         not directed to a terminal unless the CLICOLOR_FORCE
                         variable is defined or ----ccoolloorr is set to “always”.

     CLICOLOR_FORCE      Color sequences are normally disabled if the output
                         is not directed to a terminal.  This can be
                         overridden by setting this variable.  The TERM
                         variable still needs to reference a color capable
                         terminal however otherwise it is not possible to
                         determine which color sequences to use.

     COLORTERM           See description for CLICOLOR above.

     COLUMNS             If this variable contains a string representing a
                         decimal integer, it is used as the column position
                         width for displaying multiple-text-column output.
                         The llss utility calculates how many pathname text
                         columns to display based on the width provided.  (See
                         --CC and --xx.)

     LANG                The locale to use when determining the order of day
                         and month in the long --ll format output.  See
                         environ(7) for more information.

     LSCOLORS            The value of this variable describes what color to
                         use for which attribute when colors are enabled with
                         CLICOLOR or COLORTERM.  This string is a
                         concatenation of pairs of the format _f_b, where _f is
                         the foreground color and _b is the background color.

                         The color designators are as follows:

                               aa     black
                               bb     red
                               cc     green
                               dd     brown
                               ee     blue
                               ff     magenta
                               gg     cyan
                               hh     light grey
                               AA     bold black, usually shows up as dark grey
                               BB     bold red
                               CC     bold green
                               DD     bold brown, usually shows up as yellow
                               EE     bold blue
                               FF     bold magenta
                               GG     bold cyan
                               HH     bold light grey; looks like bright white
                               xx     default foreground or background

                         Note that the above are standard ANSI colors.  The
                         actual display may differ depending on the color
                         capabilities of the terminal in use.

                         The order of the attributes are as follows:

                               1.   directory
                               2.   symbolic link
                               3.   socket
                               4.   pipe
                               5.   executable
                               6.   block special
                               7.   character special
                               8.   executable with setuid bit set
                               9.   executable with setgid bit set
                               10.  directory writable to others, with sticky
                                    bit
                               11.  directory writable to others, without
                                    sticky bit
                               12.  dataless file

                         The default is "exfxcxdxbxegedabagacadah", i.e., blue
                         foreground and default background for regular
                         directories, black foreground and red background for
                         setuid executables, etc.

     LS_COLWIDTHS        If this variable is set, it is considered to be a
                         colon-delimited list of minimum column widths.
                         Unreasonable and insufficient widths are ignored
                         (thus zero signifies a dynamically sized column).
                         Not all columns have changeable widths.  The fields
                         are, in order: inode, block count, number of links,
                         user name, group name, flags, file size, file name.

     LS_SAMESORT         If this variable is set, the --tt option sorts the
                         names of files with the same modification timestamp
                         in the same sense as the time sort.  See the
                         description of the --tt option for more details.

     TERM                The CLICOLOR and COLORTERM functionality depends on a
                         terminal type with color capabilities.

     TZ                  The timezone to use when displaying dates.  See
                         environ(7) for more information.

EEXXIITT SSTTAATTUUSS
     The llss utility exits 0 on success, and >0 if an error occurs.

EEXXAAMMPPLLEESS
     List the contents of the current working directory in long format:

           $ ls -l

     In addition to listing the contents of the current working directory in
     long format, show inode numbers, file flags (see chflags(1)), and suffix
     each filename with a symbol representing its file type:

           $ ls -lioF

     List the files in _/_v_a_r_/_l_o_g, sorting the output such that the most
     recently modified entries are printed first:

           $ ls -lt /var/log

CCOOMMPPAATTIIBBIILLIITTYY
     The group field is now automatically included in the long listing for
     files in order to be compatible with the IEEE Std 1003.2 (“POSIX.2”)
     specification.

LLEEGGAACCYY DDEESSCCRRIIPPTTIIOONN
     In legacy mode, the --ff option does not turn on the --aa option and the --gg,
     --nn, and --oo options do not turn on the --ll option.

     Also, the --oo option causes the file flags to be included in a long (-l)
     output; there is no --OO option.

     When --HH is specified (and not overridden by --LL or --PP) and a file argument
     is a symlink that resolves to a non-directory file, the output will
     reflect the nature of the link, rather than that of the file.  In legacy
     operation, the output will describe the file.

     For more information about legacy mode, see compat(5).

SSEEEE AALLSSOO
     chflags(1), chmod(1), sort(1), xterm(1), localeconv(3), strftime(3),
     strmode(3), compat(5), termcap(5), sticky(7), symlink(7)

SSTTAANNDDAARRDDSS
     With the exception of options --gg, --nn and --oo, the llss utility conforms to
     IEEE Std 1003.1-2001 (“POSIX.1”) and IEEE Std 1003.1-2008 (“POSIX.1”).
     The options --BB, --DD, --GG, --II, --TT, --UU, --WW, --ZZ, --bb, --hh, --ww, --yy and --, are
     non-standard extensions.

     The ACL support is compatible with IEEE Std 1003.2c (“POSIX.2c”) Draft 17
     (withdrawn).

HHIISSTTOORRYY
     An llss command appeared in Version 1 AT&T UNIX.

BBUUGGSS
     To maintain backward compatibility, the relationships between the many
     options are quite complex.

     The exception mentioned in the --ss option description might be a feature
     that was based on the fact that single-column output usually goes to
     something other than a terminal.  It is debatable whether this is a
     design bug.

     IEEE Std 1003.2 (“POSIX.2”) mandates opposite sort orders for files with
     the same timestamp when sorting with the --tt option.

macOS 15.2                      August 31, 2020                     macOS 15.2

Getting help with man

Manual pages are shown in the shell. Here are the essentials to navigate through contents presented in the pager:

  • d - Scroll down half a page
  • u - Scroll up half a page
  • j / k - Scroll down or up a line. You can also use the arrow keys for this
  • q - Quit
  • /pattern - Search for text provided as “pattern”
  • n - When searching, find the next occurrence
  • N - When searching, find the previous occurrence
  • These and other man tricks are detailed in the help pages (hit “h” when you’re in the pager for an overview).

RTFM!

Always check the documentation!

man page explorer challenge

Partner up and choose a command from the list below. Use man to complete these tasks:

$ # Choose one (or all) of the following commands:
$ ls, cd, cp, mv, rm, mkdir, rmdir, touch, cat, find
  1. Summarise the command’s purpose in one sentence.
  2. Find an interesting option and explain what it does.
  3. Create an example using your command with at least two options.
  4. Bonus: Combine your command with your partner’s in a single line.

You have about 5 minutes. Be ready to share your findings!

Reflection: How was using man compared to online searches? How might you use it in future projects?

More navigation commands: A cheat sheet

  • ls (list): Show files and directories in the current directory
  • ls -l: Long listing format with detailed information
  • ls -a: Show hidden files (those starting with a dot)
  • ls -lh: Long listing format with human-readable sizes
  • ls -R: List subdirectories recursively
  • pwd (print working directory): Show the current directory path
  • cd (change directory): Change the current working directory
  • cd -: Go back to the previous directory
  • .: Refers to the current directory
  • ..: Refers to the parent directory
  • ~: Refers to the home directory
  • mkdir: Create a new directory
  • touch: Create a new empty file or update timestamps
  • cp: Copy files or directories
  • mv: Move or rename files or directories
  • rm: Remove files (use with caution!)
  • rmdir: Remove empty directories
  • cat: Display file contents
  • find: Search for files and directories

For a more detailed overview, click here

Shell navigation exercise

Follow these steps to practice using basic shell commands. Type each command and observe the results.

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “practice” and change into it: mkdir practice && cd practice
  3. Create two empty files called “file1.txt” and “file2.txt”: touch file1.txt file2.txt
  4. List the contents of the directory: ls
  5. Move file2.txt to a new name (rename), file3.txt: mv file2.txt file3.txt
  6. List the contents again to verify the change, then return to the home directory: ls && cd ~
  7. Remove the “practice” directory and its contents: rm -r practice
  8. Verify that the directory has been removed: ls

Shell navigation exercise to try at home

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “shell_practice”: mkdir shell-practice
  3. Change into the new directory: cd shell-practice
  4. Create three empty files called file1.txt, file2.txt, and file3.txt: touch file1.txt file2.txt file3.txt
  5. List the contents of the directory: ls
  6. Create a subdirectory called “subdir”: mkdir subdir
  1. Move file2.txt into the subdirectory: mv file2.txt subdir/
  2. Copy file1.txt to a new file called file4.txt: cp file1.txt file4.txt
  3. List the contents of the current directory and the subdirectory: ls -R
  4. Change to the parent directory: cd ..
  5. Remove the entire shell_practice directory and its contents: rm -r shell-practice
  6. Verify that the directory has been removed: ls

Bonus: Try using the man command to learn more about any of the commands you’ve used.

  • Were there any commands that surprised you?
  • Which commands did you find most useful?

Summary

Today we…

  • Explored the command line’s role in data science and programming
  • Discussed the Unix philosophy and the significance of the shell
  • Covered basic shell commands like pwd, ls, and cd for file system navigation
  • Introduced special symbols such as ~, ., and .. for directory navigation
  • Practiced executing these commands in the shell environment

Next class

  • We will learn a bit more about the command line 🤓
  • More specifically, we will learn about shell scripting to automate tasks and how to deal with text files
  • And after that, we will see how to use Git and GitHub for version control and collaboration

Questions? 😉

Thank you very much and see you next class! 😊 🙏🏼