QTM 350 - Data Science Computing

Lecture 03: Command Line Interface

Danilo Freire

Department of Data and Decision Sciences
Emory University

Recap and lecture overview 📚

Brief recap of last class

Early computing and data representation

  • Computers evolved from people to mechanical calculators to silicon-based machines
  • Modern computers use the Von Neumann architecture, storing both instructions and data in memory
  • Computers represent data using binary (base 2) numbers made up of 0s and 1s
  • A bit is a single binary digit; 8 bits make a byte
  • Hexadecimal (base 16) is a compact way to represent binary, with each hex digit corresponding to 4 bits
  • Abstraction allows representing complex data like images and text using numbers

Brief recap of last class

Representing images, colours and text

  • Images can be broken down into a grid of coloured pixels
  • Colours are represented using the RGB model, with each colour channel (red, green, blue) ranging from 0-255
  • 8-bit color uses 256 levels per channel, allowing for over 16 million possible colors
  • Text is broken into individual characters, with each character mapped to a number using an encoding like ASCII
  • ASCII is a simple lookup table mapping the numbers 0-255 to characters
  • Unicode extends ASCII to support accented characters and symbols from all languages

Brief recap of last class

Programming languages

  • Konrad Zuse created the first programmable computers and high-level programming language in the 1940s
  • Assembly allows writing human-readable instructions that map closely to machine code
  • High-level languages like Python abstract away hardware details and are more portable across systems
  • Low-level languages are harder to read and write but very fast and efficient
  • Compiled languages are converted to machine code before execution; interpreted languages are executed on the fly

Today’s lecture

Command line: the old school way of interacting with computers

  • Today, we will learn about the command line, a text-based interface to interact with computers
  • We will learn some basic commands to navigate the file system, create and delete files, and run programs
  • We will also learn about shell scripting, a way to automate tasks using the command line
  • The command line is still widely used in data science and programming, especially for remote servers, cloud computing, and automation

Questions? 🤓

What is the command line? 💻

A computer in a nutshell

Operating system

Credit Dave Kerr

  • The operating system (OS) is system software that interfaces with (and manages access to) a computer’s hardware. It also provides software resources
  • The OS is divided into the kernel and user space
  • The kernel is the core of the OS. It’s responsible for interfacing with hardware (drivers), managing resources etc. Running software in the kernel is extremely sensitive! That’s why users are kept away from it!
  • Curiosity: You can see the Linux kernel source code on GitHub
  • The user space provides an interface for users, who can run programs/applications on the machine. Hardware access of programmes (e.g., memory usage) is managed by the kernel. Programmes in user space are essentially in sandboxes, which sets a limit to how much damage they can do.

A computer in a nutshell

Kernels and shells

  • The shell is just a general name for any user space program that allows access to resources in the system, via some kind of interface
  • Shells come in many different flavours but are generally provided to aid a human operator in accessing the system. This could be interactively, by typing at a terminal, or via scripts, which are files that contain a sequence of commands
  • Modern computers use graphical user interfaces (GUIs) as the standard tool for human-computer interaction
  • Why “kernel” and “shell”? The kernel is the soft, edible part of a nut or seed, which is surrounded by a shell to protect it. Useful metaphor, innit?

Interacting with the shell

Terminals

Credit Dave Kerr

  • Things are still a bit more complicated
  • We’re not directly interacting with the “shell” but using a terminal
  • A terminal is just a program that reads input from the keyboard, passes that input to another programme, and displays the results on the screen
  • A shell program on its own does not do this - it requires a terminal as an interface
  • Why “terminal”? Back in the old days (before computer screen existed), terminal machines (hardware!) were used to let humans interface with large machines (“mainframes”). Often many terminals were connected to a single machine
  • When you want to work with a computer in a data center (or remotely in cloud computing), you’ll still do pretty much the same

Interacting with the shell

Command line

Credit Dave Kerr

  • Terminals are really quite simple - they’re just text-based interfaces

  • The first thing that a terminal will do is run a shell – a program we can use to operate the computer

  • Back to the shell: the shell usually takes input

    • Interactively from the user via the terminal’s command line
    • Executes scripts (without command line)
  • In interactive mode the shell then returns output

    • To the terminal where it is printed/shown
    • To files or other locations
  • The command line represents what is shown and entered in the terminal. They can be customised (e.g., with colour highlighting) to make interaction more convenient

Shell variants

Bash, Zsh, and others

  • It is important to note that there are many different shell programs, and they differ in terms of functionality
  • On most Unix-like systems, the default shell is a program called bash, which stands for “Bourne-Again Shell”
  • Other examples are the Z Shell (or zsh; default on MacOS), Windows Command Prompt (cmd.exe, the default CLI on MS Windows), Windows PowerShell, C Shell, and many more
  • When a terminal opens, it will immediately start the user’s preferred shell program. (This can be changed.)

Why bother with the shell? 🤷🏻‍♂️

Why bother with the shell?

Why should you use this…

… instead of this?

Why bother with the shell?

The programmer’s best friend

  1. Speed. Typing is fast: A skilled shell user can manipulate a system at dazzling speeds just using a keyboard. Typing commands is generally much faster than exploring through user interfaces with a mouse.

  2. Power. Both for executing commands and for fixing problems. There are some things you just can’t do in an IDE or GUI. It also avoids memory complications associated with certain applications and/or IDEs.

  3. Reproducibility. Scripting is reproducible, while clicking is not.

  4. Portability. A shell can be used to interface to almost any type of computer, from a mainframe to a Raspberry Pi, in a very similar way. The shell is often the only game in town for high performance computing (interacting with servers and super computers).

  5. Automation. Shells are programmable: Working in the shell allows you to program workflows, that is create scripts to automate time-consuming or repetitive processes.

  6. Become a marketable data scientist. Modern programming is often polyglot. The shell provides a common interface for tooling. Modern solutions are often built to run in containers on Linux. In this environment shell knowledge has become very valuable. In short, the shell is having a renaissance in the age of data science.

The Unix philosophy

The Unix philosophy

The shell tools that we’re going to be using have their roots in the Unix family of operating systems originally developed at Bells Labs in the 1970s.

Besides paying homage, acknowledging the Unix lineage is important because these tools still embody the “Unix philosophy”:

Do One Thing And Do It Well

By pairing and chaining well-designed individual components, we can build powerful and much more complex larger systems.

You can see why the Unix philosophy is also referred to as “minimalist and modular”.

Again, this philosophy is very clearly expressed in the design and functionality of the Unix shell.

Things to use the shell for

  • Navigating the file system
  • Version control with Git
  • Renaming and moving files
  • Finding things on your computer
  • Writing and running code
  • Installing and updating software
  • Monitoring system resources
  • Connecting to cloud environments
  • Running analyses (“jobs”) on super computers
  • … and much more!

Shell basics 🐚 🤓

Shell: First look

Let’s open up our shell!

A convenient way to do this is through VS Code’s built-in Terminal.

Click on the View menu, then Terminal. You can also use the shortcut Ctrl + ` (backtick).

Your system default shell is loaded. To find out what that is, type echo $SHELL in the terminal.

$ echo $SHELL
/bin/zsh

It is Z shell in my case

… what about you? It is your turn to find out! 🤓

Your turn!

Of course, it’s always possible to open up the shell directly if you prefer. It’s your turn!

Feel free to check our class tutorial on how to set up your shell in VS Code.

Open your terminal and type the following commands (without the $):

$ echo $SHELL 
$ whoami 
$ pwd 
$ mkdir new-folder
$ cd ..
$ ls
$ man ls # type 'j' to scroll down, 'k' to scroll up, 'q' to quit

Share your results with a colleague (or the class)!

Shell: First look

You should see something like:

$  username@hostname:~$

This is shell-speak for: “Who am I and where am I?”

  • username denotes a specific user (one of potentially many on this computer).

  • @hostname denotes the name of the computer or server.

  • :~ denotes the directory path (where ~ signifies the user’s home directory).

  • $ (or maybe %) denotes the start of the command prompt.

    • (For a special “superuser” called root, the dollar sign will change to a #).
$ whoami
$ pwd
dafreir
/Users/dafreir/Documents/github/qtm350/lectures/lecture-03

Syntax

Syntax

All bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # in long format and human-readable sizes
$ 
$ ls -lh ~/Documents 


$ # sort the file and display only unique lines
$ 
$ sort -u file.txt 

Commands

  • You don’t always need options or arguments

  • For example:

    • ls ~/Documents/ and ls -lh ~/Documents are both valid commands that will yield (different) output
  • However, you always need a command.

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # in long format and human-readable sizes
$ 
$ ls -lh ~/Documents 


$ # sort the file and display only unique lines
$ 
$ sort -u file.txt 

Options (also called Flags)

  • Start with a dash. Usually one letter.

  • Multiple options can be chained under a single dash.

$ ls -l -a -h /var/log # This works
$ ls -lah /var/log # So does this
  • An exception is with (rarer) options requiring two dashes.
$ ls --group-directories-first --human-readable /var/log
  • l: Use a long listing format. This option shows detailed information about the files and directories

  • a: Show all files, including hidden files (those starting with a dot)

  • h: With -l, print sizes in human-readable format (e.g., KB, MB)

  • u: Unique, it filters out the duplicate entries in the output

  • Think it’s difficult to memorise what the individual letters stand for? You’re totally right!

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # with human-readable sizes
$ $ ls -lh ~/Documents/


$ # sort the file and remove duplicates
$ $ sort -u file.txt

Arguments

  • Tell the command what to operate on.

  • Totally depends on the command what legit inputs are.

  • Can be a file, path, a set of files and folders, a string, and more

  • Sometimes more than just one argument is needed:

$ mv figs/cat.png best-figs/cat02.png

Help! 🆘 😟

Multiple ways to get help

  • The man tool can be used to look at the manual page for a topic.

  • The man pages are grouped into sections, we can see them with man man.

  • The cht.sh website can be used directly from the shell to get help on tools. Run it like this: curl cht.sh/command (how to install curl).

  • Or you can use an LLM (or Google) to search for help on a command.
  • There’s a great chance that someone else has already asked the same question you have 😉

Multiple ways to get help

  • You can also install the tldr tool which provides simplified help pages for common commands. Run it like this: tldr command
$ tldr ls
Local database is older than two weeks, attempting to update it...
To prevent automatic updates, set the environment variable TLDR_AUTO_UPDATE_DISABLED.
Successfully updated local database


ls

List directory contents.
More information: <https://www.gnu.org/software/coreutils/manual/html_node/ls-invocation.html>.

- List files one per line:
    ls -1

- List all files, including hidden files:
    ls [-a|--all]

- List files with a trailing symbol to indicate file type (directory/, symbolic_link@, executable*, ...):
    ls [-F|--classify]

- List all files in [l]ong format (permissions, ownership, size, and modification date):
    ls [-la|-l --all]

- List files in [l]ong format with size displayed using human-readable units (KiB, MiB, GiB):
    ls [-lh|-l --human-readable]

- List files in [l]ong format, sorted by [S]ize (descending) recursively:
    ls [-lSR|-lS --recursive]

- List files in [l]ong format, sorted by [t]ime the file was modified and in reverse order (oldest first):
    ls [-ltr|-lt --reverse]

- Only list directories:
    ls [-d|--directory] */

  • For more info on how to get help, see here.

Getting help with man

The man command (“manual pages”) is your friend if you need help.

$ man ls
LS(1)               General Commands Manual          LS(1)

NAME
     ls – list directory contents

SYNOPSIS
     ls [-@ABCFGHILOPRSTUWabcdefghiklmnopqrstuvwxy1%,] [--color=when]
    [-D format] [file ...]

DESCRIPTION
     For each operand that names a file of a type other than directory, ls
     displays its name as well as any requested, associated information.  For
     each operand that names a file of type directory, ls displays the names
     of files contained within that directory, as well as any requested,
     associated information.

     If no operands are given, the contents of the current directory are
     displayed.  If more than one operand is given, non-directory operands are
     displayed first; directory and non-directory operands are sorted
     separately and in lexicographical order.

     The following options are available:

     -@      Display extended attribute keys and sizes in long (-l) output.

     -A      Include directory entries whose names begin with a dot (‘.’)
         except for . and ...  Automatically set for the super-user unless
         -I is specified.

     -B      Force printing of non-printable characters (as defined by
         ctype(3) and current locale settings) in file names as \xxx,
         where xxx is the numeric value of the character in octal.  This
         option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -C      Force multi-column output; this is the default when output is to
         a terminal.

     -D format
         When printing in the long (-l) format, use format to format the
         date and time output.  The argument format is a string used by
         strftime(3).  Depending on the choice of format string, this may
         result in a different number of columns in the output.  This
         option overrides the -T option.  This option is not defined in
         IEEE Std 1003.1-2008 (“POSIX.1”).

     -F      Display a slash (‘/’) immediately after each pathname that is a
         directory, an asterisk (‘*’) after each that is executable, an at
         sign (‘@’) after each symbolic link, an equals sign (‘=’) after
         each socket, a percent sign (‘%’) after each whiteout, and a
         vertical bar (‘|’) after each that is a FIFO.

     -G      Enable colorized output.  This option is equivalent to defining
         CLICOLOR or COLORTERM in the environment and setting
         --color=auto.  (See below.)  This functionality can be compiled
         out by removing the definition of COLORLS.  This option is not
         defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -H      Symbolic links on the command line are followed.  This option is
         assumed if none of the -F, -d, or -l options are specified.

     -I      Prevent -A from being automatically set for the super-user.  This
         option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -L      Follow all symbolic links to final target and list the file or
         directory the link references rather than the link itself.  This
         option cancels the -P option.

     -O      Include the file flags in a long (-l) output.  This option is
         incompatible with IEEE Std 1003.1-2008 (“POSIX.1”).  See
         chflags(1) for a list of file flags and their meanings.

     -P      If argument is a symbolic link, list the link itself rather than
         the object the link references.  This option cancels the -H and
         -L options.

     -R      Recursively list subdirectories encountered.

     -S      Sort by size (largest file first) before sorting the operands in
         lexicographical order.

     -T      When printing in the long (-l) format, display complete time
         information for the file, including month, day, hour, minute,
         second, and year.  The -D option gives even more control over the
         output format.  This option is not defined in IEEE Std
         1003.1-2008 (“POSIX.1”).

     -U      Use time when file was created for sorting or printing.  This
         option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -W      Display whiteouts when scanning directories.  This option is not
         defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -X      When listing recursively, do not descend into directories that
         would cross file system boundaries.  More specifically, this
         option will prevent descending into directories that have a
         different device number.

     -a      Include directory entries whose names begin with a dot (‘.’).

     -b      As -B, but use C escape codes whenever possible.  This option is
         not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -c      Use time when file status was last changed for sorting or
         printing.

     --color=when
         Output colored escape sequences based on when, which may be set
         to either always, auto, or never.

         always will make ls always output color.  If TERM is unset or set
         to an invalid terminal, then ls will fall back to explicit ANSI
         escape sequences without the help of termcap(5).  always is the
         default if --color is specified without an argument.

         auto will make ls output escape sequences based on termcap(5),
         but only if stdout is a tty and either the -G flag is specified
         or the COLORTERM environment variable is set and not empty.

         never will disable color regardless of environment variables.
         never is the default when neither --color nor -G is specified.

         For compatibility with GNU coreutils, ls supports yes or force as
         equivalent to always, no or none as equivalent to never, and tty
         or if-tty as equivalent to auto.

     -d      Directories are listed as plain files (not searched recursively).

     -e      Print the Access Control List (ACL) associated with the file, if
         present, in long (-l) output.

     -f      Output is not sorted.  This option turns on -a.  It also negates
         the effect of the -r, -S and -t options.  As allowed by IEEE Std
         1003.1-2008 (“POSIX.1”), this option has no effect on the -d, -l,
         -R and -s options.

     -g      This option has no effect.  It is only available for
         compatibility with 4.3BSD, where it was used to display the group
         name in the long (-l) format output.  This option is incompatible
         with IEEE Std 1003.1-2008 (“POSIX.1”).

     -h      When used with the -l option, use unit suffixes: Byte, Kilobyte,
         Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the
         number of digits to four or fewer using base 2 for sizes.  This
         option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -i      For each file, print the file's file serial number (inode
         number).

     -k      This has the same effect as setting environment variable
         BLOCKSIZE to 1024, except that it also nullifies any -h options
         to its left.

     -l      (The lowercase letter “ell”.) List files in the long format, as
         described in the The Long Format subsection below.

     -m      Stream output format; list files across the page, separated by
         commas.

     -n      Display user and group IDs numerically rather than converting to
         a user or group name in a long (-l) output.  This option turns on
         the -l option.

     -o      List in long format, but omit the group id.

     -p      Write a slash (‘/’) after each filename if that file is a
         directory.

     -q      Force printing of non-graphic characters in file names as the
         character ‘?’; this is the default when output is to a terminal.

     -r      Reverse the order of the sort.

     -s      Display the number of blocks used in the file system by each
         file.  Block sizes and directory totals are handled as described
         in The Long Format subsection below, except (if the long format
         is not also requested) the directory totals are not output when
         the output is in a single column, even if multi-column output is
         requested.  (-l) format, display complete time information for
         the file, including month, day, hour, minute, second, and year.
         The -D option gives even more control over the output format.
         This option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     -t      Sort by descending time modified (most recently modified first).
         If two files have the same modification timestamp, sort their
         names in ascending lexicographical order.  The -r option reverses
         both of these sort orders.

         Note that these sort orders are contradictory: the time sequence
         is in descending order, the lexicographical sort is in ascending
         order.  This behavior is mandated by IEEE Std 1003.2 (“POSIX.2”).
         This feature can cause problems listing files stored with
         sequential names on FAT file systems, such as from digital
         cameras, where it is possible to have more than one image with
         the same timestamp.  In such a case, the photos cannot be listed
         in the sequence in which they were taken.  To ensure the same
         sort order for time and for lexicographical sorting, set the
         environment variable LS_SAMESORT or use the -y option.  This
         causes ls to reverse the lexicographical sort order when sorting
         files with the same modification timestamp.

     -u      Use time of last access, instead of time of last modification of
         the file for sorting (-t) or long printing (-l).

     -v      Force unedited printing of non-graphic characters; this is the
         default when output is not to a terminal.

     -w      Force raw printing of non-printable characters.  This is the
         default when output is not to a terminal.  This option is not
         defined in IEEE Std 1003.1-2001 (“POSIX.1”).

     -x      The same as -C, except that the multi-column output is produced
         with entries sorted across, rather than down, the columns.

     -y      When the -t option is set, sort the alphabetical output in the
         same order as the time output.  This has the same effect as
         setting LS_SAMESORT.  See the description of the -t option for
         more details.  This option is not defined in IEEE Std 1003.1-2001
         (“POSIX.1”).

     -%      Distinguish dataless files and directories with a '%' character
         in long (-l) output, and don't materialize dataless directories
         when listing them.

     -1      (The numeric digit “one”.) Force output to be one entry per line.
         This is the default when output is not to a terminal.

     -,      (Comma) When the -l option is set, print file sizes grouped and
         separated by thousands using the non-monetary separator returned
         by localeconv(3), typically a comma or period.  If no locale is
         set, or the locale does not have a non-monetary separator, this
         option has no effect.  This option is not defined in IEEE Std
         1003.1-2001 (“POSIX.1”).

     The -1, -C, -x, and -l options all override each other; the last one
     specified determines the format used.

     The -c, -u, and -U options all override each other; the last one
     specified determines the file time used.

     The -S and -t options override each other; the last one specified
     determines the sort order used.

     The -B, -b, -w, and -q options all override each other; the last one
     specified determines the format used for non-printable characters.

     The -H, -L and -P options all override each other (either partially or
     fully); they are applied in the order specified.

     By default, ls lists one entry per line to standard output; the
     exceptions are to terminals or when the -C or -x options are specified.

     File information is displayed with one or more ⟨blank⟩s separating the
     information associated with the -i, -s, and -l options.

   The Long Format
     If the -l option is given, the following information is displayed for
     each file: file mode, number of links, owner name, group name, number of
     bytes in the file, abbreviated month, day-of-month file was last
     modified, hour file last modified, minute file last modified, and the
     pathname.  If the file or directory has extended attributes, the
     permissions field printed by the -l option is followed by a '@'
     character.  Otherwise, if the file or directory has extended security
     information (such as an access control list), the permissions field
     printed by the -l option is followed by a '+' character.  If the -%
     option is given, a '%' character follows the permissions field for
     dataless files and directories, possibly replacing the '@' or '+'
     character.

     If the modification time of the file is more than 6 months in the past or
     future, and the -D or -T are not specified, then the year of the last
     modification is displayed in place of the hour and minute fields.

     If the owner or group names are not a known user or group name, or the -n
     option is given, the numeric ID's are displayed.

     If the file is a character special or block special file, the device
     number for the file is displayed in the size field.  If the file is a
     symbolic link the pathname of the linked-to file is preceded by “->”.

     The listing of a directory's contents is preceded by a labeled total
     number of blocks used in the file system by the files which are listed as
     the directory's contents (which may or may not include . and .. and other
     files which start with a dot, depending on other options).

     The default block size is 512 bytes.  The block size may be set with
     option -k or environment variable BLOCKSIZE.  Numbers of blocks in the
     output will have been rounded up so the numbers of bytes is at least as
     many as used by the corresponding file system blocks (which might have a
     different size).

     The file mode printed under the -l option consists of the entry type and
     the permissions.  The entry type character describes the type of file, as
     follows:

       -     Regular file.
       b     Block special file.
       c     Character special file.
       d     Directory.
       l     Symbolic link.
       p     FIFO.
       s     Socket.
       w     Whiteout.

     The next three fields are three characters each: owner permissions, group
     permissions, and other permissions.  Each field has three character
     positions:

       1.   If r, the file is readable; if -, it is not readable.

       2.   If w, the file is writable; if -, it is not writable.

       3.   The first of the following that applies:

              S     If in the owner permissions, the file is not
                executable and set-user-ID mode is set.  If in the
                group permissions, the file is not executable and
                set-group-ID mode is set.

              s     If in the owner permissions, the file is
                executable and set-user-ID mode is set.  If in the
                group permissions, the file is executable and
                setgroup-ID mode is set.

              x     The file is executable or the directory is
                searchable.

              -     The file is neither readable, writable,
                executable, nor set-user-ID nor set-group-ID mode,
                nor sticky.  (See below.)

        These next two apply only to the third character in the last
        group (other permissions).

              T     The sticky bit is set (mode 1000), but not execute
                or search permission.  (See chmod(1) or
                sticky(7).)

              t     The sticky bit is set (mode 1000), and is
                searchable or executable.  (See chmod(1) or
                sticky(7).)

     The next field contains a plus (‘+’) character if the file has an ACL, or
     a space (‘ ’) if it does not.  The ls utility does not show the actual
     ACL unless the -e option is used in conjunction with the -l option.

ENVIRONMENT
     The following environment variables affect the execution of ls:

     BLOCKSIZE       If this is set, its value, rounded up to 512 or down
             to a multiple of 512, will be used as the block size
             in bytes by the -l and -s options.  See The Long
             Format subsection for more information.

     CLICOLOR        Use ANSI color sequences to distinguish file types.
             See LSCOLORS below.  In addition to the file types
             mentioned in the -F option some extra attributes
             (setuid bit set, etc.) are also displayed.  The
             colorization is dependent on a terminal type with the
             proper termcap(5) capabilities.  The default “cons25”
             console has the proper capabilities, but to display
             the colors in an xterm(1), for example, the TERM
             variable must be set to “xterm-color”.  Other
             terminal types may require similar adjustments.
             Colorization is silently disabled if the output is
             not directed to a terminal unless the CLICOLOR_FORCE
             variable is defined or --color is set to “always”.

     CLICOLOR_FORCE  Color sequences are normally disabled if the output
             is not directed to a terminal.  This can be
             overridden by setting this variable.  The TERM
             variable still needs to reference a color capable
             terminal however otherwise it is not possible to
             determine which color sequences to use.

     COLORTERM       See description for CLICOLOR above.

     COLUMNS         If this variable contains a string representing a
             decimal integer, it is used as the column position
             width for displaying multiple-text-column output.
             The ls utility calculates how many pathname text
             columns to display based on the width provided.  (See
             -C and -x.)

     LANG        The locale to use when determining the order of day
             and month in the long -l format output.  See
             environ(7) for more information.

     LSCOLORS        The value of this variable describes what color to
             use for which attribute when colors are enabled with
             CLICOLOR or COLORTERM.  This string is a
             concatenation of pairs of the format fb, where f is
             the foreground color and b is the background color.

             The color designators are as follows:

                   a     black
                   b     red
                   c     green
                   d     brown
                   e     blue
                   f     magenta
                   g     cyan
                   h     light grey
                   A     bold black, usually shows up as dark grey
                   B     bold red
                   C     bold green
                   D     bold brown, usually shows up as yellow
                   E     bold blue
                   F     bold magenta
                   G     bold cyan
                   H     bold light grey; looks like bright white
                   x     default foreground or background

             Note that the above are standard ANSI colors.  The
             actual display may differ depending on the color
             capabilities of the terminal in use.

             The order of the attributes are as follows:

                   1.   directory
                   2.   symbolic link
                   3.   socket
                   4.   pipe
                   5.   executable
                   6.   block special
                   7.   character special
                   8.   executable with setuid bit set
                   9.   executable with setgid bit set
                   10.  directory writable to others, with sticky
                    bit
                   11.  directory writable to others, without
                    sticky bit
                   12.  dataless file

             The default is "exfxcxdxbxegedabagacadah", i.e., blue
             foreground and default background for regular
             directories, black foreground and red background for
             setuid executables, etc.

     LS_COLWIDTHS    If this variable is set, it is considered to be a
             colon-delimited list of minimum column widths.
             Unreasonable and insufficient widths are ignored
             (thus zero signifies a dynamically sized column).
             Not all columns have changeable widths.  The fields
             are, in order: inode, block count, number of links,
             user name, group name, flags, file size, file name.

     LS_SAMESORT     If this variable is set, the -t option sorts the
             names of files with the same modification timestamp
             in the same sense as the time sort.  See the
             description of the -t option for more details.

     TERM        The CLICOLOR and COLORTERM functionality depends on a
             terminal type with color capabilities.

     TZ          The timezone to use when displaying dates.  See
             environ(7) for more information.

EXIT STATUS
     The ls utility exits 0 on success, and >0 if an error occurs.

EXAMPLES
     List the contents of the current working directory in long format:

       $ ls -l

     In addition to listing the contents of the current working directory in
     long format, show inode numbers, file flags (see chflags(1)), and suffix
     each filename with a symbol representing its file type:

       $ ls -lioF

     List the files in /var/log, sorting the output such that the most
     recently modified entries are printed first:

       $ ls -lt /var/log

COMPATIBILITY
     The group field is now automatically included in the long listing for
     files in order to be compatible with the IEEE Std 1003.2 (“POSIX.2”)
     specification.

LEGACY DESCRIPTION
     In legacy mode, the -f option does not turn on the -a option and the -g,
     -n, and -o options do not turn on the -l option.

     Also, the -o option causes the file flags to be included in a long (-l)
     output; there is no -O option.

     When -H is specified (and not overridden by -L or -P) and a file argument
     is a symlink that resolves to a non-directory file, the output will
     reflect the nature of the link, rather than that of the file.  In legacy
     operation, the output will describe the file.

     For more information about legacy mode, see compat(5).

SEE ALSO
     chflags(1), chmod(1), sort(1), xterm(1), localeconv(3), strftime(3),
     strmode(3), compat(5), termcap(5), sticky(7), symlink(7)

STANDARDS
     With the exception of options -g, -n and -o, the ls utility conforms to
     IEEE Std 1003.1-2001 (“POSIX.1”) and IEEE Std 1003.1-2008 (“POSIX.1”).
     The options -B, -D, -G, -I, -T, -U, -W, -Z, -b, -h, -w, -y and -, are
     non-standard extensions.

     The ACL support is compatible with IEEE Std 1003.2c (“POSIX.2c”) Draft 17
     (withdrawn).

HISTORY
     An ls command appeared in Version 1 AT&T UNIX.

BUGS
     To maintain backward compatibility, the relationships between the many
     options are quite complex.

     The exception mentioned in the -s option description might be a feature
     that was based on the fact that single-column output usually goes to
     something other than a terminal.  It is debatable whether this is a
     design bug.

     IEEE Std 1003.2 (“POSIX.2”) mandates opposite sort orders for files with
     the same timestamp when sorting with the -t option.

macOS 15.6          August 31, 2020             macOS 15.6

Getting help with man

Manual pages are shown in the shell. Here are the essentials to navigate through contents presented in the pager:

  • d - Scroll down half a page
  • u - Scroll up half a page
  • j / k - Scroll down or up a line. You can also use the arrow keys for this
  • q - Quit
  • /pattern - Search for text provided as “pattern”
  • n - When searching, find the next occurrence
  • N - When searching, find the previous occurrence
  • These and other man tricks are detailed in the help pages (hit “h” when you’re in the pager for an overview).

RTFM!

Always check the documentation!

man page explorer challenge

Partner up and choose a command from the list below. Use man to complete these tasks:

$ # Choose one (or all) of the following commands:
$ ls, cd, cp, mv, rm, mkdir, rmdir, touch, cat, find
  1. Summarise the command’s purpose in one sentence.
  2. Find an interesting option and explain what it does.
  3. Create an example using your command with at least two options.
  4. Bonus: Combine your command with your partner’s in a single line.

You have about 5-10 minutes. Be ready to share your findings!

Reflection: How was using man compared to online searches? How might you use it in future projects?

More navigation commands: A cheat sheet

  • ls (list): Show files and directories in the current directory
  • ls -l: Long listing format with detailed information
  • ls -a: Show hidden files (those starting with a dot)
  • ls -lh: Long listing format with human-readable sizes
  • ls -R: List subdirectories recursively
  • pwd (print working directory): Show the current directory path
  • cd (change directory): Change the current working directory
  • cd -: Go back to the previous directory
  • .: Refers to the current directory
  • ..: Refers to the parent directory
  • ~: Refers to the home directory
  • mkdir: Create a new directory
  • touch: Create a new empty file or update timestamps
  • cp: Copy files or directories
  • mv: Move or rename files or directories
  • rm: Remove files (use with caution!)
  • rmdir: Remove empty directories
  • cat: Display file contents
  • find: Search for files and directories

For a detailed overview, click here

Shell navigation exercise

Follow these steps to practice using basic shell commands. Type each command and observe the results.

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “practice” and change into it: mkdir practice && cd practice
  3. Create two empty files called “file1.txt” and “file2.txt”: touch file1.txt file2.txt
  4. List the contents of the directory: ls
  5. Move file2.txt to a new name (rename), file3.txt: mv file2.txt file3.txt
  6. List the contents again to verify the change, then return to the home directory: ls && cd ~
  7. Remove the “practice” directory and its contents: rm -rf practice
  8. Verify that the directory has been removed: ls

Shell navigation exercise to try at home

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “shell_practice”: mkdir shell-practice
  3. Change into the new directory: cd shell-practice
  4. Create three empty files called file1.txt, file2.txt, and file3.txt: touch file1.txt file2.txt file3.txt
  5. List the contents of the directory: ls
  6. Create a subdirectory called “subdir”: mkdir subdir
  1. Move file2.txt into the subdirectory: mv file2.txt subdir/
  2. Copy file1.txt to a new file called file4.txt: cp file1.txt file4.txt
  3. List the contents of the current directory and the subdirectory: ls -R
  4. Change to the parent directory: cd ..
  5. Remove the entire shell_practice directory and its contents: rm -r shell-practice
  6. Verify that the directory has been removed: ls

Bonus: Try using the man command to learn more about any of the commands you’ve used.

  • Were there any commands that surprised you?
  • Which commands did you find most useful?

Summary

Today we…

  • Explored the command line’s role in data science and programming
  • Discussed the Unix philosophy and the significance of the shell
  • Covered basic shell commands like pwd, ls, and cd for file system navigation
  • Introduced special symbols such as ~, ., and .. for directory navigation
  • Practiced executing these commands in the shell environment

Next class

  • We will learn a bit more about the command line 🤓
  • More specifically, we will learn about shell scripting to automate tasks and how to deal with text files
  • And after that, we will see how to use Git and GitHub for version control and collaboration

Questions? 😉

Thank you very much and see you next class! 😊 🙏🏼