QTM 350 - Data Science Computing

Lecture 03: Command Line Interface

Danilo Freire

Department of Quantitative Theory and Methods
Emory University

09 September, 2024

Recap and lecture overview 📚

Brief recap of last class

Early computing and data representation

  • Computers evolved from people to mechanical calculators to silicon-based machines
  • Modern computers use the Von Neumann architecture, storing both instructions and data in memory
  • Computers represent data using binary (base 2) numbers made up of 0s and 1s
  • A bit is a single binary digit; 8 bits make a byte
  • Hexadecimal (base 16) is a compact way to represent binary, with each hex digit corresponding to 4 bits
  • Abstraction allows representing complex data like images and text using numbers

Brief recap of last class

Representing images, colours and text

  • Images can be broken down into a grid of coloured pixels
  • Colours are represented using the RGB model, with each colour channel (red, green, blue) ranging from 0-255
  • 8-bit color uses 256 levels per channel, allowing for over 16 million possible colors
  • Text is broken into individual characters, with each character mapped to a number using an encoding like ASCII
  • ASCII is a simple lookup table mapping the numbers 0-255 to characters
  • Unicode extends ASCII to support accented characters and symbols from all languages

Brief recap of last class

Programming languages

  • Konrad Zuse created the first programmable computers and high-level programming language in the 1940s
  • Assembly allows writing human-readable instructions that map closely to machine code
  • High-level languages like Python abstract away hardware details and are more portable across systems
  • Low-level languages are harder to read and write but very fast and efficient
  • Compiled languages are converted to machine code before execution; interpreted languages are executed on the fly

Today’s lecture

Command line: the old school way of interacting with computers

  • Today, we will learn about the command line, a text-based interface to interact with computers
  • We will learn some basic commands to navigate the file system, create and delete files, and run programs
  • We will also learn about shell scripting, a way to automate tasks using the command line
  • The command line is still widely used in data science and programming, especially for remote servers, cloud computing, and automation

Questions? 🤓

What is the command line? 💻

A computer in a nutshell

Operating system

Credit Dave Kerr

  • The operating system (OS) is system software that interfaces with (and manages access to) a computer’s hardware. It also provides software resources
  • The OS is divided into the kernel and user space
  • The kernel is the core of the OS. It’s responsible for interfacing with hardware (drivers), managing resources etc. Running software in the kernel is extremely sensitive! That’s why users are kept away from it!
  • The user space provides an interface for users, who can run programs/applications on the machine. Hardware access of programmes (e.g., memory usage) is managed by the kernel. Programmes in user space are essentially in sandboxes, which sets a limit to how much damage they can do.

A computer in a nutshell

Kernels and shells

  • The shell is just a general name for any user space program that allows access to resources in the system, via some kind of interface
  • Shells come in many different flavours but are generally provided to aid a human operator in accessing the system. This could be interactively, by typing at a terminal, or via scripts, which are files that contain a sequence of commands
  • Modern computers use graphical user interfaces (GUIs) as the standard tool for human-computer interaction
  • Why “kernel” and “shell”? The kernel is the soft, edible part of a nut or seed, which is surrounded by a shell to protect it. Useful metaphor, innit?

Interacting with the shell

Terminals

Credit Dave Kerr

  • Things are still a bit more complicated
  • We’re not directly interacting with the “shell” but using a terminal
  • A terminal is just a program that reads input from the keyboard, passes that input to another programme, and displays the results on the screen
  • A shell program on its own does not do this - it requires a terminal as an interface
  • Why “terminal”? Back in the old days (before computer screen existed), terminal machines (hardware!) were used to let humans interface with large machines (“mainframes”). Often many terminals were connected to a single machine
  • When you want to work with a computer in a data center (or remotely in cloud computing), you’ll still do pretty much the same

Interacting with the shell

Command line

Credit Dave Kerr

  • Terminals are really quite simple - they’re just interfaces

  • The first thing that a terminal will do is run a shell - a programme we can use to operate the computer

  • Back to the shell: the shell usually takes input

    • Interactively from the user via the terminal’s command line
    • Executes scripts (without command line)
  • In interactive mode the shell then returns output

    • To the terminal where it is printed/shown
    • To files or other locations
  • The command line represents what is shown and entered in the terminal. They can be customised (e.g., with colour highlighting) to make interaction more convenient

Shell variants

Bash, Zsh, and others

  • It is important to note that there are many different shell programmes, and they differ in terms of functionality
  • On most Unix-like systems, the default shell is a program called bash, which stands for “Bourne Again Shell”
  • Other examples are the Z Shell (or zsh; default on MacOS), Windows Command Prompt (cmd.exe, the default CLI on MS Windows), Windows PowerShell, C Shell, and many more
  • When a terminal opens, it will immediately start the user’s preferred shell programme. (This can be changed.)

Why bother with the shell? 🤷🏻‍♂️

Why bother with the shell?

Why should you use this…

… instead of this?

Why bother with the shell?

The programmer’s best friend

  1. Speed. Typing is fast: A skilled shell user can manipulate a system at dazzling speeds just using a keyboard. Typing commands is generally much faster than exploring through user interfaces with a mouse.

  2. Power. Both for executing commands and for fixing problems. There are some things you just can’t do in an IDE or GUI. It also avoids memory complications associated with certain applications and/or IDEs.

  3. Reproducibility. Scripting is reproducible, while clicking is not.

  4. Portability. A shell can be used to interface to almost any type of computer, from a mainframe to a Raspberry Pi, in a very similar way. The shell is often the only game in town for high performance computing (interacting with servers and super computers).

  5. Automation. Shells are programmable: Working in the shell allows you to program workflows, that is create scripts to automate time-consuming or repetitive processes.

  6. Become a marketable data scientist. Modern programming is often polyglot. The shell provides a common interface for tooling. Modern solutions are often built to run in containers on Linux. In this environment shell knowledge has become very valuable. In short, the shell is having a renaissance in the age of data science.

The Unix philosophy

The Unix philosophy

The shell tools that we’re going to be using have their roots in the Unix family of operating systems originally developed at Bells Labs in the 1970s.

Besides paying homage, acknowledging the Unix lineage is important because these tools still embody the “Unix philosophy”:

Do One Thing And Do It Well

By pairing and chaining well-designed individual components, we can build powerful and much more complex larger systems.

You can see why the Unix philosophy is also referred to as “minimalist and modular”.

Again, this philosophy is very clearly expressed in the design and functionality of the Unix shell.

Things to use the shell for

  • Navigating the file system
  • Version control with Git
  • Renaming and moving files
  • Finding things on your computer
  • Writing and running code
  • Installing and updating software
  • Monitoring system resources
  • Connecting to cloud environments
  • Running analyses (“jobs”) on super computers
  • … and much more!

Shell basics 🐚 🤓

Shell: First look

Let’s open up our shell!

A convenient way to do this is through VS Code’s built-in Terminal.

Click on the View menu, then Terminal. You can also use the shortcut Ctrl + ` (backtick).

Your system default shell is loaded. To find out what that is, type echo $SHELL in the terminal.

$ echo $SHELL
/bin/zsh

It is Z shell in my case

… what about you? It is your turn to find out!

Your turn!

Of course, it’s always possible to open up the shell directly if you prefer. It’s your turn!

Feel free to check our class tutorial on how to set up your shell in VS Code.

Open your terminal and type the following commands (without the $):

$ echo $SHELL 
$ whoami 
$ pwd 
$ mkdir new-folder
$ cd ..
$ ls
$ man ls # type 'j' to scroll down, 'k' to scroll up, 'q' to quit

Share your results with a colleague (or the class)!

Shell: First look

You should see something like:

$  username@hostname:~$

This is shell-speak for: “Who am I and where am I?”

  • username denotes a specific user (one of potentially many on this computer).

  • @hostname denotes the name of the computer or server.

  • :~ denotes the directory path (where ~ signifies the user’s home directory).

  • $ (or maybe %) denotes the start of the command prompt.

    • (For a special “superuser” called root, the dollar sign will change to a #).
$ whoami
$ pwd
politicaltheory
/Users/politicaltheory/Documents/github/qtm350/lectures/lecture-03

Syntax

Syntax

All bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # with human-readable sizes
$ 
$ ls -lh ~/Documents 


$ # sort the file and remove duplicates
$ 
$ sort -u file.txt 

Commands

  • You don’t always need options or arguments

  • For example:

    • ls ~/Documents/ and ls -lh ~/Documents are both valid commands that will yield (different) output
  • However, you always need a command.

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # with human-readable sizes
$ 
$ ls -lh ~/Documents 


$ # sort the file and remove duplicates
$ 
$ sort -u file.txt 

Options (also called Flags)

  • Start with a dash. Usually one letter.

  • Multiple options can be chained under a single dash.

$ ls -l -a -h /var/log # This works
$ ls -lah /var/log # So does this
  • An exception is with (rarer) options requiring two dashes.
$ ls --group-directories-first --human-readable /var/log
  • l: Use a long listing format. This option shows detailed information about the files and directories

  • h: With -l, print sizes in human-readable format (e.g., KB, MB)

  • u: Unique, it filters out the duplicate entries in the output

  • Think it’s difficult to memorize what the individual letters stand for? You’re totally right!

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

$ # list files in the Documents directory 
$ # with human-readable sizes
$ $ ls -lh ~/Documents/


$ # sort the file and remove duplicates
$ $ sort -u file.txt

Arguments

  • Tell the command what to operate on.

  • Totally depends on the command what legit inputs are.

  • Can be a file, path, a set of files and folders, a string, and more

  • Sometimes more than just one argument is needed:

$ mv figs/cat.png best-figs/cat02.png

Help! 🆘 😟

Multiple ways to get help

  • The man tool can be used to look at the manual page for a topic.

  • The man pages are grouped into sections, we can see them with man man.

  • The cht.sh website can be used directly from the shell to get help on tools. Run it like this: curl cht.sh/command

Multiple ways to get help

  • You can also install the tldr tool which provides simplified help pages for common commands. Run it like this: tldr command
$ tldr ls

  ls

  List directory contents.
  More information: https://www.gnu.org/software/coreutils/ls.

  List files one per line:

    ls -1

  List all files, including hidden files:

    ls -a

  List all files, with trailing / added to directory names:

    ls -F

  Long format list (permissions, ownership, size, and modification date) of all files:

    ls -la

  Long format list with size displayed using human-readable units (KiB, MiB, GiB):

    ls -lh

  Long format list sorted by size (descending) recursively:

    ls -lSR

  Long format list of all files, sorted by modification date (oldest first):

    ls -ltr

  Only list directories:

    ls -d */

  • For more info on how to get help, see here.

Getting help with man

The man command (“manual pages”) is your friend if you need help.

$ man ls
LS(1)                        General Commands Manual                       LS(1)

NNAAMMEE
     llss – list directory contents

SSYYNNOOPPSSIISS
     llss [--@@AABBCCFFGGHHIILLOOPPRRSSTTUUWWaabbccddeeffgghhiikkllmmnnooppqqrrssttuuvvwwxxyy11%%,,] [----ccoolloorr=_w_h_e_n]
        [--DD _f_o_r_m_a_t] [_f_i_l_e _._._.]

DDEESSCCRRIIPPTTIIOONN
     For each operand that names a _f_i_l_e of a type other than directory, llss
     displays its name as well as any requested, associated information.  For
     each operand that names a _f_i_l_e of type directory, llss displays the names of
     files contained within that directory, as well as any requested, associated
     information.

     If no operands are given, the contents of the current directory are
     displayed.  If more than one operand is given, non-directory operands are
     displayed first; directory and non-directory operands are sorted separately
     and in lexicographical order.

     The following options are available:

     --@@      Display extended attribute keys and sizes in long (--ll) output.

     --AA      Include directory entries whose names begin with a dot (‘_.’) except
             for _. and _._..  Automatically set for the super-user unless --II is
             specified.

     --BB      Force printing of non-printable characters (as defined by ctype(3)
             and current locale settings) in file names as \_x_x_x, where _x_x_x is
             the numeric value of the character in octal.  This option is not
             defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --CC      Force multi-column output; this is the default when output is to a
             terminal.

     --DD _f_o_r_m_a_t
             When printing in the long (--ll) format, use _f_o_r_m_a_t to format the
             date and time output.  The argument _f_o_r_m_a_t is a string used by
             strftime(3).  Depending on the choice of format string, this may
             result in a different number of columns in the output.  This option
             overrides the --TT option.  This option is not defined in IEEE Std
             1003.1-2008 (“POSIX.1”).

     --FF      Display a slash (‘/’) immediately after each pathname that is a
             directory, an asterisk (‘*’) after each that is executable, an at
             sign (‘@’) after each symbolic link, an equals sign (‘=’) after
             each socket, a percent sign (‘%’) after each whiteout, and a
             vertical bar (‘|’) after each that is a FIFO.

     --GG      Enable colorized output.  This option is equivalent to defining
             CLICOLOR or COLORTERM in the environment and setting ----ccoolloorr=_a_u_t_o.
             (See below.)  This functionality can be compiled out by removing
             the definition of COLORLS.  This option is not defined in IEEE Std
             1003.1-2008 (“POSIX.1”).

     --HH      Symbolic links on the command line are followed.  This option is
             assumed if none of the --FF, --dd, or --ll options are specified.

     --II      Prevent --AA from being automatically set for the super-user.  This
             option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --LL      Follow all symbolic links to final target and list the file or
             directory the link references rather than the link itself.  This
             option cancels the --PP option.

     --OO      Include the file flags in a long (--ll) output.  This option is
             incompatible with IEEE Std 1003.1-2008 (“POSIX.1”).  See chflags(1)
             for a list of file flags and their meanings.

     --PP      If argument is a symbolic link, list the link itself rather than
             the object the link references.  This option cancels the --HH and --LL
             options.

     --RR      Recursively list subdirectories encountered.

     --SS      Sort by size (largest file first) before sorting the operands in
             lexicographical order.

     --TT      When printing in the long (--ll) format, display complete time
             information for the file, including month, day, hour, minute,
             second, and year.  The --DD option gives even more control over the
             output format.  This option is not defined in IEEE Std 1003.1-2008
             (“POSIX.1”).

     --UU      Use time when file was created for sorting or printing.  This
             option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --WW      Display whiteouts when scanning directories.  This option is not
             defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --aa      Include directory entries whose names begin with a dot (‘_.’).

     --bb      As --BB, but use C escape codes whenever possible.  This option is
             not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --cc      Use time when file status was last changed for sorting or printing.

     ----ccoolloorr=_w_h_e_n
             Output colored escape sequences based on _w_h_e_n, which may be set to
             either aallwwaayyss, aauuttoo, or nneevveerr.

             aallwwaayyss will make llss always output color.  If TERM is unset or set
             to an invalid terminal, then llss will fall back to explicit ANSI
             escape sequences without the help of termcap(5).  aallwwaayyss is the
             default if ----ccoolloorr is specified without an argument.

             aauuttoo will make llss output escape sequences based on termcap(5), but
             only if stdout is a tty and either the --GG flag is specified or the
             COLORTERM environment variable is set and not empty.

             nneevveerr will disable color regardless of environment variables.
             nneevveerr is the default when neither ----ccoolloorr nor --GG is specified.

             For compatibility with GNU coreutils, llss supports yyeess or ffoorrccee as
             equivalent to aallwwaayyss, nnoo or nnoonnee as equivalent to nneevveerr, and ttttyy or
             iiff--ttttyy as equivalent to aauuttoo.

     --dd      Directories are listed as plain files (not searched recursively).

     --ee      Print the Access Control List (ACL) associated with the file, if
             present, in long (--ll) output.

     --ff      Output is not sorted.  This option turns on --aa.  It also negates
             the effect of the --rr, --SS and --tt options.  As allowed by IEEE Std
             1003.1-2008 (“POSIX.1”), this option has no effect on the --dd, --ll,
             --RR and --ss options.

     --gg      This option has no effect.  It is only available for compatibility
             with 4.3BSD, where it was used to display the group name in the
             long (--ll) format output.  This option is incompatible with IEEE Std
             1003.1-2008 (“POSIX.1”).

     --hh      When used with the --ll option, use unit suffixes: Byte, Kilobyte,
             Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the
             number of digits to four or fewer using base 2 for sizes.  This
             option is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --ii      For each file, print the file's file serial number (inode number).

     --kk      This has the same effect as setting environment variable BLOCKSIZE
             to 1024, except that it also nullifies any --hh options to its left.

     --ll      (The lowercase letter “ell”.) List files in the long format, as
             described in the _T_h_e _L_o_n_g _F_o_r_m_a_t subsection below.

     --mm      Stream output format; list files across the page, separated by
             commas.

     --nn      Display user and group IDs numerically rather than converting to a
             user or group name in a long (--ll) output.  This option turns on the
             --ll option.

     --oo      List in long format, but omit the group id.

     --pp      Write a slash (‘/’) after each filename if that file is a
             directory.

     --qq      Force printing of non-graphic characters in file names as the
             character ‘?’; this is the default when output is to a terminal.

     --rr      Reverse the order of the sort.

     --ss      Display the number of blocks used in the file system by each file.
             Block sizes and directory totals are handled as described in _T_h_e
             _L_o_n_g _F_o_r_m_a_t subsection below, except (if the long format is not
             also requested) the directory totals are not output when the output
             is in a single column, even if multi-column output is requested.
             (--ll) format, display complete time information for the file,
             including month, day, hour, minute, second, and year.  The --DD
             option gives even more control over the output format.  This option
             is not defined in IEEE Std 1003.1-2008 (“POSIX.1”).

     --tt      Sort by descending time modified (most recently modified first).
             If two files have the same modification timestamp, sort their names
             in ascending lexicographical order.  The --rr option reverses both of
             these sort orders.

             Note that these sort orders are contradictory: the time sequence is
             in descending order, the lexicographical sort is in ascending
             order.  This behavior is mandated by IEEE Std 1003.2 (“POSIX.2”).
             This feature can cause problems listing files stored with
             sequential names on FAT file systems, such as from digital cameras,
             where it is possible to have more than one image with the same
             timestamp.  In such a case, the photos cannot be listed in the
             sequence in which they were taken.  To ensure the same sort order
             for time and for lexicographical sorting, set the environment
             variable LS_SAMESORT or use the --yy option.  This causes llss to
             reverse the lexicographical sort order when sorting files with the
             same modification timestamp.

     --uu      Use time of last access, instead of time of last modification of
             the file for sorting (--tt) or long printing (--ll).

     --vv      Force unedited printing of non-graphic characters; this is the
             default when output is not to a terminal.

     --ww      Force raw printing of non-printable characters.  This is the
             default when output is not to a terminal.  This option is not
             defined in IEEE Std 1003.1-2001 (“POSIX.1”).

     --xx      The same as --CC, except that the multi-column output is produced
             with entries sorted across, rather than down, the columns.

     --yy      When the --tt option is set, sort the alphabetical output in the same
             order as the time output.  This has the same effect as setting
             LS_SAMESORT.  See the description of the --tt option for more
             details.  This option is not defined in IEEE Std 1003.1-2001
             (“POSIX.1”).

     --%%      Distinguish dataless files and directories with a '%' character in
             long

     --11      (The numeric digit “one”.) Force output to be one entry per line.
             This is the default when output is not to a terminal.  (--ll) output,
             and don't materialize dataless directories when listing them.

     --,      (Comma) When the --ll option is set, print file sizes grouped and
             separated by thousands using the non-monetary separator returned by
             localeconv(3), typically a comma or period.  If no locale is set,
             or the locale does not have a non-monetary separator, this option
             has no effect.  This option is not defined in IEEE Std 1003.1-2001
             (“POSIX.1”).

     The --11, --CC, --xx, and --ll options all override each other; the last one
     specified determines the format used.

     The --cc, --uu, and --UU options all override each other; the last one specified
     determines the file time used.

     The --SS and --tt options override each other; the last one specified
     determines the sort order used.

     The --BB, --bb, --ww, and --qq options all override each other; the last one
     specified determines the format used for non-printable characters.

     The --HH, --LL and --PP options all override each other (either partially or
     fully); they are applied in the order specified.

     By default, llss lists one entry per line to standard output; the exceptions
     are to terminals or when the --CC or --xx options are specified.

     File information is displayed with one or more ⟨blank⟩s separating the
     information associated with the --ii, --ss, and --ll options.

   TThhee LLoonngg FFoorrmmaatt
     If the --ll option is given, the following information is displayed for each
     file: file mode, number of links, owner name, group name, number of bytes
     in the file, abbreviated month, day-of-month file was last modified, hour
     file last modified, minute file last modified, and the pathname.  If the
     file or directory has extended attributes, the permissions field printed by
     the --ll option is followed by a '@' character.  Otherwise, if the file or
     directory has extended security information (such as an access control
     list), the permissions field printed by the --ll option is followed by a '+'
     character.  If the --%% option is given, a '%' character follows the
     permissions field for dataless files and directories, possibly replacing
     the '@' or '+' character.

     If the modification time of the file is more than 6 months in the past or
     future, and the --DD or --TT are not specified, then the year of the last
     modification is displayed in place of the hour and minute fields.

     If the owner or group names are not a known user or group name, or the --nn
     option is given, the numeric ID's are displayed.

     If the file is a character special or block special file, the device number
     for the file is displayed in the size field.  If the file is a symbolic
     link the pathname of the linked-to file is preceded by “->”.

     The listing of a directory's contents is preceded by a labeled total number
     of blocks used in the file system by the files which are listed as the
     directory's contents (which may or may not include _. and _._. and other files
     which start with a dot, depending on other options).

     The default block size is 512 bytes.  The block size may be set with option
     --kk or environment variable BLOCKSIZE.  Numbers of blocks in the output will
     have been rounded up so the numbers of bytes is at least as many as used by
     the corresponding file system blocks (which might have a different size).

     The file mode printed under the --ll option consists of the entry type and
     the permissions.  The entry type character describes the type of file, as
     follows:

           --     Regular file.
           bb     Block special file.
           cc     Character special file.
           dd     Directory.
           ll     Symbolic link.
           pp     FIFO.
           ss     Socket.
           ww     Whiteout.

     The next three fields are three characters each: owner permissions, group
     permissions, and other permissions.  Each field has three character
     positions:

           1.   If rr, the file is readable; if --, it is not readable.

           2.   If ww, the file is writable; if --, it is not writable.

           3.   The first of the following that applies:

                      SS     If in the owner permissions, the file is not
                            executable and set-user-ID mode is set.  If in the
                            group permissions, the file is not executable and
                            set-group-ID mode is set.

                      ss     If in the owner permissions, the file is executable
                            and set-user-ID mode is set.  If in the group
                            permissions, the file is executable and setgroup-ID
                            mode is set.

                      xx     The file is executable or the directory is
                            searchable.

                      --     The file is neither readable, writable, executable,
                            nor set-user-ID nor set-group-ID mode, nor sticky.
                            (See below.)

                These next two apply only to the third character in the last
                group (other permissions).

                      TT     The sticky bit is set (mode 1000), but not execute
                            or search permission.  (See chmod(1) or sticky(7).)

                      tt     The sticky bit is set (mode 1000), and is searchable
                            or executable.  (See chmod(1) or sticky(7).)

     The next field contains a plus (‘+’) character if the file has an ACL, or a
     space (‘ ’) if it does not.  The llss utility does not show the actual ACL;
     use getfacl(1) to do this.

EENNVVIIRROONNMMEENNTT
     The following environment variables affect the execution of llss:

     BLOCKSIZE           If this is set, its value, rounded up to 512 or down to
                         a multiple of 512, will be used as the block size in
                         bytes by the --ll and --ss options.  See _T_h_e _L_o_n_g _F_o_r_m_a_t
                         subsection for more information.

     CLICOLOR            Use ANSI color sequences to distinguish file types.
                         See LSCOLORS below.  In addition to the file types
                         mentioned in the --FF option some extra attributes
                         (setuid bit set, etc.) are also displayed.  The
                         colorization is dependent on a terminal type with the
                         proper termcap(5) capabilities.  The default “cons25”
                         console has the proper capabilities, but to display the
                         colors in an xterm(1), for example, the TERM variable
                         must be set to “xterm-color”.  Other terminal types may
                         require similar adjustments.  Colorization is silently
                         disabled if the output is not directed to a terminal
                         unless the CLICOLOR_FORCE variable is defined or
                         ----ccoolloorr is set to “always”.

     CLICOLOR_FORCE      Color sequences are normally disabled if the output is
                         not directed to a terminal.  This can be overridden by
                         setting this variable.  The TERM variable still needs
                         to reference a color capable terminal however otherwise
                         it is not possible to determine which color sequences
                         to use.

     COLORTERM           See description for CLICOLOR above.

     COLUMNS             If this variable contains a string representing a
                         decimal integer, it is used as the column position
                         width for displaying multiple-text-column output.  The
                         llss utility calculates how many pathname text columns to
                         display based on the width provided.  (See --CC and --xx.)

     LANG                The locale to use when determining the order of day and
                         month in the long --ll format output.  See environ(7) for
                         more information.

     LSCOLORS            The value of this variable describes what color to use
                         for which attribute when colors are enabled with
                         CLICOLOR or COLORTERM.  This string is a concatenation
                         of pairs of the format _f_b, where _f is the foreground
                         color and _b is the background color.

                         The color designators are as follows:

                               aa     black
                               bb     red
                               cc     green
                               dd     brown
                               ee     blue
                               ff     magenta
                               gg     cyan
                               hh     light grey
                               AA     bold black, usually shows up as dark grey
                               BB     bold red
                               CC     bold green
                               DD     bold brown, usually shows up as yellow
                               EE     bold blue
                               FF     bold magenta
                               GG     bold cyan
                               HH     bold light grey; looks like bright white
                               xx     default foreground or background

                         Note that the above are standard ANSI colors.  The
                         actual display may differ depending on the color
                         capabilities of the terminal in use.

                         The order of the attributes are as follows:

                               1.   directory
                               2.   symbolic link
                               3.   socket
                               4.   pipe
                               5.   executable
                               6.   block special
                               7.   character special
                               8.   executable with setuid bit set
                               9.   executable with setgid bit set
                               10.  directory writable to others, with sticky
                                    bit
                               11.  directory writable to others, without sticky
                                    bit

                         The default is "exfxcxdxbxegedabagacad", i.e., blue
                         foreground and default background for regular
                         directories, black foreground and red background for
                         setuid executables, etc.

     LS_COLWIDTHS        If this variable is set, it is considered to be a
                         colon-delimited list of minimum column widths.
                         Unreasonable and insufficient widths are ignored (thus
                         zero signifies a dynamically sized column).  Not all
                         columns have changeable widths.  The fields are, in
                         order: inode, block count, number of links, user name,
                         group name, flags, file size, file name.

     LS_SAMESORT         If this variable is set, the --tt option sorts the names
                         of files with the same modification timestamp in the
                         same sense as the time sort.  See the description of
                         the --tt option for more details.

     TERM                The CLICOLOR and COLORTERM functionality depends on a
                         terminal type with color capabilities.

     TZ                  The timezone to use when displaying dates.  See
                         environ(7) for more information.

EEXXIITT SSTTAATTUUSS
     The llss utility exits 0 on success, and >0 if an error occurs.

EEXXAAMMPPLLEESS
     List the contents of the current working directory in long format:

           $ ls -l

     In addition to listing the contents of the current working directory in
     long format, show inode numbers, file flags (see chflags(1)), and suffix
     each filename with a symbol representing its file type:

           $ ls -lioF

     List the files in _/_v_a_r_/_l_o_g, sorting the output such that the most recently
     modified entries are printed first:

           $ ls -lt /var/log

CCOOMMPPAATTIIBBIILLIITTYY
     The group field is now automatically included in the long listing for files
     in order to be compatible with the IEEE Std 1003.2 (“POSIX.2”)
     specification.

LLEEGGAACCYY DDEESSCCRRIIPPTTIIOONN
     In legacy mode, the --ff option does not turn on the --aa option and the --gg,
     --nn, and --oo options do not turn on the --ll option.

     Also, the --oo option causes the file flags to be included in a long (-l)
     output; there is no --OO option.

     When --HH is specified (and not overridden by --LL or --PP) and a file argument
     is a symlink that resolves to a non-directory file, the output will reflect
     the nature of the link, rather than that of the file.  In legacy operation,
     the output will describe the file.

     For more information about legacy mode, see compat(5).

SSEEEE AALLSSOO
     chflags(1), chmod(1), getfacl(1), sort(1), xterm(1), localeconv(3),
     strftime(3), strmode(3), compat(5), termcap(5), sticky(7), symlink(7)

SSTTAANNDDAARRDDSS
     With the exception of options --gg, --nn and --oo, the llss utility conforms to
     IEEE Std 1003.1-2001 (“POSIX.1”) and IEEE Std 1003.1-2008 (“POSIX.1”).  The
     options --BB, --DD, --GG, --II, --TT, --UU, --WW, --ZZ, --bb, --hh, --ww, --yy and --, are non-
     standard extensions.

     The ACL support is compatible with IEEE Std 1003.2c (“POSIX.2c”) Draft 17
     (withdrawn).

HHIISSTTOORRYY
     An llss command appeared in Version 1 AT&T UNIX.

BBUUGGSS
     To maintain backward compatibility, the relationships between the many
     options are quite complex.

     The exception mentioned in the --ss option description might be a feature
     that was based on the fact that single-column output usually goes to
     something other than a terminal.  It is debatable whether this is a design
     bug.

     IEEE Std 1003.2 (“POSIX.2”) mandates opposite sort orders for files with
     the same timestamp when sorting with the --tt option.

macOS 12.7                       August 31, 2020                      macOS 12.7

Getting help with man

Manual pages are shown in the shell. Here are the essentials to navigate through contents presented in the pager:

  • d - Scroll down half a page
  • u - Scroll up half a page
  • j / k - Scroll down or up a line. You can also use the arrow keys for this
  • q - Quit
  • /pattern - Search for text provided as “pattern”
  • n - When searching, find the next occurrence
  • N - When searching, find the previous occurrence
  • These and other man tricks are detailed in the help pages (hit “h” when you’re in the pager for an overview).

RTFM!

Always check the documentation!

man page explorer challenge

Partner up and choose a command from the list below. Use man to complete these tasks:

$ # Choose one:
$ ls, cd, cp, mv, rm, mkdir, rmdir, touch, cat, find
  1. Summarise the command’s purpose in one sentence.
  2. Find an interesting option and explain what it does.
  3. Create an example using your command with at least two options.
  4. Bonus: Combine your command with your partner’s in a single line.

You have about 5 minutes. Be ready to share your findings!

Reflection: How was using man compared to online searches? How might you use it in future projects?

More navigation commands: A cheat sheet

  • ls (list): Show files and directories in the current directory
  • ls -l: Long listing format with detailed information
  • ls -a: Show hidden files (those starting with a dot)
  • ls -lh: Long listing format with human-readable sizes
  • ls -R: List subdirectories recursively
  • pwd (print working directory): Show the current directory path
  • cd (change directory): Change the current working directory
  • cd -: Go back to the previous directory
  • .: Refers to the current directory
  • ..: Refers to the parent directory
  • ~: Refers to the home directory
  • mkdir: Create a new directory
  • touch: Create a new empty file or update timestamps
  • cp: Copy files or directories
  • mv: Move or rename files or directories
  • rm: Remove files (use with caution!)
  • rmdir: Remove empty directories
  • cat: Display file contents
  • find: Search for files and directories

For a more detailed overview, click here

Shell navigation exercise

Follow these steps to practice using basic shell commands. Type each command and observe the results.

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “practice” and change into it: mkdir practice && cd practice
  3. Create two empty files called “file1.txt” and “file2.txt”: touch file1.txt file2.txt
  4. List the contents of the directory: ls
  5. Move file2.txt to a new name (rename), file3.txt: mv file2.txt file3.txt
  6. List the contents again to verify the change, then return to the home directory: ls && cd ~
  7. Remove the “practice” directory and its contents: rm -r practice
  8. Verify that the directory has been removed: ls

Shell navigation exercise to try at home

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “shell_practice”: mkdir shell-practice
  3. Change into the new directory: cd shell-practice
  4. Create three empty files called file1.txt, file2.txt, and file3.txt: touch file1.txt file2.txt file3.txt
  5. List the contents of the directory: ls
  6. Create a subdirectory called “subdir”: mkdir subdir
  1. Move file2.txt into the subdirectory: mv file2.txt subdir/
  2. Copy file1.txt to a new file called file4.txt: cp file1.txt file4.txt
  3. List the contents of the current directory and the subdirectory: ls -R
  4. Change to the parent directory: cd ..
  5. Remove the entire shell_practice directory and its contents: rm -r shell-practice
  6. Verify that the directory has been removed: ls

Bonus: Try using the man command to learn more about any of the commands you’ve used.

  • Were there any commands that surprised you?
  • Which commands did you find most useful?

Summary

Today we…

  • Explored the command line’s role in data science and programming
  • Discussed the Unix philosophy and the significance of the shell
  • Covered basic shell commands like pwd, ls, and cd for file system navigation
  • Introduced special symbols such as ~, ., and .. for directory navigation
  • Practiced executing these commands in the shell environment

Next class

  • We will learn a bit more about the command line, especially about text processing and scripting
  • We will also learn about how to use vim or neovim as a text editors
    • Vim is a powerful text editor that is highly configurable and can be used for many different programming languages. And it is my editor of choice! 🤓
  • After that, we will introduce Git and GitHub for version control and collaboration

Questions? 😉

Thank you very much and see you next class! 😊 🙏🏼