Richard Maw Scripting

Scripts are programs that do not need to be compiled into machine code to be executed. Scripts are instead executed via another program, called the interpreter.

This is usually slower than a compiled executable, but easier to tweak, allowing more rapid development.

Scripting languages tend to also specialize towards certain goals. Shells make it easy to run programs and perform text manipulation, Ruby and PHP are used for server-side web applications, Python and Perl are general purpose, Lua is small, fast and easy to embed into other programs.

The lines have between scripted and compiled languages are blurred by the presence of bytecode VMs, where the language is compiled into an intermediate form; and JIT compilation turning an interpreted language into machine code.

Common scripting languages

Shell

Shells are the most common scripting language, installed on the vast majority of all Linux systems.

Shells broadly fall into two categories:

  1. Bourne Shells
  2. C-Shells

Bourne Shells

Bourne shells are named for the shell they are derived from.

Bash is the GNU shell, the name means Bourne Again SHell. It is much more advanced, providing support for arrays and autocompletion.

Zsh is more capable, with many more features and nicer array syntax.

Busybox ash and dash are less featureful shells than Bash and Zsh, with goals of being small, fast and POSIX compatible.

Shell scripts are fundamental to traditional Linux systems, the traditional SysV init has shell scripts for every system service. These can be found in /etc/init.d

C Shells

C-Shells ape C syntax in an effort to be more familiar, but can be less capable than their Bourne cousins.

I am biased in this regard, but please read Csh programming considered harmful.

tcsh is the most commonly used C-shell.

Perl

Perl is inspired by shells and sed. It provided a fully capable programming language with built-in regular expressions.

It is a mature language with many modules available in CPAN so a variety of well-used libraries are available to ease development.

It has a reputation of being a write-only programming language, the motto of "There's more than one way to do it" does not help in this regard, as it is a large language, so it can be easy to encounter unfamiliar constructs.

apt-file, part of debian's package manager is written in Perl.

Python

Python is another general-purpose scripting language.

It has a reputation for having "batteries included", since the default builds include a large standard library.

This makes it easy to put together a program for the same reasons as perl.

It is somewhat controversial for its enforced coding style, indentation is mandatory and the community encourages common idioms. This makes python codebases easier to read since there is less of a coding style hurdle in the way of understanding.

The lsb-release script in debian is written in Python.

Lua

Lua is a fast, small, portable scripting language.

It does not include a very large standard library, providing approximately the same features as the standard C library, but it is fast and easy to embed in other programs.

LuaJIT is an alternative interpreter which offers speeds comparable to writing native C.

Lua is found in pretty much every computer game, World of Warcraft is a notable example.

PHP

PHP is a scripting language used primarily for web development. It offers powerful templating functionality, though this tends not be be used so much recently, in favour of complicated frameworks.

It has a reputation for being insecure, complicated and inconsistent.

Ruby and Python are more popular alternatives.

Ruby

Ruby is another web scripting language. It is what all the cool kids are using these days.

It has a reputation of being susceptible to dependency hell, though tools like rvm have been developed to deal with this.

puppet is a popular cluster administration tool which is written in ruby.

Writing and executing scripts

Interpreters allow scripts to be run by passing the file name of the script when executing the interpreter. For example, a shell script can be executed by running

$ cat >hello.sh <<EOF
echo Hello World
EOF
$ sh script.sh
Hello World

Alternatively, scripts can be made to behave like compiled executables with a she-bang.

$ cat >hello.sh <<EOF
#!/bin/sh
echo Hello World
EOF
$ chmod +x hello.sh
$ ./hello.sh
Hello World

The line #!/bin/sh says to use /bin/sh to run the script.

These examples were written in shell, but all the listed examples may be used instead.

Posted Thu Oct 3 09:00:09 2013

In a world filled with strings (filenames and the like) it is often very useful to be able to talk about many items at once. This is typically done with some form of pattern to express the class or group of items you are interested in. This might be all the filenames which end .txt or all lines in a log file with the word ERROR in them.

There are two common families of pattern languages in use in Unix like machines. These are regular expressions and shell globs. The latter is a smaller language than the former, and there are many variants of each of them. We will tackle shell globs first.

Shell Globs

When using the shell, you will sometimes want to express a pattern for multiple files at once. For example, you might want to delete all the backup files in a directory, which would be a command along the lines of rm *~. This command comprises two parts: the program to run (rm) and the argument(s) given to it (*~). In this example the argument to the command is a shell glob and will be expanded by the shell into a list of all the files whose name ends with a tilde. The expanded list will be passed to the rm program.

Most shells have similar glob rules, and they usually consist of:

  • A marker for zero-or-more characters: *
  • A marker for exactly one character: ?
  • A way to express one of a certain set of characters: [...]
  • A way to express a choice of one or more strings: {...,...}
  • A way to escape any of the above special characters: \

Some shells offer more globbing patterns, but the above are the most common. In all cases, globs are matched against files in the filesystem. As such there is actually a C library function [fnmatch(3)] which does this kind of matching; the name is simply a contraction of file name match.

In our example, the glob *~ simply means zero or more characters followed by a tilde and will match filenames which end in a tilde which is the common way to indicate a 'backup' file in Unix.

If you want to know more about shell globs, see:

  • fnmatch(3)
  • The 'Pattern Matching' section of bash(1)
  • The equivalent section of your favoured shell manual page.

Regular Expressions

Regular expressions (often shortened to regexps) feature in many programs although they are most commonly encountered as part of using the shell-related tools grep, awk and sed or as part of other scripting languages such as perl or python. Some scripting languages such as lua have other pattern languages which are similar, although not identical, to regexps.

As with shell globs, regexps have a common core 'language' and then there are a multitude of variants. The common core properties of regexps are:

  • A marker for exactly one character: .
  • A way to group atoms (e.g. characters, classes or groups): (...)
  • A way to indicate a single character from a class: [...]
  • A way to indicate zero-or-one repetitions of the previous atom: ?
  • A way to indicate zero-or-more repetitions of the previous atom: *
  • A way to indicate one-or-more repetitions of the previous atom: +
  • A way to escape any of the above special characters: \
  • A way to anchor the start of the matched string: ^
  • A way to anchor the end of the matched string: $

In regular expressions, the shell glob example we used above would be ^.*~$ and would be read as "starting at the start of the input string, any character zero or more times, then a tilde, and then the end of the input string". As you can see, regexps are not intrinsically anchored to the start and end of the input, unlike shell globs. This is both very powerful and potentially confusing as if you omitted the ^ and $ then the regexp .*~ would match the file name wibble~foobar which is clearly not the intention of the glob.

If you wish to know more about regular expressions, then see:

For one different variant of regular expressions see: PCRE's documentation.

Wikipedia's article on regexps is pretty good and goes into more of the formal theory of regular languages.

Hopefully now, regular expressions won't scare you when you see them.

Posted Wed Oct 9 09:00:06 2013 Tags:

Input and output redirection is about taking the useful information from files and programs and doing what you want with them.

It will also involve some of the basics of file descriptors (FDs).

I want to save the output of a program

Suppose you and your friend have large music collections and want to share. You want to be able to tell your friend what you have. but your music folder isn't sorted by genre. Instead you just have a huge pile of albums. It would take ages to read them all and see which ones he wants.

Instead, you can make a list of all your albums and send him that.

ls Music/ > my-music-list

The above command runs ls Music/ and writes the output (a list of all your files) to the file my-music-list.

The > character means 'take the standard output of the program to my left and write it to the file to the right'

But I have more music in a different folder!

You have two options here, the boring way and the fun way.

The boring way!

Okay, ls is able to show the contents of multiple directories at once

ls Music/ other-music/ > all-my-music

The fun way!

Note: levels of fun may vary.

You can append to a file using '>>'.

ls Music/ > my-music-list
ls other-music/ >> my-music-list

I want all my music from the fake band Fiddlesticks

There's a handy-dandy program called grep, which shows every line that matches a certain pattern. There's a boring and an interesting way to use grep, as well.

I am a boring person

Okay, grep lets you specify a file to search in.

grep "Fiddlesticks" my-music-list

This will return every line in the 'my-music-list' file that contains the text 'Fiddlesticks'.

I am an interesting person

grep also reads from standard input, so this works just as well

grep "Fiddlesticks" < my-music-list

Here, the '<' character means 'Take everything from the file to my right and put it in the program to my left'

What's this 'standard input' and 'standard output'?

Standard input and standard output are simply streams that every running program has.

If you don't do any special input/output redirection, standard input is everything you type into the terminal with your keyboard (e.g. 'Y' on a yes/no prompt), and standard output is everything (except errors) that comes out of a program, which is typically what it displays in your terminal.

There is one other special stream, called standard error. This is where error messages from a program get written, separate from any ordinary output.

To see the difference, try running:

ls file-that-doesnt-exist > output

The terminal replies with ls: cannot access file-that-doesnt-exist: No such file or directory, but the file 'output' is empty.

I have too many songs by Fiddlesticks! I want something specific

The solution to your problem is pipelines. Pipelines connect the standard output of one program to the standard input of another.

We can use this to find the album 'THINGS' by fiddlesticks:

grep "Fiddlesticks" my-music-list | grep "THINGS"

This will show every line that contains the text 'Fiddlesticks' and 'THINGS'

I want to run this long, boring program overnight

Most virtual terminals only scroll back so far. If you come back to find that it spewed a huge log detailing a failure, it can be quite annoying for the exact information on how it failed to have been 1000 lines before the end of the output, where it was thrown away.

You want every line of output to be written to a file, so you can check it later. This can be done by:

run-boring-program &> log-file.txt

In this command, &> means 'write standard output and standard error of the program on the left to the file on the right'

If you want to see what the output is at the same time, you can use tee. tee is a handy little program that splits output (e.g. to standard output and to a file), resembling the letter 'T' and referencing the concept of a T-piece in a pipeline.

To write all output to a file, and also display it on the console, you would run:

run-boring-program 2>&1 | tee log-file.txt

The magic rune introduced here is 2>&1, which involves the operator >&, which means 'take the file descriptor to the left, and write it to the file descriptor to the right'.

A 'file descriptor' (FD) is a simple representation for a file (or stream). The three streams every running program has (standard input, standard output and standard error) take the file descriptors 0, 1 and 2. If the program opens any more files, they would be assigned other numbers. Since the input, output and error streams are defined to be FDs 0, 1 and 2 in POSIX, we can rely on them having these numbers no matter the platform.

Where can I learn more?

This information came partly from fiddling and experience, and partly from BASH Programming at tldp.org

Posted Wed Oct 16 09:00:06 2013 Tags:

You mustn't put personal ssh private keys on shared systems or servers. It is a pretty serious security risk: if anyone else has access to the system, they have access to your ssh private key, and may be able to use that to impersonate you.

You can mitigate that by setting a passphrase on your ssh private key. However, passwords can be guessed, or broken by various methods. It is very hard to remember, or type, a passphrase that can't be broken by a determined adversary.

Worse, you'll need to type the passphrase every time you use ssh, and that gets quite annoying after a while. This, of course, encourages using short, weak passphrases.

The solution to having to type a passphrase often is to use an ssh agent. The agent effectively remembers the passphrase for you, saving a lot of repeated typing.

You should run the agent on your own system: your laptop or desktop machine. Your ssh on the server or shared system can then connect to that agent to authenticate further.

There are still some security risks with this; see below.

How to use ssh agent forwarding

Instead of putting an ssh key on a remote computer, log into the computer with ssh -A. This forwards the connection to your ssh agent to the remote computer. When you run ssh on the remote computer to log into an other server, the login can happen using the ssh agent on your local computer (laptop) using the key on your local computer. All the login related computation with the ssh private key happens on your local system.

  • You run ssh-agent on your laptop.
  • You log into your server, mine.example.com, with ssh.
  • You log from mine.example.com to another server, git.example.com, also using ssh.
  • The ssh client running on mine.example.com connects to the ssh agent running on your laptop, to authenticate to git.example.com.

This way, your ssh private key only ever exists on your laptop. It does not ever leave your laptop. Only your laptop can actually authenticate you to another system.

If you're using any of the common Linux desktop environments, you're already using an ssh agent locally. It is set up for you and used automatically.

On security

Forwarding an ssh agent carries its own security risk. If someone on the remote machine can gain access to your forwarded ssh agent connection, they can still make use of your keys. However, this is better than storing keys on remote machines: the attacker can only use the ssh agent connection, not the key itself. Thus, only while you're logged into the remote machine can they do anything. If you store the key on the remote machine, they can make a copy of it and use it whenever they want.

You can protect yourself against this too, by using ssh-add -c. See the manual page for details.

You need to be careful and not use ssh agent forwarding except when you need to log in via ssh from the remote machine, and the remote machine is reasonably trusted.

Whenever you can, don't log in from one machine to another and from there to a third one. Always log in directly from your machine to the other.

Posted Wed Oct 23 09:00:07 2013 Tags:
Richard Maw Shell Variables

Basic variable definition

Traditionally, shell variables are declared, defined and initialized at the same time. This is done similarly to other programming languages, with the variable name on the left and the value on the right.

FOO=bar

Note that no spaces are allowed between the variable, the = and the value. The following do not parse as you would expect.

FOO= bar  # executes bar with FOO set to the empty string
FOO =bar  # executes FOO with =bar as the first argument
FOO = bar # executes FOO with = and bar as the first and second arguments

If the value you want to set the variable to contains spaces, you must enclose it in quotes like this.

FOO="bar baz"

Basic variable interpolation

The most common use for setting a variable is interpolation into another command. The following command prints the contents of the variable FOO to your terminal, which would be bar baz if it was set like the previous section.

$ echo $FOO
bar baz

It is also possible to interpolate in a variable assignment.

$ FOO="$FOO qux "
$ echo $FOO
bar baz qux 

If you want to interpolate a variable with a string which has alphanumeric characters after the interpolation, you need to use an alternative syntax.

$ echo $FOO7

$ echo ${FOO}7
bar baz qux 7

Special variables

There are a large number of special variables, which will have been pre-defined by your shell. Some of which may be modified to alter your shell's behaviour.

I'm not going to list them all, see your shell's documentation for that, but I'm going to describe the 3 most commonly used from the command line.

$?

$? is the return code of the last program that was executed. This is useful if a command produces no output, and you don't know if that is a good thing or a bad thing.

It will be 0 if a command succeeded.

$ true # true always succeeds
$ echo $?
0

It will not be 0 if the command failed.

$ false # false always fails
$ echo $?
1    

PS1

PS1 is the specification of how your shell will display its prompt.

$ PS1="% "
% echo My prompt is $PS1
My prompt is % 

This does not need to be a fixed string, your shell will provide a formatting language, and support embedding of commands. As usual, see the documentation for more details.

PATH

PATH is not specific to your shell, but the most common way of interacting with it is through a shell.

It is a colon separated list of paths to directories that contain executables.

% echo $PATH
/usr/local/bin:/usr/bin:/bin

When you run a command without specifying a full path to the executable it will use PATH to find it, using directories specified from left to right.

You can alter it like other variables to allow other executables to be found.

% cat ~/bin/my-command
#!/bin/sh
echo "Hello World"
% PATH="$HOME/bin:$PATH"
% my-command
Hello World

Scopes

There are 3 classes of scopes in most shells.

  • Global variables

    FOO is a global variable, this is the default

  • Exported (environment) variables (if you alter PATH or export FOO)

    These will be inherited by processes you execute. Any variables that were exported to you will be inherited, so you may alter the PATH for a subshell.

    You can also export FOO to a subshell with the export command.

    % sh
    $ echo FOO is $FOO
    FOO is 
    $ exit
    % export FOO
    % sh
    $ echo FOO is $FOO
    FOO is bar baz qux 
    

    You can only unexport a variable by unsetting it, then re-setting it.

    % unset FOO
    % FOO="bar baz qux "
    % echo $FOO
    bar baz qux 
    % sh
    $ echo FOO is $FOO
    FOO is 
    
  • Local variables (not basic shell)

    Local variables are only valid inside functions. These allow a variable to be re-used across functions, and is one requirement to being able to write recursive functions.

    % func () { local FOO=zxc; }
    % func
    % echo $FOO
    bar baz qux 
    

Operations on variables

As shown previously, it is possible to append to a variable by interpolating, however this requires typing out the variable twice. Shell also supports an append operator in the form of +=:

% FOO+=asdf
% echo $FOO
foo bar bas asdf

There are also more advanced interpolation operations, these only work with the braced form of interpolation (${FOO}). For a list of all of them, see the Parameter Expansion section of the documentation.

Default value

If you want to interpolate a variable, which may not have been set, but you have a sensible default.

% unset FOO
% echo $FOO

% echo ${FOO-bar}
bar

This is terser, but less readable than doing the following.

if [ -z "$FOO" ]; then
  echo bar
else
  echo "$FOO"
fi

Suffix and prefix stripping

It can be convenient to remove a portion of a string, such as removing the file extension of a file,

% FOO=bar
% echo ${FOO%r}
ba

These is also a prefix strip using the # character instead of the %.

% echo ${FOO#b}
ar

It may help to think that the % looks like an s, hence suffix.

Decompress directory example

As an example of combining some things that have been learned, here is a shell function that decompresses all gzipped files in a directory.

If a directory is not specified, it uses the current directory.

decompress() {
  for gzipped in "${1-.}"/*.gz # Use current dir if not specified
  do
    gunzip -c "$gzipped" >"${gzipped%.gz}" # Strip .gz from path
  done
}
Posted Wed Oct 30 10:00:08 2013 Tags: