A state machine is a way of structuring a program to structure it around a set of states and the events that can cause it to change state.
This can be useful in sufficiently small programs as it allows you to reason about all your inputs and all your states such that you can verify that you have accounted for every eventuality.
This can be handy when writing C programs that aren't just a single thread calling blocking system calls as the gods of UNIX intended.
This article will discuss a couple of ways of writing state machines using a necessarily trivial example.
A good start is to enumerate the set of states your program may be in, and the set of events that your program accepts.
enum states {
START,
LOOP,
END,
} state;
enum events {
START_LOOPING,
PRINT_HELLO,
STOP_LOOPING,
};
This program has the states START
, LOOP
and END
,
and accepts the events START_LOOPING
, PRINT_HELLO
and STOP_LOOPING
.
Our program will enter the LOOP
state
after the START_LOOPING
event while in START
state,
print hello when it gets the PRINT_HELLO
event in the LOOP
state,
and enter the END
state after getting a STOP_LOOPING
event.
We are going to drive this state machine by repeatedly passing it our events, though for a real program you are more likely to have an event loop.
int main(void) {
step_state(START_LOOPING);
step_state(PRINT_HELLO);
step_state(PRINT_HELLO);
step_state(STOP_LOOPING);
return 0;
}
Now there are a couple of ways of doing the state machine handling.
Handling states with case statements
If you're familiar with C, you may find it obvious to use nested switch statements.
That would look something like this:
void step_state(enum events event) {
switch(state) {
case START:
switch(event) {
case START_LOOPING:
state = LOOP;
break;
default:
exit(1);
break;
}
break;
case LOOP:
switch(event) {
case PRINT_HELLO:
printf("Hello World!\n");
break;
case STOP_LOOPING:
state = END;
break;
default:
exit(1);
break;
}
break;
case END:
exit(1);
break;
}
}
This defines case
s to jump into the code
that handles each state and event.
This can be a bit verbose, and produce long functions, especially if you have a lot of events to handle.
Conveniently if you are using gcc,
you can use the -Werror=switch
compiler flag
to make it refuse to compile the code
if you forgot to handle every state and event.
Example code of doing it this way can be found at switch.c.
Handling states with a table
This approach has a table of event handlers,
that return the subsequent state,
so the step_state()
function can look up the handler
based on the current state and the event.
This requires defining a function for each event handler and then creating the table of function pointers, which can seem a little verbose, but is good for state machines where there is a small fraction of valid state and event combinations, and can be considered to be good style anyway.
It looks something like this:
typedef enum states (*event_handler)(enum states, enum events);
enum states start_looping(enum states state, enum events event) {
return LOOP;
}
enum states print_hello(enum states state, enum events event) {
printf("Hello World!\n");
return LOOP;
}
enum states stop_looping(enum states state, enum events event) {
return END;
}
event_handler transitions[STOP_LOOPING+1][END+1] = {
[START] = { [START_LOOPING] = start_looping, },
[LOOP] = { [PRINT_HELLO] = print_hello,
[STOP_LOOPING] = stop_looping, },
};
void step_state(enum events event) {
event_handler handler = transitions[event][state];
if (!handler)
exit(1);
state = handler(state, event);
}
This typedefs the function pointer type for convenience, since the function pointer syntax is arcane and gnarly, and not something you want to combine with the array syntax.
This example uses the GCC designated initializer syntax to make the table more self-documenting, and uses arithmetic on an enum value to determine the size of the table.
This is a 3 by 3 matrix of function pointers,
and using the fact that any not explicitly initialised entry
is initialised with the NULL
pointer
we can use that as the marker for invalid combination.
Example code of doing it this way can be found at table.c.
A topic which often comes up as one gets deeper into computer science is that of finite-state automata. Finite automata are sometimes known as state machines and Richard has talked about concrete implementation considerations already. However an FSM can be used to describe significantly more than you might already think. Learning how to manipulate FSMs can lead to a deeper understanding of several parts of computer science such as lexical analysis, parsing, regular expressions.
In brief, as a data structure, an FSM is a collection of states, a collection of input atoms (also known as the alphabet of the FSM), and a collection of triplets consisting of input state, input, and output state. The vast majority of FSMs you encounter in your day-to-day programming life shall be of the kind in Richard's article, namely deterministic finite state automata. However when one delves deeper into the wonderful world of FSMs, one discovers that more flexible and more easily composed cousin, the non-deterministic finite state automaton or NFA.
The non-deterministic automaton has one specific advantage over its deterministic cousin -- the ability to express indecision through the possibility that a given input atom might transition the NFA to zero, one or more possible states, rather than always to exactly one state.
One further extension of this is needed to give us the most flexible variant of FSMs. This is often called the ε-move and it allows an NFA-ε to transition from one state to another without consuming an input atom at all.
While it's very easy to write code which represents a DFA, and pretty simple to reason about it; reasonably easy to write code which represents an NFA, and not too hard to reason about it; it's pretty hard to write code which usefully represents an NFA-ε and it can be very hard to reason about them.
Fortunately for us, NFA-εs are both helpful, and relatively easy to transform into DFAs later (though that act often expands the storage space required for the FSM quite substantially).
One directly applicable use of building NFA-εs, composing them together, and then converting them to DFAs, is in the construction of regular expression engines. Your homework is to have a look at one of the many descriptions of regular expressions, liken it to an understanding of FSMs from the linked articles on this page, and then have a go at writing a very simple regexp engine for yourself in whatever language you favour. Though I'd recommend using a scripting language if you can, for rapid development and debugging. If you get very stuck you could look to this article for further discussion of FSMs and how to do this.
The utility Sed, stream editor, can be found on almost all unix-based systems. Sed takes a steam of text and, as the name suggests, edits it according to some instructions you have given it. It is a very flexible tool and can be very useful when using the commandline or in a shell script.
As a classic Hello World example do
sed 's/.*/Hello World/g'
Then press enter a few times and see sed replace the lines with Hello World
.
This article contains some (fairly artificial) examples in order to demonstrate the features of sed. There are far more features than are in the scope of this article; I suggest checking out the man page if you are curious for more.
Uses
There is nothing that sed can do that is unique. Other tools such as grep awk and perl can be used for similar things to name just a few. Sed provides a style of working that some people like the most.
A simple use for sed is the situation when you are running some software that
produces a lot of output but you only care about certain messages and do not
want to miss them amidst the output you don't care about, maybe these messages
appear in blocks starting with a line WARNING:
and ending with an empty line.
In this case you can do
./some_software | sed -n '/WARNING:/, /^$/p'
The -n
option tells sed not to print the output of ./some_software
unless
explicitly told to do so by the print instruction, written p
, this comes at
the end of our sed command. Before it there are two regular expressions in
forward slashes separated by a comma. This tells sed that you want to print in
the range of lines between any that match those two regular expressions.
Another useful command that can go in place of the print command is the delete
line command which is written d
. Now when your boss is watching you can show
him that it runs without any warnings by removing all the lines containing the
string WARNING
with
./some_software | sed '/WARNING/d'
Notice that the -n
option is not there because we want sed to print all the
text that is being output by ./some_software
unless it matches WARNING
,
in which case delete that line from the stream.
Yet another usecase is taking the output of one command and changing parts of
it; This is called a substitution and is done with the s/
. For example, maybe
you are having problems with a piece of software. You want to ask for help on
an online forum but need to be careful not to post personal information which
is included in the command output. Maybe it gives away that your name is Joe
Bloggs so you do
./some_software | sed 's/Joe Blogs/<Name Redacted>/g'
For a more involved example, imagine you want to get the IP address of the wlan0 interface on your machine and put it in a variable in a shell script. ifconfig might ouput something that looks like:
wlan0 Link encap:Ethernet HWaddr 08:11:96:05:6b:6c inet addr:192.168.1.122 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a11:96ff:fe05:6b6c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:67062 errors:0 dropped:0 overruns:0 frame:0 TX packets:18075 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:21572026 (20.5 MiB) TX bytes:3448315 (3.2 MiB)
We want to extract the address 192.168.1.122
. By eyeballing it we can see
that we want the line after it says wlan0
, then we want to match the regex
inet addr:[^ ]*
and we want to print that. You could use the following
line
wlan0ip=$(ifconfig | sed -n '/wlan0/{n; s/.*inet addr:\([^ ]*\).*/\1/p; q}')
Notice that in this command after the /wlan0/
to match there is a curley
brace. This is a block like in other programming languages because all the
commands between the braces will be executed when there is a matching line.
Commands are separated by semicolons. The n
command simply tells sed to look
at the next line. The next bit is a little fiddly. We only want to output the
IP address, not the whole line so instead of doing a normal print p
I used
the s/.*(<some regex>).*/\1/p
trick. This is a fairy standard trick which
replaces the entire line with just the match before printing it in order to
only print what you want. Finally the q
tells sed to quit because you have
found what you want and do not need to process the remainder of the output.
Limitations
Sed is bad when you need a line that comes before a matching line. Sed never goes backwards so by the time it finds the matching line it has already thrown away.
I've talked before about the sorts of things you should know before you get started writing some code. I've even touched on typical filesystem layouts for projects. And we've talked a little about free software licences and even a bit about being a mensch.
This week I'd like to talk to you a little bit about getting started with a project from the point of view of the more social side of things. It's not often that we start projects with the explicit intention of growing a community immediately, and oftentimes we end up with a community without even really planning it. Thanks to the wonders of the general F/LOSS community, these can sometimes go entirely without hitch; yet sometimes really awful things happen (and no, I'm not going to link to them) and people can end up feeling marginalised, unwanted, or even downright attacked.
I am grateful, on a more than daily basis, that despite being fat, ginger, and gay, I am an able-bodied cis white male and as such I get so much privilege in our communities that I am rarely directly attacked. I try, again on a more than daily basis, to check my privilege and behave in a courteous way. I often fail, and I hope that by acknowledging that I am at least a little better than those who fail and ignore it. I run, or participate in, a significant number of projects. Some have social contracts and codes of conduct such as Debian, and others simply rely on the community members being "nice" to one another.
I have, recently, been considering a code of conduct, or social contract, for my Gitano group of projects; and also I have been considering it in a wider context of my membership in the NetSurf project and in my barely new-born participation in the KiCad project. In my bimbling around, I have encountered numerous options for codes of conduct or social contracts and have whittled my selection set down to The citizen code of conduct, The contributor covenant, and The open code of conduct.
I'm not yet sure which one I'll pick, or if I feel the need to produce my own
variant of an existing one. (A good friend reminded me that codes of conduct,
or social contracts / covenants are kinda like cryptography -- necessary, but
don't go rolling your own unless you're especially skilled in the domain. So
if you feel the need to have a different CoC, amend an existing one and
remember to feed back to the original authors, just like you would with
code). However, one thing I'm sure of is that the next time I start a new
project, right alongside the README
and the LICENCE
/COPYING
file, there's
going to be a CONDUCT
file (or something of similar name) with the relevant
covenant or code of conduct in it.
Perhaps next time you start a project, just after you've run git init
and
just before you git commit
your chosen F/LOSS licence, perhaps you too will
add a contributor covenant or code of conduct to shape your future community
just a little better than it might self-organise otherwise.