pages tagged syscallyakkinghttp://yakking.branchable.com/tags/syscall/yakkingikiwiki2018-01-17T12:21:10ZMSG_TRUNC is poorly supportedhttp://yakking.branchable.com/posts/msg-trunc/Richard Maw2018-01-17T12:21:10Z2018-01-17T12:21:03Z
<p><code>ssize = recv(fd, NULL, 0, MSG_PEEK|MSG_TRUNC)</code>
lets you see how big the next message in a UDP or UNIX socket's buffer is.</p>
<p>This can be important
if your application-level communications can support variable message sizes,
since you need to be able to provide a buffer large enough for the data,
but preferrably not too much larger that it wastes memory.</p>
<p>Unfortunately, a lot of programming languages' bindings don't make this easy.</p>
<hr />
<p>Python's approach is to allocate a buffer of the provided size,
and then reallocate to the returned size afterwards,
and return the buffer.</p>
<p>This behaviour is intended to permit a larger allocation to begin with,
but as a side-effect it also permits a smaller one to not break horribly.</p>
<p>Well, mostly. In Python 2 it breaks horribly.</p>
<div class="highlight-sh"><pre class="hl">$ python <span class="hl kwb">-c</span> <span class="hl str">'import socket</span>
<span class="hl str">s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)</span>
<span class="hl str">s.bind("/tmp/testsock")</span>
<span class="hl str">s.recv(0, socket.MSG_PEEK|socket.MSG_TRUNC)'</span>
Traceback <span class="hl opt">(</span>most recent call last<span class="hl opt">):</span>
File <span class="hl str">"<string>"</span><span class="hl opt">,</span> line <span class="hl num">4</span><span class="hl opt">,</span> <span class="hl kwa">in</span> <span class="hl opt"><</span>module<span class="hl opt">></span>
SystemError<span class="hl opt">:</span> ..<span class="hl opt">/</span>Objects<span class="hl opt">/</span>stringobject.c<span class="hl opt">:</span><span class="hl num">3909</span><span class="hl opt">:</span> bad argument to internal <span class="hl kwa">function</span>
</pre></div>
<p>Python 3 instead returns an empty buffer immediately before reading the socket.</p>
<div class="highlight-sh"><pre class="hl">$ python3 <span class="hl kwb">-c</span> <span class="hl str">'import socket</span>
<span class="hl str">s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)</span>
<span class="hl str">s.bind("/tmp/testsock")</span>
<span class="hl str">m = s.recv(0, socket.MSG_PEEK|socket.MSG_TRUNC)</span>
<span class="hl str">print(len(m), m)'</span>
<span class="hl num">0</span> b<span class="hl str">''</span>
</pre></div>
<p>You can work around this by receiving a minimum length of 1.</p>
<div class="highlight-sh"><pre class="hl">$ python <span class="hl kwb">-c</span> <span class="hl str">'import socket</span>
<span class="hl str">s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)</span>
<span class="hl str">s.bind("/tmp/testsock")</span>
<span class="hl str">m = s.recv(1, socket.MSG_PEEK|socket.MSG_TRUNC)</span>
<span class="hl str">print(len(m), m)'</span>
<span class="hl opt">(</span><span class="hl num">4</span><span class="hl opt">,</span> <span class="hl str">'a</span><span class="hl esc">\x00</span><span class="hl str">n</span><span class="hl esc">\x00</span><span class="hl str">'</span><span class="hl opt">)</span>
</pre></div>
<p>The returned buffer's length is that of the message,
though most of the buffer's contents is junk.</p>
<p>The reason these interfaces aren't great
is that they return an object rather than using a provided one,
and it would be unpleasant for it to return a different type based on its flags.</p>
<p>Python has an alternative interface in the form of <a href="https://docs.python.org/3/library/socket.html#socket.socket.recv_into">socket.recv_into</a>,
which should fare better, since it can return the size separately,
it should be able to translate a <code>None</code> buffer into a <code>NULL</code> pointer.</p>
<div class="highlight-sh"><pre class="hl">$ python <span class="hl kwb">-c</span> <span class="hl str">'import socket</span>
<span class="hl str">s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)</span>
<span class="hl str">s.bind("/tmp/testsock")</span>
<span class="hl str">m = s.recv_into(None, 0, socket.MSG_PEEK|socket.MSG_TRUNC)</span>
<span class="hl str">print(m)'</span>
Traceback <span class="hl opt">(</span>most recent call last<span class="hl opt">):</span>
File <span class="hl str">"<string>"</span><span class="hl opt">,</span> line <span class="hl num">4</span><span class="hl opt">,</span> <span class="hl kwa">in</span> <span class="hl opt"><</span>module<span class="hl opt">></span>
TypeError<span class="hl opt">:</span> recv_into<span class="hl opt">()</span> argument <span class="hl num">1</span> must be read-write buffer<span class="hl opt">,</span> not None
</pre></div>
<p>Unfortunately, this proves not to be the case.</p>
<p>In Python 2 we can re-use a "null byte array" for this purpose.</p>
<div class="highlight-sh"><pre class="hl">$ python <span class="hl kwb">-c</span> <span class="hl str">'import socket</span>
<span class="hl str">s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)</span>
<span class="hl str">s.bind("/tmp/testsock")</span>
<span class="hl str">nullbytearray = bytearray()</span>
<span class="hl str">m = s.recv_into(nullbytearray, 0, socket.MSG_PEEK|socket.MSG_TRUNC)</span>
<span class="hl str">print(m)'</span>
<span class="hl num">4</span>
</pre></div>
<p>Unfortunately, Python 3 decided to be clever.</p>
<div class="highlight-sh"><pre class="hl">$ python3 <span class="hl kwb">-c</span> <span class="hl str">'import socket</span>
<span class="hl str">s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)</span>
<span class="hl str">s.bind("/tmp/testsock")</span>
<span class="hl str">nullbytearray = bytearray()</span>
<span class="hl str">m = s.recv_into(nullbytearray, 0, socket.MSG_PEEK|socket.MSG_TRUNC)</span>
<span class="hl str">print(m)'</span>
<span class="hl num">0</span>
</pre></div>
<p>Like with plain <code>recv</code> it returns early without waiting for the message.</p>
<p>I had hoped to provide a counter-example of a programming language
that provided a way to expose this as part of its standard library.</p>
<p>Distressingly, the best behaved standard libraries
were the ones that exposed the system calls directly
with all the lack of safety guarantees that implies.</p>
<hr />
<p>In conclusion, <code>MSG_TRUNC</code> is a thing you can do on Linux.</p>
<p>If you want to use it from a higher-level language than C
you won't find a good interface for it in the standard library.</p>
<p>If you find yourself in a position to design a language,
please bear in mind people may want to do this on Linux,
so at least provide access to the un-mangled system call interface.</p>
Nerds of a feather flock(1) togetherhttp://yakking.branchable.com/posts/flocking/Richard Maw2015-10-07T11:00:12Z2015-10-07T11:00:06Z
<p><a href="https://en.wikipedia.org/wiki/Locking">Locking</a> is a concurrency primitive, usually allowing shared/read locking
and exclusive/write locking.</p>
<p>Linux allows you to take locks on files.</p>
<p>There's the POSIX <a href="http://man7.org/linux/man-pages/man3/lockf.3.html">lockf(3)</a> file locks,
which let you lock a specific range of a file,
but have <a href="http://0pointer.de/blog/projects/locking.html">various pitfalls</a>.</p>
<p>There's BSD <a href="http://man7.org/linux/man-pages/man2/flock.2.html">flock(2)</a> file locks,
which have per-file-descriptor rather than per-process locks,
but don't allow locking a file range.</p>
<p>There's also the new (since Linux 3.15) Linux-specific
<code>F_OFD_{SETLK,SETLKW,GETLK}</code> <a href="http://man7.org/linux/man-pages/man2/fcntl.2.html">fcntl(2)</a> commands,
which are file-descriptor bound and offer file ranges.</p>
<p>I'm only interested in the file-descriptor bound locks,
and to keep that simple I'm not going to use file ranges,
so we're going to discuss <a href="http://man7.org/linux/man-pages/man2/flock.2.html">flock(2)</a> and <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a>.</p>
<p>For the sake of conciseness, I will be using python and shell for examples,
rather than using the C api directly.</p>
<h1>Taking locks</h1>
<p>The basic principle is that,
since it's a file-descriptor bound lock,
you need to open the file,
then use the locking call on the file descriptor.</p>
<p>The following programs can be used to atomically generate unique IDs.</p>
<p>They will wait for any other programs that may be using the file to finish
before reading, updating and writing to it.</p>
<pre><code>#/bin/sh
LOCKFILE="$1"
shift
exec 100<>"$LOCKFILE" # open file descriptor for read-write
flock 100
read -u 100 num
printf '%d\n' "$num"
num="$(expr "$num" + 1)"
# Need to re-open to be able to write to the beginning
printf '%d\n' "$num" >/proc/self/fd/100
</code></pre>
<p>The logic is similar in python, but without having to shell out to <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a>.</p>
<pre><code>#!/usr/bin/python
import sys
from fcntl import flock, LOCK_EX, LOCK_SH, LOCK_UN
lockfile = sys.argv[1]
with open(lockfile, 'rw') as f:
flock(f.fileno(), LOCK_EX)
num = int(f.read())
print(num)
num += 1
f.seek(0)
f.write('%d\n' % num)
</code></pre>
<h1>Releasing locks</h1>
<p>File descriptor locks are released when every reference to the file descriptor
is closed, or explicitly with <code>flock(fd, LOCK_UN)</code>.</p>
<p>Since file descriptors are closed on process termination,
the shell program will release the lock when its process terminates.</p>
<p>The python program uses a context manager with the file,
which means it will close the file at the end of the scope,
so the file will be closed before termination.</p>
<p>If this were made more explicit, the shell program would end with:</p>
<pre><code>flock -u 100
exec 100>&-
</code></pre>
<p>The python program would end with:</p>
<pre><code> flock(f.fileno(), LOCK_UN)
</code></pre>
<p>Using <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a> with a continuation command
hence closes the lock after the command has run.</p>
<h1>Lock contention</h1>
<p>The purpose of locking is to ensure that resources are protected while the lock
is held.</p>
<p>If a lock is not held by any open file descriptors, you can always take it.</p>
<p>There are two ways to hold a lock, exclusively and shared.</p>
<p>The former is the default for the <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a> command,
though it can be explicitly chosen with the <code>--exclusive</code> option,
or with the <code>LOCK_EX</code> flag as shown with the python program
using the <a href="http://man7.org/linux/man-pages/man2/flock.2.html">flock(2)</a> syscall.</p>
<p>The latter can be requested with the <code>--shared</code> option, or the <code>LOCK_SH</code> flag.</p>
<p>If a lock is currently held with an exclusive lock,
or you want to take an exclusive lock and it is already locked,
you can't take the lock.</p>
<p>If the held lock is a shared lock though,
you can take a shared lock on the file.</p>
<h1>Blocking vs non-blocking</h1>
<p>When you are contended on taking a lock,
you can either wait for the lock to be released,
or fail immediately so you can try something else.</p>
<p>When attempting to take a contended lock,
by default you wait for it to be released,
however when using <a href="http://man7.org/linux/man-pages/man2/flock.2.html">flock(2)</a> you can instead of passing <code>LOCK_EX</code> or <code>LOCK_SH</code>,
pass <code>LOCK_EX|LOCK_NB</code> and <code>LOCK_SH|LOCK_NB</code> to make this a non-blocking lock,
which will immediately return if the lock is contended.</p>
<p>When using <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a> you would pass <code>--nonblock</code> to do this,
and while blocking is the default,
you can pass <code>--wait</code> to make it block explicitly.</p>
<p>Blocking locks have the advantage
that your process will be suspended
until you can take the lock,
you are woken up as soon as you can take the lock,
and if there is a queue of processes wanting to take the lock,
then processes that are waiting get the lock before those that weren't.</p>
<p>However the danger of blocking locks,
is that if the other lock doesn't get released,
you will not be woken up.</p>
<p>This is a problem when your process needs to be responsive to input.</p>
<p>This can be worked around by having a separate thread to handle user responses,
but at some point you've got to draw the line,
and say that not being able to take the lock in time is an error.</p>
<p>The neatest way to do this is to use <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a>'s <code>--timeout</code> option,
which you would use from python as:</p>
<pre><code>from subprocess import check_call, CalledSubprocessError
from errno import EAGAIN
from os import strerror
def take_lock(fd, timeout=None, shared=False):
try:
check_call(['flock',
'--wait' if timeout is None else '--timeout=%d' % timeout,
'--shared' if shared else '--exclusive',
'--conflict-exit-code=75', #EX_TEMPFAIL
str(fd)])
except CalledSubprocessError as e:
if e.returncode == 75:
raise IOError(EAGAIN, strerror(EAGAIN))
raise
with open(lockfile, 'r') as f:
take_lock(f.fileno(), timeout=30, shared=True)
</code></pre>
<p>Note: Old versions of <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a> may not support <code>--conflict-exit-code</code>.</p>
<p>It is possible to do locking with a timeout in native python code,
by using <a href="http://man7.org/linux/man-pages/man2/setitimer.2.html">setitimer(2)</a>,</p>
<pre><code>#!/usr/bin/python
from fcntl import flock, LOCK_SH, LOCK_EX, LOCK_NB
from os import strerror
from signal import signal, SIGALRM, setitimer, ITIMER_REAL
from sys import exit
def take_lock(fd, timeout=None, shared=False):
if timeout is None or timeout == 0:
flock(fd, (LOCK_SH if shared else LOCK_EX)|(LOCK_NB if timeout == 0 else 0))
return
signal(SIGALRM, lambda *_: None)
setitimer(ITIMER_REAL, timeout)
# Racy: alarm could be delivered before we try to lock
flock(fd, LOCK_SH if shared else LOCK_EX)
if __name__ == '__main__':
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument('--shared', action='store_true', default=False)
parser.add_argument('--exclusive', dest='shared', action='store_false')
parser.add_argument('--timeout', default=None, type=int)
parser.add_argument('--wait', dest='timeout', action='store_const', const=None)
parser.add_argument('--nonblock', dest='timeout', action='store_const', const=0)
parser.add_argument('file')
parser.add_argument('argv', nargs='*')
opts = parser.parse_args()
if len(opts.argv) == 0:
fd = int(opts.file)
take_lock(fd, opts.timeout, opts.shared)
else:
from subprocess import call
with open(opts.file, 'r') as f:
take_lock(f.fileno(), opts.timeout, opts.shared)
exit(call(opts.argv))
</code></pre>
<p>However, since signals are process-global state,
doing it that way can result in a process that has interesting side-effects,
especially in a threaded environment,
which makes it harder to reason about the behaviour of the program.</p>
<p>It may be nicer to run the blocking <a href="http://man7.org/linux/man-pages/man2/flock.2.html">flock(2)</a> in a subprocess,
just to avoid having to make your main program deal with signals.</p>
<p>The following version works as-before,
but uses multiprocessing to run the <a href="http://man7.org/linux/man-pages/man2/flock.2.html">flock(2)</a> code in a subprocess,
and pass any exceptions back to the main process.</p>
<pre><code>#!/usr/bin/python
from errno import EINTR, EAGAIN
from fcntl import flock, LOCK_SH, LOCK_EX, LOCK_NB
from multiprocessing import Pipe, Process
from os import strerror
from signal import signal, SIGALRM, setitimer, ITIMER_REAL
from sys import exit
def _set_alarm_and_lock(fd, pipew, timeout, shared):
try:
signal(SIGALRM, lambda *_: None)
setitimer(ITIMER_REAL, timeout)
# Racy: alarm could be delivered before we try to lock
flock(fd, LOCK_SH if shared else LOCK_EX)
except BaseException as e:
# This loses the traceback, but it's not pickleable anyway
pipew.send(e)
exit(1)
else:
pipew.send(None)
exit(0)
def take_lock(fd, timeout=None, shared=False):
if timeout is None or timeout == 0:
flock(fd, (LOCK_SH if shared else LOCK_EX)|(LOCK_NB if timeout == 0 else 0))
return
piper, pipew = Pipe(duplex=False)
p = Process(target=_set_alarm_and_lock,
args=(fd, pipew, timeout, shared))
p.start()
err = piper.recv()
p.join()
if err:
if isinstance(err, IOError) and err.errno == EINTR:
raise IOError(EAGAIN, strerror(EAGAIN))
raise err
if __name__ == '__main__':
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument('--shared', action='store_true', default=False)
parser.add_argument('--exclusive', dest='shared', action='store_false')
parser.add_argument('--timeout', default=None, type=int)
parser.add_argument('--wait', dest='timeout', action='store_const', const=None)
parser.add_argument('--nonblock', dest='timeout', action='store_const', const=0)
parser.add_argument('file')
parser.add_argument('argv', nargs='*')
opts = parser.parse_args()
if len(opts.argv) == 0:
fd = int(opts.file)
take_lock(fd, opts.timeout, opts.shared)
else:
from subprocess import call
with open(opts.file, 'r') as f:
take_lock(f.fileno(), opts.timeout, opts.shared)
exit(call(opts.argv))
</code></pre>
<h1>Converting locks</h1>
<p><a href="http://man7.org/linux/man-pages/man2/flock.2.html">flock(2)</a> with <code>LOCK_SH</code> when you have a <code>LOCK_EX</code>,
or <code>--shared</code> with <code>--exclusive</code> when using <a href="http://man7.org/linux/man-pages/man1/flock.1.html">flock(1)</a>,
turns it from an exclusive lock to a shared lock.</p>
<p>Similarly, you can go the other way,
converting a shared lock into an exclusive one,
though this counts as a contended lock
if there are any other holders of shared locks.</p>
<p>You may want to do this if it's for managing the lifetime of a resource.</p>
<p>You would want to hold an exclusive lock on it when making the resource,
so that any concurrent users can know that it's being set up,
so they can take a blocking lock and wait for it to be ready.</p>
<p>After the resource has been set up,
you would convert it to a shared lock,
so you can use it yourself,
and any other processes wanting to take a shared lock to use it
can be woken up and start using it.</p>
<p>When you are finished using the resource
you can convert it to an exclusive lock.</p>
<p>You will then know that when you have taken the exclusive lock,
that there can be no other users of the resource,
so it is safe to remove it.</p>
<p>Lock conversion is important,
as you don't want to unlock then re-take the lock,
as there is a period where it is unlocked,
which other concurrent users might decide means it can be cleaned up.</p>
<p>You'd end up cleaning it up just after setting it up,
before you had a chance to use it.</p>