MSG_TRUNC is poorly supported

←	Jul 2024	→
S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

ssize = recv(fd, NULL, 0, MSG_PEEK|MSG_TRUNC) lets you see how big the next message in a UDP or UNIX socket's buffer is.

This can be important if your application-level communications can support variable message sizes, since you need to be able to provide a buffer large enough for the data, but preferrably not too much larger that it wastes memory.

Unfortunately, a lot of programming languages' bindings don't make this easy.

Python's approach is to allocate a buffer of the provided size, and then reallocate to the returned size afterwards, and return the buffer.

This behaviour is intended to permit a larger allocation to begin with, but as a side-effect it also permits a smaller one to not break horribly.

Well, mostly. In Python 2 it breaks horribly.

$ python -c 'import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)
s.bind("/tmp/testsock")
s.recv(0, socket.MSG_PEEK|socket.MSG_TRUNC)'
Traceback (most recent call last):
  File "<string>", line 4, in <module>
SystemError: ../Objects/stringobject.c:3909: bad argument to internal function

Python 3 instead returns an empty buffer immediately before reading the socket.

$ python3 -c 'import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)
s.bind("/tmp/testsock")
m = s.recv(0, socket.MSG_PEEK|socket.MSG_TRUNC)
print(len(m), m)'
0 b''

You can work around this by receiving a minimum length of 1.

$ python -c 'import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)
s.bind("/tmp/testsock")
m = s.recv(1, socket.MSG_PEEK|socket.MSG_TRUNC)
print(len(m), m)'
(4, 'a\x00n\x00')

The returned buffer's length is that of the message, though most of the buffer's contents is junk.

The reason these interfaces aren't great is that they return an object rather than using a provided one, and it would be unpleasant for it to return a different type based on its flags.

Python has an alternative interface in the form of socket.recv_into, which should fare better, since it can return the size separately, it should be able to translate a None buffer into a NULL pointer.

$ python -c 'import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)
s.bind("/tmp/testsock")
m = s.recv_into(None, 0, socket.MSG_PEEK|socket.MSG_TRUNC)
print(m)'
Traceback (most recent call last):
  File "<string>", line 4, in <module>
TypeError: recv_into() argument 1 must be read-write buffer, not None

Unfortunately, this proves not to be the case.

In Python 2 we can re-use a "null byte array" for this purpose.

$ python -c 'import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)
s.bind("/tmp/testsock")
nullbytearray = bytearray()
m = s.recv_into(nullbytearray, 0, socket.MSG_PEEK|socket.MSG_TRUNC)
print(m)'
4

Unfortunately, Python 3 decided to be clever.

$ python3 -c 'import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)
s.bind("/tmp/testsock")
nullbytearray = bytearray()
m = s.recv_into(nullbytearray, 0, socket.MSG_PEEK|socket.MSG_TRUNC)
print(m)'
0

Like with plain recv it returns early without waiting for the message.

I had hoped to provide a counter-example of a programming language that provided a way to expose this as part of its standard library.

Distressingly, the best behaved standard libraries were the ones that exposed the system calls directly with all the lack of safety guarantees that implies.

In conclusion, MSG_TRUNC is a thing you can do on Linux.

If you want to use it from a higher-level language than C you won't find a good interface for it in the standard library.

If you find yourself in a position to design a language, please bear in mind people may want to do this on Linux, so at least provide access to the un-mangled system call interface.