pages tagged openyakkinghttp://yakking.branchable.com/tags/open/yakkingikiwiki2017-07-30T17:11:17ZOpen your minds, and your data (formats)http://yakking.branchable.com/posts/open-data-formats/Daniel Silverstone2017-07-30T17:11:17Z2017-07-05T12:00:06Z
<p>Whether I am writing my own program, or chosing between existing solutions,
one aspect of the decision making process which always weighs heavily on my
mind is that of the input and output data formats.</p>
<p>I have been spending a lot of my work days recently working on converting data
from a proprietary tool's export format into another tool's input format. This
has involved a lot of <a href="https://en.wikipedia.org/wiki/XML">XML</a> diving, a lot more swearing, and a non-trivial
amount of pain. This drove home to me once more that the format of input and
output of data is such a critical part of software tooling that it must weigh
as heavily as, or perhaps even more heavily than, the software's functionality.</p>
<p>As <a href="http://www.goodreads.com/quotes/589703-the-good-thing-about-standards-is-that-there-are-so">Tanenbaum</a> tells us, the great thing about standards is that there's so
many of them to choose from. <a href="https://xkcd.com/">XKCD</a> tells us, <a href="https://xkcd.com/927/">how that comes about</a>.
Data formats are many and varied, and suffer from specifications as vague as
"plain text" to things as complex as the structure of data stored in custom
database formats.</p>
<p>If you find yourself writing software which requires a brand new data format
then, while I might caution you to examine carefully if it really <em>does</em> need a
new format, you should ensure that you document the format carefully and
precisely. Ideally give your format specification to a third party and get
them to implement a reader and writer for your format, so that they can check
that you've not missed anything. Tests and normative implementations can help
prop up such an endeavour admirably.</p>
<p>Be sceptical of data formats which have "implementation specific" areas, or
"vendor specific extension" space because this is where everyone will put the
most important and useful data. Do not put such beasts into your format
design. If you worry that you've made your design too limiting, deal with that
<em>after</em> you have implemented your use-cases for the data format. Don't be
afraid to version the format and extend later; but always ensure that a given
version of the data format is well understood; and document what it means to be
presented with data in a format version you do not normally process.</p>
<p>Phew.</p>
<hr />
<p>Given all that, I exhort you to consider carefully how your projects manage
their input and output data, and for these things to be uppermost when you are
choosing between different solutions to a problem at hand. Your homework is,
as you may have grown to anticipate at this time, to look at your existing
projects and check that their input and output data formats are well documented
if appropriate.</p>