Config Files vs. Code

For software developers the world over, it has almost become idiomatic to separate data from code by putting the data in a config file. I question that wisdom. I think there are cases for shunting data into config files, but also good reasons to leave data in code.

In a recent codebase I worked in, we had config files to map territories to countries. For example, the config file had a JSON structure that basically said Puerto Rico, Guam and other territories are part of the United States; Bouvet Island, Jan Mayen and Svalbard are part of Norway; and so on. Another config file captured the fact that France, Germany, Sweden, etc. are part of the European Economic Area (EEA). Even more configs: the languages primarily spoken in each country, the currency in each country, and so on. You can imagine parallels like these in other projects: colour names for RGB values, car models for each brand, vegetables classified as tubers, macOS version names, and so on.

These config files capture something about the outside world, not operating characteristics of the platform such as hostname, port and database connections. These facts change infrequently, as opposed to say, user interface messages. Moreover, these facts are not subject to “opinions” about how your platform operates, unlike say, thresholds at which alerts fire. Putting these facts in config files that are deployed along with your code, as conventional wisdom dictates, is actually complexifying your application, and making it more, not less brittle.

Consider a simple application. Say I want to retrieve the currency for a given country. Let’s get the basics out of the way. We will assume the country will be specified by an ISO 3166-1 Alpha 2 code, e.g., US for the United States, GB for Great Britain, IN for India, and so on. Likewise, we will assume the currency will be specified by an ISO 4217 code, e.g., USD for the US dollar, EUR for the Euro, INR for the Indian rupee, and so on.

The naïve me from a quarter century ago would have created a map data structure in code, mapping a String (for the country code) to a String (for the currency code). Add in some boilerplate code for getters, prints, debug, etc. and you’re done. Writing the code would take me about half an hour. Populating the data structure in the code with the first dozen or so entries would take a few minutes. Compiling the code would take seconds. Maybe write a few tests, another hour. And… done!

Done? Not quite. What about maintenance? Ah, fair. I do need to add more countries. And countries do change their currencies. Sometimes countries may drop out of existence as well. Each one of those changes would require me to go back to the code and change it. Which would require recompiling the project containing this code. That’s a pain.

A more modern approach is to extract all of that data out of the code and put it in a config file. In times past, this file may have been a tab-delimited file, then perhaps XML, but today, it’s likely JSON. Or, we could put all of this information in a database. But writing database code seems overkill for this application. Besides, I want to talk about config files for now, so JSON it is.

My code is much simpler now. I read the file, parse it, stuff the contents into a map data structure, again mapping a String to a String. Add in the same boilerplate as before, and you’re done. Writing this code would take a bit longer, maybe an hour or slightly more, so I could deal with file exceptions, parsing, etc. Compiling and testing is ballpark, the same amount of time.

Ostensibly, maintenance now is simpler, because I can – or I can ask my users to – edit the JSON files and add, update or remove entries as they see fit. No recompiling necessary. No need to touch code, therefore fewer chances of introducing errors. So… we’re better off, right?

I don’t think so, conventional wisdom notwithstanding.

Let’s start with the maintenance bit. Editing JSON vs. editing a map data structure in code is about evenly easy or difficult. If you or your users can edit one, they can edit the other. If your users can’t be trusted with editing a map, then I argue JSON is even more difficult to edit, and now you need to build a user interface for them, which is far more code. The reason the JSON is more difficult to edit is because you do not get instant feedback on your JSON edits if you use a typical editor. If you edit code in an IDE, any syntax errors are caught instantaneously. It’s the ultimate “upstreaming” or “shift left” idea – your errors are caught not at run-time, not during testing, not during compiling, but during editing!

Sure, you could use a JSON editor, but the best you will get out of it is fixing the silliest syntax errors, like the presence/absence of colons, commas, quotes, braces and brackets. Contrast that with a halfway decent IDE, which can not only catch silly syntax errors, but also type and structure errors. Your IDE can tell you about spurious and missing elements in your data structure even if your syntax is correct. If you’re willing to put in more effort into your data structures, you could change the map to be from an enumeration (for country code) to another enumeration (for currency code), and instantly start getting a modicum of semantic checking under the guise of type checking. In other words, with that extra effort, you can ensure that your map always has a legitimate country mapping to a legitimate currency, and that a country is not repeated, all at edit time. Heck, if you forget to add a country to the map, your IDE can probably also warn you about it.

What about the recompiling bit? I will postulate that the project that contains this currency retrieval code is either worked upon often, or it is not. If it is not, the entire project is like a library that can be compiled infrequently and linked into the rest of your codebase. Yes, this library will change when you add, update, remove countries, but we’re in the not-worked-upon-often clause of my argument. Contrariwise, if this project is worked upon often, other unrelated changes will initiate recompilations anyway. The few extra milliseconds to compile this code doesn’t move the needle either way.

Despite these arguments, if we generously concede that the code approach is about the same effort as the config file approach, let’s take a second look at that config file approach, and consider the hidden complexity.

If we decouple the data from the code in this example, we now have to carry around two files – one for the code, one for the config – and make sure both are “linked” together meaningfully. “Linked” how? For starters, both have to be present in your deployment for this tiny application to work. Sure, you can tighten your deployments, think of config as code, and then deploy. Well, now, every time your config changes, you have to deploy again, exactly as would have been the case had you embedded your data in code. If you deploy your config separately from your code, you have to worry about previously-unforeseen error cases, like what happens if the file is absent? what happens if the file doesn’t have the right permissions? what happens if the file gets corrupted? what happens if there are race conditions between deployments of new versions of the code and the file?

It gets worse. If you decouple the people – human beings – who edit the code and who edit the config file (a favourite trope of software developers), how do you ensure that these two sets of people have the same mental model of what goes in the config file? How do these people agree about what to do about missing countries? Or about duplicate entries? Our countries accidentally mapped to more than one currency? Or with entries having extra JSON snippets? Ah, documentation to the rescue. So now, we create a third file that explains – in human-readable language – what mental models we need to adopt to understand how the code and data interplay. And… who will keep that documentation up-to-date? Who volunteers to read it every single time either code or data changes?

What if we wanted to make legitimate changes to this application? Say, we wanted to add the year the country adopted this currency. I won’t belabour the benefits of type-checking the year here – make it an integer and reap the benefits in the code-based approach. I will say though, that a code-based approach makes it trivial to write a unit test to perform a bounds check on that year – is it a legitimate year that has occurred in human history? A config-based approach postpones such a check to run-time, and even that check will be sacrificed at the altar of “fast startup time”.

How will the changes be rolled out? In a code-based approach, change the data and code in one shot, submit one pull request, and you’re done. But in a config-based approach, either you synchronise the code and config changes in one deployment (in which case, you didn’t gain anything by separating the two), or you tolerate asynchronous code and config deployments. That means, you now have to deal with backward compatibility. Your config file cannot worry about compatibility because, by definition, it is data alone and cannot worry about anything. But your code has to worry about parsing the new format and the old format of the config file. That leads to code bloat, and the inevitable hand-wringing about technical debt. Erasing that technical debt means deprecating the old config format so we can trim the code to deal with just the new format. Deprecate too soon, and you now have a problem if your config, for whatever reason, rolls back. Now, your code expects the new config format, but you have an old config, and… boom. Treat the new elements as optional? Sure… back to code bloat, because now you need test cases for when the year is present and when it’s not present, you need sensible defaults, and your entire codebase has to understand about optionality.

So, should we do away with config files altogether? Embrace the atavism of data embedded in code? I don’t think so. There are plenty of cases where you need config files. For example, if you have different service configurations for “production” and for “testing”, it makes sense to group those in different config files. My cautionary note is about cases where the config files are capturing some truths about our world, truths that change relatively infrequently, and truths that are, um, true, regardless of your service configurations. Truths about countries, currencies, continents, months and days of weeks, browsers, operating systems, automobile models, colours, genres, and many, many other things are truths like that – they are about our world, they change slowly, and they are true no matter in what mode your application is running. We should seriously consider returning these configurations back into code.

Advertisement

Pandemic 2020, +/- 25

As if we needed another reminder, we’re in a pandemic, right now in the year 2020. The pandemic is an unqualified tragedy, a global disaster. We grieve for the people we have lost and rightly lament the suffering of millions.

But we also cope. Many of us have been fortunate to continue our lives and livelihoods, thanks to various technologies and improvements we have inherited over the years.

Pandemic 1995

What would it have looked like had this pandemic hit us 25 years ago, in 1995?

Cue the Spice Girls, NSync, Backstreet Boys music. Or Nirvana. Or Destiny’s Child.

The first and obvious thing we would notice is no video conferencing. No virtual backgrounds, no thumbnail-sized bag of pixels representing another human being. No “can you hear me now?”, though in 6 years time we would expect Verizon’s famous ad to show up. Video conferencing in 1995 was like flying cars in every year – something that was always around the corner, but never showed up. Imagine today’s workday with no video conferencing… we would have had to use actual phones to make actual calls! To other people! Not some 800 number.

Which would have been another problem. We didn’t have smartphones back then. I don’t know about you, but most of us didn’t even have cell phones, smart or dumb. Those video, no, phone conferences, would have to have happened over a landline. Shared by Mom, Dad, kids, everyone stuck at home in quarantine. Brings a whole new round of grief to scheduling meetings when everyone has to book time on the landline.

And that same landline was also our lifeline to the internet. Remember the squeaks, groans and twangs of dial-ups? The groans were mostly from us when someone picked up the phone and disconnected our download of a picture of Sporty Spice. Drawn in ASCII characters… because no Unicode. No reading Chinese or Hindi or Arabic letters. Or emojis. How did we ever manage without the laughing-with-tears-in-eyes emoji? We probably had to type it out, like some farmer from the 1950s. No heart emoji, no head exploding emoji, no beer emoji, no T(om) Hanks emoji to express gratitude. Makes you want to slam your head into your desk. No emoji for that either. It’s enough to make you cry, if you weren’t laughing out loud. Not LOL, because FYI IIRC, ROFL NSFW BRB was gibberish then. Just as now.

The slow internet speeds back then meant no video-conference of course, but also no streaming video. No downloading movies either because that would take like a day and a half, and we would have had all of those phone conferences to interrupt the download. Heck, no downloading music either. We would have had to rummage through our extensive CD collection to stick one into a physical Discman and walk around tethered to the 2-pound thing with wired headphones. But hey, it beat hauling a boombox over one shoulder. Life was great in the 90s! We would be hoping someone invents a tiny device, call it myPod or something, to hold our playlists.

The web was just beginning back then, which would have put a serious crimp in our quarantine-relieving online shopping. In the mid 1990s, we had heard a rumour that Pizza Hut was assembling a web page where you could select a pizza, press a button, and that would place your order. (Offer available in California only.) Ridiculous! Who would order pizza from a computer?! During a pandemic, the right way to order pizza would have been to swamp those landlines with calls for pizza. And stay on hold while “someone will be with you shortly”. And that someone couldn’t even be someone from Bangalore because outsourcing customer service wasn’t a thing yet. Heck, half the time, customer service meant going to a brick-and-mortar building and arguing about an extra charge of 25 cents on last month’s cable bill. We hadn’t begun worrying about Y2K because that was so far away, but maybe with a pandemic, we would have.

Back to the web. No pizza, no ordering any food. Or clothes, not even face masks. Or toilet paper. Or exercise bikes. Or bread machines. Or… anything, really. It would have been pretty miserable being cooped up at home, making do with the stuff we already had. There was this website coming up, called Amazon, where you could buy books – and books only! – and they would ship it to you in like 4 weeks. Which was pretty quick, when you thought about it. And the book would be a real, combustible book because… no Kindle. Or iPhone. Or Android. The Barnes & Noble Nook was yet to come, much less leave. But the nice thing was that because a little-known company called Google hadn’t yet disrupted search engines, searching for anything on the web was a pain, and with a sigh of relief, we’d thumb through the yellow pages. Which were actually yellow even when they hadn’t yellowed with age.

We would have filled our time writing letters to friends and families. On paper, because no Facebook, no Instagram, no SnapChat. But we had email, so we could indeed mass mail all our friends and family those gorgeous snaps we took of the last vacation. No chat or texts though, and IRC doesn’t count because that was for nerds only. We would have to pass the time looking for clouds in photos, not photos in clouds. And the only tweets we could enjoy would be from jaybirds, not jackasses. Even all of our stress cooking would have followed recipes on paper. Spilling Sriracha on a piece of paper, instead of kombucha on a tablet.

But hey, at least there was cable TV. Though most of it was unwatchable. Who needed 200 channels when the best thing on TV was Seinfeld on Thursdays at 9PM? With no such thing as binge-watching known to humankind, we would have sat on a couch with half of America to watch it for the first time ever. But, with a pandemic raging, we couldn’t rush to work the next day to discuss the soup Nazi, the re-gifting or yada yada yada. As for movies, with no Netflix, we could always walk over to the nearby Blockbuster and get a new DVD… oh wait, that would be closed during the pandemic. Maybe we could turn on the news and watch a president make a fool of himself. Some things don’t change.

Pandemic 2045

What would it look like if this pandemic were to hit us in 25 years, in 2045?

Cue whatever passes for music in the future. Animal growls? Atonal postmodern compositions? Music from our alien overlords?

Obviously, if this pandemic were to hit us in 2045, we would have a cure. And a vaccine that probably tasted like apple pie or double caramel extra shot pumpkin spice latte or whatever it takes to make even anti-vaxxers take a vaccine. Not that we’d need a vaccine, because we’d all be enclosed in our personal force-field bubbles that ensured we never encountered another living being’s bodily fluids ever. But still, we’d take the vaccine, because pumpkin spice latte is forever.

But, let’s assume there’s no vaccine, and for some bizarre reason, the one bit of technology we won’t have invented by then is an Insta-Vaccine Fabrication Machine where you just throw in some store-bought ingredients, press a blend button, and hey presto! instant vaccine. Because that would be too easy.

Obviously, quarantining at home would be bearable because we would have holograms. Holograms of teachers on holographic whiteboards. Hologram buddies playing video games with us. Hologram happy hours. Hologram office meetings. Hologram parent and inlaws visits. Or not, if the hologram machine is mysteriously kaput just on those days. Hologram hangouts with Paul Rudd, who probably has aged to a gracious 37 by then.

Pretty sure that by then the Artificial Intelligence stuff will have reached and surpassed the singularity by then, and the machines will be running all ten billion of us. At the very least, the AI would be able to re-create a hologram of Einstein, or Beyonce, so we can chat them up. And even though we can’t step out, we can immerse ourselves – literally – in hologram versions of the old classics, like Star Wars, episode 26. Who could have known that holograms would be filling such a big void in our lives in a future pandemic?

Shopping for food would probably be easy because the three grain crops we would be left with and the four vegetables can be combined in only about a dozen ways. If you have tried a soy-spinach jello once, you’ve tried them all. Our expanded circle of compassion would have made vegetarians of all of us by then so meat would not be an option, although that cricket flour would be beginning to sound respectable.

Staying at home would be a hassle without pets though. That same expanding circle of compassion would have moved us to liberate all pets. The cats went feral, the dogs died immediately, half of them from overeating garbage. But, home would be a refuge from the overwhelming stench of overflowing landfills, the perennial forest/scrub fires and the random and unpredictable weather patterns, so there’s something to be said for that.

It’s hard to imagine what entertainment would look like in the pandemic of the future. After all, we’re a species that is entertained by watching other people play video games. Or watching people fail at microwaving a cup of water. For a species whose pleasure ranges from the intellectual depth of a Bach fugue to the interplay of emotions in a cat video, entertainment offers many choices. Maybe, in the future, news readers will be Shakespearean, all movies will be drawn by computer, and all presidential debates will include jousting. Maybe we will watch a president make a fool of herself. Because some things don’t change.

Homonymphora

This edition continues an earlier trend in wordplay, homonymphora. Once again, I take my fascination with repeated words to a ridiculous extreme. This time around, the sentence is:

John, where Jack had had "had had", had had "had";
"had had" had had a better effect for the teacher.

Let’s dissect that sentence a bit. Presumably, Jack and John are grammar students trying to impress a teacher with their mastery of tenses. The sentence above recounts that Jack’s construction, in all of its past perfect glory, impressed the teacher more than John’s measly simple present construction. Let’s look at each occurrence of “had”, sometimes in pairs.

  • John, where Jack – as-is.
  • had had – past participle of “have”; can be replaced here by “had written”.
  • “had had” – still past participle of “have”, but this time refers to Jack’s writing assignment; can be replaced by “_some past perfect_”.
  • had had – past participle of “have” again, keeping with the overall tense of the sentence; can be replaced by “had written”.
  • “had” – simple past of “have”, referring here to John’s writing assignment; can be replaced by “_some simple past_”.
  • “had had” – still refers to Jack’s writing assignment; can be replaced by “_some past perfect_”.
  • had had – past participle of “have” again, keeping with the overall tense of the sentence; can be replaced by “had caused”.
  • a better effect for the teacher – as-is.

Rewriting the sentence with appropriate substitutions:

John, where Jack had written _some past perfect_,
had written _some simple past_; _some past perfect_
had caused a better effect for the teacher.

Eleven “had”s in a row – my new record.

Solving Voting

Voter involvement is a challenge facing democracies, in particular the US. Voting for political positions is too hard, and process impediments make it harder. Some of these impediments are archaic, such as the requirement that national elections be held on the first Tuesday after the first Monday in November. Some of these impediments are devious, such as the attempt to suppress low-income or minority turnout by reducing the time available to register to vote. Some of the impediments are procedural, such as the requirement that all eligible people vote on the same day, within a fixed set of hours. Along with these impediments are various confounding items, such as the bogeyman of voter fraud, the prospect of hand-counting inaccurate ballots especially during recounts, the inconsistencies around counting of absentee ballots, etc.

I believe technology can solve all of these problems. For the rest of this discussion, let us put aside the question of political will. Can the rest of the problems be solved with the best technology we have or can invent? Will technology introduce new problems? Can we surmount those? If we can, I believe we will bring voting technology into the modern age, increase voter participation, and ensure fairer elections.

I believe we have the technology available to solve the problem of secure voting. Daily, we transact billions of dollars on the internet, using banks, credit cards, shopping sites. For the most part, these transactions occur flawlessly, with no loss, no faking of transactions, no leakage of private information. In addition, annually, an increasing fraction of the population submits its tax returns electronically, again flawlessly, safely and privately. Technologies such as cryptography and blockchain hold the promise of enabling secure transactions, with vanishingly low probabilities of insecurity.

There are three classes of problems that must be solved with technology. First, the basic voting problem, with its attendant sub-problems of authentication, authorisation, privacy, counting, etc. Second, the problems intoduced by technology, namely the perception that the technology can be hacked, the programmers introduce bugs or bias, the dangers of spoofing identity, the need for a “paper trail”, etc. Third, for the forseeable future, sections of the populace may not have access to the technology, therefore the technological solution must co-exist with a version of the “old-school” process.

For now, I do not have solutions to all of these problems. I don’t believe I will be able to invent the best solutions for all of the problems, or any solution to any of the problems. My goal in this post is to merely lay out the problems we have to solve, to spur our thinking.

Repeated Words

For inexplicable reasons, I am fascinated with a peculiar form of wordplay that I dub “homonymphora“. The idea here is to create a meaningful sentence in which a particular word is repeated consecutively. No cheating: the repeated word cannot be a proper noun, it cannot be some form of callout. It has to be a legitimate word, used legitimately, although each occurrence of the word can have a different meaning.

An early example I came up with is:

He said that that that that that man used was incorrect.

Translating into something more meaningful:

  • The first “that” is a conjunction introducing a subordinate clause.
  • The second “that” is a determiner, used to identify a particular thing. At the risk of being temporarily verbose, we can replace it with “the specific”.
  • The third “that” is the specific thing in question, which happens to be the word “that”. It could have been any word, so let’s replace it with “_word_”.
  • The fourth “that” is the a conjunction again, introducing a dependent subordinate clause. At the risk of being temporarily grammatically incorrect, we can replace it with the conjunction for an independent subordinate clause, namely “which”.
  • The fifth and final “that”  is a determiner again, used to identify a particular person. At the risk of being temporarily verbose, we can replace it with “the specific”.

Reworded, the sentence becomes:

He said that the specific _word_
which the specific man used was incorrect.

And there you have it, an example of homonymphora. I don’t know why it has fascinated me for years, but it has.