An Idea why “Business Software” is often lower-quality Code than OSS.

Abstract
A showerthought about how differences in test surface might lead to OSS having higher quality code than close source.

Code quality in closed source vs open source

When I listened to people working on closed source “Business Software” I often heard how the code quality is legendary bad. Their stories sounded far worse than the stories of people working on open source software. While I am sure that there exists some Business Software that has great code quality and there exists some open source software that has terrible code quality, the theme of “Business Software has worse code quality than OSS” stuck to my head. I heard that too often to dismiss these reports as statistical noise. When I got my first job working on closed source software - a backend of an webapplication of some large company - I confirmed this myself. Of course I had seen bad open source code before, but only the worst open source projects could rival the quality of this code. It was in a completely different league than meson, the open source project I was working on at the time (and potentially still working on, depending on when you read this). So, I consider the thoery of closed source software being worse code quality than open source software to be empirically confirmed.

Explanations I have heard

But something about that is nagging me. I believe that this code quality discrepancy exists, and I even believe it to be quite large, but I do not know why this effect exist. It bothers me that we have observed it but have no real idea why it is that why. The empircal evidence is nice, but we do not have a theory for it. Of course I talked to other people about that, and they also largely believe this theory to be true, but they offered very little explanation. Some stuff I heard:

“Open Source is done by people working in their free time. They can take as long as they want and they do not have a boss telling them to work faster. If you work without deadlines, you have time to polish everything.”

Two problems with this theory: 1. Large parts of open source is done during paid work hours. 2. If you do something in your free time, you also want to be done as fast as possible. Time is not unlimited, even during weekends. It is probably even the other way around, Time is not unlimited, especially during weekends.

“Open Source is done by people working in their free time. Who codes without getting paid? People who have a passion for software. People who have a passion for software consider programming an art and therefore want the result to look beautiful instead of just wanting to get the job done.”

Probably part of the reason why we observe open source software to be nicer-written.

“Open Source is done by people working in their free time. Who codes without getting paid? People who have a passion for software. What happens if you have a passion for software? You get better at it. Therefore open source developers are higher skilled than closed source developers and therefore write better code.”

Also probably part of the reason why we observe open source software to be nicer-written. I observed that people who only code during paid hours are lower skilled than people that go to hackerspaces or similar, but my sample-size is really low here.

“If the code you write can be read by everyone, most people will try harder to write good code in order to not embarrass themself.”

I do not buy that.

A new idea

A few days ago, a new explanation for this code quality discrepancy came to my mind: I was fixing mypy warnings in meson. Most of the warnings could be removed by adding/changing type hints, but some could not. As a temporary fix, I fixed those by adding assert isinstance(somevar, sometype)-statements. Without these statements mypy was not able to prove that somevar is always of type sometype and therefore warns. With them, mypy can (quite easily) prove that somevar is of type sometype (at least in the lines following the assertion) and therefore does not warn. Later I took a closer look at those assertions. For each one of them there are two possibilites.

With a combination of reading the source, print-debugging and using the unit-tests I managed to find a way to trigger a (to me) surprisingly high amount of these assertions (3 out of 4 if I recall correctly). In other words, 3 out of those 4 mypy warnings (not counting the ones that were fixed by adding/changing type hints) were real bugs that a user could accidentally trigger. This made me think of the times at work, when I saw code that very much looked like it was a bug. Since I do not like working with bad code I wanted them to be fixed. I knew that if I said to the lead developer “hey this code here looks bad, can you please tell me to fix it”, I would be told to fuck off. But if I could find a way a user could trigger this, then the chance of our lead deciding to tell me to fix it would improve. Unfortunately I was never successful in triggering the bug, at least not that I would recall.

This left me thinking why I was so successful in triggering the meson bugs, but so unsuccessful in triggering the webapp-backend bugs. Two effects came to my mind:

This made me think about another question that devs often ask themselves: Is this exploitable? E.g. Is this memory safety issue exploitable? The security guys will tell you that the probability of a bug being exploitable rises with the amount of control the attacker has. When I say “control” I mean control over how your programs memory looks like and what codepaths will be taken. For example, a memory safety issue in a javascript engine is much likelier exploitable than a memory safety issue in a program that has a less direct connection to the internet.

Similar to the phrase “attack surface” we could define a “production test surface”. Something like meson has a large “production test surface” whereas our webapp-backend has a small “production test surface”. How does the size of the production test surface affect the development of a project? If you have a large production test surface, you will not get away with writing half-broken code, your only chance is to write good code and actually understand your code. Whereas with a small production test surface, you can get a way with some hacky stuff, some amount of “I don’t know why, but it works, so don’t touch it” and “just try random things until it works”. You would not develop a large production test surface software like you would a small production test surface software, because you would drown in bug reports that are hard to fix since the code is hard to understand. And you would not develop a small production test surface software like you would a large production test surface software since that would take more time. In other wordss, large production test surface software needs good code-quality, small production test surface software does not need good code-quality.

Now, let’s compare the market share of open source software vs closed source software across different domains: For example compilers and interpreters are far more often open source than e.g. video games. The market share of open source software in a given domain correlates with the size of the production test surface software. Since we saw in the last paragraph that code quality correlates with production test surface size, code quality correlates with the project being open source.

I would love to hear your thoughts.