As it turned out, the most copied piece of Java code on StackOverflow contains an error that no one has noticed for nine years. Now the snippet author Andreas Lundblad, a Java developer at Palantir and one of the most influential members of the StackOverflow community, has discovered the bug.Back in a 2018 scientific article, the Lundblad snippet was recognized as the most copied Java code with StackOverflow, which was used in many open source projects.
“This code has been copied and implemented in more than 6,000 Java projects on GitHub”, – analysts calculated.
This snippet was originally published as an answer to a question in September 2010. The idea was to convert 123,456,789 bytes to a human-readable format, for example, 123.5 MB.
Last week, Lundblad told the blog that he found an error in the code: it turned out that after the publication of the mentioned scientific article he noticed that he had converted the number of bytes then, and now he prepared a corrected version of the snippet.
“In a recent study titled Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects, an answer I wrote almost a decade ago was found to be the most copied snippet on Stack Overflow. Ironically it happens to be buggy. Back in 2010 I was sitting in my office and doing what I wasn’t supposed to be doing: code golfing and chasing reputation on Stack Overflow. Personally I would not copy this snippet into production code ))”, — said Andreas Lundblad.
Fortunately, the bug turned out to be quite trivial and, as a result, could only lead to minor inaccuracies in estimates of the file size. It could have ended much worse if the error had triggered security problems. In this case, it could take years to fix all the projects that are vulnerable because of this bug, since many developers do not think at all about the possible consequences when copying someone else’s code from StackOverflow.
In addition, many people intend to copy the code without attribution and actually hide from everyone that introduced unverified code into the project.
For example, in the fall of this year, information security researchers calculated that on GitHub can be found 2,859 projects that used borrowed and dangerously vulnerable fragments of C++ code from StackOverflow. Experts have identified and searched for only 69 such problematic pieces of code over the past 10 years, and in fact, there can be many more such errors.
Advice from a repentant programmer:
- Stack Overflow snippets can be buggy, even if they have thousands of upvotes.
- Test all edge cases, especially for code copied from Stack Overflow.
- Floating-point arithmetic is hard.
- Do include proper attribution when copying code. Someone might just call you out on it.