Open Source—the future of science?

Sebastian Meznaric

The recent scientific crisis around climate research data leaks has greatly damaged the credibility and respect usually accorded the scientific community. The collected global temperature data was the subject of statistical analysis where the scientists in question used a “trick” to conceal certain decreases in the temperatures measured. The “trick” apparently went unnoticed through the peer review process which ended up leading to the processed data being published in a high profile scientific journal. After failing to obtain the data by other means, the sceptics wishing to analyse the data for themselves were (as they saw it)  forced to resort to using the Freedom of Information Act.

Their efforts went unrewarded as the requests were routinely rejected and evaded by the researchers. This situation eventually exploded in dramatic style in November with startling news of the public data leaks conducted with the help of hackers based overseas. The damage that was caused to the image and reputation of the scientific community was grave and should lead us to ask: could there be an overarching solution to deal with such problems in the future?

Open source, whether in computing or in broader terms, is a principle advocating free access to the end product’s “source materials”. This may be the source code (in the case of computer software), it may also be the design specifications for a product or it could be the data used for a statistical analysis in a scientific project. The main guiding principle behind it is peer production by collaboration. The end product is made available to the general public at no cost at all.

The creative practice of sharing the source of one’s work is nowhere more appropriate than in science. The work is by its nature collaborative and very often publicly funded. As such, it should be freely available for public examination.

Other than raw data, there are numerous scientific projects where a computer programme is the key part of the project. The need for verification of the results by the scientific community would dictate that the code be made available for inspection and modification. Indeed, if in the climate scandal noted above, the raw collected data had been made available from the beginning, the errors in the analysis could have been noticed and corrected early, benefiting both the integrity of the scientific process and the search for truth. In today’s competitive research environment, however, the data and source code for computer programmes are not always freely available.

The competition among various research groups makes the idea of hiding one’s software code and/or data (we will henceforth simply use “source” for both terms) extremely attractive to most scientists. The implication is that sharing the source would make it easier for other groups to reap the benefits of one’s hard work. However, it would be very easy (and indeed necessary) to give credit to the principal author of the source by making them a co-author of the resulting publication.

Indeed, the practice of making people who collected the data and/or wrote the code co-authors of journal articles is actually already well established. For large projects where co-authorship is impractical, like perhaps CERN-related findings, the name of the open source project can simply be referenced in the acknowledgements. Such practices would avoid having a very large number of authors while at the same time give credit where credit is due.

Another commonly used argument is that competition drives the scientific research better than openness. Different competing research groups in the same field might therefore use their own self-written versions of software designed to accomplish very similar tasks. Often, these groups would compete with one another in adding new functionalities and improving the performance of their code in order to publish new results before other competitors. However, as we see with Wikipedia, Linux and other greatly successful open source projects, more “eyes” see better and, more importantly, think better. Scientific collaboration among peers very often leads to ideas that one would not think of in a smaller group or on their own. Indeed, dramatically increasing the group of people working together on a scientific software project often quickly leads to a sky rocketing improvement in performance and applicability. Perhaps even more crucially, researchers would have more time to focus on science rather than coding or collecting data.

The open source concept has been successfully used in the commercial world, notably in the automobile industry, where the patent sharing started by Ford led to automobile design innovations moving faster than ever to the great benefit of the general public. The sharing of technology did not at all reduce the competition among the companies nor their innovative drive.

The adoption of open source models in science would not only foster greater creativity, but would also attract interest in science from programmers and other interested parties, further increasing our global productivity and efficiency.

For instance, the field of biotechnology is fast adapting to the drive for greater openness in the scientific process. Other disciplines will hopefully follow suit to harness the greater efficiency and openness offered by the open source development model. Whether the scientific community at large adopts the open source paradigm remains a matter of speculation but, considering the climate data leak fiasco, the potential benefits are surely beyond dispute.


Sebastian Meznaric is a theoretical physicist and doctoral reseracher at the University of Oxford. His areas of interests include the study of information theory in quantum mechanics. He is also a keen observer of politics and current affairs.