How you write the code matters, by John T. Murphy

john.t.murphy · December 23, 2016, 8:51pm

The challenge in writing an essay entitled, “How you write the code matters,” is, of course, that there is a strong sense in which it, in fact, does not.

Code is functional, and its function is to generate results that are useful to us as scientists. “Good code” gives correct (i.e. useful) results, and code that gives incorrect results is either useless or (worse) misleading. How you write the code is, in this sense, superficial- it doesn’t matter.

The first premise of OpenABM is that making code available is necessary to ensuring that the science based on this code is reproducible. Theoretically, this is also sufficient: a record of the code can, in theory, almost always be used to reconstruct the model that was implemented in it. While it is common for languages and hardware to change in subtle ways that sometimes make perfect replication impossible even after only a short time, few have changed so much that it is entirely impossible to discern how original code would have run.

But this is only the first premise of OpenABM; the second is revealed in the process of model certification. To become a certified model, the code must be documented and shown to be useable- not just that it can be run, but that it can be used.

To make the difference clear, we can look at an extreme. Code could be written that would be virtually impenetrable to a human being- and, in fact, code ‘obfuscators’ deliberately mangle code through tactics such as replacing meaningful variable names with nonsense, removing all comments, or distorting whitespace and line breaks, breaking code across files, or breaking units of code into separate fragments. All of this can be done without changing the output of the running code. These tactics protect code against infringement, but they are, of course, the opposite of the goal of OpenABM. This goal is not just reproducibility, but clarity.

This suggests that code has an additional function: it should be usable by researchers as a document of the scientific process that it represents. This additional demand runs counter to the pure running of the code: how you write the code definitely matters. OpenABM’s certification process focuses on this second function, and it requires that the code be documented and runnable by someone other than the original author.

In practice this straw man- that code is only functional- is widely off the mark; it’s easy to specify multiple functions that code plays:

The results that it produces
As a document and record of the scientific processes used
As a representation of, or a reflection of, a body of theory (model)
As a framework for performing multiple different kinds of tests within a single domain
As a module or component for other simulations

It is possible to evaluate code on the basis of how it fulfills these functions. (This is in addition to other ways to evaluate the code, such as the computational efficiency with which it achieves its tasks.) And it’s also notable that the ability of the code to perform some of these functions is not one-sided, but instead can be subjective and dependent on the training, ability, and familiarity of the people working with the code.

But the real difficulty is that all of this must be couched in terms of costs. There is an old saying in software development: that development can be done well, quickly, or cheaply- pick any two. Software development must always balance these, but this balance is more complicated when ‘well’ can have multiple meanings. One of the main motivations of the CoMSES network is the establishment of a community of modeling professionals who understand all of these different aspects, and can realistically assess whether effort spent on one or more of them is appropriate.

A brief commentary is not intended to provide a full resolution to these issues; instead, I will close with three short observations.

First, the different functions and the balance among them can be conceived within the framework of a lifecycle; at different points in the development cycle, different aspects carry different weights. Understanding this shift and moving the code through this process is a key part of scientific computational modeling.

Second, the design, budget, and standards of practice should allow for this. There is an increasing recognition that any research project needs a data management plan; the creation and distribution of code generally falls under this, but because code serves a range of different functions, the lifecycle of code may be more complicated than has generally been recognized.

Third, there is an additional aspect pertaining to team structure. Models typically occupy a central position in research project, and the computational model serving function #3 above can become a place where domain specialists, coders, and individuals who bridge those two aspects, intersect. But this also means that the subjective element becomes highly salient: the domain specialist may know little about ‘good’ code design, and may not value time spent on this aspect; a person trained strictly in software engineering may assume that good code design is exclusively based on metrics based in engineering principles that make the code less, rather than, more accessible and less reflective of the underlying science. Individuals on the team may have different levels of experience and training in these fields, and may play different roles as the project progresses. The challenge in this situation is that when there must be a balance between the different components, and when the costs of each must be assessed, who decides?

While the CoMSES network has many purposes and many aspects that bring us together as researchers, one is the navigation of this work: finding the balance that allows computational models to achieve as many of the goals we have for it as possible. The next few years will see more steps forward in solidifying the ways that computational modeling can be conducted as a profession; I hope that CoMSES and OpenABM can continue to play a leading role in this important work.