Do we need licenses for our simulation models? by Mariam Kiran

john.t.murphy · October 21, 2016, 7:51pm

In the many years since its original description, the well-known Sugarscape model has seen multiple reproductions and versions released by multiple developers., As Sugarscape evolved, it ended up being used in a range of diverse fields, including the social sciences, economics, and biology, studying behaviours such as hierarchy, society and organisation. Sometimes the authors of these new versions gave credit to the original Sugarscape developers; sometimes, however, these newer versions did not, and, furthermore, did not document what was changed nor what the reproduction is designed to do.

In a case like the Sugarscape model, this can lead to the contributions of the original authors being blotted out or omitted entirely, costing them the academic and intellectual credit they deserve. But similar trajectories apply to other models, such as models of the stock market that are now being used to predict market changes- enough to start earning a monetary value. These scenarios raise a question for the M&S community: is there a need to recognize original authors and contributors for models and research as they become increasingly valuable?

The answer lies in whether there is a need for having model release standards. Modelers often maintain their own model libraries, where code is uploaded with version numbers, how-to-run help files, simulation details and accompanying dependencies – all to allow easy result reproducibility. However, there is a lack of ‘modelrelease’ standards in the modeling and simulation community (M&S), discussing details on how models are documented, used and modified as research progresses. Most models are released to the public-domain as ‘free-to-use’, extend and publish results. However, recent reproductions and publication problems have raised issues of reliability and confidential research tools being used in policy making.

The software writing community, particularly the open-source community, has largely investigated problems of what is open-source and how developers re-use software, in commercial purposes. Stallman argued a case for code freedom and GNU General Public Licenses (GPL), attaching readme text files with every code release. GPL, itself, has multiple versions, allowing user’s different access and re-use or sell details. Adobe, YouTube and more have been releasing their own licenses with their code carefully crafted using legal terms. Other licenses such as creative commons, MIT or BSD grant users more kinds of access, leading to so many licenses with more complications in trying to find the right license for me for developers to use. Here, there is a need to create a distinction between commercial and academic research software licenses. While the commercial world seems to have a great number of legal guidelines, academic research mostly favours open access and model reproducibility because of the nature of publications, transparency and credibility. Reuse of models is often preferred but there do not exist any methods to allow original authors to say how they would want it to be reused.

Simulation code and its rights

The plethora of legal terms for software releases becomes increasingly complicated for developers. Releasing code as open-source by default may not be a good idea if someone else downloads it and starts making money from it. Various terms can thus be added to create tailored licenses for specific developer releases (See Polhill 2007 for more details):

Copyleft allows original authors to relinquish all rights to the software, allowing it to re-used, modified, enhanced and redistributed by future developers.
Create an unrestricted right to execute the software allows certain number of allowable users to run the code.
Grant access to inspect the code allows users to read or critique the code.
Right to re-implement the code.
Right to modify the code, which is different from above and implies using most of the original code and adding or deleting parts of it for desired model output.
Right to redistribute the code in the modified version. Here the GPL v3 license, particular states, that any modification re distributed, would attach the original authors name as well as a means to protect the original copyrights.

Agent-based modeling frameworks and their licences

Agent-based modeling toolkits allows modelers to easily write and simulate their models. These toolkits have their own licences to allow users to use them. However, they do not document what happens to the model written using them. One could argue, that toolkits released under the GPL license, which basically means all files modified or reused should also be under the GPL license, would automatically deem models to be GPL as well. But the argument would not follow for toolkits with licenses other than GPL. This raises a number of arguments here:

Should models then have their own global licenses independent of the toolkits used?
Should model licenses complement toolkit licenses; because there is a dependency, needing each other to simulate and execute?
Do models written using toolkits, naturally inherit the toolkit license?
If models are released to the public domain for free-to-use, but the toolkit is not, how does this relationship play when it comes to seeking permission to simulate models?
Is there a need for toolkit developers to engage with the developer community, tracking the models being written, in cases the models become complex and valuable?

All of the above questions, stipulate the need for a wider community discussion involving toolkit and model developers to come together and agree on releasing standards. Efforts by CoMSES, Repast, JADE and research projects have attempted to create model catalogues which can shine a light on the general direction being developed. But currently there is a need to have a uniform body to discuss these problems. Perhaps the leading conferences in Complex Systems can be excellent venues to aid this agenda. Commonly used ABM frameworks and their licenses include Repast: BSD, JADE: LGPL v2, SWARM: GPL, SOAR: BSD, FLAME: Originally GPL v3 (now being commercialized new license unknown), Starlogo: Free, Netlogo: GPL and Mason: Academic Free license (More info: https://en.wikipedia.org/wiki/Comparison_of_agent-based_modeling_software)

Where do we go from here?

Simulation code is intellectual property, similar to any source code. The complication of using toolkits to run these simulations lead to contradictions for licenses. But what constitutes to code requiring licenses? Simulations written in MATLAB, have an inherent dependency of MATLAB that carries a commercial software. Even if the simulation code is released free, it cannot be reused until one has MATLAB installed on their lab computers. This leads to an interesting chicken and egg conundrum of whether MATLAB simulations can be released by researchers, or does MATLAB own them? Keeping in line with academic purposes, there is also a discussion on algorithms. Simulation code can also be rewritten as set of algorithmic steps which inherently removes any toolkit dependency for it. This also allows models to be written for multiple toolkits regardless of computational dependencies.

The algorithm likeness seems to reduce some of the complexity in license discussions here. In order for any code to carry a license, it should be independent. So would it be easier to call our models algorithms rather than code? Recent successes of new tools such as Ipython, Jupiter and Docker could also help in releasing platform dependencies for code simulations.

Whether you release all software as free by default on Github or only release results, in the academic world, reuse, reproducibility and transparency are of utmost importance. A few points to recommend here:

Every model can attach an authors text file, documenting names of original developers. This gives credibility to PhD students and Post-docs, who are commonly doing most of the development.
Every model should have a how-to-run file, to show how code is available and its dependencies such as toolkits needed to run it.
Documenting simulation code itself, describing what all functions do to allow easy understanding of model behaviour. This is similar to having architectural or algorithm description of the model to allow reuse and adaptation.
Documenting a list of assumptions in the model, to identify shortcomings of the models.
Results of the simulation are released as free to use to the wider community. This allows wider criticism of ideas encouraging academic collaboration and research improvement across the discipline.

Some licenses offer freedoms for developers and modelers. The license GPL prevents some code freedom, by limiting all files produced to also be GPL, but does give original authors credit. BSD implies freedom, but modifier becomes the benefactor. The source code community, itself, is actually extremely honest when it comes to acknowledging and sharing code, but it is still difficult to catch modifications if software is used offline. There is a strong case for models to have a license attached, as a tailored one or allow toolkits to relinquish dependencies from them. However, whether we need licenses or just plain guidelines to help increase model reusability and transparency can help improve domain knowledge, is possibly an easier solution.

Extra Reading

Copyleft license: Use, modify and distribute, but share the source code.
Berkeley Software Distribution license (BSD): Can combine the software with proprietary software and release it under a proprietary license, but retain the BSD license text and notices. This license may also include author’s name or advertising details.
Gnu Public License (GPL): Can use, modify and distribute the software for free or for a fee, but distribute the source code with it. If software is combined with other software, everything should be released as GPL, unless it is a LGPL
Lesser Gnu Public License (LGPL): Same as GPL, but allowed to release it under your terms under certain conditions.
MIT license: Can use, modify and distribute software allowing same rights to the users.
Apache License: Can use, modify and distribute copies of the software, add own copyright statement to the changes with additional terms for use of the modified version.
Mozilla Public License (MPL): Can use, modify, distribute and sell the software, with the source code. Can also sub-license the modified work, but do not restrict access to the source code.

Various license permissions include:

Download and run: permitted by GPL, LGPL v 2.1, Apache 2.0, Mozilla, GPL v3, GPL v2 and Creative Commons.
Redistribution: permitted by LGPL v 2.1, Apache 2.0, GPL v3, GPL v2 and Creative Commons (with parameters nc = non- commercial, nd = no derivatives of the work, sa = share with same license, which can be combined to create own)
Modify: permitted by LGPL v 2.1, Apache 2.0, GPL v3 and GPL v2.
Can be used with commercial software LGPL v2.1, Apache 2.0, GPL v3 and GPL v2.
Must include original authors, Apache 2.0, GPL v3 and GPL v2.

References

G. Polhill, B. Edmonds (2007), Open Access for Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 10, no. 3 10/

R. Stallman (2002). Why “free software” is better than “open source”. In Gay J (Ed.), Free Software, Free Society: Selected Essays of Richard M. Stallman. Boston, USA: GNU Press, pp. 55-60