Issues with regard to Call for transparency of COVID-19 models

jbarhak · July 7, 2020, 3:06am

This thread is aimed at discussing the call for model transparency published by a group of scientists on:

I contacted the corresponding author Michael Barton to address a few issues related to the paper and asked for a public discussion on some of the points made. We had some private email exchanges and the corresponding author suggested I open a discussion in this forum.

I will try to summarize my arguments as follows: The authors wish for an ideal situation where data is shared. However, at the same time the authors do not provide a tangible path towards making this happen.

Some issues that need addressing are:

How will authors suggest dealing with restrictions that may be imposed by different governing bodies? This may include patents, export control, some restrictions by some universities, potential other legal issues.
How does the group suggest dealing with misuse?
What incentive does the group give to those being transparent?

I myself am an independent developer and have been releasing open source code for years and have been calling publicly to publish code.

However, I am also aware of limitations and had experience that limit the level of sharing I can allow.

I wish to open the discussion here so we can dive into those points and explore them so that the issues I listed and potentially others can be explored.

–
Jacob Barhak Ph.D.

bruceedmonds · July 7, 2020, 7:12am

Jacob makes some good points, but not I feel, fatal ones for transparency of models

How will authors suggest dealing with restrictions that may be imposed by different governing bodies? This may include patents, export control, some restrictions by some universities, potential other legal issues.

When starting a project, it should be made clear the open license that the project will operate under. If public funds are being used, there is no contest - it should be openly transparent. If commercial then others should not trust their code - code that you can not inspect, run, check, change assumptions and re-run is unreliable code.

How does the group suggest dealing with misuse?

Not quite sure what kinds of misuse you are envisaging here. There are loads of COVID19 models out there, if you want to pretend you have a model that you fix the outcomes of, they are available. This leaves the case where someone takes some respected code and maliciously changes it to give different results. Code should be published with the main body being checkable with a checksum and the parameters dealt with in a different, simple bit. If the main body has been changed then only trust it as far as the current authors and being able to inspect, compare, run the code yourself. Use diff or similar to find the differences between original and current code.

What incentive does the group give to those being transparent?

Like all good practice, putting in the extra effort to make things transparent is a pain. The only way to ensure this is to make it the norm for any serious publishing, presentation or giving advice. There may be exceptions, the these would have to be heavily justified by the author. This is no different from any other aspect of good practice - documenting, keeping records etc.

Yes this all requires a change in culture and ways of doing things, but if you want to be scientific there is no choice. If Neil Ferguson had been forced to make his code transparent when publishing his Nature paper on flue epidemics years ago, there would have been several teams critiquing the code, suggesting improvements, finding limitations etc. and we would be in a much better place to use it in the current crisis - it would then be more mature science and not emergency modelling.

jbarhak · July 8, 2020, 3:00am

Thanks Bruce,

Your response is a start for the discussion. Allow me to expand those points you answered a little bit.
Lets start with the first point or restrictions that you answered by making things open if government funded and not trusted if not open.

You claim that a government funded project should be made open. How would you define open?

Also, if it is a government funded a project why not apply restrictions that this government supports as part of the license you are releasing under? For example, will export control measures be applied to the code funded by the government?

You see, some open source licenses become restrictive after a while by their own nature. Why would a government fund a project that can later become unusable by others for various reasons other than reasons the government wishes to impose? Why not release code directly to public domain so it can be reused equally by others while allowing government mandated restrictions such as patents to be applied on the code after funding is over?

Finally, what happens after the government funding for the project is over? Who maintains rights to the code? Who maintains the code? Are funding body restrictions apply anymore?

Those questions are relevant and hopefully further discussion will reveal more issues.

bruceedmonds · July 8, 2020, 8:25am

Ultimately if a government funds a project, then it should be up to the democratic process of that government to determine the conditions under which it supports code development. It may well say certain projects (e.g. security or defense related) should be closed or restricted. However, there are strong arguments for making other kinds of project as open as possible. Namely:

If it is public money, then the widest set of the public should benefit. In a real sense the public own the IP here - they have a right to use it and even make a profit from it if this is possible.
Knowledge becomes so much more useful if openly shared - m ore benefit accrues (in total) if it is open. Science is international so international sharing benefits all (overall).
You can not trust code that is not inspected, critiqued, played around with, replicated, re-run, etc. However careful you are, individuals doing code will make mistakes. Others are simply in a better position to see mistakes. This is true for even quite simple simulations published in high status journals (as we showed in Bruce Edmonds and David Hales: Replication, Replication and Replication).

How would you define open

Openness is not simple, it is many dimensional and each of these comes in different degrees. Ultimately (for me) it is that which allows what I describe in 1, 2, and 3 above. This can be formulated in different ways but Gary Polhill and I make an attempt as this in http://jasss.soc.surrey.ac.uk/10/3/10.html it took us a whole paper, so I am not going to summarize here.

Why would a government fund a project that can later become unusable by others for various reasons other than reasons the government wishes to impose?

Good standards of openness imply good documentation to make the possibility of it being unusable being remote (e.g. see what CoMSeS.net recommends, or again http://jasss.soc.surrey.ac.uk/10/3/10.html). If code is so important that it should be maintained for ever, then yes someone has to pay for it - by the people it is important for.

Why not release code directly to public domain so it can be reused equally by others while allowing government mandated restrictions such as patents to be applied on the code after funding is over?

Public domain release of code, with good documentation is what I would recommend. This could be with a non-commercial license (e.g. CC BY NC, so money could still be made on it afterwards). Thus others can check, re-run it, critique it at will retain the monopoly commercial use. This fulfills 2&3 above but not 1.

Finally, what happens after the government funding for the project is over?

If there is a need for longer-term code maintenance then this should be paid for by subscription by those who need this code maintained. The important thing here is to separate the ability to do code inspection, changing it, re-running it etc. from paying for maintenance. Paying for maintenance should be done in other ways than closing code (if it is critical) - e.g. via a subscription service.

All the above depends on the kind of project it is. A completely non-critical computer game can be totally closed. A science project totally open. The bottom line is this… if code is in any way critical – it affects people’s lives – then it should be open to critique and checking. People should have a democratic right that such code can be openly checked.

bruceedmonds · July 8, 2020, 8:36am

Ultimately the mistake is conceptual: to see code as a thing rather than as something like knowledge or a service. One would never give IP to a bit of news or some knowledge – these are so useful precisely because they are replicable and transmissible.

For purely historical reasons the way we often pay for code is via IP law which is based on the production of things. This is not the right way to pay for software. News we pay for via subscriptions (you pay to get it first), basic scientific knowledge is paid for via public grants for everyone’s benefit. Yes we need new ways to pay for code maintenance, but this should be via other mechanisms than closing of code sharing.

jbarhak · July 8, 2020, 7:22pm

Thanks Bruce,

This is a good discussion, yet your references are seeing only the point of view of the funded scientist. Where you see opportunity, I see obstacles. And do note that I do publish code under open source licenses. However, there are limits.

When there is funding available things go well, and software can be supported. When funding stops things become different. The ideal you describe where there are ways to maintain open source software is incorrect in many real cases. Many useful open source software project suffer from lack of funding.

One reason governments fund research is not necessarily to support international research. Governments fund research since it opens opportunities for businesses in the long run. In fact, here is a U.S. bill that support increased access to government funded research. There are more examples of initiative in these directions you can find in an NIH plan for data science that repeats the word “open” over a dozen times, and in a previous US administration policy. However, when you read the U.S. bill you will find out that there are points that also protect national security and patent rights. So the legislator understands that there are limits to openness.

This is what I do not see in the original call for transparency. This is why I started this discussion. The idealism in the call for transparency needs further exploration to get in touch with reality.

You quoted an article on open source you co-authored. This article is indeed a nice review, yet it takes a one sided approach that denies incentives of innovation unless one has funding and works in a specific research environment. For example, the article seems to promote copy-left licenses as protection against future innovation. Actually, copy-left licenses limit the use of code in the long run. In fact, there is actual decline in use of GPL as seen here with good reasoning - it becomes restrictive and unless constantly funded, one will probably choose to change the license to a more permissive license. By the way, the argument you co-authored article in section 2.7 that “copyleft protects the author from criticisms of closed modifications to their work” are better accomplished by other licenses such as BSD 3 Clause. Yet even better, when research is government funded, why not use CC0 which does not pose any limits other than those imposed by law?

You argument that public domain “fulfills 2&3 above but not 1.” needs explanation. Public domain code fulfills all your points and is superior to other open source licenses that become restrictive after a while. In contrast to other licenses, it allows innovation by others after code has been released and even abandoned.

Also, the use of the term “standard” is not correct with regard to open source. There are so many open source licenses a person can choose from or make up, that there is nothing standard about open source. Also, there is no Standards Development Organization (SDO) that handles open source as far as I know - please correct me if I am wrong, yet OSI and FSF are not SDOs - they are nice non profit organizations promoting open source.

However, to move the discussion forward, allow me to suggest a resolution. Would it be prudent that government funding bodies will nominate a committee that at the end of a funding cycle will decide:

If software has restrictions on becoming open source, in which case decide on restrictions and license.
If no restrictions found release it to public domain in a government web site and this will count as a publication to authors - this will mean that that government wants to give this software as a gift to the world.

However, such solutions may require legislation since a group of scientists publishing a paper probably cannot create incentives or protect against misuse.

However, since my idea may be an unattainable ideal, I will be happy to learn of other solutions the authors of the paper can offer.

Jacob

jbarhak · August 4, 2020, 10:19pm

Since there is no response by any of the article authors and there is no agreement or disagreement with the proposal I made regarding government funded research transparency, Should I conclude that there is:

Silent agreement by the modeling community?
Silent disagreement by the modeling community?
Non interest in the topic by the modeling community and the authors of the article?

For proper disclosure, I declare interest in the topic since I privately own modeling technology that can accumulate knowledge from multiple models and data sources.

I recently published a COVID-19 model:

Barhak J , The Reference Model Initial Use Case for COVID-19. Cureus. DOI:10.7759/cureus.9455 , Online . Interactive Results are available in this link

The technology used in this model allows absorbing other models, so proper publication of models is of personal interest to me. However, legal and licensing issues have to be properly dealt with to allow easier accumulation of knowledge in the future. The published call for transparency being discussed above does not provide proper solutions that will give incentives for collaboration towards improving our capabilities. I strongly encourage further exploration and discussion of the topic and ask the authors to pitch in the conversation and attempt to address the difficult issues raised.

Hopefully this call will lead to better long term solutions.