ISCB BoF on open source and open data

ISCB logoMore ISMB recap.

There was also a bit of a stir at the Open Data and Software BoF which centers around the ISCB’s statement about guidelines for open source software (you should provide feedback if you feel strongly about this and are an ISCB member: policy<-at->iscb.org).

The discussion was prompted in part because Mike Eisen and Sean Eddy both turned down their complementary ISCB memberships stemming from recent publications in PLoS CompBio (oh the benefits of publishing in PLoS CompBio) because of their disagreement with the policy. The Board took notice enough to organize a BoF at the conference. Unfortunately it was during lunch so you had to either choose between food and session or wolf down food very fast and run over to the room.

Some usual suspects were there that had a variety of opinions on open source – I don’t have the complete list written down though. There was an open-mic session after a panel of ISCB directors presented their opinions. It seemed like most of the audience members were in support of revising the statement to be more supportive of open-source although not everyone wanted to make it a requirement that source code be available commiserate with publication. To me there are a lot of messy ends here rather than having a discussion about the principals it ended up being about individuals personal stories that supported or discouraged a requirement of open source. Some people see it as too much of a burden to release their software (it is written either poorly or too hard-coded for their internal compilation system).

Others argued in not so many words that sell their software and don’t want to be told how they can operate – the argument is couched under statements like “papers should be well-enough written to describe an algorithm so that it can be reproduced”. Great so now in order to move forward in a field we have to re-implement everything that came before?
There were also some arguments against open-source made like: “we made our source available for our project and we have 50 users and never have received a bug fix”. So the lack of contribution to one project by users justifies not encouraging it for all other projects? A poorly managed open-source project will not succeed any more than a poorly managed in-house closed source project. It takes work to communicate with users and solicit input and will also depend on the computational saavyness of the users in the first place.

Some people lobbied for open source software because they can’t afford to buy commercial software and they need things like R, bioconductor, BLAST, etc. In the same way that the government pays twice for research: grants pay for research which is published in journals and then grant overhead pays for journal subscriptions for others to read about the work. Similarly I fail to see why granting agencies would want to fund software developers in many parallel labs to reimplement the same thing or to pay a subscription for software that was developed under government grants.
The cost argument shouldn’t be overlooked, but I don’t think it is as convincing as the ability to build on other people’s work. For me it is because we want to build larger systems from the basic components and only when we have access to the individual pieces that we can tweak and manipulate can we construct larger systems from them.

Anyways, by the end I felt like I was at a purely ideological debate rather than a potentially interesting discussion. There was mostly rhetoric and essentially no one was really budging on hearing other points of view. I appreciated hearing the concerns of non-open-source proponents to at least see what the concerns were about actually releasing code, but I can’t really support their point of view since it seems to be really in fear of not being able to sell their expensive software or be bothered with the hassle of people using their software in ways they didn’t expect.

One thing that was clear that needs to be resolved outside this debate is that journals that support open-access don’t enforce software which supports a paper be made available and open-source. This lack of policy from journals was an argument against endorsing open-source from ISCB. Similarly there were different interpretations for the National Academies report on open-data sharing (I am failing to find the link right now). There are different reasons that people might not want to make the source code available (but is often available on request). Why don’t they want to release it? I think Carole made the point in her keynote somewhere; it must to come down to embarrassment of poorly written code, code which doesn’t actually do what they say it does, or some sort of expectation to be able to sell the software. At any rate it seems like there is big need for journals, editors, and reviewers to make some decisions about whether or not it is critical for the source code to be available as part of a publication that uses the software.

Sadly we never got to the discussion of open-data sharing which I think is just as pertinent. perhaps there will be more deliberate effort by people to give feedback to ISCB board members about the policy (email the committee: policy<-at->iscb.org)

6 thoughts on “ISCB BoF on open source and open data”

  1. Another potential reason not to open source the code is simply competition in academia. A closed code or database can serve as a differentiation factor. It takes time to build some resources and this unfortunate credit system does not value how much a code or database is used but how many papers get produced in X impact journals. Even if people agree by principle that opening up research is morally correct the current reward system will tend to push people to secrecy.

  2. A probably not very popular remark:

    In the old days of bioinformatics, it was mainly the algorithm that was published in a scholarly journal. It was expected to be clever and to solve an important problem, but nobody required that there also is a program that implements this algorithm. Not to speak of any requirement for source code. Sometimes, there was a program to show that the algorithm actually works, but these were not elaborate software packages but just proofs-of-concept. In a way I think this is how it should be. For me, devising an algorithm is science, coding is not. (Pleeaase, developers, don’t kill me, not yet) .

    No mistaking, writing good (usable) programs is extremely valuable, what would we do without BLAST, HMMER, TMEV etc.? Bioinformatics software developemt (even if I don’t consider it science) is often done by scientists, and I understand that they need the credit for this. In science, credit typically means publication. I guess this is why Bioinformatics and other journals have a section like ‘application notes’ that does not deal with algorithms but rather with software.

    So, what does this have to do with open software? In my humble opinion, it is still perfectly o.k. to publish an algorithm without an associated application (open source or otherwise). For these types of publications, I consider it silly to demand the submission of any source code. For application notes, matters might be different. Anyway, I feel that a scientist must at one point make the decision whether to become rich OR famous. In the former case, it is o.k. to sell software but don’t expect to get free advertising space in a science journal. In the latter case, one should publish but abandon the hope of making money with the program.

    Maybe some disclaimers: I used to work in academia, where I wrote some programs myself (all in the public domain, but mostly useless). Nowadays, I work for a biotech company, but have to run on a very tight budget. Thus, I rely on free (as in beer) software. I normally don’t care much about source availability, but I can see that this is a big issue for others. What I don’t like is the (nowadays very common) option to make software freely available to people in academia but have industry people pay ridiculous amounts of money. Again, I can understand the motivation for this move, but the assumption that biotechs are swimming in money is not always justified. Also, I feel that I get hit particularly hard, as I mostly tend to use the software for doing basic science that gets published, not really for making money out of other people’s work.

  3. In addition to cost and redundancy of effort, it is worth considering reproducibility. Any result that depends on private software or software of limited availability (due to price, unportable binaries, etc.) will be less subject to independent verification by other labs. In cases where the specification of the algorithm is incomplete, or where the results depend on a non-trivial parameterization, even verification by independent implementation of the algorithm may not be possible.

  4. Hi Jason,

    My feeling is that “open source” and “open access” tend to cloud an even more important issue of immediate relevance to science. There are many facets of the “open” debates, not all of which are relevant to science, and some of which tend to immediately polarize a debate. But a really important issue to us is what happens with data, software, and materials upon publication.

    The scientific publication system was founded in 1665 explicitly as a reward or a quid pro quo mechanism, comparable in some ways to the patent system: the idea is that publication is an incentive to get scientists to disclose their findings to everyone else. The community allots priority and prestige to the author, and the author gives something of value to the community. This was a great improvement over Newton encrypting his discoveries on secret rings stored with his London lawyer.

    One might argue whether Sir Henry Oldenburg’s personal motivation in 1665 for starting the Philosophical Transactions of the Royal Society rises to the level of a community ethical standard in 2007, but I think most people would agree that it is indeed the “community standard” by which science operates. A 2003 report from the National Research Council articulated this view at length, in the wake of the Science publications of the Celera human genome and the Syngenta rice genome — neither of which was deposited at that time in Genbank despite Science’s own policies. Executive summary of the NAS report is at http://selab.janelia.org/publications/NAS03/NAS03-execsum.pdf.

    The key argument in the report is that upon publication, authors are obligated to deliver enough information about the central result in their paper that other scientists can reproduce the result and build on it.

    For publications where the central result is too large to fit in a journal’s pages, like a genome sequence or a software package, that data, software, or material must be made readily available to everyone in the community — whether they work in academia or industry. (“Academic only” distribution is viewed as inconsistent with the principles of the scientific publication system.)

    If it’s not readily available, reviewers and editors can and should take that into account — the usefulness of a result depends on the degree to which it’s being made available to the community.

    Suicyte’s point is a good one, and the NAS report covered it in some detail. If the central result is an algorithm, then the algorithm is all that is being given to the community – no need for an implementation or source code. But if the central result is “Foo: a program to align everything”, then the program needs to be available at least as an executable (people can build on it by building pipelines around it) and ideally as source (even easier to build on it then).

    The problem with ISCB’s policy is that it says *nothing* about a computational biologist’s ethical obligations to make data and software available upon publication — and what it does say is an attack on open source that is widely interpreted as being a defense of those few people who withhold availability of published results.

    In my view, ISCB ought to show better leadership, and adopt a policy consistent with the ethical principles of scientific publication that are being articulated by NIH, HHMI, the National Academies, and other funding bodies and scientific organizations. Whether ISCB also chooses to wade into the wider “open source” debate after that is not my concern, really, but I don’t think it’s advisable.

    Sean

Leave a Reply