Talk:GISAID/Archive 1

All or Avian Influenza?
An anonymous editor changed the "Avian" in the GISAID name to "All". While this makes sense, I can't find any evidence that the name was actually changed, so have reverted. Pol098 (talk) 12:59, 19 June 2009 (UTC)
 * If you look in the Nature article from Aug 2008 (https://www.nature.com/articles/442981a), the "A" originally stood for "Avian". However, by 2010, the "A" seems to be considered to stand for "All", according to  "Influenza pathogen database of global significance set up in Bonn". BMEL Homepage.  — Preceding unsigned comment added by 2620:0:691:4:0:0:0:58 (talk) 22:55, 9 July 2020 (UTC)
 * GISAID's homepage http://gisaid.org/ clearly states under the GISAID Foundation tab that it is called Global Initiative on Sharing All Influenza Data. Perhaps this was updated after Pol098's entry on 19 June 2009  —Preceding unsigned comment added by 114.251.14.2 (talk) 02:56, 24 March 2010 (UTC)

Logo update?
The GISAID logo seems to be updated with a gradient fill. I don't have experience loading new images, and I hope that's not a copyright issue, but I'll try to figure out how to do it on Commons. - AppleBsTime (talk) 03:36, 5 June 2020 (UTC)

Addition of mpox to list of virus outbreaks GISAID has supported
''This topic is part of a series. For the series summary see [3] above.''

Change the introduction from this:

Since its establishment as an alternative to sharing avian influenza data via conventional public-domain archives, GISAID has been recognized for incentivizing rapid exchange of outbreak data during the H1N1 pandemic in 2009, the H7N9 epidemic in 2013, and the COVID-19 pandemic  in early 2020.

To this:

Since its establishment as an alternative to sharing avian influenza data via conventional public-domain archives, GISAID has facilitated rapid exchange of outbreak data during the H1N1 pandemic in 2009, the H7N9 epidemic in 2013, the COVID-19 pandemic  and the 2022–2023 mpox outbreak.

most notably adding the mpox outbreak.

AncientWalrus (talk) 12:11, 25 March 2023 (UTC)

Number of unique sequences in the SARS-CoV-2 database?
Thanks for your contributions to this article. Since last December it has averaged over 200 views per day. That seems to me to be a handsome return for your work.

Is there a reasonably easy way for a mere mortal to obtain the number of unique sequences in the SARS-CoV-2 database?

Their website prominently reports the number of sequences submitted but not the number of unique sequences. This article currently (as of 2021-09-07) says, "by mid-April 2021, GISAID's SARS-CoV-2 database reached over 1,200,000 submissions". The Wikipedia article on Phylogenetic Assignment of Named Global Outbreak Lineages says, "The PANGOLIN web application has assigned more that 512,000 unique SARS-CoV-2 sequences as of January 2021."

Does that say that every other submission was unique?

Also, is it reasonable to say that the number of variants is roughly proportional to the number new cases? If yes, I estimate that the world has seen the probability of a new patient generating a new variant may be between 0.2 and 3 percent. I get the upper limit by considering that GISAID is not receiving submissions from a substantial portion of the world. Thanks, DavidMCEddy (talk) 16:05, 7 September 2021 (UTC)


 * I'm just noticing your note now; I appreciate your gratitude, David. This year has certainly seen the GISAID organization being mentioned by the media more than ever, so that would explain the many page views.


 * As for your question about "unique" sequences, I think it may be somewhat dangerous to think of any virus' genetic sequence as being truly unique. There is one strain of SARS-CoV-2, many variants, and probably over 100,000 frameshifts in the various sequences that have been analyzed and uploaded. When you sign your name, your signature is always unique versus every other time you have ever signed your name... but one might also say that "your" signature is unique in the world versus all other people's own signatures. I believe biologically speaking, the chance that you will find what I think you are seeking (a truly "uniquely" identical match of genetic code in SARS-CoV-2 samples taken from different people) would only occur within members of the same family household sharing similar DNA (e.g., a brother and a sister). One more analogy... You could have the same song digitally recorded onto digital tape, compact disc, and MP3 file. To the human ear, when played back these would be entirely considered as perfect duplicates. But we know that they are in fact not "perfect" duplications, as each audio platform has certain limitations at various frequencies, and indeed, you can't even play back a digital tape on a CD player! - AppleBsTime (talk) 05:06, 1 December 2021 (UTC)


 * Thanks for the reply. Since my post above 2021-09-07, I included the above computation in a manuscript submitted to the Real-World Economics Review.  This manuscript includes the following:


 * "Harvey et al. (2021) claimed that between December 2019 and October 2020, the virus was 'acquiring approximately two mutations per month in the global population.' That's grossly misleading, because the mutation rate is not a function of time: It's proportional to the number of patients infected and spreading a disease. If the number of new cases per unit time is cut in half or by a factor of 10 or a million, the number of new variants per unit time will also be cut by approximately the same factor.  We estimate that there is at least one new viable variant for each 600 cases, and it could be one for every 60 cases or less.[13]"


 * Note 13 reads as follows:


 * "The Wikipedia article on 'Phylogenetic Assignment of Named Global Outbreak Lineages' claimed that more than 512,000 unique SARS-COV-2 sequences had been reported to open, international databases as of January 2021, all or nearly all of which could transmit the disease to another human. Unfortunately, no reference was given for that number. The actual number of unique sequences is almost certainly higher, because  many parts of the world are not submitting sequences to the international database.  The figure of almost 290 million cases by January 2021 came from the Wikipedia article on 'COVID-19 pandemic cases', accessed 12 September 2021, which were extracted from the World Health Organization, 'Coronavirus disease (COVID-19) Weekly Epidemiological Update and Weekly Operational Update' (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports)."


 * What do you think? Would you recommend revising this?  If so, how?
 * Also, the 512,000 number in the Wikipedia article on "Phylogenetic Assignment of Named Global Outbreak Lineages is accompanied by a "citation needed" flag. Do you know a citation for a number like that -- and maybe a more current number that could be used?  Or should the discussion of that 512,000 number be revised?  If yes, might you be able to revise it appropriately or find someone else competent to do so or help you do so?  Thanks, DavidMCEddy (talk) 07:04, 1 December 2021 (UTC)

A concern
Hello, I have been taking some of my time on Wikipedia to improve this article, quite relevant during the COVID-19 pandemic, with newer sources and a more readable intro. I am noticing multiple edits made (and re-inserted) by IP addresses appearing to have the single purpose of editing Wikipedia exclusively about GISAID, yet no other subjects. The basis of these edits repeatedly seeks to convey the perception that GISAID's terms of access are "restrictive". Ironically, GISAID terms of use are not at all dissimilar to those of Wikipedia itself. Participating scientists are free to contribute or read from the database, just as long as they agree to appropriately acknowledge the contributors of the information they use. Contributors of data can freely choose whether they don’t care about any of their rights and deposit in public-domain archives, or whether they share in a transparent manner preserving some of their rights, and thus share with the public via GISAID. It's obviously a model that works -- Wikipedia has millions of articles under the Creative Commons Attribution license, and GISAID has over a million flu and about 50k genomic sequences of the virus causing COVID-19, contributed by thousands of laboratories under its usage license. Calling this a "restriction" is far less accurate than calling it "terms of use" or "regulating" how data are shared. I would like other contributors to consider this and respond, as I fear that this IP editor (or editors) is pushing an agenda and may be unlikely to form consensus. - AppleBsTime (talk) 04:53, 20 June 2020 (UTC)
 * No response to this note, nor anything heard from the IP editors (who were notified). Given that, I am going to revert the single-purpose IP edits at this time.  Happy to engage in more discussion (anything is better than zero), if that's seen as problematic. - AppleBsTime (talk) 15:07, 27 June 2020 (UTC)
 * Response to concern: First, GISAID's terms of use do not only require that scientists cite the data that they use. The terms also require that scientists "agree to make best efforts to collaborate with representatives of the Originating Laboratory responsible for obtaining the specimen(s) and involve them in such analyses and further research using such Data." This is now hidden further in the submission process, but you still have to sign it to get access.  Second, these terms are quite dissilimiar to those of wikipedia: wikipedia does not restrict people from reading it, but GISAID does.  Third, I think that GISAID is most naturally compared to other DNA sequence databases, not wikipedia.  Since those databases do not impose terms forbidding users from sharing data or reading data, I think that saying that GISAID "restricts" the use of data helpfully clarifies how GISAID differs from other similar databases.  However, I do not think this is a hill worth dying on.  If you want to say "govern" I don't care that much.  Fourth, the original article sought to suppress debate and discussion about whether restrictive access agreements promote data sharing by simply saying that GISAID promotes data sharing.  In theory, this could be true.  Perhaps scientists are more willing to share their data when they know they will have more control over it after they share it.  But the fact that GISAID has taken this path should not be hidden, nor should disagreement be suppressed about whether this approach leads to science that is more open, or more closed. On a slightly different note... GISAID's divergence from its initial goal seems somewhat puzzling.  It seems like the initial goal was to allow scientists to share data before first publication, and allow public domain usage after first publication.  This is the general model of scientific data sharing, and it makes sense that scientists would be hesitant to share avian flu data before they had gotten any credit.  However, at some point, this all changed to GISAIDs current model of public-domain-never, and it seems very unclear who made this decision, when, and why.  It is also unclear (though I see no conspiracy here) when the "A" in GISAID changed from "Avian" to "All".  Wouldn't you like to know?  — Preceding unsigned comment added by 2620:0:691:4:0:0:0:58 (talk) 22:45, 9 July 2020 (UTC)

An inverse concern
Much of this page appears (i) overly positive, to the extent that it serves as an advertisement for GISAID and (ii) uses sources that simply quote GISAID's positive description of itself. For examples, GISAID's web page says that it overcomes "disincentive hurdles and restrictions". The claim about disincentive hurdles is interesting, though vague. However, no example of "restrictions" is given. And yet, the current page repeats the claim that data sharing was "restricted". Additionally, the History section contains a list of "endorsements", which sounds like an advertisement, not a balanced description. Here are a list of other concerns: Finally, user AppleBsTime has removed interesting facts, seemingly because they reflect negatively on GISAID. For example, the original signed letter calls for shared sequences to be deposited in public databases eventually, which would then allow scientists to share pre-publication data w/o being scooped while still not restriction post-publication data. But AppleBsTime removed this comment, even though it had a citation. — Preceding unsigned comment added by 2620:0:691:4:0:0:0:1B (talk) 23:28, 15 August 2020 (UTC)
 * what are "submitters rights"? It is neither clear what specific "rights" are being claimed, nor what makes these things a "right".
 * what does it mean that WHO member states were concerned about sharing data? As far as I know, countries and states do not share data: individual scientists do.  If I am wrong, that would be interesting.  However, if I am right, this makes no sense.
 * what, exactly, does GISAID do to prevent sharing researchers being scooped pre-publication? And how does this different from post-publication?
 * why exactly is "verification of users" supposed to be a positive thing that public-domain database do not offer?

A response from an experienced editor

 * I want to thank the Duke University IP address(es) for this opportunity to re-examine the Wikipedia article about GISAID from his/her perspective. It is reassuring and a healthy process to mutually share a common goal to make this article as informative and accurate as it can be, especially within the policies and guidelines of Wikipedia. The IP editor may not be familiar with all of Wikipedia’s practices, such as registering an account to build trust and gain access to more functionality, or such as signing comments with four tildes. I have been an editor for a number of years, having made over 500 edits to hundreds of different articles, and even created a handful of new articles. This is a nice opportunity to share with you some of what I’ve learned on Wikipedia, since you seem to have experience only with this one article about GISAID.


 * The first concern of the Duke University IP editor is that the page "appears overly positive". Given the significant amount of coverage of the role of GISAID from reliable sources over a considerable time period, versus the rather limited edits to this article during that same time period, no evidence is given to support your concern that "it serves as an advertisement for GISAID and uses sources that simply quote GISAID's positive description of itself."


 * Frankly, I suspect that we’re seeing an outcome of Wikipedia’s reliance on independent sources to reliably document how a subject topic should be characterized. This Wikipedia article has been built over the past 13 years and at that, very sporadically. If one looks up up "problems with GISAID" or "trouble with GISAID" in a search engine, one will not find anything. Try "criticism of GISAID" on any search engine, you will find about 2-3 results, which appear to be blog entries in the vein of rants, or Reddit posts, rather than the journalistic or peer-reviewed concerns for which Wikipedians search. Sources like websites operated by the originator of a complaint about a subject or a Reddit conversation about a subject are generally not allowed as reference sources in Wikipedia -- unless they become newsworthy themselves (e.g., if Dr. Ghebreyesus or Dr. Fauci were to start participating in the Reddit conversation, and this got picked up by the BBC or Associated Press).


 * A second concern presented by the Duke University IP editor is that some of the sources in the article are simply citations of GISAID's own material. While editors must ensure articles follow Wikipedia's content guidelines, see here on self-published sources used as sources of information which technically allows a limited amount of self-sourcing (but never in an unduly self-serving way). The suggestion that these edits are driven by GISAID itself seems far-fetched, as it is not substantiated. Nonetheless, it is clear that improvements to this article (in particular in sections that have not been vetted) can and should be made, and that finding independent, third-party sources to replace some GISAID.org sources would improve the article's quality, so it will not be merely categorized as  a 'Start-Class Genetics' / 'Low-importance Genetics' / 'C-Class COVID-19' / or 'Low-importance COVID-19' article.


 * Currently, I count 3 references to GISAID materials out of 35 total references in this article. That doesn't seem undue or self-serving, compared to other Wikipedia articles about organizations.


 * With regard to the point-by-point "other concerns" itemized by the Duke University IP editor; allow me to address these as follows.
 * What are "submitters' rights"?
 * GISAID's Terms of Use, aka the Database Access Agreement, states in section 2a: "This Agreement does not transfer any other rights or ownership interests in the Data" and further in section 2c, the rights of the "Originating Laboratory where the clinical specimen or virus isolate was first obtained and the Submitting Laboratory where sequence data have been generated and submitted" are acknowledged.


 * In addition to a significant number of published reliable sources, please take note of the written Statement to the World Health Organization, given by the Federal Republic of Germany in 2015, which makes it clear GISAID employs "a unique sharing mechanism which ensures that inherent rights (e.g. IPR) of contributors of GSD are not forfeit."


 * What does it mean that WHO member states were concerned about sharing data? As far as I know, countries and states do not share data: individual scientists do. If I am wrong, that would be interesting. However, if I am right, this makes no sense.
 * Matter of fact, all countries and states decide how data are shared when it comes to pathogens, which is evident by governments regulating the safety levels of handling pathogens in the first place (see BSL biosafety levels, for example). The headlines we see read "Indonesia hands over bird flu data to new database", rather than "An individual scientist in Indonesia hands over bird flu data".  It's also why we see wording in this article like, "China, Russia and other nations that have withheld virus samples...", rather than "Individual scientists in China, Russia and other nations…", or "… sequences for the novel coronavirus (2019-nCoV) … submitted by Chinese authorities to the GISAID platform", rather than "submitted by a[n] [individual] Chinese scientist".


 * What, exactly, does GISAID do to prevent sharing researchers being scooped pre-publication?
 * The article states "GISAID sought to address medical researchers' reticence about sharing." Reading GISAID's Terms of Use makes it crystal clear "Your rights and privileges under this Agreement will terminate automatically and without need for written notice upon any breach by You of any term of this Agreement." Given that a username/password procedure is in place, GISAID can very well enforce/sanction violators who scoop, irrespective of a paper having been peer-reviewed or not. The sheer number of emerging coronavirus genetic data and metadata in GISAID, but also influenza data, when compared to public-domain archives are evidence that GISAID has somehow addressed researchers' reticence about sharing. Though I should also say, it's not Wikipedia's responsibility to document a process, when the claim is merely that the organization sought to address a problem.


 * Why exactly is "verification of users" supposed to be a positive thing that public-domain database do not offer?
 * I'm doubtful that this is Wikipedia's responsibility to prove. It's our job as editors to find if reliable sources say that it is the case that GISAID provides "verification of users" which, for example, public-domain archives that permit anonymous access do not. The Catherine Saez article in Intellectual Property Watch (a publication utilized in dozens of Wikipedia articles) says that verification of users is something that GISAID provides that other public-domain databases do not. We don't speculate on why that's a positive thing, because that would be original research, which is forbidden by firm policy on Wikipedia.


 * Per the complaint that I removed "interesting facts, seemingly because they reflect negatively on GISAID", sorry to say, I removed some content because it pertained to a correspondence letter that conceived of an idea prior to the formation of the actual organization that the Wikipedia article is about.


 * I considered it misleading to suggest to readers that the correspondence letter in Nature called for sequences to be "deposited in the three publicly available databases participating in the International Sequence Database Collaboration" while omitting the preceding text, i.e., the proposal "to expand and complement existing efforts with the creation of a global consortium".


 * A consortium is an association of two or more individuals, companies, organizations, or governments and will by default not be open to the public. Even a Nature editorial understood at the time that an "Agreement on the principles of GISAID is only a beginning, however. Prompt progress in establishing the ground rules for sharing will be essential to build confidence and momentum." The peer-reviewed Elbe et al (2017) also addressed this correspondence letter: "However, … notwithstanding its good intentions, the brief letter still lacked much practical detail, and that the core issues of transparency and equity of data sharing would likely remain unresolved if data archives with anonymous access to data (like Genbank) were used."


 * Eighteen months after the correspondence letter appeared in Nature, GISAID did provide ground rules for sharing, by providing immediate access to the public and not merely to a consortium.


 * It would be like saying that Thomas Edison should be extensively criticized in Wikipedia for not sticking to his initial idea that platinum should be the filament in an incandescent light bulb, when he later found that carbonized bamboo was a much more practical, inexpensive, and longer-lasting solution. So, please, I’d ask that you not assail my removal of some content in the interest of making an article less confusing.  It wasn’t about something "reflecting negatively" on the subject.


 * The lede historically had been too promotional, but after it was cut back, it was rather confusing and didn't even mention GISAID's contemporary work on the coronavirus pandemic. That's when I stepped in to edit the article. I'm not trying to paint a rosy picture, but the independent sources say things that simply recognize the success of GISAID.


 * I'll close with an interesting article from Duke University's publication Duke Today, where a professor of immunology, pathology, pediatrics, molecular genetics and microbiology is asked what she trusts for information about COVID-19… and without any complaint about restrictions, user verification, or submitters' rights, she says "For the latest on viral sequence dynamics, I check gisaid.org." - AppleBsTime (talk) 15:17, 5 September 2020 (UTC)

A balanced view
There seems to be something of a disagreement in the 2 sections above. I just heard an excellent radio show/podcast on NPR, that gives a lot of info on this topic, but I wanted to check it out here. My general impression is that it considers the same disagreement as above with views from both sides.

Smallbones( smalltalk ) 22:21, 30 May 2021 (UTC)
 * Journalist Meredith Wadman has kind of flipped back and forth on the issue, herself. - 97.64.141.154 (talk) 13:24, 18 June 2021 (UTC)

Additional discussion
From the Duke researcher mentioned above: these articles from Nature and Science also talk about this disagreement: I also note that http://www.nextstrain.org has begun offering two coronavirus tree reconstructions, one labeled "Latest Global Analysis - GISAID data", and one labeled "Latest Global Analysis - open data".
 * https://www.nature.com/articles/d41586-021-00305-7
 * https://www.sciencemag.org/news/2021/03/critics-decry-access-transparency-issues-key-trove-coronavirus-sequences
 * https://www.nature.com/articles/d41586-021-01194-6

I am still confused about whether GISAID is trying to restrict PRE-publication data sharing or POST-publication data sharing. It sounds to me like it restricts both equally. 2603:6080:6502:E900:B7F:3D05:7E:3E7C (talk) 04:46, 18 August 2021 (UTC)