Talk:List of sequenced eukaryotic genomes

Untitled
Suggestion: Could this list be orderered by date of completion?

Draft Genomes
The pages now says that "draft genomes are not included", but in the genome sequencing world, "draft" usually refers to anything which is the product of a whole genome shotgun project without additional "finishing". By this definition, the sea urchin, fugu, and probably most of the other genomes are "draft" sequences.

I think this conflict is best resolved by including these draft genomes in the list. —Preceding unsigned comment added by 128.42.164.163 (talk) 04:36, 1 March 2009 (UTC)

This article is very weak. Among the plant genomes, the fully sequenced and properly published papaya genome (The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) Ming et al., Nature 452, 991-996 (24 April 2008) | doi:10.1038/nature06856 ) is not included. Two genomes, maize and oilseed rape are included, but these, along with half a dozen other 'complete genomes' like oil palm, soybean, medicago, date palm, are 'published by press release' only, or extremely incomplete. Since these data are not verified, verifiable or available, I do not think they should be included in the listing here. Pat Heslop-Harrison pathh —Preceding unsigned comment added by Pathh (talk • contribs) 17:12, 26 January 2010 (UTC)

Relevance
I notice nobody has mentioned relevance in any of the mammals. 129.174.54.114 (talk) 15:21, 22 March 2012 (UTC)

Dating
This list claims to be up-to-date, but since it is manually maintained it will temporarily become out of date, until edited, every time a new genome is sequenced. It badly needs a "Last updated xxxx" or "as of xxxx" date specification. --mglg(talk) 21:35, 11 September 2006 (UTC)

Common names
Since this is a public encyclopedia, common names of each species (or broader group description, such as "fruit fly") should be added. --mglg(talk) 21:35, 11 September 2006 (UTC)
 * Redirects let people know the common name easily. I was thinking of having a field to describe the type of organism - which I will propably add soon.--Peta 22:16, 11 September 2006 (UTC)
 * Thanks - that is exactly what I am looking for - mglg(talk)
 * Thanks Peta for adding the "Relevance" column. I took the liberty to separate the "Type of organism" (say, "Mosquito") from the "Relevance" ("Vector of malaria"). Feel free to check my entries. --mglg(talk) 22:33, 21 September 2006 (UTC)
 * Looks great, thanks. --Peta 00:02, 25 September 2006 (UTC)

Order
Is alphabetical order by genus name really the best approach to organisation? I would suggest that taxonomic order or order by date woud make more sense. This list is going to get really long soon and might be easier to resort now. --Aranae 00:42, 10 November 2006 (UTC)


 * Oh, yes, it should be sorted by plant, animal, protist, fungi, bacteria, etc., etc., as even now the list of organisms is overlong for an unsorted or alphabetically sorted list to be usable. KP Botany 00:45, 10 November 2006 (UTC)


 * Yeah, dividing the list in plants, animals, fungi and protists should make it more readable. Tycho 03:08, 14 November 2006 (UTC)


 * I will try doing it --Kupirijo 16:48, 27 December 2006 (UTC)
 * Thanks, and good luck. I am looking into the other matter.  KP Botany 18:56, 27 December 2006 (UTC)

Mitochondrial Genomes
I think organisms which have only their mitochondrial genome sequenced should not be listed since it is misleading to list such a small genome for the whole organism. --Kupirijo 19:51, 29 December 2006 (UTC)
 * There's nothing misleading about it, simply note that it's the mitochondrial genome. It's correct terminology to refer to the mitochondrial genome, and it's by definition, a eukaryotic genome, so it belongs.  Again, just make sure it's clear that that is what has been sequenced.  KP Botany 21:15, 29 December 2006 (UTC)
 * I did so for the Cryptomonad. It was its nucleomorph genome. Let me know if it is OK with you. Cheers. --Kupirijo 23:14, 29 December 2006 (UTC)
 * Oh, I didn't even notice, I simply searched for mitochondrial and chloroplast. Yes, of course it's okay, as it accomplishes what it intends, the indication that it is not nuclear DNA that has been sequenced.  Particularly important with funky DNA, I would think.  Thanks for taking the time to do this, sort by kingdom/division/whatever, and for working with other editors to make the article usable and accurate for readers.   KP Botany 23:41, 29 December 2006 (UTC)

I strongly discourage the addition of organelle genomes to this list as it will make the size of the article unmanageable. The lead text clearly describes this as a list of fully sequenced eukaryote genomes, which in my mind clearly excludes organisms with organelle only information.--Peta 00:10, 31 December 2006 (UTC)


 * That's fine with me that is why I started this section. KP Botany what is your opinion? Happy new year to everybody --Kupirijo 08:54, 31 December 2006 (UTC)


 * That's fine, but they are rather important in plant genomics and phylogenetic systematics, some studies and classifications based on research on what is available in the line of chloroplast genomes, rather than deciding which plants to use in analyses based on characteristics. So, where should they go?  Also, I point out that if "fully sequenced eukaryote genomes" then means nuclear genome only, this isn't "fully sequenced eukaryotic genomes," but, rather "fully sequenced eukaryotic nuclear genomes." KP Botany 19:49, 31 December 2006 (UTC)


 * I think the title of the page is ok. I think "eukaryote" necessarily refers to an organism as a whole; I wouldn't call a chloroplast itself a eukaryote, I'd call it part of a eukaryotic cell.  (You can type "define:eukaryote" in Google to see what others think.)  So I think when users scanning this list see an organism listed on this page, they would assume the organism's full genome has been sequenced, and we shouldn't rely on a note to indicate otherwise.  As for where to put them, maybe we need a new page for sequenced organelle genomes? –Adrian J. Hunter 13:43, 1 January 2007 (UTC)
 * But what about a nucleopmorph? Isn't that misleading? There is a chromist listed that has only its nucleomorph genome sequenced. Kupirijo 18:47, 1 January 2007 (UTC)


 * I think you're misunderstanding me. "Fully sequenced eukaryote genomes" seems to mean that ALL of the genomes of the eukartyote in question have been sequenced.  Does it mean this, or does it mean that the nuclear genome has been sequenced?  For a plant, fully sequenced sounds like it means nuclear, plastid and mitochondrial genomes have been sequenced, but actually, you seem to be saying it means the nuclear genome has been sequenced.  If the latter is the case, the article should explicitly state that it is about nuclear genomes only.  KP Botany 20:25, 1 January 2007 (UTC)


 * When I started this page I intended for it to be a list of completed nuclear genomes (or whatever weird equivalent the organism may have) - the list did not include organisms that have organelle sequences but lack nuclear sequence, since many plants and animals have organelle sequences - but don't have fully sequenced nuclear genomes. I think a list of fully sequenced organelles by organism might be interesting - but it'd be too long to include in this page. I should add (although I haven't checked) that I'd be very surprised if any organism with a full nuclear genome sequence didn't also have organelle sequences available. --Peta 01:39, 2 January 2007 (UTC)

To add photosynthetic "cabozoan" chloroplast genomes
The two major photosynthetic "cabozoan" chloroplast genome sequences are available. Euglena gracilis (1993) and Bigelowiella natans (2007). The article by the Keeling group (C-S's nemesis) claims to disprove Cavallier-Smith's cabozoan theory since the chloroplast genomes of the green algae that were phagocytosed by these two organisms are not the same. This means that there were two independent endosymbiosis events of green algae, one for the class Euglenoidea and the other for the class Chlorarachniophyta. See my latest table in algae. I guess C-S was wrong, but I am waiting for his rebuttal. --Kupirijo 01:22, 30 December 2006 (UTC)


 * As above, I don't think this list should include organelle only sequences. Heaps of organisms have full organelle sequences available, which don't really fit here as this is a list of complete genomes. Add the info to the species pages.--Peta 00:18, 31 December 2006 (UTC)

Archea
I started a list of sequenced archeal genomes, I don't have much time to finish it, so if anyone wants to move it to the main namspace and finish it, it's here. --Peta 00:27, 31 December 2006 (UTC)
 * The incomplete list is now in the main namespace. --Peta 02:05, 8 January 2007 (UTC)

Linking the headings?
A non-expert who comes to this page will very likely not know what type of organisms Chromista or Alveolata are, and may be curious to learn. It is, of course, central to Wikipedia's intent that such a curious visitor should be able to just click a wikilink and find out. In January I therefore wikilinked the clade headings on this page. This was promptly reverted, because on the surface it conflicts with the recommendation of the Wikipedia Manual of Style to avoid wikilinks in headings. However, the motivation in the MoS for avoiding wikilinks in headings is that "Depending on settings, some users may not see them clearly. It is much better to put the appropriate link in the first sentence under the heading." In other words, putting a link in a heading instead of in the plain text that follows might make a few users (with unusual display preferences) unable to notice that it is a link, and thus deprive them of the utility of the link. The problem is that on this page there is no following plain text, and no other occurrences of these main clade names: the word protist for example never occurs in the table of sequenced protists. Thus the heading is our only opportunity to link to Protist, and the obvious place to do so. I fail to see how having no link, thus depriving every user of the utility of the link, helps further the intent of the MoS, which was to avoid depriving even a small subset of users of the utility of the link. If we link, most people will see a link; if we don't, nobody will. Therefore I argue that on this particular page we link all the clade headings. Comments? --mglg(talk) 23:50, 11 February 2007 (UTC)
 * I think it looks really bad, thats why they MoS says don't do it; why don't we just add a sentence or two before the table explaining (and linking) what the organism is and why it is interesting. --Peta 00:00, 12 February 2007 (UTC)
 * Nice work, Peta. Thanks! --mglg(talk) 23:40, 12 February 2007 (UTC)
 * Thanks, I don't know much about protists, so it'd be great if you could improve the text in any way.--Peta 23:48, 12 February 2007 (UTC)

Updates
This page really needs some updates. The horse genome has been completed, and i suspect many more - can someone add the horse info, i didn;t know how to edit a table like that. This website - http://www.genome.ucsc.edu lists the rhesus, the cat, the cow and  the opossum as all completed, and has lots of information on them. can someone put this into a table, please.

This page is now very seriously out of date. The "animals list" only includes 6 mammals. By Jan 2008 some 24 full Mammal genomes have been sequenced. According to the genome lists on the web http://www.ncbi.nlm.nih.gov/projects/WGS/WGSprojectlist.cgi, with some help from http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Search&db=genomeprj&term=%22Mammalia%22%5BOrganism%5D and http://www.broad.mit.edu/mammals/ the current list of sequenced whole genomes for mammals is: Cow, Dog, Guinea Pig, Nine-banded Armadillo, Horse, European Hedgehog, Cat, Human, African Elephant, Mouse Lemur, Mouse, Pika, Rabbit, Rhesus Macaque, Bushbaby, Chimpanzee, Orangutan, Rat, European Shrew, Squirrel, Northern Tree Shrew, Little Brown Bat, Platypus, Tenrec-Hedgehog Despite announcements in the press, only about 10% of the genome for the Opposum is available. And Entrez is wrong in saying that the Gorilla sequence is complete. (Mollwollfumble (talk) 23:15, 14 January 2008 (UTC))


 * The above changes have been made, but according to Miller et al. (2008) the page is still missing at least six non-mammal vertebrates: lizard, frog, tetraodon, stickleback, medaka, zebrafish. (Mollwollfumble (talk) 01:42, 24 November 2008 (UTC))


 * There are at least two bird species - Domestic Chicken and Zebra Finch which have been sequenced. 210.50.143.21 (talk) 23:38, 3 September 2010 (UTC) Ian Ison

What is a gene?
There is no consensus from the leading researchers in a field. I think putting "estimated genes" is pointless and this column should be removed since there is no clear definition to the word. Perhaps 'estimated loci' is more accurate. —Preceding unsigned comment added by 130.113.111.210 (talk) 15:27, 18 January 2008 (UTC)

Genome size
To the lay reader (me), the value of "551 Kb" for a genome size is ambiguous. I assume that kilobit is intended here, but the kilo- prefix is ambiguous in this context. Does it mean 551*1000 bits or 551*1024 bits? Thunderbird2 (talk) 12:33, 17 February 2008 (UTC)

Looking at Ref. [1] (Douglas et al 2001), I see that Kb actually means kilobase, not kilobit. My question now becomes, what is a kilobase? Thanks in advance Thunderbird2 (talk) 15:03, 17 February 2008 (UTC)

Never mind. I found this, which seems to answer my question. Thunderbird2 (talk) 15:07, 17 February 2008 (UTC)

TODO: Chlamy, Nematostella, amphioxus
We should add the Chlamydomonas reinhardtii genome. --Kupirijo (talk) 17:55, 8 June 2008 (UTC)

Other animals: Nematostella vectensis, Branchiostoma floridae

Giant panda
The Chinese news agency Xinhua is reporting that the genome of the Giant panda has been sequenced. Scientists complete sequencing giant panda genome. I saw no reference to the research report, or much detail. Perhaps someone who has access to more information could update this entry. TomS TDotO (talk) 12:15, 12 October 2008 (UTC)
 * Later news reports seem to indicate that the sequencing is not yet complete. TomS TDotO (talk) 16:45, 13 October 2008 (UTC)
 * The difficulty is that there is complete and there is complete. Unsurprisingly the MSM does not really understand the difference.--ZayZayEM (talk) 23:21, 13 October 2008 (UTC)

Babesia bovis
Is there any reference, or any information, for the sequencing of the genome of Babesia bovis? TomS TDotO (talk) 15:19, 1 December 2008 (UTC)
 * 8x shotgunned by the J.C.V.Inst. They are now in process of finding the genes. . -- Ayacop (talk) 17:13, 17 December 2010 (UTC)
 * However, UniProt has a complete proteome: . So, this is pretty much finished. Of course, now gene function haas to be found. -- Ayacop (talk) 16:55, 18 December 2010 (UTC)

Most are drafts
Only the human genome (and just now the mouse genome) have been fully sequenced - see this BBC News article I propose changing the list to explain this and make it clearer that most of the genomes listed are drafts. Smartse (talk) 12:51, 27 May 2009 (UTC)
 * To explain this complicated process would need an entire article itself IMHO. -- Ayacop (talk) 07:40, 15 December 2010 (UTC)

21 arthropods
I think this article might be getting a little out of date. A paper in BMC Genomics last year lists 21 arthropods and 1 mollusc with "completely sequenced genomes". Is there some reason why these shouldn't be included here? --Stemonitis (talk) 06:36, 12 January 2010 (UTC)




 * I'm pretty sure you're right that this list is out of date, but the lead specifies that only annotated genomes are included. Having said that, I suspect some of the genomes listed here are not annotated. Adrian J. Hunter(talk•contribs) 14:39, 12 January 2010 (UTC)


 * That means removing the 2x coverage genomes (they're just misleading to have here anyway since so little of their genome is sequenced). Narayanese (talk) 16:10, 12 January 2010 (UTC)

Nature, Science, etc.
An editor has marked all references to the famous journals Nature, Science, and so on, noting that there are Wikipedia entries for these journals. This has some appeal, but I wonder whether this would set a precedent for a Wikipedia-wide change - there are, after all, many references to such famous journals. I have no feelings, one way or the other, but I think that it is worth some discussion before going along with this change. (Is there somewhere else that this should be discussed?) TomS TDotO (talk) 11:50, 15 January 2010 (UTC)


 * Hmmm. I've always assumed this was widely done, though looking at a couple of recent FA of the Days (Splendid Fairywren, Ganymede), I see the journal names are not linked.  WP:LINK and WP:CITE would be the relevant project pages, though I don't think either says much about internal wikilinks within references.  I found this discussion in a 2007 archive.  Personally I include as many blue wikilinks as possible within references, as the corresponding articles on journals, newspapers or authors may give readers some clue as to the reliability of a claim (eg a claim sourced to Nature may be more reliable than one sourced to Social Text).  Also, references aren't normally read as one great block of text, so I don't see overlinking as a problem.  If you want to start a discussion about this I'd suggest starting a thread at the talk page of either WP:LINK or WP:CITE, and leaving a note at the other talk page directing people to that thread.  Adrian J. Hunter(talk•contribs) 14:20, 15 January 2010 (UTC)


 * Thanks. Just to keep you up to date: I've started a discussion at Wikipedia talk:Citing sources. TomS TDotO (talk) 17:16, 17 January 2010 (UTC)

Theobroma cacao
Supposedly a preliminary sequencing of Theobroma cacao has been completed. Abductive (reasoning) 21:35, 15 September 2010 (UTC)

Format change? Several proposals
This is now really getting out of control. People won't contribute to the "complicated" tables, and OTOH, with http://genome10k.org things are starting the overdrive. I would like to propose switching to a simple list format, each text line containing: species name (trivial name, importance), genome size in Mb, year. Number of genes is subject to change so can be left out, as well as the sequencing org which are the usual players.

Such a simple list could be also created in addition to the existing one. We could agree to stop contributing to the extended format up until and including year 2010. However, if both are kept, information would have to be duplicated in the newer, and confusion will appear nevertheless.

Furthermore, this page is now overripe for division into animals and others. Comments? --Ayacop (talk) 08:12, 15 December 2010 (UTC) P.S. BTW There were several instances where no longer a paper was published on the completion of the genome. I have added links to the NCBI accession entry in this case. It is safe to assume such papers will appear less and less in the future. — Preceding unsigned comment added by Ayacop (talk • contribs) 08:18, 15 December 2010 (UTC)


 * I agree that the tables aren't particuarly ideal and that it could be an idea to change this at some point. Regarding a split, the page is only 68kb at the moment, so I think we can wait a while until splitting. Obviously at some point in the future, this list will become unworkable as the number of species grows exponentially, but I think we should wait until that becomes more of a problem. SmartSE (talk) 15:15, 2 January 2011 (UTC)


 * Maybe renaming the present article to something like "List of early sequencings of eukaryotic genomes" with some arbitrary cutoff date.

Maybe, at some point, it will become so commonplace for a eukaryote to be sequenced that it is no longer "encyclopedic" and the only thing is to give a pointer to a repository of the data. TomS TDotO (talk) 15:28, 2 January 2011 (UTC)


 * Clearly this page can never be completed nor kept up to date. We are just about at the stage where it is possible to expect a new draft genome sequence every day. You can argue whether a draft genome is the same thing as a fully sequenced genome, but really they are all shades of grey. It was announced this week that the very first eukaryotic genome sequence (yeast) has had another major update...15 years after it was first published! So even the most complete genomes are never really finished.
 * So I would suggest that the title of this article should focus on just 'notable' or 'important' (scientifically and/or commercially) species, but I also like the idea of only listing the first 'N' sequenced genome (though this could also be a separate page). A line has to be drawn somewhere. Nod (talk) 00:47, 5 February 2011 (UTC)

Selaginella moellendorffii
The basal fern Selaginella moellendorffii has been sequenced, according to. To which table should it be added? Abductive (reasoning) 14:47, 2 January 2011 (UTC)
 * It needs to go in its own table under plants, maybe we should change "algae" to "other plants". I found a journal reference which can be used as well. Just to nitpick, it's a lycophyte not a fern. SmartSE (talk) 15:11, 2 January 2011 (UTC)

Number of sequenced fungi escalates
...and I don't mean draft genomes, you don't see those 1x or 2x genomes anymore. I mean all entries with version number bigger than 1 on the JGI page alone:. Also, the time the page takes to load after editing is growing beyond usability. To resolve both problems, I propose to make a separate List of sequenced fungi genomes as a one-liner text list, and link to it from this list. --Ayacop (talk) 15:32, 8 September 2011 (UTC)
 * The separate list is now created and linked to. --Ayacop (talk) 17:44, 9 October 2011 (UTC)
 * Good work. I suggest splitting out all the divisions (or at least plants). --Dan Bolser (talk) 00:54, 19 March 2012 (UTC)
 * Fungi now only has a table of the first 5 sequenced, discouraging users who don't (want to) understand why I'm doing this from adding to the table again. --Ayacop (talk) 17:25, 20 April 2012 (UTC)

This page should be automatically created
You should run a bot that uses ncbi genbank.--92.203.102.189 (talk) 07:54, 8 March 2012 (UTC)
 * Well volunteered! --Dan Bolser (talk) 00:50, 19 March 2012 (UTC)
 * And why do you think NCBI would be a complete source for this task? --Ayacop (talk) 14:23, 20 April 2012 (UTC)
 * Please note the NCBI assemblies released listing resources: genome/assembly/organism LeoBC (talk) 17:57, 11 May 2012 (UTC)

refs for Oryza sativa / japonica seem to be scrambled?
Can someone check the refs for sativa / japonica, as they seem to be scrambled. Cheers, --Dan Bolser (talk) 00:49, 19 March 2012 (UTC)
 * This is corrected in List of sequenced plant genomes. --Ayacop (talk) 17:26, 7 May 2012 (UTC)

ref 33
Can someone check ref 33, it does not link to a plant pathogen genome, but to a pediatric disorder. 82squaremetres (talk) 20:28, 21 April 2012 (UTC)

Aim for organisms with papers?
As I've asked in the talk page to the fungi list, we can no longer manually match that speed with fungi. Also, as predicted, less and less papers are written about fungi genome sequencing. Maybe we should in general instead aim for all organisms that have sequencing papers? --Ayacop (talk) 07:05, 19 May 2012 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 3 external links on List of sequenced eukaryotic genomes. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20060207040322/http://pgec-genome.ars.usda.gov/STRUCTURAL_DIR/AGI.html to http://pgec-genome.ars.usda.gov/STRUCTURAL_DIR/AGI.html
 * Added archive https://web.archive.org/web/20070927233645/http://www.genomesonline.org/Yeast.html to http://www.genomesonline.org/Yeast.html
 * Added archive https://web.archive.org/web/20120205193208/http://www.fugu-sg.org/project/info.html to http://www.fugu-sg.org/project/info.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 08:49, 2 January 2018 (UTC)