Talk:Computational phylogenetics

Deleted a paragraph from the introduction
"Producing a phylogenetic tree requires a measure of homology among the characteristics shared by the taxa being compared. In morphological studies, this requires explicit decisions about which physical characteristics to measure and how to use them to encode distinct states corresponding to the input taxa. In molecular studies, a primary problem is in producing a multiple sequence alignment (MSA) between the genes or amino acid sequences of interest. Progressive sequence alignment methods produce a phylogenetic tree by necessity because they incorporate new sequences into the calculated alignment in order of genetic distance."

-- I deleted this paragraph, which I think A) contains some fundamental inaccuracies, and B) mostly restates information from earlier in the introduction. Firstly, I think it confuses homology with similarity. I think it also makes contradictory statements about multiple sequence alignments. It states that multiple sequence alignments are central to calculating phylogenetic trees, while also stating that progressive MSA methods can create phylogenetic trees. Since progressive MSA methods create their guide trees from pairwise (rather than multiple) sequence comparisons, it follows that multiple sequence alignments would not be important for phylogenetic tree inference. That is, when you make a MSA with Clustal or MAFFT, you get the tree before the alignment, so the alignment can't be important for the tree. A more subtle issue is that multiple alignment guide trees are really not good phylogenetic trees (at all), and I would argue it's muddying the water to a certain extent to bring them up---certainly in the first few paragraphs of the article.

Please feel free to reinstate this paragraph if you disagree, but think about fixing those issues if you do. I opted to just take it out completely because it didn't seem to add very much that hadn't been said in the paragraph above. Gearfo (talk) 03:15, 1 May 2018 (UTC)

GA Review

 * GA review (see here for criteria)


 * 1) It is reasonably well written.
 * a (prose): b (MoS):
 * 1) It is factually accurate and verifiable.
 * a (references): b (citations to reliable sources):  c (OR):
 * 1) It is broad in its coverage.
 * a (major aspects): b (focused):
 * 1) It follows the neutral point of view policy.
 * a (fair representation): b (all significant views):
 * 1) It is stable.
 * 2) It contains images, where possible, to illustrate the topic.
 * a (tagged and captioned): b lack of images (does not in itself exclude GA):  c (non-free images have fair use rationales):
 * 1) Overall:
 * a Pass/Fail:
 * The prose, I was starting to fell asleap, lots of times, I lost track what I was reading about, it lacks flow.
 * Jargon, plenty of undefined words lying around, for example all to all matrix etc...
 * broad in its coverage, cannot decide on that point, as the text is too heavy to read, so I lost track what it was all about.
 * Images, there must be some images in this article, defining what it's about, not just text, as I for example havn't as of yet understand if it's about computer sience or biomolecular sience.
 * Images, there must be some images in this article, defining what it's about, not just text, as I for example havn't as of yet understand if it's about computer sience or biomolecular sience.

Result: fail → A z a  Toth 23:13, 14 October 2006 (UTC)

More GA Review
Sorry, I don't really get the GA formatting so I'm adding another section. For that matter I had initially intended to review this earlier, but couldn't figure out where to do so. Apparently it's here on the article talk page instead of on the GA page (as in FAs). Here are a list of comments I have concerning this article. For now I'd have to agree with the Fail decision listed above. My only deal-breaking reason is the lack of coverage for morphlogical characters. As it's currently written a reader might very easily think that computational phylogenetics is restricted to nucleotide/amino acid data. I think misleading a reader to this degree is fundamentally grounds for disqualification as a GA. I'm also uncomfortable with the treatment of this topic as if it is a subdiscipline of aligning sequences. Props still go to Opabinia regalis and other editors who have clearly put a lot of well-researched work into this. --Aranae 05:54, 16 October 2006 (UTC)
 * The article is massively biased toward genetic data. I'm not sure where this comes from, but the article strongly suggests that computational phylogenetics is exclusively for nucleotides/genetic data.  This bias is rampant throughout the article.  Neighbour-joining, Fitch-Margoliash, parsimony heuristic algorithms, and almost everything else discussed here (short of ML based and tree-based alignment methods) were developed for use with morphological data and later applied to molecular data.
 * I'm having trouble trying to identify the intended scope of the article so I'm having issues with determining what's needed. Is it about data treatment and analysis?  Is it about any computerized algorithm you'd use to help you build a tree?  Here are a list of things that may apply depending on the intended scope:
 * Postional homology - This is treated very nicely elsewhere, and should be given a prominent set of links and a quick summary here. It's really the homology of characters that's important.  The alignment is important because of homology.  The present second sentence and second paragraph are awkward.  They are constructed as if this subject is an offshoot of the field of sequence alignment.  All the alignment stuff in the intro is very strangely placed.  Discussion of homology for morphological characters should also be given a quick overview.  At this level discussion of genetic and morphological characters would involve similar statements.
 * Coding of characters - Again, morphological characters have been strangely omitted, but how to code characters is very important. Are they coded as binary characters?  Are they ordered or unordered?
 * Model selection - ML, Bayesian, and (though it's not discussed as much in the literature) NJ algorithms are really defined by the model of evolution imposed on the data. There are many approaches to determining which model should be used.
 * Indels - How are gaps treated? Missing data, additional character state, as binary characters?  Most of this literature concerns molecular data, but it's important in dealing with morphological data as well (i.e. it's hard to count toes if you don't have a foot).
 * Outgroup selection - Covered a bit, but warrant more as they are the only way to root most trees.
 * Ancestral state reconstruction - It's brought up and could be applicable here, but may be better treated in depth elsewhere.
 * Combining data - Different genes and/or molecular + morphology.
 * Nodal support
 * Setup - Right now the article headings are technique based, yet that's more reasonable in an article (such as Phylogenetics) that discusses the theory behind the approaches, but this article should be more nuts and bolts. Perhaps an outlne very roughly along the lines of:
 * Homology
 * Morphology
 * Alignment
 * Missing data and indels
 * Character coding
 * Morphology
 * Nucleotide
 * Amino Acid
 * Approaches
 * Phenetic
 * UPGMA
 * Neighbor-joining
 * Fitch-Margoliash
 * Parsimony
 * Overview
 * MALIGN and POY
 * Maximum likelihood
 * Model selection
 * Character weighting
 * Hierarchical-based model selection
 * Bayesian approach to model selection
 * Searching tree-space
 * Exhaustive
 * Branch and bound
 * Heuristic approaches
 * Identifying and reaching multiple islands
 * Rooting
 * Outgroup rooting
 * Rooting without an outgroup
 * Midpoint rooting
 * Molecular clock
 * Nodal support
 * Bootstrap
 * Jacknife
 * Taxon jacknifing
 * Bayesian posterior probability
 * Other comments:
 * The term "phylogenetic tree construction" should probably be replaced with "phylogenetic tree reconstruction". The tree-builder isn't making the tree for the first time, but is trying to replay history (crudely).
 * The wording suggests that the gene sequence is the OTU, which may be true in some instances (such as in detecting gene duplication events or teasing apart gene trees from species trees). I think most questions are looking at the relatedness among taxa (i.e. species).
 * Most algorithms build an unrooted tree and use a user-selected outgroup to impose a root after the analysis is finished. I find the wording in "Types of phylogenetic trees" to be a bit awkward.
 * UPGMA has enough historical significance and is such a simple technique to understand that I think it warrants as much of a section as NJ.


 * Excellent, thanks for the very detailed comments! Exactly what I was hoping for :) This is what I meant about single authorship being a problem; the lack of discussion of morphology comes from the fact that I have zero experience working with morphological data, and know very little about the methodology (especially character coding, which has always seemed somewhat arbitrary to me). I will look at this in more detail tomorrow, but a couple of quick comments - I originally wrote/expanded this as a subarticle of sequence alignment, which is why it seems to lean in that direction. The usage of "phylogenetic tree construction" was intended to roughly follow the distinction in the articles phylogenetic tree vs evolutionary tree, where the merge "discussion" generally agreed that the latter is the "true" history and the former is a construction. Thanks again for such a thorough review! Opabinia regalis 06:34, 16 October 2006 (UTC)

MP & ML do not require MSA?
Although a phylogenetic tree can always be constructed from an MSA, phylogenetics methods such as maximum parsimony and maximum likelihood do not require the production of an initial or concurrent MSA.

Unless I'm reading this wrong, it says that MP and ML can be (sensibly) constructed without aligning the sequences first. I have not being able to find anything online that supports that notion. In fact, many articles implied that the methods are all sensitive to sequence alignments. I'm pretty new to computational phylogenetics, so please pardon me if I've made any elementary mistake here. --　  Zephirum  Talk   00:40, 22 July 2010 (UTC)
 * Good catch - all phylogenetic methods require alignment of sequences. -- Scray (talk) 00:36, 23 July 2010 (UTC)
 * This could be referring to a "joint-estimation" approach like Bali-Phy which simultaneously aligns as it infers tree topology & parameters. This doesn't really seem to be in common usage as yet (despite a number of benefits) because it's hugely computationally intensive. Simon (talk) 04:00, 24 July 2010 (UTC)
 * I might agree if not for the word "concurrent" in that statement quoted by . -- Scray (talk) 16:36, 24 July 2010 (UTC)

Methods using ancestor sequences as well
From my (admittedly possibly limited :-)) knowledge of the field, it seems that most methods use extant species to reconstruct and infer the tree and the ancestors. Now, assume I had sequences corresponding to some ancestors of my extant species, or that I had at least some information giving me a partial order on my species. Are there methods that take these into account and associate those ancestral sequences with internal nodes in the tree they reconstruct, instead of inserting all available sequences at the leaves? 134.58.45.79 (talk) 15:31, 8 October 2010 (UTC)

Self promotion by khalafvand
First reference has 0 citations? Miguelaglopes (talk) 02:47, 4 October 2023 (UTC)