PHYLIP

PHYLogeny Inference Package (PHYLIP) is a free computational phylogenetics package of programs for inferring evolutionary trees (phylogenies). It consists of 65 portable programs, i.e., the source code is written in the programming language C. As of version 3.696, it is licensed as open-source software; versions 3.695 and older were proprietary software freeware. Releases occur as source code, and as precompiled executables for many operating systems including Windows (95, 98, ME, NT, 2000, XP, Vista), Mac OS 8, Mac OS 9, OS X, Linux (Debian, Red Hat); and FreeBSD from FreeBSD.org. Full documentation is written for all the programs in the package and is included therein. The programs in the phylip package were written by Professor Joseph Felsenstein, of the Department of Genome Sciences and the Department of Biology, University of Washington, Seattle.

Methods (implemented by each program) that are available in the package include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and discrete characters.

Each program is controlled through a menu, which asks users which options they want to set, and allows them to start the computation. The data is read into the program from a text file, which the user can prepare using any word processor or text editor (but this text file cannot be in the special format of the word processor, it must instead be in flat ASCII or text only format). Some sequence analysis programs such as the ClustalW alignment program can write data files in the PHYLIP format. Most of the programs look for the data in a file called. If the phylip programs do not find this file, they then ask the user to type in the file name of the data file.

File format
The component programs of phylip use several different formats, all of which are relatively simple. Programs for the analysis of DNA sequence alignments, protein sequence alignments, or discrete characters (e.g., morphological data) can accept those data in sequential or interleaved format, as shown below.

Sequential format: 5 42 Turkey    AAGCTNGGGC ATTTCAGGGT GAGCCCGGGC AATACAGGGT AT  Salmo schiAAGCCTTGGC AGTGCAGGGT GAGCCGTGGC CGGGCACGGT AT  H. sapiensACCGGTTGGC CGTTCAGGGT ACAGGTTGGC CGTTCAGGGT AA  Chimp     AAACCCTTGC CGTTACGCTT AAACCGAGGC CGGGACACTC AT  Gorilla   AAACCCTTGC CGGTACGCTT AAACCATTGC CGGTACGCTT AA Interleaved format: 5 42 Turkey    AAGCTNGGGC ATTTCAGGGT Salmo schiAAGCCTTGGC AGTGCAGGGT H. sapiensACCGGTTGGC CGTTCAGGGT Chimp    AAACCCTTGC CGTTACGCTT Gorilla  AAACCCTTGC CGGTACGCTT GAGCCCGGGC AATACAGGGT AT GAGCCGTGGC CGGGCACGGT AT  ACAGGTTGGC CGTTCAGGGT AA  AAACCGAGGC CGGGACACTC AT  AAACCATTGC CGGTACGCTT AA The numbers are the number of taxa (different species in the example shown above) followed by the number of characters (aligned nucleotides or amino acids in the case of molecular sequences). Restriction site data must include the number of enzymes as well.

Names are limited to 10 characters by default and must be blank-filled to be of that length and followed immediately by the character data using one-letter codes, although the 10 character limit name can be changed by a minor modification of the code (by changing  in phylip.h and recompiling). All printable ASCII/ISO characters are allowed names, except for parentheses (" " and " "), square brackets (" " and " "), colon (" "), semicolon (" ") and comma (" "). The spaces embedded in the alignment are ignored.

Many programs for phylogenetic analyses, including the commonly-used RAxML and IQ-TREE programs, use the phylip format or a minor modification of that format called the relaxed phylip format.

Relaxed phylip format (sequential): 5 42 Turkey                  AAGCTNGGGCATTTCAGGGTGAGCCCGGGCAATACAGGGTAT Salmo_schiefermuelleri AAGCCTTGGCAGTGCAGGGTGAGCCGTGGCCGGGCACGGTAT H_sapiens              ACCGGTTGGCCGTTCAGGGTACAGGTTGGCCGTTCAGGGTAA Chimp                  AAACCCTTGCCGTTACGCTTAAACCGAGGCCGGGACACTCAT Gorilla                AAACCCTTGCCGGTACGCTTAAACCATTGCCGGTACGCTTAA The primary difference in relaxed phylip format is the absence of the 10 character limit and the removal of the need to blank fill names to reach that length (although filling names to start the character matrix at the same position can improve readability for user). This example of relaxed uses underscores rather than spaces in the names and uses spaces between the names and the aligned character data; it is often good practice to avoid white space within taxon names and to separate the character data from the name when generating files. Like strict phylip format files, relaxed phylip format files can be in interleaved format and include spaces and endlines within the sequence data.

The programs that use distance data, like the  program that implements the neighbor-joining method, also use a simple distance matrix format the includes only the number of taxa, their names, and numerical values for the distances:

Phylip distance matrix: 7 Bovine    0.0000 1.6866 1.7198 1.6606 1.5243 1.6043 1.5905 Mouse    1.6866 0.0000 1.5232 1.4841 1.4465 1.4389 1.4629 Gibbon   1.7198 1.5232 0.0000 0.7115 0.5958 0.6179 0.5583 Orang    1.6606 1.4841 0.7115 0.0000 0.4631 0.5061 0.4710 Gorilla  1.5243 1.4465 0.5958 0.4631 0.0000 0.3484 0.3083 Chimp    1.6043 1.4389 0.6179 0.5061 0.3484 0.0000 0.2692 Human    1.5905 1.4629 0.5583 0.4710 0.3083 0.2692 0.0000 The number indicates the number of taxa and same limitations for taxon names exist. Note that this matrix is symmetric and the diagonal has values of 0 (since the distance between a taxon and itself is zero by definition).

Programs that use trees as input accept the trees in Newick format, an informal standard agreed to in 1986 by authors of seven major phylogeny packages. Output is written onto files with names like  and. Trees written onto  are in the Newick format.