Webbased Phylogenomic analysis


You will in this exercise practice your newly acquired skills in phylogenetic inference and "tree thinking", by analysing the evolutionary history of a gene family. One part of the exercise is also to collect the data necessary for running the analysis (in the form of homologous protein sequences from different species) as well as interpreting the result (you first have to draw a species tree for the taxa included in the analysis).

Suggestions for gene families to analyse in this exercise

  • Toc75
  • Toc34
  • Tic20
  • POR A
  • Alb3
  • ... or your own favorite gene family.


  1. Select one of the gene families from the list above for your analysis.
  2. Download an amino acid reference sequence from the species Arabidopsis thaliana at the NCBI site and save it in a text file using your favorit text editor [Hint: use e.g the search string "Arabidopsis thaliana[orgn] Tic22"].
  3. Download the file "Viridiplantae.pdf" from the course GUL page and look up the phylogenetic position of the species in the list below. Draw a "species tree" for these species and save for later. The first five species will represent your ingroup, and the last one will represent your outgroup. You will compare the topology of the gene tree generated in this exercise to the topology of this species tree in order to identify speciation events and gene duplications.
    • Arabidopsis thaliana
    • Medicago truncatula
    • Zea mays
    • Oryza sativa
    • Physcomitrella patens
    • Volvox carteri
  4. Navigate your web browser to www.phytozome.net/.
  5. Click on the species tree in order to access their respective BLAST databases. Select "Target type: Proteome" and then make a BLASTp search for homologous sequences using your Arabidopsis thaliana reference sequence. Save the sequences you want to include in the analysis in the same file as you saved the reference sequence [Hint: at this stage it is better to save too many, rather than too few sequences]. You can access the sequences by clicking the "Gene view" button, selecting the "Sequences" tab and then "Peptide sequence"
  6. Upload your sequences to the Mafft alignment server (http://mafft.cbrc.jp/alignment/server/) and align them [Alternatively you can use a local installation of your favorite alignment program]. Look for sequences in the alignment that are poorly aligned to the rest, and exclude them if you suspect that they are not homologous to your reference sequence. Keep aligning/analysing/excluding sequences until you are happy with the alignment. Save the resulting sequences to your computer.
  7. Once again redirect your web browser to a new web site. This time to www.phylogeny.fr/. Select their "One Click" function and upload your data and run the analysis using the default settings.
  8. After the analysis has finished you'll be presented with a phylogenetic tree. Manipulate your tree by changing the rooting (try "Mid-point rooting" and "Reroot (outgroup)") etc. Also try the "Flip" and "Swap" options in order to facilitate the comparison to the species tree you draw earlier. Does your result make sense? Can you distinguish the nodes representing gene duplications from the once indicating speciation events? Also play around with the other options to make the tree look its best.
  9. Go back and add more sequences to your analysis, if you think you have missed some homologues. Then redo the steps outlined above.
    1. Good luck