Evolution of the POR gene family

Data at github.com/mtop-data/POR
Code at github.com/mtop/misc

Material & Methods

Peter has predicted the transit peptide (TP) region in all the sequences used in previous analyses (stored in "analysis/mrbayes/all_seqs.fst"). The TP regions have been removed from the sequences that are found in the file "all_seqs_No_TP.fst". I have aligned this dataset using mafft...

mats@Slartibartfasts: $ linsi --reorder all_seqs_No_TP.fst > all_seqs_No_TP.linsi.fst 

and analysed the alignment using zoro. The result is found in "all_seqs_No_TP.zorro.out". I have created two files in NEXUS format that will be analysed in MrBayes. One will be the original alignment ("all_seqs_No_TP.linsi.nex") and one will be same alignment masked with the output data from zorro ("all_seqs_No_TP.linsi.zorro.nex"), using a cutoff value of 0.4.

Results: Analysis of the masked dataset resulted in lower resolution in the eudicot clade. Analysis of the unmasked alignment resulted in a tree similar to previous results. Only surprise is that the sequence "Nicotiana_tabacum1" end up as sister to the monocots.

Marchantia paleacea

Peter reported that the M. paleacea sequence included in the analysis lacks a TP region, according to the prediction programs he used. Also the 3D structure differs from the other sequences. Will therefore search the NCBI and UniProt databases again to see if I can find any more sequences. Will also search the M. polymorpha sequences that was downloaded from http://www.genome.jp/. Query sequences in the latter analysis will be the three A. thaliana sequences, atPOR1-3, and the Physcomitrella patens sequence (with the TM regions removed) used in the phylogenetic analysis.

Results: Using the P. patens sequence as query of the NCBI nr database (search limited to Marchantia paleacea (taxid:56867)) resulted in four matches, one of them being the M. paleacea sequence used in previous analyses. Limiting the search to Marchantia polymorpha (taxid:3197) resulted in 21 matches. These matches are very short and have high e-values (0.27 and higher). Searching all species in the NCBI and UniProt databases found the query sequence.

Results "blast_and_align.py": All four query sequences found the following sequence:


>gnl|BL_ORD_ID|1111 @Marchathia_polymorpha@T20061:1166 K00218 protochlorophyllide reductase [EC:1.3.1.33] (A)
NFRVVSSISRVPEAVYRMAAVASLGS-ALSVSSAALSQNVCHSNHATKESAFLGLRIGEA
AKFGGVSLSASTVASNETSKPGVVSVNA--VTAPAETMNKPSAKKTATKSTCIITGASSG
LGLATAKALADSGEWHVIMACRDFLKAERAARAVG-------IPKDSYTV--IHCDLASF
DSVRAFVDNFRRTERQLDVLVCNAAVYFPTDK-EPKFSAEGFELSVGTNHMGHFLLARLL
MEDLQ-KAKDSLKRMIIVGSITGNSNTVAGNVPPKAN-LGDLRGLAGGLNGVNSSSMIDG
GEFDGAKAYKDSKVCNMLTMQEFHRRYHAETGITFSSLYP--------------------
------------------------------------------------------------
--------------

It is slightly shorter in both the N- and C-terminal ends compared to the M. paleacea sequence. Apart from that, only a few residues differs between the two Marchantia sequences.

Conclusion: No other POR sequences from the genus Marchantia are available in public databases.