Toc34 and Toc159 proteins in diatoms

Introduction

Kallanon & McFadden reported that Cyanidioschyzon merolae encodes two putative GTPase receptor proteins and classified one of them (CMP284C) in the Toc34 gene family and the other one (CMQ137C) as Toc159-like. Searching the NCBI non-redundant protein (nr) repository for Rhodophyceae (taxid:2763) proteins using either of the two protein as query sequence identifies the other as the fifth best match.

Also, using these two proteins as query sequences when searching all proteins from the diatoms ...

  • Fragilariopsis cylindrus
  • phaeodactylum tricornutum
  • Pseudo-nitzschia multiseries
  • Thalassiosira oceanica
  • Thalassiosira pseudonana

... identifies the same proteins as best matches. The sequences are separated in two groups in the multi sequence alignment (using mafft) when analysed together with a dataset of sequences from plant Toc34 and Toc159, 90, 120 and 132 proteins (https://github.com/mtop-data/diatom_toc_tic/blob/master/toc159/Toc34-159...). Furthermore, the group of diatom sequences in the lower part of the alignment are much fewer and less well aligned to the plant and algae sequences. I suspect they are not homologues of the C. merolae sequences.

This would indicate that there is only one TOC34/159 gene family in diatoms. Another explanation is that one of the C. merolae proteins has evolved faster than the other one and is now too dissimilar from the diatom homologues to find them in the BLAST searches. Finding homologues of C. merolae Toc34 and Toc159 in other red algae genomes and using these as query sequences to search the diatom genomes may fix this problem.

Code example:

blast_and_align.py -q cm_CMQ137C.fst -p ~/db/diatoms/ -a -l -n 10

TOC homologues in other red algae

Using the two proteins CMP284C (annotated Toc34) and CMQ137C (Toc159-like) as query sequences of the NCBI non-redundant protein (nr) repository for Rhodophyceae (taxid:2763) found three additional sequence matches.

  • EME28063.1 [Galdieria sulphuraria]
  • EME28062.1 [Galdieria sulphuraria]
  • CDF36346.1 [Chondrus crispus]

The first two proteins are annotated "chloroplast envelope protein translocase family isoform" 1 and 2, and the third protein as "unnamed protein product". These three proteins have been used to search the diatom protein databases again.

References

Kalanon and McFadden (2008) The Chloroplast Protein Translocation Complexes of Chlamydomonas reinhardtii: A Bioinformatic Comparison of Toc and Tic Components in Plants, Green Algae and Red Algae. Genetics. 179(1): 95–112.