Skeletonema codon usage


I'm about to calculate the codon usage for Skeletonema marinoi transcripts, using Sandras Trinity assembly from unfiltered/untrimmed non-normalised reads (/data4/skeletonema_sandra/trinity_out_dir/Trinity.fasta).

Material and methods

Open reading frames longer then 1000 bp and containing both a start and a stop codon was identified using the command "getorf -minsize 1000 -find 3 Trinity.fasta". This method found 15381 ORF's, the longest on being 16890 bp. Alos identified ORF's between 1000-3000 bp in length using the command "getorf -minsize 1000 -maxsize 3000 -find 3 Trinity.fasta" and found 13714. Cumulative codon frequencies was then calculated using the program GCUA downloaded from


The default behaviour of "getorf" is apparently to not include the stop codon in the reported ORF. Hence, the GCUA output contains no information about stop codon frequencies. I retrieved these codons using the command "getorf -minsize 1000 -maxsize 3000 -find 6 -flanking 0", that only results in the stop codons ("-find 6" = Nucleotides flanking ending STOP codons, "-flanking 0" = Zero nucleotides flanking the STOP codon).