Skeletonema codon usage

Introduction

I'm about to calculate the codon usage for Skeletonema marinoi transcripts, using Sandras Trinity assembly from unfiltered/untrimmed non-normalised reads (/data4/skeletonema_sandra/trinity_out_dir/Trinity.fasta).

Material and methods

Open reading frames longer then 1000 bp and containing both a start and a stop codon was identified using the command "getorf -minsize 1000 -find 3 Trinity.fasta". This method found 15381 ORF's, the longest on being 16890 bp. Alos identified ORF's between 1000-3000 bp in length using the command "getorf -minsize 1000 -maxsize 3000 -find 3 Trinity.fasta" and found 13714. Cumulative codon frequencies was then calculated using the program GCUA downloaded from http://bioinf.nuim.ie/gcua/.

Update

The default behaviour of "getorf" is apparently to not include the stop codon in the reported ORF. Hence, the GCUA output contains no information about stop codon frequencies. I retrieved these codons using the command "getorf -minsize 1000 -maxsize 3000 -find 6 -flanking 0", that only results in the stop codons ("-find 6" = Nucleotides flanking ending STOP codons, "-flanking 0" = Zero nucleotides flanking the STOP codon).