Skeletonema marinoi de novo genome project
We recently received the first sequence datasets from SciLife Lab. They had preformed a "quick and dirty" assembly, without doing any filtering or trimming of the data first. Started a filtering analysis in "/data7/skeletonema_mats/20130304" on sparc1.
[2013-03-05] Running "fastqc" on the oroginal files. Output in "/data7/skeletonema_mats/20130304/original_files_FQC".
[2013-03-08] The filtering analysis has finished:
[op] 162460792 sequnce pairs are in order
However, this analysis was run with an earlier version of the assemblyPipeline.py code, that contained a bug. The last step (pairSeq.py) that separates mate pairs froms singlets sequences was not done in the right way. I have therefore restated this step.
[2013-03-08] Started fastqc analyses of the *FXT* files, the *FXF.CA* and *FXF.CA.FQF* files.
[2013-03-08] The pairSeq.py analysis finished:
[op] 129420991 sequnce pairs are in order
The two "singlets" sequence files contains 9'271'882 and 7'783'701 sequences respectively. I'm transferring the files to Albiorix in order to start the assembly analysis later tonight.
[2013-03-10] Started a CLC assembly analysis, using 8 cores, on Albiorix. The assembly of Littorina sequences (started 2013-03-08, now 95% completed) is still running on 32 cores and ~40% of the 96G RAM. The two analyses may compete for resources.
Ran the assembly analysis again, this time using the "-e" flag, to estimate fragment sizes.