Skeletonema marinoi de novo genome project

Introduction

This project is conducted in collaboration with Anders Blomberg, Magnus Alm Rosenblad and Anna Godhe. A description of the project can be found at Annas webpage.

Data

Coming soon

Assembly analyses

20130304

We recently received the first sequence datasets from SciLife Lab. They had preformed a "quick and dirty" assembly, without doing any filtering or trimming of the data first. Started a filtering analysis in "/data7/skeletonema_mats/20130304" on sparc1.

Result

[2013-03-05] Running "fastqc" on the oroginal files. Output in "/data7/skeletonema_mats/20130304/original_files_FQC".

[2013-03-08] The filtering analysis has finished:

[op] 162460792 sequnce pairs are in order

However, this analysis was run with an earlier version of the assemblyPipeline.py code, that contained a bug. The last step (pairSeq.py) that separates mate pairs froms singlets sequences was not done in the right way. I have therefore restated this step.

[2013-03-08] Started fastqc analyses of the *FXT* files, the *FXF.CA* and *FXF.CA.FQF* files.

[2013-03-08] The pairSeq.py analysis finished:

[op] 129420991 sequnce pairs are in order

The two "singlets" sequence files contains 9'271'882 and 7'783'701 sequences respectively. I'm transferring the files to Albiorix in order to start the assembly analysis later tonight.

[2013-03-10] Started a CLC assembly analysis, using 8 cores, on Albiorix. The assembly of Littorina sequences (started 2013-03-08, now 95% completed) is still running on 32 cores and ~40% of the 96G RAM. The two analyses may compete for resources.

20130327

Ran the assembly analysis again, this time using the "-e" flag, to estimate fragment sizes.