Amphiura filiformis de novo genome project

Introduction

Coming soon

Data

  • 5_120719_AC0YY4ACXX_2_indexm1_1.fastq - 170'243'702 sequences (Data_1)
  • 5_120719_AC0YY4ACXX_2_indexm1_2.fastq - 170'243'702 sequences (Data_2)
  • 3_130111_BD1HWHACXX_P389_101_indexm1_1.fastq - 158'112'696 sequences (Data_3)
  • 3_130111_BD1HWHACXX_P389_101_indexm1_2.fastq - 158'112'696 sequences (Data_4)

Analyses

20130305

settings:

[fastx_trimmer]
f: 6
[cutadapt]
q: 15
o: 10
e: 0.1
n: 1
m: 50
[fastq_quality_filter]
p: 95
k: 20
[clc]
min_dist: 100
max_dist: 450

Running a "fastqc" analysis of the original data (four files).

Result

[2013-03-05] Running a "fastqc" analysis of the original data (four files).

[2013-03-06] Started a filtering analysis of the four files. This analysis was terminated with the following error meassage:

matsto@sparc1:/data6/amphiura_mats/20130305$ assemblyPipeline.py
fastx_trimmer: Invalid quality score value (char '#' ord 35 quality value -29) on line 4
fastx_trimmer: Invalid quality score value (char '#' ord 35 quality value -29) on line 12
fastq_quality_filter: Premature End-Of-File (filename ='Data_3.FXT.CA.fastq')
[--] Building initial dictionary of sequence id's in first file.
[--] Attempting memory garbage collection
[--] Comparing id's in second file to the dictionary.
[--] Comparing id's in first file to the dictionary
[--] Check if sequences in 'pair' files are in the same order
[op] 103772835 sequnce pairs are in order
Traceback (most recent call last):
  File "/home/mastto/bin/pairSeq.py", line 226, in 
    main()
  File "/home/mastto/bin/pairSeq.py", line 192, in main
    inFile1 = fastqFile(f1, 'r')                # First sequence file to read from
  File "/home/mastto/bin/pairSeq.py", line 103, in __init__
    file.__init__(self, name, mode)
IOError: [Errno 2] No such file or directory: 'Data_3.FXT.CA.FQF.fastq'
matsto@sparc1:/data6/amphiura_mats/20130305$

Input fiels "Data_3" and "Data_4" have Q33 quality scores. Will restart the analysis of the files "Data_3" and "Data_4" using the correct configuration.

[2013-03-11] Started "fastqc" analyses of all "*.FXT.fastq", "*.FXT.CA.fastq" and *.FXT.CA.FQF.fastq" files.

[2013-03-25] Transferring files to Albiorix for assembly.

[2013-03-25] Started the assembly analysis.


20130328

Settings:

[fastx_trimmer]
f: 6
[cutadapt]
q: 15
o: 10
e: 0.1
n: 1
m: 50
[fastq_quality_filter]
p: 95
k: 20
[clc]
min_dist: 100
max_dist: 450

The previous assembly was done using the wrong flags with clc_assembler (see Surirella brebissonii assembly 20130218). I'm therefore doing a new assembly (with it's own id number) of the filtered and sorted sequences from the 20130305 analysis.


20130330

[fastx_trimmer]
f: 6
[cutadapt]
q: 15
o: 10
e: 0.1
n: 1
m: 50
[fastq_quality_filter]
p: 95
k: 20
[clc]
min_dist: 100
max_dist: 450

Same analysis as "20130328", only this time I'm estimating fragment size for clc_assembler using the "-e" flag.