The first thing I need to do is subsample all the individual metagenomes. I opted to do a 1% subsetting so the following processes can run fairly quickly in a day, making the debugging and testing process that much faster.
As usual I’m using the Enveomics script FastA.subsample.pl to accomplish this.
I used our clusters multiple job submission format with these two scripts (pbs script, [submission script] (/assets/internal_files/submit_multiple_qsub_subsample.sh).
I then concatenate all the subsampled libraries into 1 fasta file.
Building the bowtie index of the final contigs file from megahit
Mapping individual sample libraries to the index
Converting the produced SAM file to its corresponding binary BAM file
Running in a loop
Converting the produced SAM file to its corresponding binary BAM file
Running in a loop
Run metaBAT default command
##Metagenomic binning on the uncontaminated oil samples
Looks like I went ahead and did a lot of work without documenting everything, shame! But running metahit on the fulldataset failed to finish so I separated the dataset into clean samples and oiled samples. Since I only care about the clean samples I pooled all of them into 1 fasta file and ran metahit on that using the above command.
From there the process is exactly as described above, except I used mutliple job submissions instead of loops to run each samples coverage.