Index of /examples/bioinformatics/genomestrip

Icon  Name                    Last modified      Size  Description
[DIR] Parent Directory -

GenomeSTRiP


General Notes:


GenomeSTRiP is developed by the Broad Institute and is a java based program in the same family as GATK/Queue/Picard. Setting library and classpaths can be a little difficult. See below for a working example. More information here: http://sccsvc.bu.edu/software/#/package/genomestrip/

Interactive Usage:

# Create Testing Directory

[cjahnke@scc4 ~]$ mkdir /scratch/cjahnke/genomestrip
[cjahnke@scc4 ~]$ cd /scratch/cjahnke/genomestrip


# Load Modules

[cjahnke@scc4 genomestrip]$ module load java/1.8.0_66
[cjahnke@scc4 genomestrip]$ module load genomestrip/2.00.1650

# Get Data

[cjahnke@scc4 genomestrip]$ wget http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly18.fasta

# Index data

[cjahnke@scc4 genomestrip]$ bwa index Homo_sapiens_assembly18.fasta
[bwa_index] Pack FASTA...
[bwa_index] Reverse the packed sequence... 11.13 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 15.92 seconds elapse
[bwa_index] Construct BWT for the reverse packed sequence...
[bwa_index] 15.67 seconds elapse.
[bwa_index] Update BWT... 11.28 sec
[bwa_index] Update reverse BWT... 10.55 sec
[bwa_index] Construct SA from BWT and Occ... 17.38 sec
[bwa_index] Construct SA from reverse BWT and Occ... 17.33 sec


# Run ComputeGenome Mask
# http://gatkforums.broadinstitute.org/gatk/discussion/1499/computegenomemask

[cjahnke@scc4 genomestrip]$ export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}
[cjahnke@scc4 genomestrip]$ java -Xmx2g \
-cp ${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
org.broadinstitute.sv.apps.ComputeGenomeMask \
-R Homo_sapiens_assembly18.fasta \
-O Homo_sapiens_assembly18.mask.chr1.36.fasta \
-readLength 36 \
-sequence chr1
INFO 16:46:55,734 HelpFormatter - ----------------------------------------------------------
INFO 16:46:55,737 HelpFormatter - Program Name: org.broadinstitute.sv.apps.ComputeGenomeMask
INFO 16:46:55,743 HelpFormatter - Program Args: -R Homo_sapiens_assembly18.fasta -O Homo_sapiens_assembly18.mask.chr1.36.fasta -readLength 36 -sequence chr1
INFO 16:46:55,748 HelpFormatter - Executing as cjahnke@scc4.bu.edu on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17.
INFO 16:46:55,749 HelpFormatter - Date/Time: 2016/04/04 16:46:55
INFO 16:46:55,749 HelpFormatter - ----------------------------------------------------------
INFO 16:46:55,750 HelpFormatter - ----------------------------------------------------------
INFO 16:46:56,235 GenomeMaskAlgorithm - Initializing bwa ...
# .
# ..
# ...
INFO {timestamp} GenomeMaskAlgorithm - Completed {10000-the whole file} alignments
# ...
# ..
# .
INFO 18:11:23,830 GenomeMaskAlgorithm - Reference cache performance: cache size: 1000000, hits: 247249225 (100.0%), misses: 494 (0.0%), long 0 (0.0%)
INFO 18:11:23,830 CommandLineProgram - Program completed.
[cjahnke@scc4 genomestrip]$ ls -1
Homo_sapiens_assembly18.fasta.amb
Homo_sapiens_assembly18.fasta.ann
Homo_sapiens_assembly18.fasta.bwt
Homo_sapiens_assembly18.fasta.fai
Homo_sapiens_assembly18.fasta.pac
Homo_sapiens_assembly18.fasta.rbwt
Homo_sapiens_assembly18.fasta.rpac
Homo_sapiens_assembly18.fasta.rsa
Homo_sapiens_assembly18.fasta.sa
Homo_sapiens_assembly18.mask.chr1.36.fasta


Batch Usage:


scc4% qsub -P {project} genomestrip.qsub


Documentation: