You are here

BMTagger

Table of Contents

Introduction

Best Match Tagger (BMTagger) is an efficient tool that discriminates between human reads and microbial reads without doing an alignment of all reads to the human genome.

Running Instructions

Instructions (README) at the software download site are partially repeated here, but, not all of them directly apply on WestGrid systems. For example, blastn is already available on many WestGrid systems so doesn't have to be downloaded.

Steps done once per reference genome are:

1. Make index for bmfilter using command
     bmtool -d  -o  -w 18
   where reference.fa is fasta file for the screening database. For
   HMP, this can be the human genome. Output is a binary file generated
   in reference.bitmask
      To make a compressed index, add flag "-z" to the above command line.

2. Make index for srprism
     srprism mkindex -i  -o  -M 7168
   This generates files with prefix reference.srprism

3. Make blastdb for blast
     makeblastdb -in  -dbtype nucl
   This generates database files for blastn. makeblastdb and blastn
   can be downloaded from blast distribution:
   ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/

Commands for running the tagger depend on the data source for reads.
 - For single reads in fasta format, the command is:
     bmtagger.sh -b reference.bitmask -x reference.srprism -T tmp -q 0 -1  -o 
 - For paired reads in fasta format, the command is:
     bmtagger.sh -b reference.bitmask -x reference.srprism -T tmp -q 0 -1  -2  -o 
 - For single reads in fastq format, the command is:
     bmtagger.sh -b reference.bitmask -x reference.srprism -T tmp -q 1 -1  -o 
 - For paired reads in fastq format, the command is:
     bmtagger.sh -b reference.bitmask -x reference.srprism -T tmp -q 1 -1  -2  -o 
 - For reads read directly from SRA, the command is:
     bmtagger.sh -b reference.bitmask -x reference.srprism -T tmp -A  -o 
   Note that the application assumes that if a run is specified, 
then the SRA structure has been installed as per the directions in SRA toolkit.
In all above scenarios, -b, -x, and -T specify the index for bmfilter,
index for srprism, and directory to use for temporary files
If no temporary directory is specified, current working directory is used.
Flag -q of 0 and 1 specify fasta and fastq input files, respectively.
Output specified by -o is a file name if input is fasta or fastq,
and it is a directory if the input is a run. The output for, say run SRR059480,
when -o is my_results will be a file my_results/SRR059480.blacklist
that contains the SRA indexes of reads found to be human rather than the full id. Output files with inputs as fasta or fastq contain the ids of reads found to be human. If input files are fasta or fastq files, one can instead of printing
the list for reads that are tagged as human, get the reads in the original
input format that are NOT tagged as human by addign flag -X. See README.bmtagger.txt, README.bmfilter.txt and README.bmtool.txt for more details. For systems with free RAM 3GB-9GB comressed bitmask format is preferred.
If available RAM size is more than 9GB, uncompressed bitmask will work faster.
Still to produce compressed or uncompressed bitmask one needs 9GB available memory.
System Breezy Jasper
Version 1.1.0 1.0