You are here

DiScRIBinATE

Table of Contents

Introduction

DiScRIBinATE is a similarity based binning method. User needs to perform a similarity search of the input metagenomic sequences (reads) against the nr protein database using BLASTx search. The generated blastx output is then taken as the input by the DiScRIBinATE program.

Restrictions / License Information

All files are copyrighted, but license is hereby granted STRICTLY for academic, and non-profit use.

Running Instructions

SYNTAX:

Run the program with no parameters to get usage messages describing the 

available parameters. For running DiScRIBinATE, navigate to the folder where the 

DiScRIBinATE executable is present, and then execute the DiScRIBinATE program

using the following command:

 

./DiScRIBinATE -i <INPUT_FILE> -min <MIN_BIN_SIZE> -l <MINIMUM_BIT_SCORE>

 

These parameters are explained below

 

INPUT PARAMETERS (to be passed to the perl program during run time):

 

Argument 1 : INPUT_FILE 

    Name of the input file.(The output generated after 

    performing a blastx search of the metagenomic sequences

             (against the nr database) is taken as input for this program)

    

Argument 2 : MIN_BIN_SIZE (Range: 1-Total number of reads in the input file. Default:2)

    Minimum number of reads to create a bin.

 

Argument 3:  MINIMUM_BIT_SCORE (Default: 35)

    BLASTx hits with bit score less than the given value are neglected by the

      DiScRIBinATE program

 

OUTPUT FORMAT: 

 

Each time the program is executed, two files are generated as output

a. InputFileName.bins : The format of this file is as follows:

column1 : Taxid of the organism/taxa.

column2 : Name of the organism/taxa.

column3 : The total number of reads in the input file

 which are categorised under that organism/taxa. 

column4 : A comma separated list of all the reads in the input

 file which are categorised under that particular bin.

 

The last two lines in this file are the following:

NHBin - Number of reads in the input file which have no BLASTx Hits.

UnAss - Number of reads in the input file classified as 'unassigned'

 due to insignificant alignment parameters.

 

 

b. InputFileName.bin_stats : The format of this file is as follows:

column1 : Taxid of the organism/taxa

                column2 : Name of the organism/taxa

                column3 : The total number of reads in the input file which

                          are categorised as this organism/taxa

 

The last four lines in this file are the following:

NHBin - Number of reads in the input file which have no BLASTx Hits.

UnAss - Number of reads in the input file classified as 'unassigned'

 due to insignificant alignment parameters.

TAss - Total number of assignments.

TReads- Total number of reads in the input file.

System Jasper
Version Mar21-2013