You are here

TopHat

Introduction

Exon splice junction mapper based on BowTie RNA-Seq alignments.

Running Tophat on Breezy

Here is a sample job script for analysis of a human genome case on Breezy, which was used in preliminary testing of TopHat.  Replace mycasename and myusername with values appropriate for your calculation.

#!/bin/bash
#PBS -S /bin/bash

# Sample tophat run on Breezy.
# Define file locations related to this case.
CASE=mycasename

BAMS_DIR=/global/scratch/myusername/bam_files
OUTPUT_DIR=/global/scratch/myusername/results/$CASE
READS1=${BAMS_DIR}/$CASE.fq
READS2=${BAMS_DIR}/$CASE.fq

TOPHAT_OPTIONS=" "

# Define location of reference genome (not just the directory, but, a prefix)
export HG19=/global/software/data/hg/hg19/hg19

# Define location of software
TOPHAT_DIR=/global/software/tophat/tophat_2.0.10/bin
export PATH=${TOPHAT_DIR}:$PATH

BOWTIE_DIR=/global/software/bowtie/bowtie2-2.1.0/bin
export PATH=${BOWTIE_DIR}:$PATH
SAMTOOLS_DIR=/global/software/samtools/samtools0119/bin
export PATH=${SAMTOOLS_DIR}:$PATH

BOOST_LIB_DIR=/global/software/boost/gcc/lib
export LD_LIBRARY_PATH=${BOOST_LIB_DIR}:$LD_LIBRARY_PATH

# Provide a more recent version of Python than the Breezy default:
module load python/2.7.2

# On systems where $PBS_NUM_PPN is not available, one could use:
#CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
CORES=$PBS_NUM_PPN

cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"

echo "Running on `hostname`"
echo "Running on $CORES cores."
echo "Using tophat: `which tophat`"
echo "Using bowtie2: `which bowtie2`"
echo "Using SAMtools from ${SAMTOOLS_DIR}"
echo "Using Python: `which python`"
echo "Starting run at: `date`"
tophat ${TOPHAT_OPTIONS} \
-p $CORES \
-o ${OUTPUT_DIR} \
$HG19 \
$READS1 \
$READS2

echo "Finished at: `date`"

For More Information

2014-06-02: - Added sample job script for Breezy.