You are here

orthomcl

Table of Contents

Introduction

OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.

Running Instructions on bugaboo

please run:

module load orthomcl

in your pbs script perior to your orthomcl command. Please take a look at software versions for available versions in bugaboo.

The config file that is used to connect mysql is slightly different than the original config file. Here is a config filetemplate:

dbVendor=mysql
dbname=michael_orthomcl
oracleIndexTablespace=IndexTablespace
similarSequencesTable=SimilarSequences
orthologTable=Ortholog
inParalogTable=InParalog
coOrthologTable=CoOrtholog
interTaxonMatchView=InterTaxonMatch

where dbname in this config file is the name of mysql database that must be already created. Please take a look at

https://www.westgrid.ca/support/software/mysql

for more information about the rules for choosing name of the database in mysql. Please note that you DO NOT need to write your mysql user name and mysql password in this config file. The correct user name and password will be taken automatically.

 

 

 

System Bugaboo
Version 2.0.9