Maximum Likelihood Estimation of Weight Matrices for Targeted Homology Search

P. Menzel, J. Gorodkin and P. F. Stadler
GCB09. 2009.


Genome annotation relies to a large extent on the recognition of homologs to already known genes. The starting point for such protocols is a collection of known sequences from one or more species, from which a model is constructed --- either automatically or manually --- that encodes the defining features of a single gene or a gene family. The quality of these models eventually determines the success rate of the homology search. We propose here a novel approach to model construction that not only captures the characteristic motifs of a gene, but are also adjusts the search pattern by including phylogenetic information. Computational tests demonstrate that this can lead to a substantial improvement of homology search models.