The analysis from the three-dimensional structure of proteins is an important

The analysis from the three-dimensional structure of proteins is an important topic in molecular biochemistry. which critically impact the quality of the positioning. We display that several existing positioning methods arise like a posteriori estimations under specific choices of prior distributions and error models. Our probabilistic platform is also very easily extended to incorporate additional information which we demonstrate by including main sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities acquired using structure only. This combined model also provides a natural approach for the difficult task of estimating evolutionary range based on structural alignments. The model is definitely illustrated by comparison with well-established methods on several demanding protein alignment good examples. atoms. The popular DALI [Holm and Sander (1993)] method is an example of this approach. Other techniques are specially personalized Rabbit Polyclonal to ATF-4 (phospho-Ser219). for the large-scale computational demands of rapid searching of large protein databases sometimes utilizing highly redundant representations of the data; these include geometric hashing [Altschul et al. (1990) Fischer et al. (1994) Wallace Laskowsi and Thornton (1996)] graph algorithms [Taylor (2002)] and clustering methods like VAST [Gibrat Madej and Bryant (1996)]. Finally some authors combine these suggestions with additional heuristics to produce faster or more accurate algorithms including CE [Shindyalov and Bourne (1998)] and PROSUP [Lackner et al. (2000)]. Detailed critiques on pairwise structural positioning methods can be found in Brownish Orengo and Taylor (1996) Eidhammer Jonassen and Taylor (2000) and Lemmen and Lengauer (2000). The profusion of methods shows the difficulties involved in carrying out structural alignments: in defining how to measure alignment quality and in computing “best” alignments efficiently. It has been well recorded in the literature that different algorithms can create alignments sharing very few amino acid pairings and are sensitive to both the initial positioning and the specific choice of algorithm guidelines [Gerstein and Levitt (1998) Godzik (1996) Zu-Kang and Sippl (1996)]. Additional complications arise when trying to determine the significance of the producing alignments. Although considerable effort has been devoted to Chenodeoxycholic acid this point and important progress made [Gerstein and Levitt (1998) Levitt and Gerstein (1998) Lipman and Pearson (1985) Mizuguchi and Proceed (1995)] the solutions remain based on heuristics and top bounds Chenodeoxycholic acid that are hard to interpret. Finally all the methods described above approach the structural positioning as an optimization problem finding a single best positioning. However structural comparisons are subject to substantial uncertainties arising from evolutionary divergence populace variability experimental measurement error and protein conformational variability not to mention sensitivity to guidelines of assessment metrics and optimization algorithms. To address these sources of variability approaches based on explicit statistical modeling are desired and the results of structural comparisons require careful analysis to understand the effect of uncertainty. With this paper we develop a Bayesian statistical approach to pairwise protein structure positioning combining techniques from statistical shape analysis [Dryden and Mardia (1998) Kendall et al. (1999) Small (1996)] and Bayesian sequence positioning [Liu and Lawrence (1999) Webb Liu and Lawrence (2002) Zhu Liu and Lawrence (1998)]. This represents one aspect of a general Bayesian framework developed here and elsewhere [Schmidler (2003 2004 and consequently prolonged by Schmidler (2007a 2007 Wang and Schmidler (2008). Green and Mardia (2006) and Dryden Hirst and Melville (2007) individually developed related methods for hierarchical Bayesian positioning of protein active sites rather than whole proteins and for small molecules respectively. However our approach differs in a number of important points: we expose hierarchical priors on the space of alignments that are equivalent Chenodeoxycholic acid to Chenodeoxycholic acid the standard affine gap penalty of classical positioning approaches but allow us.