Evolutionary Bioinformatics Suite

The explosion of genomic sequence data over the past decade has created the need for tools to better connect the sequences of biological macromolecules to their fold, behavior and function. While convential annotation methods are sometimes helpful, they are insufficient to convey fully the information that is contained within genome sequence databases. It is also clear that the simplest evolution-based tools, often referred to as "comparative genomics", do not resolve many common annotation problems.

To overcome these deficiencies, the Foundation is developing a software suite (in collaboration with a molecular modeling company, Hypercube, Inc.) to examine genomic sequences using more sophisticated models within the context of their evolutionary history, with the expectation that more detailed insights can be gained about protein fold, behavior and function. This examination has come to be known as "phylogenomics". When defining phylogenomics, Jonathan Eisen observed that genomics, even in the version known as comparative genomics, had lagged behind other biological disciplines in exploiting the insights that are offered by the vast experiment that constitutes the three billion years of life on Earth.

Phylogenomics seeks to rectify this through more complex analysis of evolutionary patterns. A phylogenomic analysis begins by identifying homologous genes, those related by common ancestry. It then seeks to place an evolutionary analysis of the family within a species context, identifying ortholog and paralog relationships. The analysis then examines patterns of amino acid replacement, the tempo of amino acid replacements, and the temporal sequence of events in gene evolution. The farther a phylogenomics analysis proceeds, the more information that can be extracted about fold, behavior, and function from the natural history of protein families.

Phylogenomic analyses are not limited to primary sequence data, of course. Similar principles can be extended to structures, pathways, and expression patterns. Phylogenomics has also been used to find key regulatory elements in non-coding genomic regions and delineate specificity determinants in proteins. More broadly, phylogenomic-types of analyses are offering fresh viewpoints in immunology, physiology, neurosciences, mental disease, and 'Darwinian medicine', which places human health and disease within an evolutionary perspective.

Phylogenomics can have practical implications for the pharmaceutical industry. For example, in the August 2003 issue of Nature Drug Discovery, David Searls, Head of the Bioinformatics Division of GlaxoSmithKline, discussed the concept of pharmacophylogenomics. Analysis of paralogs has been explored as a way to understand pathophysiology, seeking ways to move from biologically interesting but problematic targets to more tractable, druggable ones. The Foundation has been active in applied phylogenomics for several years. Indeed, Searls cites several papers that have emerged from the Foundation, and the laboratories of its consultants and collaborators, as important to pharmacophylogenomics. Some phylogenomic tools invented/developed at the Foundation, or in the laboratories of its consultants/collaborators, are currently being integrated into the HyperProtein bioinformatics suite in collaboration with Hypercube, Inc.