CoGenT++: an extensive and extensible data environment for computational genomics

Research output: Contribution to journalArticle

Authors

  • Paul Janssen
  • Leonid Goldovsky
  • Dag Ahren
  • Benjamin Audit
  • Ildefonso Cases
  • Nikos Darzenta
  • Anton J. Enright
  • Nuria López-Bigas
  • Jose M. Peregrin-Alvarez
  • Mike Smith
  • Sophia Tsoka
  • Victor Kunin
  • Christos Ouzounis

Documents & links

Abstract

Motivation: CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. Description: CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions—AllFuse, putative orthologs—OFAM, protein families—TRIBES, phylogenetic profiles—ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. Conclusion: CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing. Availability: The database and component downloads are accessible at http://cgg.ebi.ac.uk/cogentpp.html

Details

Original languageEnglish
Pages (from-to)3806-3810
JournalBioinformatics
Volume21
Issue number19
DOIs
Publication statusPublished - 1 Oct 2005

Keywords

  • MySQL, CoGenT, genome meta analysis, ProXSim, genome comparison, computational genomics, data mining

ID: 370656