Genome coverage, literally speaking. The challenge of annotating 200 genomes with 4 million publications

Research output: Contribution to journalArticlepeer-review

Standard

Genome coverage, literally speaking. The challenge of annotating 200 genomes with 4 million publications. / Janssen, Paul; Goldovsky, Leonid; Kunin, Victor; Darzentas, Nikos; Ouzounis, Christos; Benotmane, Rafi (Peer reviewer); Borgermans, Paul (Peer reviewer).

In: EMBO Reports, Vol. 6, No. 5, 01.05.2005, p. 397-399.

Research output: Contribution to journalArticlepeer-review

Author

Janssen, Paul ; Goldovsky, Leonid ; Kunin, Victor ; Darzentas, Nikos ; Ouzounis, Christos ; Benotmane, Rafi ; Borgermans, Paul. / Genome coverage, literally speaking. The challenge of annotating 200 genomes with 4 million publications. In: EMBO Reports. 2005 ; Vol. 6, No. 5. pp. 397-399.

Bibtex - Download

@article{0e440b4a53904b4ead1b2dc57efe12ed,
title = "Genome coverage, literally speaking. The challenge of annotating 200 genomes with 4 million publications",
abstract = "In late 2004, 200 complete genomes had been sequenced and made available to the research community. At the time of writing this viewpoint, that number had further risen to 221 and will have undoubtedly increased again before publication. These genomes, which represent a wide range of species from archaea to human, are a highly valuable knowledge resource for the scientific community. However, the sequencing of a full genome is just the first step in research; it must be followed by the functional characterization of genes and proteins. In this context, it is interesting to see how well represented these sequenced species are in terms of publications. We have thus obtained the number of abstracts published per species and normalized that count by the number of genes in that species to obtain a comparable measure for the number of publications per gene for all completed and published genomes. This simple measure highlights the current knowledge gap between various organisms and could further serve as a guideline for selecting genomes for sequencing projects, high-throughput functional genomics and database annotation efforts.",
keywords = "genomics, genomeprojects, genome literature, text mining, species knowledge index, SKI",
author = "Paul Janssen and Leonid Goldovsky and Victor Kunin and Nikos Darzentas and Christos Ouzounis and Rafi Benotmane and Paul Borgermans",
note = "Score = 10",
year = "2005",
month = may,
day = "1",
doi = "10.1038/sj.embor.7400412",
language = "English",
volume = "6",
pages = "397--399",
journal = "EMBO Reports",
issn = "1469-3178",
publisher = "Wiley - John Wiley & Sons, Ltd",
number = "5",

}

RIS - Download

TY - JOUR

T1 - Genome coverage, literally speaking. The challenge of annotating 200 genomes with 4 million publications

AU - Janssen, Paul

AU - Goldovsky, Leonid

AU - Kunin, Victor

AU - Darzentas, Nikos

AU - Ouzounis, Christos

A2 - Benotmane, Rafi

A2 - Borgermans, Paul

N1 - Score = 10

PY - 2005/5/1

Y1 - 2005/5/1

N2 - In late 2004, 200 complete genomes had been sequenced and made available to the research community. At the time of writing this viewpoint, that number had further risen to 221 and will have undoubtedly increased again before publication. These genomes, which represent a wide range of species from archaea to human, are a highly valuable knowledge resource for the scientific community. However, the sequencing of a full genome is just the first step in research; it must be followed by the functional characterization of genes and proteins. In this context, it is interesting to see how well represented these sequenced species are in terms of publications. We have thus obtained the number of abstracts published per species and normalized that count by the number of genes in that species to obtain a comparable measure for the number of publications per gene for all completed and published genomes. This simple measure highlights the current knowledge gap between various organisms and could further serve as a guideline for selecting genomes for sequencing projects, high-throughput functional genomics and database annotation efforts.

AB - In late 2004, 200 complete genomes had been sequenced and made available to the research community. At the time of writing this viewpoint, that number had further risen to 221 and will have undoubtedly increased again before publication. These genomes, which represent a wide range of species from archaea to human, are a highly valuable knowledge resource for the scientific community. However, the sequencing of a full genome is just the first step in research; it must be followed by the functional characterization of genes and proteins. In this context, it is interesting to see how well represented these sequenced species are in terms of publications. We have thus obtained the number of abstracts published per species and normalized that count by the number of genes in that species to obtain a comparable measure for the number of publications per gene for all completed and published genomes. This simple measure highlights the current knowledge gap between various organisms and could further serve as a guideline for selecting genomes for sequencing projects, high-throughput functional genomics and database annotation efforts.

KW - genomics

KW - genomeprojects

KW - genome literature

KW - text mining

KW - species knowledge index

KW - SKI

UR - http://ecm.sckcen.be/OTCS/llisapi.dll/open/ezp_27237

UR - http://knowledgecentre.sckcen.be/so2/bibref/2869

U2 - 10.1038/sj.embor.7400412

DO - 10.1038/sj.embor.7400412

M3 - Article

VL - 6

SP - 397

EP - 399

JO - EMBO Reports

JF - EMBO Reports

SN - 1469-3178

IS - 5

ER -

ID: 71686