A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads

Research output: Contribution to journalArticle

Standard

A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads. / Górzcak, Katarzyna; Claesen, Jürgen; Burzykowski, Tomasz.

In: Journal of Computational Biology, Vol. 27, No. 0, 31.12.2019, p. 1-16.

Research output: Contribution to journalArticle

Harvard

Górzcak, K, Claesen, J & Burzykowski, T 2019, 'A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads' Journal of Computational Biology, vol 27, no. 0, pp. 1-16. DOI: 10.1089/cmb.2019.0272

Vancouver

Górzcak K, Claesen J, Burzykowski T. A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads. Journal of Computational Biology. 2019 Dec 31;27(0):1-16. Available from, DOI: 10.1089/cmb.2019.0272

Author

Górzcak, Katarzyna; Claesen, Jürgen; Burzykowski, Tomasz / A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads.

In: Journal of Computational Biology, Vol. 27, No. 0, 31.12.2019, p. 1-16.

Research output: Contribution to journalArticle

Bibtex - Download

@article{cd79e44629ae498990a2ae965c9df079,
title = "A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads",
keywords = "Abundance estimation, multireads, next-generation sequencing",
author = "Katarzyna Górzcak and Jürgen Claesen and Tomasz Burzykowski",
note = "Score=10",
year = "2019",
month = "12",
doi = "10.1089/cmb.2019.0272",
volume = "27",
pages = "1--16",
journal = "Journal of Computational Biology",
number = "0",

}

RIS - Download

TY - JOUR

T1 - A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads

AU - Górzcak,Katarzyna

AU - Claesen,Jürgen

AU - Burzykowski,Tomasz

N1 - Score=10

PY - 2019/12/31

Y1 - 2019/12/31

N2 - RNA sequencing (RNA-seq) is widely used to study gene-, transcript-, or exon expression. To quantify the expression level, millions of short sequenced reads need to be mapped back to a reference genome or transcriptome. Read mapping makes it possible to find a location to which a read is identical or similar. Based upon this alignment, expression summaries, that is, read counts are generated. However, reads may be matched to multiple locations. Such ambiguously mapped reads are often ignored in the analysis, which is a potential loss of information and may cause bias in expression estimation. We present the general principles underlying multiread allocation and unbiased estimation of the expression level of genes, exons, or transcripts in the presence of multiple mapped reads. The underlying principles are derived from a theoretical concept that identifies important sources of information such as the number of uniquely mapped reads, the total target length, and the length of the shared target regions. We show with simulation studies that methods incorporating some or all of the aforementioned sources of information estimate the expression levels of genes, exons, and/or transcripts with a higher precision and accuracy than methods that do not use this information. We identify important sources of information that should be taken into account by methods that estimate the abundance of genes, exons, and/or transcripts to achieve good precision and accuracy.

AB - RNA sequencing (RNA-seq) is widely used to study gene-, transcript-, or exon expression. To quantify the expression level, millions of short sequenced reads need to be mapped back to a reference genome or transcriptome. Read mapping makes it possible to find a location to which a read is identical or similar. Based upon this alignment, expression summaries, that is, read counts are generated. However, reads may be matched to multiple locations. Such ambiguously mapped reads are often ignored in the analysis, which is a potential loss of information and may cause bias in expression estimation. We present the general principles underlying multiread allocation and unbiased estimation of the expression level of genes, exons, or transcripts in the presence of multiple mapped reads. The underlying principles are derived from a theoretical concept that identifies important sources of information such as the number of uniquely mapped reads, the total target length, and the length of the shared target regions. We show with simulation studies that methods incorporating some or all of the aforementioned sources of information estimate the expression levels of genes, exons, and/or transcripts with a higher precision and accuracy than methods that do not use this information. We identify important sources of information that should be taken into account by methods that estimate the abundance of genes, exons, and/or transcripts to achieve good precision and accuracy.

KW - Abundance estimation

KW - multireads

KW - next-generation sequencing

UR - http://ecm.sckcen.be/OTCS/llisapi.dll/open/36889787

U2 - 10.1089/cmb.2019.0272

DO - 10.1089/cmb.2019.0272

M3 - Article

VL - 27

SP - 1

EP - 16

JO - Journal of Computational Biology

T2 - Journal of Computational Biology

JF - Journal of Computational Biology

IS - 0

ER -

ID: 5961475