ITP OpenIR  > SCI期刊论文
Aurell, E; Innocenti, N; Zhou, HJ; Innocenti, N (reprint author), Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel.
The bulk and the tail of minimal absent words in genome sequences
Source PublicationPHYSICAL BIOLOGY
Language英语
KeywordMinimal Absent Words Copy-mutation Evolution Model Random Sequence
AbstractMinimal absent words (MAW) of a genomic sequence are subsequences that are absent themselves but the subwords of which are all present in the sequence. The characteristic distribution of genomic MAWs as a function of their length has been observed to be qualitatively similar for all living organisms, the bulk being rather short, and only relatively few being long. It has been an open issue whether the reason behind this phenomenon is statistical or reflects a biological mechanism, and what biological information is contained in absent words. In this work we demonstrate that the bulk can be described by a probabilistic model of sampling words from random sequences, while the tail of long MAWs is of biological origin. We introduce the concept of a core of a MAW, which are sequences present in the genome and closest to a given MAW. We show that in E. faecalis, E. coli and yeast the cores of the longest MAWs, which exist in two or more copies, are located in highly conserved regions the most prominent example being ribosomal RNAs. We also show that while the distribution of the cores of long MAWs is roughly uniform over these genomes on a coarse-grained level, on a more detailed level it is strongly enhanced in 3' untranslated regions (UTRs) and, to a lesser extent, also in 5' UTRs. This indicates that MAWs and associated MAW cores correspond to fine-tuned evolutionary relationships, and suggest that they can be more widely used as markers for genomic complexity.
2016
Volume13Issue:2Pages:26004
Subject AreaBiochemistry & Molecular Biology ; Biophysics
DOIhttp://dx.doi.org/10.1088/1478-3975/13/2/026004
Indexed BySCI
Funding OrganizationSwedish Science Council [621-2012-2982] ; Swedish Science Council [621-2012-2982] ; Swedish Science Council [621-2012-2982] ; Swedish Science Council [621-2012-2982] ; Academy of Finland through its Center of Excellence COIN ; Academy of Finland through its Center of Excellence COIN ; Academy of Finland through its Center of Excellence COIN ; Academy of Finland through its Center of Excellence COIN ; Natural Science Foundation of China [11225526] ; Natural Science Foundation of China [11225526] ; Natural Science Foundation of China [11225526] ; Natural Science Foundation of China [11225526]
Citation statistics
Document Type期刊论文
Identifierhttp://ir.itp.ac.cn/handle/311006/21704
CollectionSCI期刊论文
Corresponding AuthorInnocenti, N (reprint author), Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel.
Recommended Citation
GB/T 7714
Aurell, E,Innocenti, N,Zhou, HJ,et al. The bulk and the tail of minimal absent words in genome sequences[J]. PHYSICAL BIOLOGY,2016,13(2):26004.
APA Aurell, E,Innocenti, N,Zhou, HJ,&Innocenti, N .(2016).The bulk and the tail of minimal absent words in genome sequences.PHYSICAL BIOLOGY,13(2),26004.
MLA Aurell, E,et al."The bulk and the tail of minimal absent words in genome sequences".PHYSICAL BIOLOGY 13.2(2016):26004.
Files in This Item:
File Name/Size DocType Version Access License
The bulk and the tai(1879KB) 开放获取--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Aurell, E]'s Articles
[Innocenti, N]'s Articles
[Zhou, HJ]'s Articles
Baidu academic
Similar articles in Baidu academic
[Aurell, E]'s Articles
[Innocenti, N]'s Articles
[Zhou, HJ]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Aurell, E]'s Articles
[Innocenti, N]'s Articles
[Zhou, HJ]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.