中国科学院理论物理研究所机构知识库
Advanced  
ITP OpenIR  > 理论物理所2016年知识产出  > 期刊论文
题名: The bulk and the tail of minimal absent words in genome sequences
作者: Aurell, E ;  Innocenti, N ;  Zhou, HJ
刊名: PHYSICAL BIOLOGY
出版日期: 2016
卷号: 13, 期号:2, 页码:26004
关键词: minimal absent words ;  copy-mutation evolution model ;  random sequence
学科分类: Biochemistry & Molecular Biology; Biophysics
DOI: http://dx.doi.org/10.1088/1478-3975/13/2/026004
通讯作者: Innocenti, N (reprint author), Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel.
文章类型: Article
英文摘要: Minimal absent words (MAW) of a genomic sequence are subsequences that are absent themselves but the subwords of which are all present in the sequence. The characteristic distribution of genomic MAWs as a function of their length has been observed to be qualitatively similar for all living organisms, the bulk being rather short, and only relatively few being long. It has been an open issue whether the reason behind this phenomenon is statistical or reflects a biological mechanism, and what biological information is contained in absent words. In this work we demonstrate that the bulk can be described by a probabilistic model of sampling words from random sequences, while the tail of long MAWs is of biological origin. We introduce the concept of a core of a MAW, which are sequences present in the genome and closest to a given MAW. We show that in E. faecalis, E. coli and yeast the cores of the longest MAWs, which exist in two or more copies, are located in highly conserved regions the most prominent example being ribosomal RNAs. We also show that while the distribution of the cores of long MAWs is roughly uniform over these genomes on a coarse-grained level, on a more detailed level it is strongly enhanced in 3' untranslated regions (UTRs) and, to a lesser extent, also in 5' UTRs. This indicates that MAWs and associated MAW cores correspond to fine-tuned evolutionary relationships, and suggest that they can be more widely used as markers for genomic complexity.
类目[WOS]: Biochemistry & Molecular Biology ;  Biophysics
关键词[WOS]: COMMUNITY RECONSTRUCTION ;  EFFICIENT COMPUTATION ;  SPONTANEOUS MUTATION ;  BACTERIA ;  PHYLOGENY ;  MATTER
收录类别: SCI
项目资助者: Swedish Science Council [621-2012-2982] ;  Academy of Finland through its Center of Excellence COIN ;  Natural Science Foundation of China [11225526]
语种: 英语
Citation statistics: 
内容类型: 期刊论文
URI标识: http://ir.itp.ac.cn/handle/311006/21704
Appears in Collections:理论物理所2016年知识产出_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
The bulk and the tail of minimal absent words in genome sequences - Aurell, Innocenti, Zhou - 2016.pdf(1879KB)----开放获取View Download

Recommended Citation:
Aurell, E,Innocenti, N,Zhou, HJ. The bulk and the tail of minimal absent words in genome sequences[J]. PHYSICAL BIOLOGY,2016,13(2):26004.
Service
 Recommend this item
 Sava as my favorate item
 Show this item's statistics
 Export Endnote File
Google Scholar
 Similar articles in Google Scholar
 [Aurell, E]'s Articles
 [Innocenti, N]'s Articles
 [Zhou, HJ]'s Articles
CSDL cross search
 Similar articles in CSDL Cross Search
 [Aurell, E]‘s Articles
 [Innocenti, N]‘s Articles
 [Zhou, HJ]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
  Add to CiteULike  Add to Connotea  Add to Del.icio.us  Add to Digg  Add to Reddit 
文件名: The bulk and the tail of minimal absent words in genome sequences - Aurell, Innocenti, Zhou - 2016.pdf
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院理论物理研究所 - Feedback
Powered by CSpace