长链非编码核糖核酸(英语:long non-coding RNAs,简称为lncRNA)指的是长于200核苷酸的不编码蛋白质的转录物(Perkel 2013)。该有些武断的界定将长链非编码核糖核酸与较小的调控核糖核酸区分开来,后者如微核糖核酸(miRNAs)、小干扰核糖核酸(siRNAs)、Piwi互作核糖核酸类(piRNAs)、小核仁核糖核酸(snoRNAs)及其它短核糖核酸(Ma 2013)。
近年研究显示人类基因组中的转录只有五分之一与蛋白编码基因有关(Kapranov 2007),这说明至少有较编码核糖核酸序列四倍多的长链非编码核糖核酸。而像FANTOM(哺乳动物cDNA功能注释)等的大规模互补脱氧核糖核酸(cDNA)测序计划揭示了转录的复杂性(Carninci 2005)。FANTOM3计划从约一万个不同的基因座中鉴定出了约三万五千条非编码转录物它们有着与mRNA类类似的特征,包括5'端有帽、受到剪接及多聚腺苷酸化,但只有很小的开放阅读框(ORF)或根本没有(Carninci 2005)。然而长链非编码RNA的丰度是意料之外的,其数目代表的是保守估计的最低值,因为这种方法忽略了许多单独的转录物及非多腺苷酸化的转录物(瓦片阵列数据显示出40%以上的转录物是非多腺苷酸化的)(Cheng 2005)。尽管如此,在这些cDNA文库中明确鉴定非编码RNA类仍是充满挑战的,因为该方法无法区分非编码转录物及蛋白编码转录物。
目前将哺乳动物基因组的全景描绘为:长段的基因间空间将多个转录“焦点”分割开(Carninci 2005)。然而长链非编码核糖核酸正位于这些基因间区段中并由此转录出来,其中大多数是与其它转录物之间呈错综复杂的正义或反义重叠,这些转录物往往包括了蛋白编码基因(Kapranov 2007)。在正义或反义链上的多个不同的编码或非编码转录物共享这些转录焦点中的基因组序列(Birney 2007),使得这些重叠的亚型之间产生复杂的层次结构。例如,8961个cDNA中的3012个曾被FANTOM2计划注释为编码序列中的一段截短序列,但后来又重新被指定为蛋白编码cDNA中的新非编码RNA变异体(Carninci 2005)。尽管编码RNA及非编码RNA的交错排列具备一定的丰度和保守性,并可能意味着它们两者之间具有某些生物学关联性,但仍无法对这些复杂的焦点结构进行简单的评价。
GENCODE共同体已综合性整理及分析了一些人类长链非编码RNA的注释及它们的基因组结构、修饰、细胞定位及组织表达谱(Derrien 2012)。他们的分析结果说明人类长链非编码RNA易形成具有两个外显子的转录物(Derrien 2012)。
如微核糖核酸及小核仁核糖核酸等的众多小型核糖核酸都显现出了跨多物种的保守性(Bentwich 2005)。与之相反,大多数长链非编码核糖核酸则保守性不强,这一点常被引用为其不具备功能的证据(Brosius 2005;Struhl 2007)。然而,如及等经过详细研究的长链非编码RNA,它们的保守性也很差(Nesterova 2001),这意味着非编码RNA类可能受到不同的选择压力(Pang 2006)。mRNA必须保守密码子的正常用法并防止单个长ORF中出现移码突变,然而对长链非编码RNA的选择压力可能只会令其保守其中的较短区域,这些较短区域对于结构或序列特异性相互作用较为关键。因此,我们可见选择压力只会作用于长链非编码RNA转录物的小块区域。仍然要看到:尽管长链非编码RNA总体来说保守性较低,但仍可见许多长链非编码RNA具有较强的保守元件。例如,高度保守的phastCons元件中有19%存在于已知的内含子中,而其它32%存在于未注释的区域之中(Siepel 2005)。此外,人类长链非编码RNA中的具有代表性的一类长链非编码RNA在碱基取代和插入/缺失速率方面显现出较小但显著的降低,这一现象指示了净化选择压力使得转录物的完整性得到保守,这在序列、启动子及剪接三种水平上体现出来(Ponjavic 2007)。
非编码核糖核酸的保守性差可能是近期且快速的适应性选择的结果。例如,非编码核糖核酸较蛋白编码基因可能对进化压力可塑性更强,如或等的许多世系特异性非编码RNA的存在可以证明这一点(Pang 2006)。相对于黑猩猩基因组来说人类基因组中经受近期进化改变的保守区域确实主要存在于非编码区域,其中很多已有详尽描述(Pollard 2006;Pollard 2006)。其中包括一条名为的非编码RNA,该基因在人类中经历了快速的进化变化,且特异性地在人类新皮质的卡哈尔-雷济厄斯氏细胞中特异性表达(Pollard 2006)。现有报道称许多功能已确定的RNA进化速率也很快(Pang 2006;Smith 2004),这可能由于这些序列受到结构-功能约束时表现得更灵活,我们可以期待在这些序列中发现新的进化方式。人类基因组中有数千条序列的一级序列保守性较差,但有证据显示它们RNA二级结构却存在着保守性(Torarinsson 2006;Torarinsson 2008),这支持了上述论点。
cDNA文库的大规模测序及更先进的基于下一代测序的转录组测序表明哺乳动物中长链非编码核糖核酸的数量大约是几万条。然而,虽然越来越多的证据提示大多数长链非编码核糖核酸具有功能(Mercer 2009;Dinger 2009),但相对只有一小部分已被证明有生物学重大意义。截至2012年十二月,约有127条长链非编码RNA在LncRNAdb(一个描述长链非编码RNA的文献数据库)中有功能注释(Amral 2011)。
RNA转录在真核生物中是一个受到严密调控的过程。非编码RNA可以靶向该进程的多个方面,包括靶向转录激活因子或转录抑制因子、如RNA聚合酶(RNAP)Ⅱ等转录反应中的各组分、甚至是DNA双螺旋结构,以达到调控基因转录及表达的目的(Goodrich 2006)。非编码RNA将这些机制结合在一起可以组成为一个包括转录因子在内的调控网络,可以精细地调控复杂真核生物的基因表达。
非编码RNA通过多种不同的机制调节转录因子的功能,包括充当共调控因子的角色、修饰转录因子的活性或是调控共调控因子的活性。例如,非编码RNA Evf-2作为同源异形框转录因子Dlx2的共激活因子,Dlx2在前脑发育及神经发生中起到重要作用(Feng 2006;Panganiban 2002)。Evf-2转录自位于与基因之间的超保守元件,音猬因子在前脑发育过程中诱导该长链的转录(Feng 2006)。Evf-2接着将Dlx2转录因子招募到同一个超保守元件处,Dlx2在此处诱导的表达。哺乳动物基因组中存在其它一些可转录且执行增强子功能的超级保守或高度保守元件,这提示Evf-2可作为范例阐述脊椎动物生长过程中以复杂表达的形式严密调控重要发育基因的普遍机制(Pennacchio 2006;Visel 2008)。近期研究也确实发现与之类似的其它非编码超保守元件的转录及表达在人类白血病中出现异常,且促进结肠癌细胞的凋亡,这提示了它们涉及到肿瘤形成(Calin 2007)。
局部的非编码RNA类可以招募转录机制对附近蛋白编码基因的转录加以调控。TLS(英语:translocated in liposarcoma)是一种结合RNA的蛋白,它结合到CREB结合蛋白和组蛋白乙酰基转移酶p300上并抑制这两者在靶基因周期蛋白D1上的活性,从而起到抑制后者的作用。作为DNA受损信号的响应,长链非编码RNA以低水平表达出来并拴在周期蛋白D1基因的5'调控区域上,这指导了TLS招募到周期蛋白D1启动子上(Wang 2008)。除此之外,这些局部的非编码RNA作为配体调控TLS的活性。从更广泛的层面上说,这一机制使得细胞可以利用RNA结合蛋白(它们组成了哺乳动物蛋白质组中的最庞大的种类之一)并在转录程序控制中整合它们的功能。
在X染色体失活的情况下一些基因仍可以转录,近期对逃避染色体失活控制的染色体区域进行研究,发现其中表达的长链非编码RNA可能介导了这一过程(Reinius 2010)。
非编码RNA还可以靶向通用转录因子,后者是RNAPⅡ转录所有基因所必需的(Goodrich 2006)。这些通用因子包括了起始复合体中组装在启动子上或涉及转录延伸的部件。转录自二氢叶酸还原酶(DHFR)基因上游次要启动子的一条非编码RNA进入DHFR主要启动子,形成稳定的RNA-DNA三股螺旋以阻止转录辅因子TFⅡB结合到其上(Martianov 2007)。已知真核染色体上存在着数千个三股螺旋(Lee 1987),这一调控基因表达的新机制可能事实上代表这些三股螺旋在控制启动子上起到的广泛作用。U1非编码RNA通过结合到TFⅡH上并刺激其对RNAPⅡ的C-端以实现诱导转录起始(Kwek 2002)。相反,非编码RNA 7SK可通过下列方式起到抑制转录延伸的作用:7SK首先与HEXIM1/2结合,形成抑制性复合物,该复合物阻止PTEFb通用转录因子去磷酸化RNAPⅡ的C-端结构域(Kwek 2002;Yang 2001;Yik 2003),当细胞处于应激状况下可以抑制全局延伸。这些例子中的机制可以绕开单个启动子上特异性的调控模式,介导起始及延伸转录机器工作水平发生直接改变,提供了迅速影响基因表达全局改变的方法。
现也证明非编码重复序列有着介导全局调控的能力。人类的短散在核内(SINE)Alu元件及小鼠中同源的B1和B2元件是基因组中丰度最高的可移动性元件,分别组成了人类和小鼠基因组的约10%和约6%(Lander 2001;Waterston 2002)。在如热休克等环境应激情况下这些元件被RNAPⅢ转录为非编码RNA(Liu 1995),后者接下来会以高亲和度的方式与RNAPⅡ结合并阻止其形成为有活性的前起始复合物(Allen 2004;Espinoza 2004;Espinoza 2007;Mariner & Walters 2008)。这使得在响应应激的情况下可以大范围并迅速抑制基因的表达(Allen 2004;Mariner & Walters 2008)。
对Alu元件的RNA转录物中的功能序列进行分析后,发现其亦有类似于蛋白质转录因子中结构域的模块化结构(Shamovsky 2008)。Alu元件RNA包括两个“臂”,每个臂都可以结合到一个RNAPⅡ分子上;体外实验表明该RNA还具有两个调控结构域,起到抑制RNAPⅡ转录活性的作用(Mariner 2008)。 These two loosely-structured domains may even be concatenated to other ncRNAs such as B1 elements to impart their repressive role (Mariner & Walters 2008). The abundance and distribution of Alu elements and similar repetitive elements throughout the mammalian genome may be partly due to these functional domains being co-opted into other long ncRNAs during evolution, with the presence of functional repeat sequence domains being a common characteristic of several known long ncRNAs including Kcnq1ot1, Xlsirt and Xist (Mattick 2003; Mohammad 2008; Wutz 2002; Zearfoss 2003).
除了热休克外,如病毒感染、, the expression of SINE elements (including Alu, B1, and B2 RNAs) increases during cellular stress such as viral infection (Singh 1985) in some cancer cells (Tang 2005) where they may similarly regulate global changes to gene expression. The ability of Alu and B2 RNA to bind directly to RNAP II provides a broad mechanism to repress transcription (Espinoza 2004; Mariner & Walters 2008). Nevertheless, there are specific exceptions to this global response where Alu or B2 RNAs are not found at activated promoters of genes undergoing induction, such as the heat shock genes (Mariner & Walters 2008). This additional hierarchy of regulation that exempts individual genes from the generalised repression also involves a long ncRNA, heat shock RNA-1 (HSR-1). It was argued that HSR-1 is present in all cells in an inactive state, but upon stress is activated to induce the expression of heat shock genes (Shamovsky 2006). The authors found that this activation involves a conformational alteration to the structure of HSR-1 in response to rising temperatures, thereby permitting its interaction with the transcriptional activator HSF-1 that subsequently undergoes trimerisation and induces the expression of heat shock genes (Shamovsky 2006). In the broad sense, these examples illustrate a regulatory circuit nested witin ncRNAs whereby Alu or B2 RNAs repress general gene expression, while other ncRNAs activate the expression of specific genes.
除了在转录水平上调控,ncRNAs 也在转录后水平调控mRNA加工的不同方面。与小调控RNAs,例如微小RNAs和小核仁RNAs,类似,ncRNAs 的功能包括与目标mRNA进行互补碱基配对。互补ncRNA和mRNA形成的RNA双链可能为需要结合反式作用因子的mRNA募集关键因子,可能影响转录后水平基因表达的每一步,包括前体mRNA加工,剪接,运输,翻译以及降解。
The splicing of mRNA can induce its translation and functionally diversify the repertoire of proteins it encodes. The Zeb2 mRNA, which has a particularly long 5’UTR, requires the retention of a 5’UTR intron that contains an internal ribosome entry site for efficient translation (Beltran 2008). However, retention of the intron is dependent on the expression of an antisense transcript that complements the intronic 5’ splice site (Beltran 2008). Therefore, the ectopic expression of the antisense transcript represses splicing and induces translation of the Zeb2 mRNA during mesenchymal development. Likewise, the expression of an overlapping antisense Rev-ErbAα2 transcript controls the alternative splicing of the thyroid hormone receptor ErbAα2 mRNA to form two antagonistic isoforms (Munroe 1991).
NcRNA may also apply additional regulatory pressures during translation, a property particularly exploited in neurons where the dendritic or axonal translation of mRNA in response to synaptic activity contributes to changes in synaptic plasticity and the remodelling of neuronal networks. The RNAP III transcribed BC1 and BC200 ncRNAs, that previously derived from tRNAs, are expressed in the mouse and human central nervous system, respectively (Tiedge 1993; Tiedge 1991). BC1 expression is induced in response to synaptic activity and synaptogenesis and is specifically targeted to dendrites in neurons (Muslimov 1998). Sequence complementarity between BC1 and regions of various neuron-specific mRNAs also suggest a role for BC1 in targeted translational repression (Wang 2005). Indeed it was recently shown that BC1 is associated with translational repression in dendrites to control the efficiency of dopamine D2 receptor-mediated transmission in the striatum (Centonze 2007) and BC1 RNA-deleted mice exhibit behavioural changes with reduced exploration and increased anxiety (Lewejohann 2004).
In addition to masking key elements within single-stranded RNA, the formation of double-stranded RNA duplexes can also provide a substrate for the generation of endogenous siRNAs (endo-siRNAs) in Drosophila and mouse oocytes (Golden 2008). The annealing of complementary sequences, such as antisense or repetitive regions between transcripts, forms an RNA duplex that may be processed by Dicer-2 into endo-siRNAs. Also, long ncRNAs that form extended intramolecular hairpins may be processed into siRNAs, compellingly illustrated by the esi-1 and esi-2 transcripts (Czech 2008). Endo-siRNAs generated from these transcripts seem particularly useful in suppressing the spread of mobile transposon elements within the genome in the germline. However, the generation of endo-siRNAs from antisense transcripts or pseudogenes may also silence the expression of their functional counterparts via RISC effector complexes, acting as an important node that integrates various modes of long and short RNA regulation, as exemplified by the Xist and Tsix (see above) (Ogawa 2008).
包括组蛋白和DNA甲基化、组蛋白乙酰化和SUMO化等在内的表观遗传修饰影响了染色体生物学的众多方面,主要包括通过对广大染色质区域进行重塑从而调控大量基因(Kiefer 2007;Mikkelsen 2007)。一段时间以来,RNA作为染色质的有机组成部分已被人知晓(Nickerson 1989;Rodriguez-Campos 2007),但现在我们才开始认识到RNA在涉及到染色质修饰通路上的意义(Chen 2008;Rinn 2007;Sanchez-Elsner 2006)。
果蝇属中的长链非编码RNA类通过将三胸蛋白Ash1募集到同源异形调控元件并指导其发挥染色质修饰作用的方式诱导同源异形基因的表达(Sanchez-Elsner 2006)。后来发现哺乳动物中也存在着相似的调控模式:认为强大的表观遗传机制奠定了胚胎同源异形基因家族的表达谱,而同源异形基因家族是贯穿整个人体发育过程中持续发挥作用的重要因子(Mazo 2007;Rinn 2007)。人类同源异形基因家族确实与数百个非编码RNA之间有着相关性,这些非编码RNA在人体发育的时空轴上按顺序表达,这些非编码RNA也定义染色质各区域中组蛋白甲基化程度的差异以及RNA聚合酶可进入染色质的程度(Rinn 2007)。其中一条名为HOTAIR的转录自基因座的非编码RNA通过改变组蛋白三甲基化状态从而使基因座中长约40kb的区域发生转录沉默。目前认为HOTAIR执行的作用机制是:多梳染色质重塑复合物具有操纵细胞表观遗传状态的功能,而HOTAIR以反式调控的方式指导该功能的发挥并继而影响基因的表达。多梳复合物中的成员包括SUZ12、EZH2和EED等,它们具有RNA结合结构域并可能结合HOTAIR及其它类似的非编码RNA类(Denisenko 1998;Katayama 2005)。该例子极好地描绘出了这样一个更广泛的主题:非编码RNA类招募一系列染色质修饰蛋白到特定基因组基因座上并发挥功能,这更加突出了目前所绘制基因组图谱的复杂性(Mikkelsen 2007)。发育时期中调控基因表达的染色质修饰有着区域化的模式,大量长链非编码与蛋白编码基因的联系确实帮助塑造了这种模式。例如,大多数蛋白编码基因都具有配对的反义基因,许多抑癌基因在癌症中常受到沉默,一些反义基因使用表观遗传机制使这些抑癌基因沉默(Yu 2008)。近期研究发现:在白血病中基因和一条反义非编码RNA的表达此消彼长(Yu 2008)。经过详细分析发现:的反义非编码RNA(CDKN2BAS)可通过一种未知机制诱导的异染色质和DNA甲基化状态发生改变,因而调控了基因的表达(Yu 2008)。因此,相关的反义非编码RNA类表达发生异常可能继而沉默了抑癌基因,从而走向癌症发生。
最近非编码RNA指导的染色质修饰主题最初是从基因组印记的现象中引出的,基因组印记是仅从母系或父系染色体两者中的一个表达出等位基因的现象。一般来说,印记基因是呈簇状排列于染色体上,这提示:印记的机制是作用于局部的染色质区域上而不是针对单个基因。这些基因簇常常与长链非编码RNA相关:长链非编码RNA的表达量与在相同等位上相连锁的蛋白编码基因受到抑制的程度呈正相关(Pauler 2007)。详细分析确实显示出非编码RNA 和在指导基因印记上发挥着重要作用(Braidotti 2004)。
几乎所有位于基因座Almost all the genes at the Kcnq1 loci are maternally inherited, except the paternally expressed antisense ncRNA Kcnqot1 (Mitsuya 1999). Transgenic mice with truncated Kcnq1ot fail to silence the adjacent genes, suggesting that Kcnqot1 is crucial to the imprinting of genes on the paternal chromosome (Mancini-Dinardo 2006). It appears that Kcnqot1 is able to direct the trimethylation of lysine 9 (H3K9me3) and 27 of histone 3 (H3K27me3) to an imprinting centre that overlaps the Kcnqot1 promoter and actually resides within a Kcnq1 sense exon (Umlauf 2004). Similar to HOTAIR (see above), Eed-Ezh2 Polycomb complexes are recruited to the Kcnq1 loci paternal chromosome, possibly by Kcnqot1, where they may mediate gene silencing through repressive histone methylation (Umlauf 2004). A differentially methylated imprinting centre also overlaps the promoter of a long antisense ncRNA Air that is responsible for the silencing of neighbouring genes at the Igf2r locus on the paternal chromosome (Sleutels 2002; Zwart 2001). The presence of allele-specific histone methylation at the Igf2r locus suggests Air also mediates silencing via chromatin modification (Fournier 2002).
The inactivation of a X-chromosome in female placental mammals is directed by one of the earliest and best characterized long ncRNAs, Xist (Wutz 2007). The expression of Xist from the future inactive X-chromosome, and its subsequent coating of the inactive X-chromosome, occurs during early embryonic stem cell differentiation. Xist expression is followed by irreversible layers of chromatin modifications that include the loss of the histone (H3K9) acetylation and H3K4 methylation that are associated with active chromatin, and the induction of repressive chromatin modifications including H4 hypoacetylation, H3K27 trimethylation (Wutz 2007), H3K9 hypermethylation and H4K20 monomethylation as well as H2AK119 monoubiquitylation. These modifications coincide with the transcriptional silencing of the X-linked genes (Morey 2004). Xist RNA also localises the histone variant macroH2A to the inactive X–chromosome (Costanzi 1998). There are additional ncRNAs that are also present at the Xist loci, including an antisense transcript Tsix, which is expressed from the future active chromosome and able to repress Xist expression by the generation of endogenous siRNA (Ogawa 2008). Together these ncRNAs ensure that only one X-chromosome is active in female mammals.
Telomeres form the terminal region of mammalian chromosomes and are essential for stability and aging and play central roles in diseases such as cancer (Blasco 2007). Telomeres have been long considered transcriptionally inert DNA-protein complexes until it was recently shown that telomeric repeats may be transcribed as telomeric RNAs (TelRNAs) (Schoeftner 2008) or telomeric repeat-containing RNAs (Azzalin 2007). These ncRNAs are heterogeneous in length, transcribed from several sub-telomeric loci and physically localise to telomeres. Their association with chromatin, which suggests an involvement in regulating telomere specific heterochromatin modifications, is repressed by SMG proteins that protect chromosome ends from telomere loss (Azzalin 2007). In addition, TelRNAs block telomerase activity in vitro and may therefore regulate telomerase activity (Schoeftner 2008). Although early, these studies suggest an involvement for telomeric ncRNAs in various aspects of telomere biology.