Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Tytuł pozycji:

Data-based RNA-seq simulations by binomial thinning.

Tytuł:
Data-based RNA-seq simulations by binomial thinning.
Autorzy:
Gerard D; Department of Mathematics and Statistics, American University, Massachusetts Ave NW, Washington, DC, 20016, USA. .
Źródło:
BMC bioinformatics [BMC Bioinformatics] 2020 May 24; Vol. 21 (1), pp. 206. Date of Electronic Publication: 2020 May 24.
Typ publikacji:
Journal Article
Język:
English
Imprint Name(s):
Original Publication: [London] : BioMed Central, 2000-
MeSH Terms:
Computer Simulation*
Databases, Genetic*
RNA-Seq*
Algorithms ; Gene Expression Profiling ; Humans ; Principal Component Analysis ; Software ; Exome Sequencing
References:
PLoS One. 2013 Jul 18;8(7):e68141. (PMID: 23874524)
Nat Methods. 2014 Jul;11(7):740-2. (PMID: 24836921)
Genome Biol. 2010;11(3):R25. (PMID: 20196867)
Methods. 2015 Sep 1;85:54-61. (PMID: 26142758)
Genome Biol. 2010;11(8):R83. (PMID: 20701754)
Proc Natl Acad Sci U S A. 2008 Dec 2;105(48):18718-23. (PMID: 19033188)
BMC Bioinformatics. 2011 Dec 17;12:480. (PMID: 22177264)
Genome Biol. 2017 Sep 12;18(1):174. (PMID: 28899397)
Bioinformatics. 2013 Apr 15;29(8):1026-34. (PMID: 23419377)
Genome Biol. 2018 Feb 26;19(1):24. (PMID: 29478411)
Biometrics. 2019 Jun;75(2):650-662. (PMID: 30430537)
BMC Bioinformatics. 2015 Nov 04;16:361. (PMID: 26538400)
Nature. 1999 Oct 21;401(6755):788-91. (PMID: 10548103)
BMC Bioinformatics. 2010 Feb 18;11:94. (PMID: 20167110)
Nucleic Acids Res. 2012 May;40(10):4288-97. (PMID: 22287627)
Biometrika. 2017 Jun;104(2):303-316. (PMID: 29430031)
Biostatistics. 2008 Apr;9(2):321-32. (PMID: 17728317)
Methods. 2018 Aug 1;145:25-32. (PMID: 29702224)
BMC Genomics. 2016 Jan 25;17:78. (PMID: 26810311)
Genome Biol. 2014;15(12):550. (PMID: 25516281)
Sci Rep. 2017 Oct 19;7(1):13587. (PMID: 29051597)
Multivariate Behav Res. 1992 Oct 1;27(4):509-40. (PMID: 26811132)
Nat Biotechnol. 2015 Feb;33(2):155-60. (PMID: 25599176)
Biostatistics. 2013 Jan;14(1):113-28. (PMID: 22988280)
Nat Methods. 2017 Mar;14(3):309-315. (PMID: 28114287)
Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70. (PMID: 20810919)
Am J Bot. 2012 Feb;99(2):248-56. (PMID: 22268221)
Stat Sin. 2021 Jul;31(3):1145-1166. (PMID: 38148787)
PLoS Comput Biol. 2010 May 06;6(5):e1000770. (PMID: 20463871)
Bioinformatics. 2014 Dec 1;30(23):3424-6. (PMID: 25189781)
Bioinformatics. 2008 Jan 15;24(2):192-201. (PMID: 18042553)
Bioinformatics. 2015 Jul 1;31(13):2131-40. (PMID: 25725090)
Nat Genet. 2004 Sep;36(9):943-7. (PMID: 15340433)
J Am Stat Assoc. 2008 Dec 1;103(484):1438-1456. (PMID: 21218139)
Nucleic Acids Res. 2014 Dec 1;42(21):. (PMID: 25294822)
PLoS Genet. 2008 Jun 20;4(6):e1000098. (PMID: 18566659)
Biometrika. 2019 Dec;106(4):823-840. (PMID: 31754283)
Neural Netw. 2000 May-Jun;13(4-5):411-30. (PMID: 10946390)
Nature. 2017 Oct 11;550(7675):204-213. (PMID: 29022597)
Nat Genet. 2010 Apr;42(4):348-54. (PMID: 20208533)
Biostatistics. 2017 Apr 1;18(2):275-294. (PMID: 27756721)
Genet Epidemiol. 2009 Jan;33(1):79-86. (PMID: 18642345)
BMC Bioinformatics. 2019 Jan 18;20(1):40. (PMID: 30658573)
Bioinformatics. 2016 Feb 15;32(4):533-41. (PMID: 26515818)
PLoS Comput Biol. 2012 Jan;8(1):e1002330. (PMID: 22241974)
Genome Biol. 2010;11(10):R106. (PMID: 20979621)
PLoS Genet. 2010 Sep 16;6(9):e1001117. (PMID: 20862358)
Bioinformatics. 2017 Nov 01;33(21):3486-3488. (PMID: 29036287)
Nat Rev Genet. 2009 Jan;10(1):57-63. (PMID: 19015660)
Bioinformatics. 2010 Jan 1;26(1):139-40. (PMID: 19910308)
J Comput Graph Stat. 2009 Jun 1;18(2):306-320. (PMID: 23997568)
Ann Stat. 2017 Oct;45(5):1863-1894. (PMID: 31439967)
Nat Protoc. 2012 Feb 16;7(3):500-7. (PMID: 22343431)
BMC Bioinformatics. 2016 Feb 29;17:110. (PMID: 26927822)
Genome Biol. 2014 Feb 03;15(2):R29. (PMID: 24485249)
Nat Commun. 2018 Jan 18;9(1):284. (PMID: 29348443)
PLoS One. 2017 Dec 21;12(12):e0190152. (PMID: 29267363)
Genetics. 2008 Mar;178(3):1709-23. (PMID: 18385116)
BMC Bioinformatics. 2010 Aug 10;11:422. (PMID: 20698981)
J Comput Biol. 2010 Oct;17(10):1385-95. (PMID: 20976876)
Nat Biotechnol. 2020 Feb;38(2):147-150. (PMID: 31937974)
Genetics. 2008 Dec;180(4):1909-25. (PMID: 18791227)
BMC Bioinformatics. 2014 Apr 26;15:116. (PMID: 24766777)
Biostatistics. 2012 Jul;13(3):539-52. (PMID: 22101192)
Exp Mol Med. 2018 Aug 7;50(8):1-14. (PMID: 30089861)
Front Genet. 2013 Sep 17;4:178. (PMID: 24062766)
J Am Stat Assoc. 2022;117(537):225-236. (PMID: 35615339)
Brief Bioinform. 2018 Jan 1;19(1):65-76. (PMID: 27742662)
PLoS Genet. 2007 Sep;3(9):1724-35. (PMID: 17907809)
PLoS Comput Biol. 2015 Nov 24;11(11):e1004575. (PMID: 26600239)
Stat Appl Genet Mol Biol. 2004;3:Article3. (PMID: 16646809)
Genome Biol. 2015 Dec 10;16:278. (PMID: 26653891)
Nat Biotechnol. 2014 Sep;32(9):896-902. (PMID: 25150836)
Nat Commun. 2017 Jan 16;8:14049. (PMID: 28091601)
Bioinformatics. 2018 Sep 15;34(18):3223-3224. (PMID: 29688277)
Biostatistics. 2020 Jan 1;21(1):15-32. (PMID: 29985984)
Nat Rev Genet. 2010 Oct;11(10):733-9. (PMID: 20838408)
Biostatistics. 2009 Jul;10(3):515-34. (PMID: 19377034)
Brief Bioinform. 2013 Nov;14(6):671-83. (PMID: 22988256)
Genome Biol. 2016 Oct 25;17(1):222. (PMID: 27782827)
Contributed Indexing:
Keywords: Confounders; Differential expression; Factor analysis; RNA-seq; Scaling factors; Simulation
Entry Date(s):
Date Created: 20200526 Date Completed: 20200623 Latest Revision: 20240328
Update Code:
20240329
PubMed Central ID:
PMC7245910
DOI:
10.1186/s12859-020-3450-9
PMID:
32448189
Czasopismo naukowe
Background: With the explosion in the number of methods designed to analyze bulk and single-cell RNA-seq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in unsubstantiated claims of a method's performance.
Results: Rather than generate data from a theoretical model, in this paper we develop methods to add signal to real RNA-seq datasets. Since the resulting simulated data are not generated from an unrealistic theoretical model, they exhibit realistic (annoying) attributes of real data. This lets RNA-seq methods developers assess their procedures in non-ideal (model-violating) scenarios. Our procedures may be applied to both single-cell and bulk RNA-seq. We show that our simulation method results in more realistic datasets and can alter the conclusions of a differential expression analysis study. We also demonstrate our approach by comparing various factor analysis techniques on RNA-seq datasets.
Conclusions: Using data simulated from a theoretical model can substantially impact the results of a study. We developed more realistic simulation techniques for RNA-seq data. Our tools are available in the seqgendiff R package on the Comprehensive R Archive Network: https://cran.r-project.org/package=seqgendiff.
Zaloguj się, aby uzyskać dostęp do pełnego tekstu.

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies