Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Tytuł pozycji:

UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets.

Tytuł:
UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets.
Autorzy:
Hozumi Y; Department of Mathematics, Michigan State University, MI, 48824, USA.
Wang R; Department of Mathematics, Michigan State University, MI, 48824, USA.
Yin C; Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL, 60607, USA.
Wei GW; Department of Mathematics, Michigan State University, MI, 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, MI, 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, MI, 48824, USA. Electronic address: .
Źródło:
Computers in biology and medicine [Comput Biol Med] 2021 Apr; Vol. 131, pp. 104264. Date of Electronic Publication: 2021 Feb 22.
Typ publikacji:
Journal Article; Research Support, N.I.H., Extramural; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, Non-P.H.S.
Język:
English
Imprint Name(s):
Publication: New York : Elsevier
Original Publication: New York, Pergamon Press.
MeSH Terms:
Algorithms*
Databases, Nucleic Acid*
Genome, Viral*
Mutation*
Phylogeny*
COVID-19/*genetics
SARS-CoV-2/*genetics
Humans
References:
J Hum Genet. 2020 Dec;65(12):1075-1082. (PMID: 32699345)
Cell. 2020 Aug 20;182(4):812-827.e19. (PMID: 32697968)
Discrete Continuous Dyn Syst Ser B. 2021 Jul;26(7):3785-3821. (PMID: 34675756)
J Med Virol. 2020 Oct;92(10):1932-1937. (PMID: 32314811)
J Mol Biol. 2020 Sep 4;432(19):5212-5226. (PMID: 32710986)
Nat Microbiol. 2020 Apr;5(4):536-544. (PMID: 32123347)
Front Cell Infect Microbiol. 2020 Jul 27;10:405. (PMID: 32850499)
Emerg Microbes Infect. 2020 Dec;9(1):1457-1466. (PMID: 32543353)
J Phys Chem Lett. 2020 Dec 3;11(23):10007-10015. (PMID: 33179934)
Nature. 2020 Mar;579(7798):265-269. (PMID: 32015508)
Int J Infect Dis. 2020 Nov;100:164-173. (PMID: 32866640)
Infect Genet Evol. 2020 Sep;83:104351. (PMID: 32387564)
J Chem Inf Model. 2020 Dec 28;60(12):5853-5865. (PMID: 32530284)
J Med Virol. 2020 Jun;92(6):602-611. (PMID: 32104911)
Mol Syst Biol. 2011 Oct 11;7:539. (PMID: 21988835)
Science. 2020 Oct 30;370(6516):564-570. (PMID: 32912998)
Commun Biol. 2021 Feb 15;4(1):228. (PMID: 33589648)
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9241-9243. (PMID: 32269081)
Nat Methods. 2019 Mar;16(3):243-245. (PMID: 30742040)
Trends Ecol Evol. 2012 Feb;27(2):113-20. (PMID: 22209094)
Nat Biotechnol. 2018 Dec 03;:. (PMID: 30531897)
Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. (PMID: 26953178)
Grant Information:
R01 GM126189 United States GM NIGMS NIH HHS
Contributed Indexing:
Keywords: COVID-19; PCA; SARS-CoV-2; UMAP; t-SNE
Entry Date(s):
Date Created: 20210301 Date Completed: 20210329 Latest Revision: 20221215
Update Code:
20240105
PubMed Central ID:
PMC7897976
DOI:
10.1016/j.compbiomed.2021.104264
PMID:
33647832
Czasopismo naukowe
Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. Understanding the evolution and transmission of SARS-CoV-2 is of paramount importance for controlling, combating and preventing COVID-19. Due to the rapid growth in both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced K-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted K-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.
(Copyright © 2021 Elsevier Ltd. All rights reserved.)
Update of: ArXiv. 2020 Dec 30;:. (PMID: 33398244)

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies