Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Tytuł pozycji:

A generalized linear mixed model association tool for biobank-scale data.

Tytuł:
A generalized linear mixed model association tool for biobank-scale data.
Autorzy:
Jiang L; Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia.; School of Life Sciences, Westlake University, Hangzhou, China.
Zheng Z; Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia.
Fang H; School of Life Sciences, Westlake University, Hangzhou, China.; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China.
Yang J; Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia. .; School of Life Sciences, Westlake University, Hangzhou, China. .; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China. .
Źródło:
Nature genetics [Nat Genet] 2021 Nov; Vol. 53 (11), pp. 1616-1621. Date of Electronic Publication: 2021 Nov 04.
Typ publikacji:
Journal Article; Research Support, Non-U.S. Gov't
Język:
English
Imprint Name(s):
Original Publication: New York, NY : Nature Pub. Co., c1992-
MeSH Terms:
Algorithms*
Biological Specimen Banks*/statistics & numerical data
Linear Models*
Models, Genetic*
Adult ; Aged ; Case-Control Studies ; Genetic Variation ; Genome-Wide Association Study/statistics & numerical data ; Genotype ; Humans ; Middle Aged ; Phenotype ; United Kingdom
References:
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). (PMID: 30305743678697510.1038/s41586-018-0579-z)
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016). (PMID: 27863252530090710.1016/j.cell.2016.10.042)
Kemp, J. P. et al. Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis. Nat. Genet. 49, 1468 (2017). (PMID: 28869591562162910.1038/ng.3949)
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018). (PMID: 29700475593432610.1038/s41588-018-0090-3)
Tin, A. et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat. Genet. 51, 1459–1474 (2019).
Craig, J. E. et al. Multitrait analysis of glaucoma identifies new risk loci and enables polygenic prediction of disease susceptibility and progression. Nat. Genet. 52, 160–166 (2020). (PMID: 31959993805667210.1038/s41588-019-0556-y)
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015). (PMID: 25722852434219310.1186/s13742-015-0047-8)
Canela-Xandri, O., Law, A., Gray, A., Woolliams, J. A. & Tenesa, A. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach. Nat. Commun. 6, 10162 (2015). (PMID: 2665701010.1038/ncomms10162)
Loh, P. R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018). (PMID: 29892013630961010.1038/s41588-018-0144-6)
Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019). (PMID: 3176806910.1038/s41588-019-0530-8)
Pirinen, M., Donnelly, P. & Spencer, C. C. Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 7, 369–390 (2013). (PMID: 10.1214/12-AOAS586)
Van Rheenen, W. et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 48, 1043–1048 (2016). (PMID: 27455348555636010.1038/ng.3622)
Howson, J. M. et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 49, 1113 (2017). (PMID: 28530674555538710.1038/ng.3874)
Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018). (PMID: 30104761611912710.1038/s41588-018-0184-y)
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet 88, 76–82 (2011). (PMID: 21167468301436310.1016/j.ajhg.2010.11.011)
Liu, Y. et al. Acat: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 104, 410–421 (2019). (PMID: 30849328640749810.1016/j.ajhg.2019.01.002)
Band, G. & Marchini, J. BGEN: a binary file format for imputed genotype and haplotype data. Preprint at bioRxiv https://doi.org/10.1101/308296 (2018).
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. https://doi.org/10.1038/s41588-021-00870-7 (2021).
Zhou, W. et al. Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts. Nat. Genet. 52, 634–639 (2020). (PMID: 32424355787173110.1038/s41588-020-0621-6)
Wu, P. et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inform. 7, e14325 (2019). (PMID: 31553307691122710.2196/14325)
Chatila, T. A. Interleukin-4 receptor signaling pathways in asthma pathogenesis. Trends Mol. Med. 10, 493–499 (2004). (PMID: 1546444910.1016/j.molmed.2004.08.004)
Wenzel, S. E. et al. IL4Rα mutations are associated with asthma exacerbations and mast cell/IgE expression. Am. J. Respir. Crit. Care Med. 175, 570–576 (2007). (PMID: 1717038710.1164/rccm.200607-909OC)
Hirota, T. et al. Genome-wide association study identifies three new susceptibility loci for adult asthma in the Japanese population. Nat. Genet. 43, 893–896 (2011). (PMID: 21804548431072610.1038/ng.887)
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019). (PMID: 31704910684172710.1038/s41467-019-12653-0)
Ni, G. et al. A comparison of ten polygenic score methods for psychiatric disorders applied across multiple cohorts. Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2021.04.018 (2021).
Lloyd-Jones, L. R., Robinson, M. R., Yang, J. & Visscher, P. M. Transformation of summary statistics from linear mixed model association on all-or-none traits to odds ratio. Genetics 208, 1397–1408 (2018). (PMID: 29429966588713810.1534/genetics.117.300360)
Dey, R., Schmidt, E. M., Abecasis, G. R. & Lee, S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet. 101, 37–49 (2017). (PMID: 28602423550177510.1016/j.ajhg.2017.05.014)
Breyer, J. P., Avritt, T. G., McReynolds, K. M., Dupont, W. D. & Smith, J. R. Confirmation of the HOXB13 G84E germline mutation in familial prostate cancer. Cancer Epidemiol. Prev. Biomark. 21, 1348–1353 (2012). (PMID: 10.1158/1055-9965.EPI-12-0495)
Ewing, C. M. et al. Germline mutations in HOXB13 and prostate-cancer risk. N. Engl. J. Med. 366, 141–149 (2012). (PMID: 22236224377987010.1056/NEJMoa1110000)
Karlsson, R. et al. A population-based assessment of germline HOXB13 G84E mutation and prostate cancer risk. Eur. Urol. 65, 169–176 (2014). (PMID: 2284167410.1016/j.eururo.2012.07.027)
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014). (PMID: 24473328398914410.1038/ng.2876)
Pulit, S. L., de With, S. A. & de Bakker, P. I. Resetting the bar: statistical significance in whole‐genome sequencing‐based association studies of global populations. Genet. Epidemiol. 41, 145–151 (2017). (PMID: 2799068910.1002/gepi.22032)
Wu, Y., Zheng, Z., Visscher, P. M. & Yang, J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 18, 86 (2017). (PMID: 28506277543297910.1186/s13059-017-1216-0)
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006). (PMID: 1638071610.1038/ng1702)
Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008). (PMID: 18385116227809610.1534/genetics.107.080101)
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010). (PMID: 20208533309206910.1038/ng.548)
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010). (PMID: 20208535293133610.1038/ng.546)
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012). (PMID: 22706312338637710.1038/ng.2310)
Svishcheva, G. R., Axenovich, T. I., Belonogova, N. M., van Duijn, C. M. & Aulchenko, Y. S. Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012). (PMID: 2298330110.1038/ng.2410)
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015). (PMID: 25642633434229710.1038/ng.3190)
Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016). (PMID: 27018471483321810.1016/j.ajhg.2016.02.012)
Gilmour, A. R., Thompson, R. & Cullis, B. R. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440–1450 (1995).
Breslow, N. E. & Lin, X. Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82, 81–91 (1995). (PMID: 10.1093/biomet/82.1.81)
Kuonen, D. Miscellanea. Saddlepoint approximations for distributions of quadratic forms in normal variables. Biometrika 86, 929–935 (1999). (PMID: 10.1093/biomet/86.4.929)
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016). (PMID: 27548312538817610.1038/ng.3643)
UK10K consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015). (PMID: 10.1038/nature14962)
Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017). (PMID: 2847569410.1093/bioinformatics/btx299)
Millard, L. A. C., Davies, N. M., Gaunt, T. R., Davey Smith, G. & Tilling, K. Software application profile: PHESANT: a tool for performing automated phenome scans in UK Biobank. Int. J. Epidemiol. 47, 29–35 (2017).
World Health Organization. International Statistical Classification of Diseases and Related Health Problems 10th revision (ICD-10) (World Health Organization, 2016).
Lubin, J. H. & Gail, M. H. Biased selection of controls for case–control analyses of cohort studies. Biometrics 40, 63–75 (1984).
Yang, J. et al. jianyangqt/gcta: GCTA (v1.93.3beta2). Zenodo https://doi.org/10.5281/zenodo.5226943 (2021).
Jiang, L., Zheng, Z., Fang, H. & Yang, J. A generalized linear mixed model association tool for biobank-scale data—code. Zenodo https://doi.org/10.5281/zenodo.5501110 (2021).
Grant Information:
MC_PC_17228 United Kingdom MRC_ Medical Research Council; MC_QA137853 United Kingdom MRC_ Medical Research Council
Entry Date(s):
Date Created: 20211105 Date Completed: 20211227 Latest Revision: 20230207
Update Code:
20240105
DOI:
10.1038/s41588-021-00954-4
PMID:
34737426
Czasopismo naukowe
Compared with linear mixed model-based genome-wide association (GWA) methods, generalized linear mixed model (GLMM)-based methods have better statistical properties when applied to binary traits but are computationally much slower. In the present study, leveraging efficient sparse matrix-based algorithms, we developed a GLMM-based GWA tool, fastGWA-GLMM, that is severalfold to orders of magnitude faster than the state-of-the-art tools when applied to the UK Biobank (UKB) data and scalable to cohorts with millions of individuals. We show by simulation that the fastGWA-GLMM test statistics of both common and rare variants are well calibrated under the null, even for traits with extreme case-control ratios. We applied fastGWA-GLMM to the UKB data of 456,348 individuals, 11,842,647 variants and 2,989 binary traits (full summary statistics available at http://fastgwa.info/ukbimpbin ), and identified 259 rare variants associated with 75 traits, demonstrating the use of imputed genotype data in a large cohort to discover rare variants for binary complex traits.
(© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.)

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies