Summary of major biobanks and cohorts v1

主要的生物银行 biobanks 以及队列 cohorts总结.v1


本文主要列举世界范围内主要的生物银行 biobanks 以及队列 cohorts,仅供参考。目前仅列举各个生物银行及队列的基础信息,包括样本量(概数),位置,网站链接以及简要介绍。未来会不断更新,下一步是补全缩写,增加研究类型,族裔信息,样本量中分开总样本量与基因分型的样本量,以及对应的数据公开的链接等。(个人手动整理,难免有差错,如有遗漏或错误,欢迎评论区指正,感谢!)

本文为CTGCatalog (Complex Trait Genetics Catalog, 主要收集整理Complex Trait Genetics 领域内常用参考数据与资源,公开的sumstats,以及常用工具等)的一部分:

Contents : Biobanks and Cohorts v1 (20221006)

  • Biobank of the Americas
  • Biobank Graz
  • Biobank Japan
  • BioMe
  • BioVU
  • CanPath – Ontario Health Study
  • China Kadoorie Biobank
  • Colorado Center for Personalized Medicine
  • deCODE Genetics
  • Estonian Biobank
  • FinnGen
  • Generation Scotland
  • Genes & Health
  • HUNT
  • IARC Biobank
  • Lifelines
  • Massachusetts General Brigham Biobank
  • Michigan Genomics Initiative
  • Million Veteran Program (MVP)
  • National Biobank of Korea
  • Nigerian 100K Genome Project
  • Penn Medicine Biobank
  • Qatar Biobank
  • QIMR Berghofer – QIMR Biobank (QSkin and GenEpi)
  • Taiwan Biobank
  • The Malaysian Cohort (TMC)
  • UCLA Precision Health Biobank
  • Uganda Genome Resource
  • UK Biobank


UK Biobank (UKB)

  • SAMPLE SIZE: ~500k
  • URL:
  • DESCRIPTION: UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database is regularly augmented with additional data and is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. It is a major contributor to the advancement of modern medicine and treatment and has enabled several scientific discoveries that improve human health.
  • CITATION: Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L. T., Sharp, K., … & Marchini, J. (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature, 562(7726), 203-209.


  • SAMPLE SIZE: ~343k
  • LOCATION: Finland
  • URL:
  • DESCRIPTION: FinnGen study launched in Finland in the autumn of 2017 is a unique study that combines genome information with digital health care data. The FinnGen study is an unprecedented global research project representing one of the largest studies of this type. Project aims to improve human health through genetic research, and ultimately identify new therapeutic targets and diagnostics for treating numerous diseases. The collaborative nature of the project is exceptional compare to many ongoing studies, and all the partners are working closely together to ensure appropriate transparency, data security and ownership.
  • CITATION:Kurki, M. I., Karjalainen, J., Palta, P., Sipilä, T. P., Kristiansson, K., Donner, K., … & Nelis, M. (2022). FinnGen: Unique genetic insights from combining isolated population and national health register data. medRxiv.

Estonian Biobank

  • SAMPLE SIZE: ~200k
  • LOCATION: Estonia
  • URL:
  • DESCRIPTION:The Estonian Biobank has established a population-based biobank of Estonia with a current cohort size of more than 200,000 individuals (genotyped with genome-wide arrays), reflecting the age, sex and geographical distribution of the adult Estonian population. Considering the fact that about 20% of Estonia’s adult population has joined the programme, it is indeed a database that is very important for the development of medical science both domestically and internationally.
  • CITATION:Leitsalu, L., Haller, T., Esko, T., Tammesoo, M. L., Alavere, H., Snieder, H., … & Metspalu, A. (2015). Cohort profile: Estonian biobank of the Estonian genome center, university of Tartu. International journal of epidemiology, 44(4), 1137-1147.


  • SAMPLE SIZE: ~167k
  • LOCATION: Netherlands
  • URL:
  • DESCRIPTION: Lifelines is a large, multigenerational cohort study that includes over 167,000 participants (10%) from the northern population of the Netherlands. We included participants from three generations, who are followed for at least 30 years, to obtain insight into healthy ageing. The aim of Lifelines is to be a resource for the national and international scientific community.
  • CITATION: Scholtens, S., Smidt, N., Swertz, M. A., Bakker, S. J., Dotinga, A., Vonk, J. M., … & Stolk, R. P. (2015). Cohort Profile: LifeLines, a three-generation cohort study and biobank. International journal of epidemiology, 44(4), 1172-1180.


  • SAMPLE SIZE: ~88k
  • LOCATION: Norway
  • URL:
  • DESCRIPTION:HUNT Biobank is an established and modern research biobank with high-technology equipment for storage, analysis, sample handling and delivery of samples. Our samples satisfy high quality standards and are stored in accordance with the Data Inspectorates laws and regulations. HUNT Biobank engages in sample handling from The Nord-Trøndelag Health Study (HUNT), Cohort of Norway (CONOR), and can receive samples from other researchers and research projects for storage, analysis and processing of DNA. We do not store samples from private individuals.
  • CITATION: Brumpton, B. M., Graham, S., Surakka, I., Skogholt, A. H., Løset, M., Fritsche, L. G., … & Willer, C. J. (2021). The HUNT Study: a population-based cohort for genetic research. medRxiv.

Generation Scotland

  • SAMPLE SIZE: ~24k
  • LOCATION: Scotland
  • URL:
  • DESCRIPTION: Generation Scotland is a research study looking at the health and well-being of volunteers and their families. Generation Scotland combines responses to questionnaires of health and well-being from birth through life. We combine this with NHS health records and innovative laboratory science to understand health trajectories. We work closely with researchers and our volunteers to create a rich evidence base for understanding health. Through this rigorous, ethical and safe approach to research, we seek to enable meaningful change in public health.  
  • CITATION: Smith, B. H., Campbell, A., Linksted, P., Fitzpatrick, B., Jackson, C., Kerr, S. M., … & Morris, A. D. (2013). Cohort Profile: Generation Scotland: Scottish Family Health Study (GS: SFHS). The study, its participants and their potential for genetic research on health and illness. International journal of epidemiology, 42(3), 689-700.

East London Genes & Health

  • SAMPLE SIZE: ~100k
  • URL:
  • DESCRIPTION: Genes & Health is a huge long-term study of 100,000 people of Bangladeshi and Pakistani origin. We will link genes with health records, to study disease and treatments. Some volunteers may be invited for further studies. We are inviting volunteers to take part in two regions of the UK: East London (East London Genes & Health) and Bradford (Bradford Genes & Health).
  • CITATION: Finer, S., Martin, H. C., Khan, A., Hunt, K. A., MacLaughlin, B., Ahmed, Z., … & van Heel, D. A. (2020). Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. International journal of epidemiology, 49(1), 20-21i.

deCODE Genetics

  • SAMPLE SIZE: ~250k
  • LOCATION: Iceland
  • URL:
  • DESCRIPTION:deCODE leads the world in the discovery of genetic risk factors for common diseases. Our gene discovery engine is driven by our unique approach and resources, including detailed genetic and medical information on some 500,000 individuals from around the globe taking part in our discovery work and proprietary statistical algorithms and informatics tools for gathering, analyzing, visualizing and storing large amounts of data.

The International Agency for Research on Cancer (IARC) Biobank (IBB)

  • SAMPLE SIZE: ~560k
  • LOCATION: France
  • URL:
  • DESCRIPTION: The IARC BioBank (IBB) is one of the largest, most varied and richest International collections of samples in the world. The Biobank is publicly funded, (approximately 60% of its budget is provided by IARC Participating States through the regular budget and the remainder is from research grants) and hosts over 50 different studies, led or coordinated by IARC scientists. The IBB contains both population-based collections from research projects focusing on gene-environment interactions (as in the European Prospective Investigation into Cancer and Nutrition (EPIC) study) and disease-based collections which focus on biomarkers (as in the International Head and Neck Cancer Epidemiology (INHANCE)). Study designs include case-series, prevalence studies, case-control and cohort studies, etc. The IBB contains 5.1 million biological samples from 562,000 individuals. 4 million of the samples are from the EPIC study (over 370,000 individuals) and about one million samples from other collections (close to 200,000 individuals). Most of the samples are body fluids, including plasma, serum and urine as well as extracted DNA samples.

Biobank Graz

  • SAMPLE SIZE: ~1200k
  • LOCATION: Austria
  • URL:
  • DESCRIPTION: Biobank Graz is one of the largest and most well-known clinical biobanks in the world. Around 20 million individual specimens of body fluids and human tissue are stored here. Biobank Graz allows access to these specimens and associated data for scientific research purposes. The common goal is to develop approaches to diagnosing and treating disease.
  • CITATION: Huppertz, B., Bayer, M., Macheiner, T., & Sargsyan, K. (2016). Biobank Graz: the hub for innovative biomedical research. Open journal of bioresources, 3(1).


China Kadoorie Biobank (CKB)

  • SAMPLE SIZE: ~500k
  • LOCATION: China
  • URL:
  • DESCRIPTION:The China Kadoorie Biobank is one of the world’s largest prospective cohort studies. A long-term collaboration between the UK and China, it aims to generate reliable evidence about the lifestyle, environmental and genetic determinants of a wide range of common diseases that can inform disease prevention, risk prediction and treatment worldwide.
  • CITATION:Chen, Z., Chen, J., Collins, R., Guo, Y., Peto, R., Wu, F., & Li, L. (2011). China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. International journal of epidemiology, 40(6), 1652-1666.

Taiwan Biobank (TWB)

  • SAMPLE SIZE: ~150k
  • LOCATION: China, Taiwan
  • URL:
  • DESCRIPTION:The Taiwan Biobank (TWB) is an ongoing prospective study of over 150,000 individuals aged 30-70 recruited from across Taiwan beginning in 2012. A comprehensive list of phenotypes was collected for each consented participant at recruitment and follow-up visits through structured interviews and physical measurements. Biomarkers and genetic data were also generated for all participants from blood and urine samples.
  • CITATION:Feng, Y. C. A., Chen, C. Y., Chen, T. T., Kuo, P. H., Hsu, Y. H., Yang, H. I., … & Lin, Y. F. (2021). Taiwan Biobank: a rich biomedical research database of the Taiwanese population. medRxiv.

BioBank Japan (BBJ)

  • SAMPLE SIZE: ~200k
  • LOCATION: Japan
  • URL:
  • DESCRIPTION:In 2003, BioBank Japan (BBJ) started developing one of the world’s largest disease biobanks, creating a foundation for research aimed at achieving medical care tailored to the individual traits of each patient. From a total of 260,000 patients representing 440,000 cases of 51 primarily multifactorial (common) diseases, BBJ has collected DNA, serum, medical records (clinical information), etc. with their consent. No less than 5,800 items of screened information are available for research, including the patients’ survival information, with 95% of the patients tracked over an average of 10 years. In addition to large-scale genomic analyses, omics analyses including whole genome sequencing and metabolome/proteome analyses have been performed on the DNA, serum and other biological samples collected, producing significant research findings. The genomic information acquired through the analyses continues to be used as data. The biological samples and data are widely distributed and used by researchers.
  • CITATION:Nagai, A., Hirata, M., Kamatani, Y., Muto, K., Matsuda, K., Kiyohara, Y., … & Kubo, M. (2017). Overview of the BioBank Japan Project: study design and profile. Journal of epidemiology, 27(Supplement_III), S2-S8.

Tohoku Medical Megabank (TMM)

  • SAMPLE SIZE: ~157k
  • LOCATION: Japan
  • URL:
  • DESCRIPTION:Tohoku University Tohoku Medical Megabank Organization was founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. The organization has been developing a biobank that combines medical and genome information during the process of rebuilding the community medical system and supporting health and welfare in the Tohoku area. The information from the brand-new biobank will create a new medical system, and, based on the findings of its analysis, the organization aims to attract more medical practitioners from all over the country to the area, promote industry-academic partnerships, create employment in related fields, and restore the medical system in Tohoku.
  • CITATION:Kuriyama, S., Yaegashi, N., Nagami, F., Arai, T., Kawaguchi, Y., Osumi, N., … & Tohoku Medical Megabank Project Study Group. (2016). The Tohoku medical megabank project: design and mission. Journal of epidemiology, 26(9), 493-511.

National Biobank of Korea

  • LOCATION: Korea
  • URL:
  • DESCRIPTION:The NBK is the national control center for the collection, management, and utilization of human bioresources in Korea. And NBK manages KBN, it contributes to the development of policies related to human bioresources, standardization of human bioresource management, and advancement of domestic biobanks through developing and providing support for human bioresource technologies. For guaranteeing the fairness in bioresource distribution and development of an efficient distribution system, the NBK also serves as the human bioresource supply hub that supports national healthcare and medical R&D.
  • CITATION:Cho, S. Y., Hong, E. J., Nam, J. M., Han, B., Chu, C., & Park, O. (2012). Opening of the national biobank of Korea as the infrastructure of future biomedical science in Korea. Osong public health and research perspectives, 3(3), 177-184.

Qatar Biobank

  • LOCATION: Qatar
  • URL :
  • DESCRIPTION: Qatar Biobank, a center within Qatar Foundation, was created in collaboration with Hamad Medical Corporation and the Ministry of Public Health to enable local scientists to conduct medical research on prevalent health issues in Qatar.
  • CITATION:Al Kuwari, H., Al Thani, A., Al Marri, A., Al Kaabi, A., Abderrahim, H., Afifi, N., … & Elliott, P. (2015). The Qatar Biobank: background and methods. BMC public health, 15(1), 1-9.

The Malaysian Cohort (TMC)

  • Cohort Size: ~100k
  • LOCATION: Malaysia
  • URL:
  • DESCRIPTION:The Malaysian Cohort study was initiated in 2005 by the Malaysian government. The top-down approach to this population-based cohort study ensured the allocation of sufficient funding for the project which aimed to recruit 100 000 individuals aged 35–70 years. Participants were recruited from rural and urban areas as well as from various socioeconomic groups. The main objectives of the study were to identify risk factors, to study gene-environment interaction and to discover biomarkers for the early detection of cancers and other diseases.
  • CITATION:Jamal, R., Syed Zakaria, S. Z., Kamaruddin, M. A., Abd Jalal, N., Ismail, N., Mohd Kamil, N., … & Malaysian Cohort Study Group. (2015). Cohort profile: The Malaysian Cohort (TMC) project: a prospective study of non-communicable diseases in a multi-ethnic population. International journal of epidemiology, 44(2), 423-431.


Uganda Genome Resource

  • SAMPLE SIZE: ~6k
  • URL:
  • DESCRIPTION:Genomic studies in African populations provide unique opportunities to understand disease aetiology, human genetic diversity and population history in a regional and a global context. To leverage the relative benefits of different strategies, we undertook a combined approach of genotyping and whole-genome sequencing (WGS) in a population-based study of 6,400 individuals from a geographically defined rural community in South-West Uganda. We present data from 4,778 individuals with genotypes for ~2.2 million SNPs from the Uganda GWAS resource (UGWAS), and sequence data on up to 1,978 individuals spanning 41.5M SNPs and 4.5M indels (UG2G); 343 individuals overlap between the two datasets. We highlight the value of the largest sequence panel from Africa to date as a global resource for variant discovery, imputation and understanding the mutational spectrum and its clinical relevance in African populations. Alongside phenotype data, we provide a rich new genomic resource for researchers in Africa and globally
  • CITATION:Gurdasani, D., Carstensen, T., Fatumo, S., Chen, G., Franklin, C. S., Prado-Martinez, J., … & Sandhu, M. S. (2019). Uganda genome resource enables insights into population history and genomic discovery in Africa. Cell, 179(4), 984-1002.

Nigerian 100K Genome Project (coming soon)

  • CITATION:Fatumo, S., Yakubu, A., Oyedele, O., Popoola, J., Attipoe, D. A., Eze-Echesi, G., … & Ene-Obong, A. (2022). Promoting the genomic revolution in Africa through the Nigerian 100K Genome Project. Nature Genetics, 54(5), 531-536.


Michigan Genomics Initiative

  • SAMPLE SIZE: ~55k
  • URL:
  • DESCRIPTION:The Michigan Genomics Initiative (MGI) is a collaborative research effort among physicians, researchers, and patients at the University of Michigan (U-M) with the goal of combining patient electronic health record (EHR) data with corresponding genetic data to gain novel biomedical insights. There are currently ~84K consented participants through the MGI and partner studies and the addition of ~10K new participants per year is anticipated. Currently, all MGI participants with available genetic data have received care at the University of Michigan Health System.
  • CITATION:Zawistowski, M., Fritsche, L. G., Pandit, A., Vanderwerff, B., Patil, S., Scmidt, E. M., … & Zoellner, S. (2021). The Michigan Genomics Initiative: a biobank linking genotypes and electronic clinical records in Michigan Medicine patients. medRxiv.

Penn Medicine Biobank

  • SAMPLE SIZE: ~40k
  • URL:
  • DESCRIPTION:The Penn Medicine BioBank (PMBB) is a research program created to study the causes and treatments of many diseases. Any Penn Medicine patient (age 18 and up) can sign up. The PMBB is a collection of biological samples, such as blood or tissue, that are donated by patient volunteers. These samples are then connected to clinical information, such as diseases or lab measures. These data are then used by researchers to discover new ways to detect, treat, and maybe even prevent or cure disease. Some of these studies may be about how genes affect health and disease. Other studies look at how genes affect response to medicines.

UCLA Precision Health Biobank

  • SAMPLE SIZE: ~27k
  • URL:
  • DESCRIPTION:The UCLA ATLAS Precision Health Biobank, under the supervision of the Translational Pathology Core Laboratory (TCPL), collects biological samples from patients who have consented to participate in the UCLA ATLAS Community Health Initiative. As a collaborator with UCLA ATLAS Community Health Initiative, the UCLA ATLAS Precision Health Biobank manages the collection and distribution of biological samples by removing the personally identifiable information.
  • CITATION:Johnson, R. D., Ding, Y., Bhattacharya, A., Chiu, A., Lajonchere, C., Geschwind, D. H., & Pasaniuc, B. (2022). The UCLA ATLAS Community Health Initiative: promoting precision health research in a diverse biobank. medRxiv.


  • SAMPLE SIZE: ~32k
  • URL:
  • DESCRIPTION:The Institute for Personalized Medicine at the Icahn School of Medicine at Mount Sinai is leading the movement toward diagnosis and classification of disease according to the patient’s molecular profile. This approach accommodates differences at all possible levels of exposure (genome, environment, and lifestyle) and at all stages of the process, from prevention to post-treatment follow-up. At the center of this effort is BioMe, an electronic medical record-linked biobank that enables researchers to rapidly and efficiently conduct genetic, epidemiologic, molecular, and genomic studies on large collections of research specimens linked with medical information.


  • SAMPLE SIZE: ~120k
  • URL:
  • DESCRIPTION:Planning for BioVU began in mid-2004 and the first samples were collected in February 2007. Prior to collecting DNA samples, all aspects of the BioVU project were extensively tested. BioVU now accrues 500-1000 samples per week, totaling more than 275,000 DNA samples as of January 2022. Vanderbilt clinic patients may sign the BioVU Consent Form if they wish to donate their excess blood samples, or not sign the form if they do not wish to participate.
  • CITATION:Roden, D. M., Pulley, J. M., Basford, M. A., Bernard, G. R., Clayton, E. W., Balser, J. R., & Masys, D. R. (2008). Development of a large‐scale de‐identified DNA biobank to enable personalized medicine. Clinical Pharmacology & Therapeutics, 84(3), 362-369.

Biobank of the Americas

  • SAMPLE SIZE: ~20k
  • URL:
  • URL:
  • DESCRIPTION: Biobank consented samples with associated clinical data from diverse populations from throughout the United States and Latin America via healthcare and biopharma partnerships.

Colorado Center for Personalized Medicine

  • SAMPLE SIZE: ~34k
  • URL:
  • DESCRIPTION:Established in 2014 as a partnership between UCHealth and University of Colorado Anschutz Medical Campus, the Colorado Center for Personalized Medicine (CCPM) brings together multiple disciplines and institutions to uncover advancements in genomics that can improve diagnosis and treatment of disease, and identify more tailored approaches to population health management.To facilitate discoveries in personalized medicine, CCPM has created a Biobank that aims to be one of the largest academic medicine biospecimen repositories in the mountain and midwest regions of the U.S. The CCPM Biobank is able to link biospecimens and genotype information with patient health information from electronic medical records in an enterprise data warehouse (Health Data Compass) to support a broad range of research, operational, and clinical quality improvement agendas.

CanPath – Ontario Health Study

  • SAMPLE SIZE: ~7.3k
  • LOCATION: Canada
  • URL:
  • DESCRIPTION:The Ontario Health Study (OHS) is a resource for investigating the ways in which lifestyle, the environment and genetics affect people’s health. It is one of the regional cohorts that collectively form the Canadian Partnership for Tomorrow’s Health (CanPath)—a pan-Canadian cohort with >330 000 participants. The linking of Canada’s rich collection of administrative health data with the cohort’s data represents a powerful means to disseminate high-quality, timely data.
  • CITATION:Kirsh, V. A., Skead, K., McDonald, K., Kreiger, N., Little, J., Menard, K., … & Awadalla, P. (2022). Cohort Profile: The Ontario Health Study (OHS). International Journal of Epidemiology.

Massachusetts General Brigham Biobank

  • URL:
  • DESCRIPTION: The Mass General Brigham Biobank is a large research program designed to help researchers understand how people’s health is affected by their genes, lifestyle, and environment. By participating in the Mass General Brigham Biobank, you can help us better understand, treat, and even prevent the diseases that might affect your health and the health of future generations. 
  • CITATION: Boutin, N. T., Schecter, S. B., Perez, E. F., Tchamitchian, N. S., Cerretani, X. R., Gainer, V. S., … & Smoller, J. W. (2022). The Evolution of a Large Biobank at Mass General Brigham. Journal of Personalized Medicine, 12(8), 1323.
  • CITATION:Castro, V. M., Gainer, V., Wattanasin, N., Benoit, B., Cagan, A., Ghosh, B., … & Murphy, S. N. (2022). The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics. Journal of the American Medical Informatics Association, 29(4), 643-651.

Million Veteran Program (MVP)

  • SAMPLE SIZE: ~900k
  • URL:
  • DESCRIPTION: The Million Veteran Program (MVP) is a national research program to learn how genes, lifestyle, and military exposures affect health and illness. Since launching in 2011, over 900,000 Veteran partners have joined one of the world’s largest programs on genetics and health.
  • CITATION:Gaziano, J. M., Concato, J., Brophy, M., Fiore, L., Pyarajan, S., Breeling, J., … & O’Leary, T. J. (2016). Million Veteran Program: A mega-biobank to study genetic influences on health and disease. Journal of clinical epidemiology, 70, 214-223.


QIMR Berghofer – QIMR Biobank (QSkin and GenEpi)


Home | Global Biobank Meta

《Summary of major biobanks and cohorts v1》有2个想法


Fill in your details below or click an icon to log in: 徽标

您正在使用您的 账号评论。 注销 /  更改 )

Facebook photo

您正在使用您的 Facebook 账号评论。 注销 /  更改 )

Connecting to %s