China Kadoorie Biobank: Lessons from a Visionary Cohort Study

In 2003, the over a decade-long Human Genome Project was completed—with the cost of genome sequencing at a staggering US$3 billion.[1] At that time, precision medicine was a distant, futuristic vision. Yet, amid this backdrop, the team at China Kadoorie Biobank (CKB) embarked on an ambitious journey: to collect and preserve health data and biological samples from over half a million individuals across China. They would work towards their vision of building a data-rich resource for the future of biomedical research and development of precision health.
Professor Chen Zhengming, UK Principal Investigator for the China Kadoorie Biobank (CKB), said, “We were inspired by the Human Genome Project to conduct a massive, long-term cohort study with half a million or more participants. But unlike many research projects at that time, we did not start with a specific hypothesis.”
He continued, “Rather, we wanted to collect as much data as we could. Because with the rapid development of technology—who can predict what we could do one day? More than twenty years ago, it cost US$3 billion to sequence one human genome at 1X. But today, it costs a mere fraction of that to sequence thousands of human genomes at 30X.”
A collaboration between the University of Oxford, and the then China National Centre for Disease Prevention and Control (China CDC)—which was later replaced by Peking University—CKB began in 2004 and ended in 2008 with over 512,000 adults recruited from ten geographically defined areas in China.[2] In addition to obtaining biosamples and physical measurements of participants, the CKB team also rolled out a detailed questionnaire on medical history, lifestyle and environmental exposures.
The health of CKB participants has been monitored for almost two decades since. [2] Moreover, a random subset of surviving participants were assessed repeatedly every four to five years thereafter to yield new data and research insights.
Building a Future-ready Resource
In the early years, CKB focused on data and biosample collection and storage. Prof Chen said, “We didn’t test the samples—because with the level of technology available then, there were only very limited number of biomarkers we could test for. It was only in the last several years that we could begin large-scale assays, covering genetics and omics.”
“The sample collection was very complete and not a single sample was wasted—to loss, damage or thaw,” Prof Chen shared with a tinge of pride, on the subject of retrieving the biosamples for genotyping and other assays after twenty years in durable storage.
He credited it to the meticulous protocols by his team, as well as CKB’s network of bespoke IT systems, tailored to the biobank’s specialised demands. Many were proprietary and developed specifically to enable remote site monitoring and timely collection of high quality data.[3] Protocols wise, the team created secure data management policies which are standard practice today.
Through their efforts, CKB has become a repository of quality data, capturing over 3.5 million episodes of hospitalisation for over 6,000 disease types by the end of 2024, enabling research into almost all common and less common diseases of adults.
Challenging Prevailing Assumptions
In tandem with the advancement of sequencing technologies, whole genome sequencing for the entire CKB cohort commenced at BGI, Shenzhen, in April 2024, and was completed by March 2025. This will help to address the critical gap in East Asian representation in genomic research, as similar large-scale population studies have predominantly focused on European populations.
To date, CKB data has contributed to over 600 publications, with over 350 published in the last five years. Prof Chen said, “Using our data, researchers have not only expanded our collective knowledge of well-studied diseases but also challenged prevailing assumptions.”
One key example being the common belief that moderate alcohol consumption is beneficial to health. Prof Chen explained, “Previous studies suggested an association between moderate drinking and lower risk of death, especially from cardiovascular diseases. However, our researchers using robust genetic instruments that are only possible in East Asians found that the observed ‘J-shaped’ curve between alcohol consumption and mortality rate was due to residual confounding factors and reverse causality—rather than any causal relation.”
By leveraging the high prevalence of genetic variants causing flushing reactions to alcohol in the East Asian population, researchers observed that men with these variants tended to drink less and had lower risks of developing conditions such as cardiovascular diseases or alcohol-related cancers. Consequently, they concluded that even one to two drinks a day could increase the risk for cardiovascular deaths by 15%, alcohol-related cancers by 12%, liver disease by 31%, and overall mortality by 7%.[4]
Apart from showing the influence alcohol intake may have on the risk of health issues in populations around the world, “The study provided important causal evidence of the scale of alcohol-related harms, leading to changes in guidelines in many countries,” added Prof Chen.[5]
Leveraging Technology to New Possibilities
Besides facilitating the accessibility of genomic sequencing, recent technological developments have introduced exciting new possibilities for CKB research, such as proteomics. The technology enables the measurement of thousands of proteins using minimal amounts of biological samples as low as 50 microlitres, such as blood plasma.
“Nearly all drugs target proteins—not genes,” Prof Chen enthused. “Hence, the ability to study proteins opens up new avenues for improving our understanding of disease etiology, and potentially changes the way we predict, diagnose, prevent and treat diseases.” Collaborations between Prof Chen and UK Biobank researchers have already identified many proteins that can be new therapeutic targets for ischaemic heart diseases[6] and used for predicting mortality[7] and disease risks.
Looking ahead, Prof Chen emphasises the importance of maintaining the cohort size for CKB to remain a valuable resource. “Around 100,000 of our participants have already passed away. As our cohort ages, we will lose more participants,” he noted. To mitigate this, one strategy would be to recruit an additional 50,000 to 100,000 participants, including family members, in the future.
However, expanding the cohort is only one part of ensuring CKB’s longevity. It is equally important for the team to continue turning samples into data and deepening what can be learnt from the high dimensional data for novel discovery. Therefore, efforts are simultaneously underway to convert CKB’s treasure trove of biosamples into data, as well as adopting technologies such as machine learning to facilitate data analysis.
While much work remains, Prof Chen and his team remain steadfast in their mission. “The scientific landscape has changed, and so has the depth and breadth of our work. But our principles remain the same—to continue advancing our ability to not just treat health conditions, but also predict and prevent them through analyses of high-dimensional, fully integrated data generated by CKB and other similar biobanks around the world,” he concluded.