Working the Nuts and Bolts in a Cohort Study

17 MAR 2023

The recent rollout of SG100K offers the potential to learn factors associated with health and diseases that Singaporeans—and possibly half the world’s population—are predisposed to. More excitingly, this knowledge promises to shape the precision medicine landscape.

Associate Professor Sim Xueling from the National University of Singapore, Saw Swee Hock School of Public Health shares insights on what it is like to be running Singapore Population Health Study which is part of Singapore’s largest longitudinal study—SG100K, the progress thus far, and her hopes for it.

What is the Thinking behind SG100K?

Human beings are generally quite similar. But there are exceptions where we see disease occurring more often in some than others. With cohort studies, we are trying to see these similarities and differences between populations and make adjustments to change the trajectory or improve the outcomes.

One of the things that is unique about SG100K is that our data spans across Chinese, Malay and Indian. This makes it potentially translatable to half the world. It also offers an opportunity to compare our findings against that of other international cohort studies. Hopefully, this puts us in a position to better predict health risks—and possibly, prevent and delay them from happening.

What does a Successful Cohort Study look like?

One of the earliest cohort studies in the world was the Framingham Heart Study. It was started in 1948, Massachusetts, with a cohort size of approximately 5,200 men and women. The cardiovascular risk prediction model which was first built from the study is still in use today.

There is a certain intricacy to ensuring that the cohort size is sufficiently large—because if there are only 100 persons in the research, estimates are less precise and it is hard to tell if this group of people coincidentally shares the same predisposition. That probability is reduced when the sample size is bigger. Also, having a certain size helps buffer against dropouts. Cross-checking against other similar studies provides a good gauge.

What does it take to run a Successful Cohort?

Running a cohort has many moving parts. Besides recruiting participants and inviting them back every five years, we need to put in place an infrastructure where data collected is comparable across different time periods and useful for anyone who has use for them. This can sometimes be tricky because five years is a long time and many things such as lifestyle behaviour, external environment, generational perspectives can change. For example, 10 years ago, there is little talk about wholegrain. Today, we have a whole range of wholegrain products from rice to pasta. These changes can have a significant impact on data collection and how they relate to what is happening.

Currently, we tackle these challenges by continuously reviewing what we have collected and scanning the environment to check the relevance of what we have collected or if there is a new set of data we should collect.

In addition, we actively encourage researchers to come on board the journey with us by using the data we have collected and suggesting to us what other data they might be interested in. For this to happen, we need to first make sure that our data is curated, reusable, and easily accessible. Data sharing rules and principles are also clearly outlined.

What are some outcomes we can expect from SG100K?

SG100K is a central part of the National Precision Medicine (NPM) Phase II with PRECISE. In the last five years, some of the SG100K cohorts were part of the NPM Phase I (SG10K). That has enabled us to lay the foundation for collaboration and data sharing among partner institutions. As we gain momentum on the SG100K study, we will continue to build on the learning and the infrastructure, and reinforce them. Particularly, we can look forward to larger and more diverse public and private partnerships, e.g., the establishment of the PRECISE-Illumina partnership agreement for the generation of whole genomes for SG100K.

When we run cohorts, there is always something new—be it a new specimen type or a new dimension of data. Interestingly though, while we are collecting and building these data resources for future use, we cannot really future proof what we have collected. Because we only know so much now and that influences the types of samples and data we collect now. There is also the issue of cost and time versus benefit. Sample and data collection can be costly because of the technology used to collect, process, store and extract data. Oftentimes, the collected biological specimens require long-term storage arrangement—this is another hidden cost.

Nonetheless, understanding why diseases develop in some people, but not others, is such a critical step to the development of new approaches to prevent, delay, diagnose and treat diseases. Hence, SG100K is really an important infrastructure resource for precision medicine which aligns with the Ministry of Health’s “Healthier SG” blueprint for preventive care. It enables us to track the cohort’s biomarkers and ask the tough question as to whether we can use precision medicine to change things for these people. Quite often, this involves a systemic change in healthcare—from the way things are done in clinics to how the population embraces the knowledge and changes its lifestyle beyond the healthcare settings.

Running a cohort is like running a never-ending experiment. There is never really quite an end to the experiment, as how people live constantly changes. There is always more to learn with each wave—that warrants a next wave. The same goes for cohort studies.