This blog is written based on my invited talk at the Molecular Epidemiology Research Lab, Max-Delbrück-Centrum für Molekulare Medizin (MDC) in Germany. I would like to thank Dr. Sara Moazzen (Postdoctoral Researcher) and Prof. Dr. Tobias Pischon (Molecular Epidemiology Research Group Leader) for the invitation and for arranging my talk.

Disclaimer: Blogs constitute only the opinion of the author.

Introduction

A complex disease, also called multifactorial disease, is a disease caused by a combination of genetic, environmental, and lifestyle factors, most of which have not yet been identified. Complex diseases account for 70% of all deaths globally. Common examples of complex diseases include common noncommunicable diseases, including cancer, diabetes, cardiovascular diseases, Parkinson’s disease, depressive disorders, and psychotic spectrum disorders.

Complex diseases are highly heterogeneous regarding signs and symptoms, underlying causal mechanisms, and the number of underlying genetic and nongenetic risk factors. Disease subtyping is a method for clustering patients into discrete homogeneous subgroups based on multiomics data – phenomics (e.g., clinical and imaging data), metabolomics, epigenomics, proteomics, transcriptomics, and genomics. It is a promising strategy for improving diagnosis, prediction, treatment, prevention, and prognosis.

The availability of high-throughput genome-wide profiling technologies, electronic health records (EHR), large-scale cohort and birth registry data, and novel data-driven methods, offers promise for successful subtyping of complex diseases and tailoring disease prevention and treatment considering individual differences. Data-driven methods, also known as unsupervised machine learning algorithms, include clustering models such as group-based trajectory modeling, latent class growth analysis, growth mixture modeling, general growth mixture modeling, latent class analysis, and latent transition analysis. iCluster and Biclustering methods are also contemporary approaches for data integration and disease subtyping.

Here, I summarized findings from eight systematic reviews on data-driven subtypes of diabetes mellitus, cancer, depressive disorders, Parkinson’s diseases, and schizophrenia spectrum disorders identified using multiomics data. I also highlighted challenges and potential solutions to harness the full potential of data-driven methods.

Type 2 diabetes mellitus

Diabetes Mellitus (DM) is a chronic, heterogeneous, and lifelong metabolic disorder mainly characterized by elevated blood glucose levels. In 2023, a study showed that about 529 million people are living with DM worldwide. The global prevalence of DM was 6·1%. By 2050, it is believed more than 1·31 billion people will have DM, and 89 (43·6%) of 204 countries and territories will have an incidence rate greater than 10%. The most common types of DM are Type 1 and Type 2 DM. Current DM management is challenging because of the heterogeneity in sample sociodemographic characteristics, illness severity and trajectory, duration of illness, diagnostic markers (e.g., glycated hemoglobin (HbA1c), insulin sensitivity, and body composition), and treatment response. Clinically meaningful subtypes of DM can be identified using data-driven statistical methods.

Sarría-Santamera and colleagues conducted a systematic review of 14 cross-sectional and longitudinal data-driven studies published to date among 130,353 diabetic patients (i.e., 130,082 diagnosed with type 2 diabetes mellitus) using multiomics data extracted from electronic health records, healthcare databases, and previously conducted longitudinal observational cohort studies and surveys.

Eight of the reviewed studies identified subtypes based on age at diagnosis, body mass index (BMI), glutamic acid decarboxylase antibody (GADA), glycated hemoglobin (HbA1c), homoeostasis model assessment 2 estimates of β-cell function (HOMA-2b) and homoeostasis model assessment 2 estimates of insulin resistance (HOMA-IR) data. Likewise, four studies used HbA1c, proinsulin, insulin, glucagon-like peptide-1 (GLP-1), glucose-dependent insulinotropic polypeptide (GIP), ghrelin, interferon-g, interleukin 10 (Il-10), antigen-specific autoantibodies (Aabs), islet antigen antibodies (IA-2Ab), glutamic acid decarboxylase 65 antibody, zinc transporter 8 antibody, and gastrointestinal symptoms (upper gastrointestinal dysmotility, diarrhea, constipation, nausea, vomiting) data. Moreover, one study used gender, BMI, total cholesterol, triglycerides, blood pressure, anti-glutamic acid decarboxylase (GAD) autoantibody, anti-islet antigen-2, anti-thyroid peroxidase, cumulative genetic score, insulin-free period. Another study used up to 73 variables for subtyping.

The reviewed studies identified two to five subtypes of DM, of which six of them identified five subtypes named “severe autoimmune diabetes (SAID)”, “severe insulin-deficient diabetes (SIDD)”, “severe insulin-resistant diabetes (SIRD)”, “mild obesity-related diabetes (MOD),” and “mild age-related diabetes (MARD)”. Patients with the SAID subtype were young, GADA-positive, and had high HbA1c, low BMI, and low HOMA-2b levels. The patients in the SIDD subtype had the same characteristics as SAID but were GADA-negative. Patients with SIRD subtype were young and had high BMI and HOMA-IR levels. Patients in the MOD subtype were young and had obesity and moderate insulin resistance. Finally, patients with the MARD subtype were old and had moderate metabolic dysregulations.

Cancer

According to a WHO report, cancer is the second leading cause of death worldwide. In 2020, about 10 million deaths were attributed to cancer, of which 70% were in low-and-middle-income countries. Cancer is mainly classified by pathologists based on histological appearance and site of origin. However, this only partially reflects the true heterogenic character of cancer. Cancer is the most studied disease through data-driven subtyping.

Zhao and colleagues conducted a review of 20 data-driven studies using microarray and RNA-seq, mutations, microRNAs (miRNAs), copy number variation (CNV) and DNA methylation data in patients mainly diagnosed with breast cancer, colorectal cancer, pancreatic ductal adenocarcinoma (PDAC), leukemia, lymphoma, pancreatic cancer, and glioblastoma.

Four studies identified four to ten subtypes of breast cancer using mRNA, CNV, Microarray, RNA sequencing, quantitative polymerase chain reaction (qPCR), NanoString, and Tissue microarray data. Despite inconsistent naming and number of subtypes by different studies, Zhao and colleagues conclude that breast tumors fall primarily into three major subtypes: luminal, HER2 overexpression, and triple-negative breast cancer (TNBC). Patients with the luminal subtype had a good prognosis and were responsive to hormone therapies. Patients with the HER2-overexpressing subtype are more sensitive to Herceptin (Trastuzumab) and chemotherapy than patients with the luminal subtype. Lastly, patients with TNBC subtype were resistant to standard targeted therapies, and had the worst prognosis.

Seven studies identified three to six subtypes of colorectal cancer using mRNA data. Despite these inconsistencies, the Colorectal Cancer Subtyping Consortium (CRCSC) classified into four robust subtypes of colorectal cancer. Patients with subtype 1 had high mutation, unstable microsatellite, and strong immune activation. Patients with subtype 2 had Wnt and Myc signaling activation, while subtype 3 had metabolic dysregulation. Patients with subtype 4 had transforming growth factor-β activation, stromal invasion, and angiogenesis signatures.

Moreover, three studies identified two to three subtypes of pancreatic ductal adenocarcinoma (PDAC) using miRNA and mRNA data, whereas two studies identified two to 16 subtypes of leukemia using mRNA and methylation data. Other studies identified two to four subtypes of pancreatic cancer, lymphoma, and lung cancer based on mRNA data. Finally, one study identified two subtypes of glioblastoma using miRNA data and another study identified 11 subtypes of cancer using Microarray, RNA sequencing, qPCR, NanoString, and Tissue microarray data. Patient characteristics in each subtype were not evaluated.

Major depressive disorder

Major depressive disorder (MDD) is an important contributor to the global burden of disease. According to a 2023 WHO report, more than 280 million people in the world have depression. Depression is about 50% more common among women than among men. Unfortunately, there is huge variation among individuals with the same diagnosis regarding their risk factors, symptom patterns, long-term course trajectories, and treatment responses. Previous evidence shows that the identification of homogeneous subgroups of patients or subtypes using data-driven methods can improve understanding of patient-specific diseases mechanisms and consequently, developing personalized diagnoses and treatments.

Beijers and colleagues conducted a systematic review of 29 cross-sectional and longitudinal data-driven studies published to date among individuals with depression using psychometric, biochemical, neuroimaging and genetic data.

Six out of the 29 reviewed studies consistently identified two subtypes (i.e., high and low) of depression based on biochemicals/metabolomics extracted from plasma (ACTH, TSH, T4, L-TRP, cortisol, DST, CRP), cerebrospinal fluid (5-HIAA, IAA, HVA, MHPG), urine (MHPG, epinephrine, norepinephrine, metanephrine, normetanephrine, vanillymandelic acid), and basal ganglia (glutamate). Overall, patients with a high biochemical depression subtype were females and had a high response to treatment, high frequency of psychomotor symptoms and BMI. Additionally, they experience more severe depression, more smoking, distinct quality of mood, early morning awakening, nonreactivity, decreased weight, lower AOO, lower functional integrity in left basal ganglia, and lower network connectivity. Patients with low biochemical depression subtype were tall, had a higher chance of suicide attempt and had no significant response to treatment.

Nine of the 29 reviewed studies depicted the existence of two to five subtypes of depression using clinician- or subject-scored psychometric symptom-based data assessed using various psychometric instruments. Most studies (4i.e., four out of nine) frequently identified two depression subtypes – high and low symptom frequency. Patients with severe depressive symptoms subtype had worse scores on all biochemical markers, older age, more DST non-suppressors, lower TSH and T3, more hospitalization frequency, and more melancholic specifier.

Furthermore, five studies identified two to four subtypes of depression using neuroimaging data assessed by using resting-state functional magnetic resonance imaging, canonical correlation analysis, and diffusion tensor imaging based on fractional anisotropy (FA) scores of white matter. Almost all (i.e., four out of five) studies identified two subtypes of depression – high and low connectivity subtypes. Patients with low network connectivity subtype had older age, shorter duration, less severe manifestation, and more severe affective symptoms. Patients with the high network connectivity subtype were female and young, and had more comorbid anxiety, longer duration of illness, and more severe manifestation.

Finally, one of the reviewed studies identified two subtypes of depression using SNPs associated with major depressive disorder, whereas another study identified five subtypes of depression by combining sociodemographic data (sex and age), clinical questionnaire scores (BDI, MINI, HAMD, CATS, other psychiatric variables), resting-state functional connectivity measures (BOLD correlations between ROI based on ICA), and various biomarkers (plasma BDNF and cortisol levels, SNPs and DNA methylation for BDNF and serotonin genes). Patient characteristics in each subtype were not evaluated.

Schizophrenia spectrum disorders

Schizophrenia is one of the most common and severe psychiatric disorders with a lifetime prevalence of approximately 1%. Globally, schizophrenia cases almost doubled from 13.1 million in 1990 to 20.9 million in 2016, and it has contributed to 13.4 million years of life lived with disability, which brings a substantial burden on patients, families, community and healthcare systems. Heterogeneity in the traditional classification of mental disorders, including schizophrenia spectrum disorders (SSD) is the main challenge in research and clinical practice.

Habtewold and colleagues conducted a systematic review of 53 cross-sectional and longitudinal data-driven studies published to date among adult patients diagnosed with schizophrenia spectrum disorders (SSD) using positive and negative symptoms and cognitive performance.

Four studies identified two to five subtypes of SSD based on positive symptoms. Despite these inconsistencies, the studies consistently reported four subtypes – progressively ameliorated (22% – 87%), slowly ameliorated (12%), relapse (15%), and progressively deteriorated (8.3% – 13%). Patients with the “progressively ameliorated” subtype had a shorter duration of untreated psychosis (DUP) and less frequent use of cannabis, whereas “progressively deteriorated” had longer DUP, low global functioning, and frequent substance use including cannabis. Patients with the “slowly ameliorated” subtype were male, achieved secondary education, had longer DUP, frequent schizophrenia diagnosis, and frequent use of cannabis. Finally, patients “relapse” subtype had longer DUP and frequent use of substance use.

Ten studies identified three to five subtypes based on negative symptoms. Even though there was a huge inconsistency among reports, identified subtypes had four main features – “minimal” (6.2% – 84%), “minor” (5.9% – 55.6%), “moderate” (1.2% – 37.3%), and “severe” (5.4% – 27%). Patients with “minimal” negative symptoms subtype were young, female, Caucasian, married or living together, and live independently. They also had high educational status, quality of life, neurocognitive performance, and social, occupational, and global functioning. Moreover, they had a late age of disease onset and low dosage of chlorpromazine equivalent. Patients with “minor” negative symptoms subtype were male and frequently diagnosed with schizophrenia. Besides, they had primary education, low premorbid social function, high frequency of unemployment, frequent and long duration of hospitalization, poor work, social, and family role functioning, poor independent living skills, and severe disorganized symptoms and general psychotic symptoms. Patients with “moderate” negative symptoms subtype were old, male, unemployed, attained primary and secondary education, and had inadequate social contact, long duration of untreated psychosis. They were frequently diagnosed with schizophrenia, brief psychotic disorder and psychotic disorder not otherwise specified and received standard treatment. Finally, patients with “severe” negative symptoms subtype were male and frequently diagnosed with schizophrenia. They also had family history of non-affective psychosis, high levels of psychotic (i.e., delusion and disorganized symptoms) and depressive symptoms, early illness onset and age at first psychiatric treatment, few episodes of psychiatric hospitalization and poor psychosocial and global functioning, premorbid adjustment, quality of life and cognitive performance.

By combining positive and negative symptoms, 11 studies identified two to five subtypes of SSD. Despite the inconsistencies among reports, the reported subtypes had four main features – “minimal” (2.4% – 34.0%)), “minor” (1.4% – 90.6%), “moderate” (4.1% – 80.6%), and “severe” (11.9% – 44.0%). Patients with the “minimal” symptoms subtype were female, young, and Hispanics, treated with ziprasidone, and live independently. They also had involuntary admission and high social, occupational, and global functioning. %). Patients with the “minor” symptoms subtype were male, old, Caucasian, and treated with quetiapine or risperidone. Also, they had severe depressive symptoms, high baseline weight, high treatment tolerability, and longer duration of untreated psychosis, duration of illness, and duration of current hospitalization. Patients with the “moderate” symptoms subtype were young, White, and treated with aripiprazole and olanzapine. They also had more extra-pyramidal symptoms and depressive symptoms, early age at disease onset, antipsychotic premedication use, and a high number of previous hospitalizations and exacerbations. Patients with the “severe” symptom subtype had a late age of disease onset and low metacognition.

A total of 23 studies reported three to five subtypes of SSD based on cognitive performance, with most of them identifying three subtypes – “no cognitive deficit” (17.4% – 58.4%), “partial/intermediate cognitive deficit” (6.7% – 48.4%), and “global cognitive deficit” (9.2% – 29.2%). Patients with the “no cognitive deficit” subtype were young, highly educated, and frequently diagnosed with affective spectrum disorders, such as bipolar disorder. Additionally, they had minimal positive and negative symptoms, and high socioeconomic status, premorbid IQ, and socio-occupational and community functioning. Patients with the “partial/intermediate cognitive deficit” subtype were male, Caucasian, and unemployed prior to disease onset. They also had older age of disease onset, high polygenic risk scores for schizophrenia and attention deficit disorder, severe anxiety, depression, mania, psychotic, and general psychopathology symptoms, childhood learning difficulties, and poor premorbid adjustment, and premorbid cognitive and social functioning. Moreover, they had severe physical anergia, chronic illness course and received high‐dose chlorpromazine. Patients with the “global cognitive deficit” subtype were male and old. In addition, they had low IQ, low educational status, high unemployment, severe mania, positive and negative symptoms, low socio-occupational and global functioning, and were treated with high‐dose chlorpromazine.

Parkinson’s disease

Parkinson’s disease (PD) is the second most common progressive neurodegenerative disorder. The overall age-standardized incidence rate was 13.43/100,000 in 2019, and it increased with an annual average of 0.61% from 1990 to 2019. The overall age-standardized prevalence was 106.28/100,000 in 2019, and it increased with an annual average of 0.52 from 1990 to 2019. The cardinal symptom of PD includes motor symptoms (tremor, rigidity, bradykinesia, postural instability, and gait disorders) and non-motor symptoms (sleep–wake cycle disorders, cognitive impairment, mood, and affective disorders, autonomic dysfunction, and sensory symptoms and pain). The clinical variability between patients with diagnosed PD suggests the existence of subtypes of the disease. Several studies have attempted to subtype PD based on its clinical manifestations.

Recently, Lee and colleagues reviewed data-driven subtyping studies among patients with PD according to clinical, neuroimaging, biochemical, genetic and transcriptomic data. The reviewed studies identified two to four subtypes based on motor symptoms, two subtypes based on non-motor symptoms, and two to four subtypes by combining motor and non-motor symptoms. Two to three subtypes of PD were identified based on structural and functional neuroimaging data. Moreover, two to four subtypes were identified using genetic data, and three subtypes were identified using biochemical/metabolomic data.

Despite these inconsistencies, the reviewed studies consistently reported two subtypes of PD – tremor-dominant (TD) and postural instability and gait difficulty (PIGD). Patients with TD subtype had a severe olfactory impairment, decreased interoceptive accuracy and sensibility, and increased regional homogeneity value in the right parahippocampal gyrus. Patients with the PIGD subtype mainly had severe changes in spatiotemporal parameters during gait, greater loss of functional connectivity in the cerebellum, and lack of an endogenous defense system to prevent oxidative stress from damaging and destroying dopaminergic cells in the substantia nigra.

Another contemporary systematic review of data-driven studies by Pourzinal and colleagues unraveled severity-based and domain-based subtypes of PD based on cognitive impairment. Severity-based subtyping studies (9 studies) identified three to five subtypes, which majority revealed three subtypes ranging from cognitively intact to severely impaired. Patients with severe cognitive impairment subtype were old and had lower education. PD patients with minimal cognitive subtype had lower disease duration and disease severity, functional disability, and levodopa intake. Patients with moderate cognitive impairment expressed greater depression, apathy, and anxiety symptoms.

Domain-based studies (11 studies) identified two to six subtypes, while most studies reported cognitively intact, memory deficit, executive deficits and globally impaired subtypes. Patients with memory deficits subtype were male and old, and had later age at onset and higher disease severity. Patients with memory deficit subtype had reduced functional connectivity within posterior regions of the default mode network and visuospatial network, volumetric grey matter alterations in the amygdala, right rectal gyrus, and right middle occipital gyrus, unique temporoparieto-occipital atrophy. They also had cortical thinning in precentral, posterior cingulate, and parahippocampal gyri, cuneus, and inferior and superior parietal areas, and reduced bilateral entorhinal-hippocampal (ERC-HIPP) white matter connectivity. Patients with executive deficits subtype generally had younger age, earlier age at onset, higher education, prominent frontal cortical thinning, lower total brain, putamen, and thalamus volume, reduced right ERC-HIPP and dorsolateral-prefrontal cortex to caudate nucleus (DLPFC-CN) connectivity and cortical atrophy, and increased cerebellum grey matter volume. Patients with globally impaired subtypes had older age, lower education, later age at onset, and greater motor symptoms, disease severity, and disease duration. Neurodegeneration was also rampant in this subtype, with widespread atrophy across frontal, posterior, and subcortical regions of the brain characteristic of this subtype. Patients with cognitively intact subtype had younger age, higher education, and less severe motor symptoms, disease severity, and disease duration. They also had a relatively high functional and structural integrity of the brain.

van Rooden and colleagues also previously conducted a systematic review of seven data-driven studies among patients with PD. Six of the seven studies used motor symptoms for subtyping, five studies used motor symptoms and measures of cognition, and five studies used depressive symptoms, age-at-onset, measure of disease progression. Broadly, the reviewed studies reported two to five subtypes of Parkinson’s disease. Six of the seven studies identified “old age-at-onset and rapid disease progression” PD subtype characterized by predominance bradykinesia/rigidity, axial impairment, bilateral PD signs at onset, frequent symptomatic orthostasis, low LDOPA dose, short disease duration, higher Hoehn and Yahr stage, higher level of disability, low-level quality of life (physical). Five of the seven studies identified the “young age-at-onset and slow disease progression” subtype characterized by predominance of tremors, absence of gait disturbance, unilateral PD signs at onset, severe motor complications, large proportion using DA, relatively long disease duration, and younger age.

Conclusions

Clustering statistical methods have great potential for identifying latent disease subtypes, unraveling heterogeneity, and providing insights into clinically meaningful phenotypes. This can be beneficial for monitoring patient progression, controlling diseases, and ultimately leading to accurate diagnosis and treatment selection. Integrating omics data for subtyping diseases can offer even better insights into disease biology and more precise predictions compared to analyzing single-domain data. As high-throughput profiling technologies advance, the costs of sample analysis are decreasing, making multi-platform identification and characterization of diseases more commonplace. Clinicians are increasingly using cancer subtyping studies, from high-throughput molecular data to marker panels developed using low- and medium-throughput methods, to inform treatment decisions for cancer patients. Subtyping studies can help select a subset of patients who may benefit from specific drugs or therapies. For example, Rouzier et al. examined the response of four subtypes of breast cancer to chemotherapy and found that the basal-like and ERBB2-overexpressing subtypes were more sensitive to paclitaxel- and doxorubicin-containing preoperative chemotherapy compared to the luminal and normal-like subtypes.

The identification of complex disease subtypes has several advantages. Firstly, it is important for researching the causes and mechanisms of the disease, as patients with similar characteristics are more likely to have similar genetic and pathological features. Secondly, identifying subtypes can help in developing tailored management strategies for patient care. Thirdly, these subtypes can be used in designing clinical trials, although this is a relatively new practice. Fourthly, gene set enrichment analysis (GSEA) is often used to understand the biology underlying the identified subtypes. GSEA examines gene expression data at the level of gene sets, which are groups of genes that share the same biological function, chromosomal location, or regulation. However, there are significant inconsistencies in cluster results between studies, and the validity and generalizability of distinct subtypes can be questionable. This may be partly due to differences in the methodological approach to the subtyping process. Additionally, varying characteristics of study populations, sample sizes, and variables included in the analysis are contributing factors. The optimal number of variables, which provides a balance between validity and economic efficiency in clustering patients, is also an important issue that requires further investigation.

To advance our knowledge of subtypes of complex diseases and effectively utilize data-driven methods, it is important to consider the following recommendations.

First and foremost, it is crucial to select a sample of patients with a similar duration of illness. This ensures that the data we analyze is consistent and comparable.

Secondly, it is essential to carefully choose a set of clinically relevant variables that represent the full spectrum of diseases and are conceptually similar. These variables should also be capable of distinguishing between different subtype profiles. Additionally, it is equally important to ensure rigorous data processing and quality control to maintain the accuracy and reliability of our findings.

When using omics data for subtyping, it is necessary to integrate and curate the data extensively. We should also conduct studies on the role of omics variables in understanding the pathogenesis of specific diseases of interest.

Thirdly, it is important to acknowledge the limitations of any single data-driven method and validate our results by employing another method that does not share these limitations. Cluster analysis methods, for instance, produce different outcomes in terms of cluster numbers and assignments due to their varied algorithms. To enhance the robustness of clustering, we can consider methods like cluster ensemble or consensus clustering. These methods combine results from multiple runs of clustering methods to derive a consensus result. This helps to reduce bias and increase the reliability of our conclusions.

Fourthly, we should critically evaluate whether the identified subtypes are statistically and clinically/biologically meaningful and interpretable. Additionally, we should assess whether these subtypes differ with respect to variables that were not included in the initial clustering analysis. This comprehensive evaluation ensures that our findings are valid and relevant.

Lastly, we should validate our results in independent and more diverse populations. This ensures that our findings are generalizable and applicable to a wider range of individuals.

It is worth noting that current clustering methods involve an iterative process that stops when an optimal solution is achieved. However, this optimal solution may not be the best solution among all possible outcomes; it merely represents a local optimum. The process of partitioning is sensitive to the starting points, so to mitigate the risk of ending up with a local optimum, clustering can be repeated multiple times with randomly chosen starting points. The optimal solution can then be selected from these repeated runs.

Out of the six studies mentioned, only five performed independent validation of the results of cluster analysis on a separate sample. One study, on the other hand, split the database into training and test datasets to replicate findings. This demonstrates the importance of validating our results using independent samples to strengthen the reliability of our findings.

References

  1. Craig, J. Complex diseases: Research and applications. Nature Education 2008; 1(1):184.
  2. Biswas S, Hasija Y. Big data analytics in precision medicine. In Big Data Analytics for Healthcare 2022:63-72. Academic Press.
  3. Johansson Å, Andreassen OA, Brunak S, Franks PW, Hedman H, Loos RJ, Meder B, Melén E, Wheelock CE, Jacobsson B. Precision medicine in complex diseases—Molecular subgrouping for improved prediction and treatment stratification. J Intern Med. 2023; 0: 1-9.
  4. Xie J, Ma A, Fennell A, Ma Q, Zhao J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform. 2019; 20(4):1450-65.
  5. Nguyen T, Tagett R, Diaz D, Draghici S. A novel approach for data integration and disease subtyping. Genome Res. 2017;27(12):2025-39.
  6. Muthén B, Muthén LK. Integrating person‐centered and variable‐centered analyses: Growth mixture modeling with latent trajectory classes. Alcohol Clin Exp Res. 2000;24(6):882-91.
  7. Ong KL, Stafford LK, McLaughlin SA, Boyko EJ, Vollset SE, Smith AE, Dalton BE, Duprey J, Cruz JA, Hagins H, Lindstedt PA. Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet. 2023; 402(10397): 203–234.
  8. Sarría-Santamera A, Orazumbekova B, Maulenkul T, Gaipov A, Atageldiyeva K. The identification of diabetes mellitus subtypes applying cluster analysis techniques: a systematic review. Int J Environ Res Public Health. 2020; 17(24):9523.
  9. World Health Organization. Cancer. Accessed on August 13, 2023.
  10. Zhao L, Lee VH, Ng MK, Yan H, Bijlsma MF. Molecular subtyping of cancer: current status and moving toward clinical applications. Brief Bioinform. 2019; 20(2):572-584.
  11. World Health Organization. Depressive disorder (depression). Accessed on August 13, 2023.
  12. Beijers L, Wardenaar KJ, van Loo HM, Schoevers RA. Data-driven biological subtypes of depression: systematic review of biological approaches to depression subtyping. Mol Psychiatry. 2019;24(6):888-900.
  13. Van Loo HM, De Jonge P, Romeijn JW, Kessler RC, Schoevers RA. Data-driven subtypes of major depressive disorder: a systematic review. BMC Med. 2012;10:156.
  14. Charlson FJ, Ferrari AJ, Santomauro DF, Diminic S, Stockings E, Scott JG, McGrath JJ, Whiteford HA. Global epidemiology and burden of schizophrenia: findings from the global burden of disease study 2016. Schizophr Bull. 2018; 44(6):1195-1203.
  15. Habtewold TD, Rodijk LH, Liemburg EJ, Sidorenkov G, Boezen HM, Bruggeman R, Alizadeh BZ. A systematic review and narrative synthesis of data-driven studies in schizophrenia symptoms and cognitive deficits. Transl Psychiatry. 2020; 10(1):244.
  16. Habtewold TD, Hao J, Liemburg EJ, Baştürk N, Bruggeman R, Alizadeh BZ. Deep Clinical Phenotyping of Schizophrenia Spectrum Disorders Using Data-Driven Methods: Marching towards Precision Psychiatry. J Pers Med. 2023; 13(6):954.
  17. Ou Z, Pan J, Tang S, Duan D, Yu D, Nong H, Wang Z. Global trends in the incidence, prevalence, and years lived with disability of Parkinson’s disease in 204 countries/territories from 1990 to 2019. Front Public Health. 2021; 9:776847.
  18. Lee SH, Park SM, Yeo SS, Kwon O, Lee MK, Yoo H, Ahn EK, Jang JY, Jang JH. Parkinson’s disease subtyping using clinical features and biomarkers: literature review and preliminary study of subtype clustering. Diagnostics (Basel). 2022; 12(1):112.
  19. Pourzinal D, Yang J, Lawson RA, McMahon KL, Byrne GJ, Dissanayaka NN. Systematic review of data‐driven cognitive subtypes in Parkinson disease. Eur J of Neurol. 2022; 29(11):3395-417.
  20. Van Rooden SM, Heiser WJ, Kok JN, Verbaan D, Van Hilten JJ, Marinus J. The identification of Parkinson’s disease subtypes using cluster analysis: a systematic review. Mov Disord. 2010; 25(8):969-78.