Multiomics-based data-driven analysis unravels subtypes of complex diseases

This blog is written based on my invited talk at the Molecular Epidemiology Research Lab, Max-Delbrück-Centrum für Molekulare Medizin (MDC) in Germany. I would like to thank Dr. Sara Moazzen (Postdoctoral Researcher) and Prof. Dr. Tobias Pischon (Molecular Epidemiology Research Group Leader) for the invitation and for arranging my talk.

Disclaimer: Blogs constitute only the opinion of the author.

Introduction

A complex disease, also known as a multifactorial disease, is a disease that results from a combination of genetic, environmental, and lifestyle factors, many of which have yet to be identified. Complex diseases account for 70% of all global deaths¹. Common examples of complex diseases include noncommunicable diseases such as cancer, diabetes, cardiovascular diseases, Parkinson’s disease, depressive disorders, and psychotic spectrum disorders¹.

Complex diseases exhibit a high degree of heterogeneity in terms of signs, symptoms, underlying causal mechanisms, and the number of genetic and non-genetic risk factors involved¹. Heterogeneity in the classical classification of complex diseases is the main challenge in research and clinical practice. The clinical variability between patients with diagnosed these complex diseases suggest the existence of subtypes of the disease. Diseases management is challenging because of the heterogeneity. Diseases classification by clinicians based on clinical manifestations, physical examinations, blood assay results, and histological appearance cannot capture the full picture of the diseases. However, this only partially reflects the true heterogenic character of cancer. Several contemporary studies have attempted to subtype complex disease using different kinds of data. Disease subtyping, which involves grouping patients into distinct subgroups based on phenomic, metabolomic, epigenomic, proteomic, transcriptomic, and genomics data, holds promise is a promising strategy for improving diagnosis, prediction, treatment, prevention, and prognosis^2,3.

The availability of advanced technologies for genome-wide profiling, electronic health records (EHR), large-scale cohort and birth registry data, and novel data-driven methods provides potential for successful subtyping of complex diseases and tailoring disease prevention and treatment based on individual differences⁴. Classical data-driven methods includes group-based trajectory modeling, latent class growth analysis, growth mixture modeling, general growth mixture modeling, latent class analysis, and latent transition analysis³. iCluster and Biclustering methods are also contemporary approaches for big data integration and disease subtyping^4,5.

Over the past decade, data-driven approaches have proven effective in exploring heterogeneity and biological drivers of complex diseases⁶. Additionally, these approaches can aid in identifying high-risk population groups, selecting interventions for specific patient groups with similar phenotypes, and evaluating of patient prognosis⁷. Thus, data-driven methods have the potential to contribute to precision medicine by addressing challenges related to heterogeneity in diagnostic and treatment selection. Data-driven clustering methods are used to identify subgroups within a sample on the basis of the observed data only, also known as unsupervised classification. These methods differ from supervised classification, in which an algorithm predicting group membership is developed in a sample in which membership is known, and then applied to new samples in which membership is unknown.

In this correspondence, we share our experience and provide a summary of multiomics-based data-driven subtypes of complex diseases: schizophrenia, depressive disorders, Parkinson’s diseases, diabetes, and cancer. These diseases are highly prevalent and brings a substantial burden on patients, families, community, and healthcare systems^8-10. We also discuss the challenges associated with data-driven methods application and highlight solutions to fully harness their capabilities.

Precision medicine and multiomics data

The term precision medicine received wider attention in January 2015 as President Obama unveiled plans for a national “precision medicine initiative” to promote the development and use of genomic tools in health care¹¹. Precision medicine is an emerging approach for deep characterization of diseases¹², precisely assessing patients’ risk and prognosis^13,14, and precisely prescribing the right drugs to the right patients¹⁵ considering individual variability in molecular biomarkers, environment and lifestyle¹⁶. Precision medicine offers a remarkable opportunity to help physicians better comprehend and practice medicine, predict the needs of their patients, share health data, improve health and address health disparities ^13,14,16,17. The diversity of data for precision medicine can be ensured by including diverse source of population and collecting diverse type of data including exposome or environtome¹⁸, metabolomics¹⁹, and proteomics¹⁸. Precision medicine will continue to transform healthcare in the coming decade as it expands in key areas: huge cohorts, artificial intelligence (AI), routine clinical genomics, phenomics and environment, and returning value across diverse populations²⁰.

Over the past 20 years, advances in genomic technology have enabled unparalleled access to the information contained within the human genome and provided rich evidence for the implementation of precision medicine. However, the multiple genetic variants associated with various diseases typically account for only a small fraction of the disease risk and the expression of our genes fluctuates over time and in response to the environment. This may be due to the multifactorial nature of disease mechanisms, strong impact of the environment, sociocultural factors, gene-environment interactions and epigenomics, and change in metabolomics and proteomics^19,21. Thus, the ability to combine and harness the explosion of omics data may offer additional insights to precision medicine and will be critical to improving treatments for patients.

There are contemporary studies that shows precision medicine studies already gone beyond genomics and adapt a panoptic view through deep phenotyping using clinical laboratory tests, metabolomics technologies, and advanced noninvasive imaging data from diverse population^22-25. For example, a three years precision medicine study depicted that integrating whole-genome sequencing and deep phenotyping of metabolomics, advanced imaging, and clinical laboratory tests in addition to family/medical history helps to identify a high percentage of genotype and phenotype associations in dyslipidemia, cardiomyopathy, arrhythmia, and other cardiac diseases, and diabetes and endocrine diseases in adults²². Another evidence showed that the accuracy and utility of current diabetes prediction models might improve outcomes and can ensure precision medicine when genetic risks combined with clinical risk factors, age, race or ethnicity, and natural history of disease²⁶.

No field in science and medicine today remains untouched by big data.¹⁸ Precision medicine lends itself to big data or “informatics” approaches and is focused on storing, accessing, sharing, and studying these data while taking necessary precautions to protect patients’ privacy²⁷. The promises of precision medicine will be more quickly realized by expanding collaborations to rapidly process and interpret the growing volumes of omics data²⁷. A confluence of biological, physical, engineering, computer, and health sciences is setting the stage for a transformative leap toward data-driven, mechanism-based health and health care for each individual for better control of chronic disease; smaller, faster, and more successful clinical trials; and avoidance of unnecessary tests and ineffective therapies, the slope of the health care–cost curve could decline²⁸.

Subtypes of complex diseases

Schizophrenia: Habtewold and colleagues⁶ conducted a systematic review of 53 cross-sectional and longitudinal data-driven studies among adult patients diagnosed with schizophrenia spectrum disorders (SSD) (Table 1). Four studies identified two to five subtypes of SSD based on positive symptoms. Despite these inconsistencies, the studies consistently reported four subtypes. Ten studies identified three to five subtypes based on negative symptoms. Even though there was a huge inconsistency among reports, identified subtypes had four main features. By combining positive and negative symptoms, 11 studies identified two to five subtypes of SSD. Despite the inconsistencies among reports, the reported subtypes had four main features. Moreover, 23 studies reported three to five subtypes of SSD based on cognitive performance, with most of them identifying three subtypes. Patients with different subtypes of SSD had different sociodemographic characteristics, clinical outcomes, and daily functioning and quality of life levels.

Major depressive disorder: Beijers and colleagues²⁹ conducted a systematic review of 29 cross-sectional and longitudinal data-driven studies among individuals with depression using psychometric, biochemical, neuroimaging and genetic data (Table 1). Six out of the 29 reviewed studies consistently identified two subtypes (i.e., high and low) of depression based on biochemicals/metabolomics extracted from plasma, cerebrospinal fluid, urine, and basal ganglia. Nine of the 29 reviewed studies depicted the existence of two to five subtypes of depression using clinician- or subject-scored psychometric symptom-based data assessed using various psychometric instruments. Most studies (i.e., four out of nine) frequently identified two depression subtypes. Furthermore, five studies identified two to four subtypes of depression using neuroimaging data assessed by using resting-state functional magnetic resonance imaging, canonical correlation analysis, and diffusion tensor imaging based on fractional anisotropy (FA) scores of white matter. Almost all (i.e., four out of five) studies identified two subtypes of depression. Finally, one of the reviewed studies identified two subtypes of depression using SNPs associated with major depressive disorder, whereas another study identified five subtypes of depression by combining sociodemographic data, clinical questionnaire scores, resting-state functional connectivity measures, and various biomarkers.

Parkinson’s disease: Recently, Lee and colleagues³⁰ reviewed data-driven subtyping studies among patients with PD according to clinical, neuroimaging, biochemical, genetic and transcriptomic data (Table 1). The reviewed studies identified two to four subtypes based on motor symptoms, two subtypes based on non-motor symptoms, and two to four subtypes by combining motor and non-motor symptoms. Two to three subtypes of PD were identified based on structural and functional neuroimaging data. Moreover, two to four subtypes were identified using genetic data, and three subtypes were identified using biochemical/metabolomic data.

Another contemporary systematic review of data-driven studies by Pourzinal and colleagues³¹ unraveled severity-based and domain-based subtypes of PD based on cognitive impairment. Severity-based subtyping studies (9 studies) identified three to five subtypes, which majority revealed three subtypes ranging from cognitively intact to severely impaired. Domain-based studies (11 studies) identified two to six subtypes, while most studies reported four subtypes. van Rooden and colleagues³² also previously conducted a systematic review of seven data-driven studies among patients with PD. Six of the seven studies used motor symptoms for subtyping, five studies used motor symptoms and measures of cognition, and five studies used depressive symptoms, age-at-onset, measure of disease progression. Broadly, the reviewed studies reported two to five subtypes of Parkinson’s disease. Six of the seven studies identified “old age-at-onset and rapid disease progression” PD subtype. Five of the seven studies identified the “young age-at-onset and slow disease progression” subtype.

Diabetes mellitus: Sarría-Santamera and colleagues³³ conducted a systematic review of 14 cross-sectional and longitudinal data-driven studies among diabetic patients using multiomics data extracted from electronic health records, healthcare databases, and previously conducted longitudinal observational cohort studies and surveys (Table 1). Eight out of 14 reviewed studies used clinical data and biomarkers for subtyping. Four studies used biomarkers for subtyping. One study used sociodemographic, clinical and biomarkers data for subtyping. One study used 73 variables for subtyping. The reviewed studies identified two to five subtypes of DM, of which 6 of them identified five subtypes.

Cancer: Zhao and colleagues³⁴ conducted a review of 20 data-driven studies using microarray and RNA-seq, mutations, microRNAs (miRNAs), copy number variation (CNV) and DNA methylation data in patients mainly diagnosed with breast cancer, colorectal cancer, pancreatic ductal adenocarcinoma (PDAC), leukemia, lymphoma, pancreatic cancer, and glioblastoma (Table 1). Four studies identified four to ten subtypes of breast cancer using molecular data. Despite inconsistent naming and number of subtypes by different studies, Zhao and colleagues conclude that breast tumors fall primarily into three major subtypes. Seven studies identified three to six subtypes of colorectal cancer using mRNA data. Despite these inconsistencies, the Colorectal Cancer Subtyping Consortium (CRCSC) classified into four robust subtypes of colorectal cancer. Moreover, three studies identified two to three subtypes of pancreatic ductal adenocarcinoma (PDAC) using miRNA and mRNA data, whereas two studies identified two to 16 subtypes of leukemia using mRNA and methylation data. Other studies identified two to four subtypes of pancreatic cancer, lymphoma, and lung cancer based on mRNA data. Finally, one study identified two subtypes of glioblastoma using miRNA data and another study identified 11 subtypes of cancer using Microarray, RNA sequencing, qPCR, NanoString, and Tissue microarray data. Patient characteristics in each subtype were not evaluated.

Conclusions

In general, it is undeniable that the utilization of clustering statistical methods holds significant potential in the identification of latent disease subtypes. This can be instrumental in unraveling heterogeneity and gaining valuable insights into clinically relevant phenotypes. Moreover, employing such methods aids in monitoring the progression and management of patients, ultimately leading to accurate diagnosis and treatment selection. Integrating omics data to subtype diseases offers even greater understanding of disease biology and enhances predictive accuracy compared to analyzing data from a single domain. The decreasing costs associated with high-throughput profiling technologies enable the multi-platform identification and characterization of diseases to become more common. For instance, the integration of high-throughput molecular data and molecular subtyping in cancer research has paved the way for the development of marker panels through low- and medium-throughput methods. As a result, clinicians are increasingly embracing and basing treatment decisions for cancer patients on cancer subtyping studies.

Identification of subtypes of complex diseases offers several advantages in an academic context. Firstly, it is crucial for research on the etiology and underlying pathophysiological mechanisms of the diseases. Homogeneous groups of patients are more likely to exhibit similar pathological and genetic features, allowing for more accurate investigations. Secondly, the identification of subtypes can inform tailored management strategies for patient care. An example of this is the response to chemotherapy in different subtypes of breast cancer, which can be determined through hierarchical clustering with the breast cancer intrinsic gene set³⁵. Additionally, the response to antipsychotic medication can vary among subtypes of schizophrenia, as indicated by differences in positive and negative symptom scores. Thirdly, the identification of subtypes can facilitate the design of clinical trials, although this is a relatively recent development³⁶. This has been particularly prevalent in the field of cancer care. Lastly, it enhances the ability to perform gene set enrichment analysis to characterize the underlying biology of the identified subtypes, particularly in cases where poor health outcomes are observed.

Despite these advantages, there are significant inconsistencies in cluster results among studies, and the identification of distinct subtypes can be questionable in terms of replication, generalizability, and meaningfulness in both conceptual and clinical terms. The main challenge lies in attributing meaning to data-driven clusters, as investigators must rely on subjective interpretations based on theory and fit-to-data in the absence of ground truth. This challenge may be partly related to methodological differences in study design and the subtyping process. Moreover, the inconsistent findings can be attributed to the large variability in characteristics of study populations, sample size, variables included (such as a mixture of clinical and nonclinical variables, and the presence or absence of scaling and transformation) in the subtyping analysis, and measurement instruments. Another important issue that requires further investigation is the determination of the optimal number and type of variables that strike a balance between validity and the economic efficiency of clustering patients. Furthermore, the optimal utilization of genetic data or broader multiomics integration has been primarily explored in cancer subtyping, with less application in other complex diseases.

To effectively enhance our understanding of subtypes of complex diseases and fully capitalize on the benefits of data-driven approaches using multiomics data, it is imperative to take into account the following recommendations.

First, it is essential to conduct a comprehensive evaluation to determine the statistical and clinical/biological significance of the identified subtypes. Additionally, it is necessary to assess whether these subtypes exhibit differences in variables that were not considered during the clustering analysis. Prior to any data analysis, it is crucial to incorporate a relevant theoretical framework, as humans have a tendency to assign meaning to observed patterns once they have been recognized³⁷. In other words, the process of identifying and interpreting clusters alone may not be highly persuasive, whereas predicting them based on existing theories would be more compelling³⁷.

Second, it is crucial to carefully choose a sample of patients who possess comparable clinical characteristics, such as the duration of their illness. It is also essential to meticulously select a set of clinically similar variables that adequately capture the range of diseases and can effectively differentiate between different subtypes. It is equally important to employ rigorous data processing and quality control measures, in addition to selecting a homogeneous group of variables. In the case of subtyping based on omics data, a more extensive integration and curation of data is required, along with studies investigating the role of omics variables in the pathogenesis of specific diseases of interest. At present, an enormous amount of unstructured data is generated from the free text of patient records. Given the availability of abundant publicly accessible datasets and various data processing tools that can be utilized to address such inquiries, researchers should fully capitalize on these resources.

Third, consider the limitations of a single data-driven method into account and validate the results by applying another similar method that does not have these limitations. Given that cluster analysis methods are based on different algorithms, they yield different results in terms of cluster numbers and assignments. To enhance the robustness of clustering, a method called cluster ensemble has been proposed, which combines results from different runs of clustering methods into a single consensus result³⁸. Another similar methodology is consensus clustering, which in conjunction with resampling techniques provides a method to reach consensus from multiple runs of the same clustering method³⁹.

Fourth, it is crucial to validate the findings in independent, more diverse, and larger populations. Current clustering methods follow an iterative process and terminate when an optimal solution is obtained. However, this optimum may not necessarily be the best solution among all possible options, but rather represents a local optimum. The partitioning process is sensitive to initial starting points. To mitigate the risk of converging to a local optimum, the clustering can be repeated multiple times using different random starting points, and then the optimal solution is selected. Researchers often employ subtype characterization as a means of validating the identified subtypes. Such characterizations are both necessary and important. They not only enhance our understanding of subtype characteristics but also provide a validation process for the subtypes. Ideally, distinct molecular and clinical characteristics should exist between the identified subtypes. However, it is common for subtypes to only exhibit statistical differences rather than biological ones. In such cases, re-clustering and reclassification should be performed until more interpretable results are obtained. Only a limited number of studies have conducted validation of their results using independent samples, while others have split their databases into training and test datasets to replicate their findings. The validation of results through external validation using independent samples or cross-validation within a dataset is crucial in assessing the quality of the data-driven analysis conducted. Investigators must exercise caution to avoid the problem of “double dipping” when splitting their dataset into training and validation datasets. It is also customary for researchers to use their own dataset as the training dataset and publicly available datasets as their validation datasets.

References

1 Johansson, Å. et al. Precision medicine in complex diseases-Molecular subgrouping for improved prediction and treatment stratification. J Intern Med 294, 378-396 (2023). https://doi.org/10.1111/joim.13640

2 Biswas, S. & Hasija, Y. in Big Data Analytics for Healthcare (ed Pantea Keikhosrokiani) 63-72 (Academic Press, 2022).

3 Muthén, B. & Muthén, L. K. Integrating person-centered and variable-centered analyses: growth mixture modeling with latent trajectory classes. Alcohol Clin Exp Res 24, 882-891 (2000).

4 Nguyen, T., Tagett, R., Diaz, D. & Draghici, S. A novel approach for data integration and disease subtyping. Genome Res 27, 2025-2039 (2017). https://doi.org/10.1101/gr.215129.116

5 Xie, J., Ma, A., Fennell, A., Ma, Q. & Zhao, J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform 20, 1449-1464 (2019). https://doi.org/10.1093/bib/bby014

6 Habtewold, T. D. et al. A systematic review and narrative synthesis of data-driven studies in schizophrenia symptoms and cognitive deficits. Transl Psychiatry 10, 244 (2020). https://doi.org/10.1038/s41398-020-00919-x

7 Mori, M., Krumholz, H. M. & Allore, H. G. Using Latent Class Analysis to Identify Hidden Clinical Phenotypes. Jama 324, 700-701 (2020). https://doi.org/10.1001/jama.2020.2278

8 Charlson, F. J. et al. Global Epidemiology and Burden of Schizophrenia: Findings From the Global Burden of Disease Study 2016. Schizophr Bull 44, 1195-1203 (2018). https://doi.org/10.1093/schbul/sby058

9 Ou, Z. et al. Global Trends in the Incidence, Prevalence, and Years Lived With Disability of Parkinson’s Disease in 204 Countries/Territories From 1990 to 2019. Front Public Health 9, 776847 (2021). https://doi.org/10.3389/fpubh.2021.776847

10 Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet 402, 203-234 (2023). https://doi.org/10.1016/s0140-6736(23)01301-6

11 Juengst, E., McGowan, M. L., Fishman, J. R. & Settersten, R. A., Jr. From “Personalized” to “Precision” Medicine: The Ethical and Social Implications of Rhetorical Reform in Genomic Medicine. Hastings Cent Rep 46, 21-33 (2016). https://doi.org/10.1002/hast.614

12 Ashley, E. A. Towards precision medicine. Nat Rev Genet 17, 507-522 (2016). https://doi.org/10.1038/nrg.2016.86

13 Landry, L. G., Ali, N., Williams, D. R., Rehm, H. L. & Bonham, V. L. Lack of diversity in genomic databases is a barrier to translating precision medicine research into practice. Health Affairs 37, 780-785 (2018).

14 Fine, M. J., Ibrahim, S. A. & Thomas, S. B. The role of race and genetics in health disparities research. American journal of public health 95, 2125-2128 (2005). https://doi.org/10.2105/ajph.2005.076588

15 Letai, A. Functional precision cancer medicine-moving beyond pure genomics. Nat Med 23, 1028-1035 (2017). https://doi.org/10.1038/nm.4389

16 Watson, K. S. et al. Adapting a conceptual framework to engage diverse stakeholders in genomic/precision medicine research. Health Expect (2022). https://doi.org/10.1111/hex.13486

17 Geneviève, L. D., Martani, A., Shaw, D., Elger, B. S. & Wangmo, T. Structural racism in precision medicine: leaving no one behind. BMC Med Ethics 21, 17 (2020). https://doi.org/10.1186/s12910-020-0457-8

18 Özdemir, V. et al. Personalized medicine beyond genomics: alternative futures in big data-proteomics, environtome and the social proteome. J Neural Transm (Vienna) 124, 25-32 (2017). https://doi.org/10.1007/s00702-015-1489-y

19 Rattray, N. J. W. et al. Beyond genomics: understanding exposotypes through metabolomics. Hum Genomics 12, 4 (2018). https://doi.org/10.1186/s40246-018-0134-x

20 Denny, J. C. & Collins, F. S. Precision medicine in 2030-seven ways to transform healthcare. Cell 184, 1415-1419 (2021). https://doi.org/10.1016/j.cell.2021.01.015

21 Mostafavi, H. et al. Variable prediction accuracy of polygenic scores within an ancestry group. Elife 9 (2020). https://doi.org/10.7554/eLife.48376

22 Hou, Y. C. et al. Precision medicine integrating whole-genome sequencing, comprehensive metabolomics, and advanced imaging. Proc Natl Acad Sci U S A 117, 3053-3062 (2020). https://doi.org/10.1073/pnas.1909378117

23 Rahman, S. et al. Quo vadis now: Beyond genomics to an era of personalised medicine. J Inherit Metab Dis 45, 129-131 (2022). https://doi.org/10.1002/jimd.12487

24 Snyderman, R. & Spellmeyer, D. Precision medicine: beyond genomics to targeted therapies. Per Med 13, 97-100 (2016). https://doi.org/10.2217/pme.15.48

25 Pfohl, U. et al. Precision Oncology Beyond Genomics: The Future Is Here-It Is Just Not Evenly Distributed. Cells 10 (2021). https://doi.org/10.3390/cells10040928

26 Mercader, J. M., Ng, M. C. Y., Manning, A. K. & Rich, S. S. Predicting diabetes risk in diverse populations: what next? Lancet Diabetes Endocrinol 9, 808-810 (2021). https://doi.org/10.1016/s2213-8587(21)00287-4

27 Madhavan, S., Subramaniam, S., Brown, T. D. & Chen, J. L. Art and Challenges of Precision Medicine: Interpreting and Integrating Genomic Data Into Clinical Practice. Am Soc Clin Oncol Educ Book 38, 546-553 (2018). https://doi.org/10.1200/edbk_200759

28 Hawgood, S., Hook-Barnard, I. G., O’Brien, T. C. & Yamamoto, K. R. Precision medicine: Beyond the inflection point. Sci Transl Med 7, 300ps317 (2015). https://doi.org/10.1126/scitranslmed.aaa9970

29 Beijers, L., Wardenaar, K. J., van Loo, H. M. & Schoevers, R. A. Data-driven biological subtypes of depression: systematic review of biological approaches to depression subtyping. Mol Psychiatry 24, 888-900 (2019). https://doi.org/10.1038/s41380-019-0385-5

30 Lee, S. H. et al. Parkinson’s Disease Subtyping Using Clinical Features and Biomarkers: Literature Review and Preliminary Study of Subtype Clustering. Diagnostics (Basel) 12 (2022). https://doi.org/10.3390/diagnostics12010112

31 Pourzinal, D. et al. Systematic review of data-driven cognitive subtypes in Parkinson disease. Eur J Neurol 29, 3395-3417 (2022). https://doi.org/10.1111/ene.15481

32 van Rooden, S. M. et al. The identification of Parkinson’s disease subtypes using cluster analysis: a systematic review. Mov Disord 25, 969-978 (2010). https://doi.org/10.1002/mds.23116

33 Sarría-Santamera, A., Orazumbekova, B., Maulenkul, T., Gaipov, A. & Atageldiyeva, K. The Identification of Diabetes Mellitus Subtypes Applying Cluster Analysis Techniques: A Systematic Review. Int J Environ Res Public Health 17 (2020). https://doi.org/10.3390/ijerph17249523

34 Zhao, L., Lee, V. H. F., Ng, M. K., Yan, H. & Bijlsma, M. F. Molecular subtyping of cancer: current status and moving toward clinical applications. Brief Bioinform 20, 572-584 (2019). https://doi.org/10.1093/bib/bby026

35 Rouzier, R. et al. Breast cancer molecular subtypes respond differently to preoperative chemotherapy. Clin Cancer Res 11, 5678-5685 (2005). https://doi.org/10.1158/1078-0432.Ccr-04-2421

36 Prasuhn, J. & Brüggemann, N. Genotype-driven therapeutic developments in Parkinson’s disease. Mol Med 27, 42 (2021). https://doi.org/10.1186/s10020-021-00281-8

37 van Smeden, M., Harrell, F. E., Jr. & Dahly, D. L. Novel diabetes subgroups. Lancet Diabetes Endocrinol 6, 439-440 (2018). https://doi.org/10.1016/s2213-8587(18)30124-4

38 Ghosh, J. & Acharya, A. Cluster ensembles. WIREs Data Mining and Knowledge Discovery 1, 305-315 (2011). https://doi.org/https://doi.org/10.1002/widm.32

39 Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning 52, 91-118 (2003).

About The Author

Tesfa D. Habtewold