Publications
A comprehensive list of my research publications in computational biology and machine learning.
Automatic ploidy prediction and quality assessment of human blastocysts using time-lapse imaging
Published in: Nature Communications | Date:
Assessing fertilized human embryos is crucial for in vitro fertilization, a task being revolutionized by artificial intelligence. Existing models used for embryo quality assessment and ploidy detection could be significantly improved by effectively utilizing time-lapse imaging to identify critical developmental time points for maximizing prediction accuracy. Addressing this, we develop and compare various embryo ploidy status prediction models across distinct embryo development stages. We present BELA, a state-of-the-art ploidy prediction model that surpasses previous image- and video-based models without necessitating input from embryologists. BELA uses multitask learning to predict quality scores that are thereafter used to predict ploidy status. By achieving an area under the receiver operating characteristic curve of 0.76 for discriminating between euploidy and aneuploidy embryos on the Weill Cornell dataset, BELA matches the performance of models trained on embryologists’ manual scores. While not a replacement for preimplantation genetic testing for aneuploidy, BELA exemplifies how such models can streamline the embryo evaluation process.
Citation: Rajendran, S., Brendel, M., Barnes, J. et al. Automatic ploidy prediction and quality assessment of human blastocysts using time-lapse imaging. Nat Commun 15, 7756 (2024). https://doi.org/10.1038/s41467-024-51823-7
High-speed optical imaging with sCMOS pixel reassignment
Published in: Nature Communications | Date:
Fluorescence microscopy has undergone rapid advancements, offering unprecedented visualization of biological events and shedding light on the intricate mechanisms governing living organisms. However, the exploration of rapid biological dynamics still poses a significant challenge due to the limitations of current digital camera architectures and the inherent compromise between imaging speed and other capabilities. Here, we introduce sHAPR, a high-speed acquisition technique that leverages the operating principles of sCMOS cameras to capture fast cellular and subcellular processes. sHAPR harnesses custom fiber optics to convert microscopy images into one-dimensional recordings, enabling acquisition at the maximum camera readout rate, typically between 25 and 250 kHz. We have demonstrated the utility of sHAPR with a variety of phantom and dynamic systems, including high-throughput flow cytometry, cardiomyocyte contraction, and neuronal calcium waves, using a standard epi-fluorescence microscope. sHAPR is highly adaptable and can be integrated into existing microscopy systems without requiring extensive platform modifications. This method pushes the boundaries of current fluorescence imaging capabilities, opening up new avenues for investigating high-speed biological phenomena.
Citation: Mandracchia, B., Zheng, C., Rajendran, S. et al. High-speed optical imaging with sCMOS pixel reassignment. Nat Commun 15, 4598 (2024). https://doi.org/10.1038/s41467-024-48987-7
Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation
Published in: Cell Patterns | Date:
In healthcare, machine learning (ML) shows significant potential to augment patient care, improve population health, and streamline healthcare workflows. Realizing its full potential is, however, often hampered by concerns about data privacy, diversity in data sources, and suboptimal utilization of different data modalities. This review studies the utility of cross-cohort cross-category (C4) integration in such contexts: the process of combining information from diverse datasets distributed across distinct, secure sites. This paper provides a comprehensive overview of C4 in health care, including its present stage, potential opportunities, and associated challenges.
Citation: Rajendran, S., Pan, W., Sabuncu, R. M., Chen, Y., Zhou, J., & Wang, F. (2023). Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation. Patterns, 100913. https://doi.org/10.1016/j.patter.2023.100913
An adaptive federated learning framework for clinical risk prediction with electronic health records from multiple hospitals
Published in: Cell Patterns | Date:
Clinical risk prediction with electronic health records (EHR) using machine learning has attracted lots of attentions in recent years, where one of the key challenges is how to protect data privacy. Federated learning (FL) provides a promising framework for building predictive models by leveraging the data from multiple institutions without sharing them. However, data distribution drift across different institutions greatly impacts the performance of FL. In this paper, an adaptive FL framework was proposed to address this challenge. Our framework separated the input features into stable, domain-specific, and conditional-irrelevant parts according to their relationships to clinical outcomes.
Citation: Pan, W., Xu, Z., Rajendran, S., & Wang, F. (2023). An adaptive federated learning framework for clinical risk prediction with electronic health records from multiple hospitals. Patterns (New York, N.Y.), 5(1), 100898. https://doi.org/10.1016/j.patter.2023.100898
Predicting Embryo Ploidy Status Using Time-lapse Images
Published in: Human Reproduction | Date:
Can deep learning models using time-lapse images of embryo development be used to predict embryo ploidy and provide supplemental information to embryologists for clinical decision-making? We developed a general MDBS-Ploidy model that uses time-lapse images and maternal age to predict embryo quality scores and ploidy status.
Citation: S Rajendran and others, O-120 Predicting Embryo Ploidy Status Using Time-lapse Images, Human Reproduction, Volume 38, Issue Supplement_1, June 2023, dead093.147, https://doi.org/10.1093/humrep/dead093.147
Web-Based Social Networks of Individuals With Adverse Childhood Experiences: Quantitative Study
Published in: Journal of Medical Internet Research | Date:
Adverse childhood experiences (ACEs), which include abuse and neglect and various household challenges such as exposure to intimate partner violence and substance use in the home, can have negative impacts on the lifelong health of affected individuals. Among various strategies for mitigating the adverse effects of ACEs is to enhance connectedness and social support for those who have experienced them. However, how the social networks of those who experienced ACEs differ from the social networks of those who did not is poorly understood. In this study, we used Reddit and Twitter data to investigate and compare social networks between individuals with and without ACE exposure.
Citation: Cao, Y., Rajendran, S., Sundararajan, P., Law, R., Bacon, S., Sumner, S. A., & Masuda, N. (2023). Web-Based Social Networks of Individuals With Adverse Childhood Experiences: Quantitative Study. Journal of medical Internet research, 25, e45171. https://doi.org/10.2196/45171
Biomedical discovery through the integrative biomedical knowledge hub (iBKH)
Published in: Iscience | Date:
The abundance of biomedical knowledge gained from biological experiments and clinical practices is an invaluable resource for biomedicine. The emerging biomedical knowledge graphs (BKGs) provide an efficient and effective way to manage the abundant knowledge in biomedical and life science. In this study, we created a comprehensive BKG called the integrative Biomedical Knowledge Hub (iBKH) by harmonizing and integrating information from diverse biomedical resources. To make iBKH easily accessible for biomedical research, we developed a web-based, user-friendly graphical portal that allows fast and interactive knowledge retrieval. Additionally, we also implemented an efficient and scalable graph learning pipeline for discovering novel biomedical knowledge in iBKH. As a proof of concept, we performed our iBKH-based method for computational in-silico drug repurposing for Alzheimer’s disease. The iBKH is publicly available.
Citation: Su, C., Hou, Y., Zhou, M., Rajendran, S., Maasch, J. R. M. A., Abedi, Z., Zhang, H., Bai, Z., Cuturrufo, A., Guo, W., Chaudhry, F. F., Ghahramani, G., Tang, J., Cheng, F., Li, Y., Zhang, R., DeKosky, S. T., Bian, J., & Wang, F. (2023). Biomedical discovery through the integrative biomedical knowledge hub (iBKH). iScience, 26(4), 106460. https://doi.org/10.1016/j.isci.2023.106460
Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care
Published in: PLOS Digital Health | Date:
With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis’ high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks.
Citation: Rajendran, S., Xu, Z., Pan, W., Ghosh, A., & Wang, F. (2023). Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care. PLOS digital health, 2(3), e0000117. https://doi.org/10.1371/journal.pdig.0000117
A non-invasive artificial intelligence approach for the prediction of human blastocyst ploidy: A retrospective model development and validation study
Published in: The Lancet Digital Health | Date:
One challenge in the field of in-vitro fertilisation is the selection of the most viable embryos for transfer. Morphological quality assessment and morphokinetic analysis both have the disadvantage of intra-observer and inter-observer variability. A third method, preimplantation genetic testing for aneuploidy (PGT-A), has limitations too, including its invasiveness and cost. We hypothesised that differences in aneuploid and euploid embryos that allow for model-based classification are reflected in morphology, morphokinetics, and associated clinical information. In this retrospective study, we used machine-learning and deep-learning approaches to develop STORK-A, a non-invasive and automated method of embryo evaluation that uses artificial intelligence to predict embryo ploidy status.
Citation: Barnes, J., Brendel, M., Gao, V. R., Rajendran, S., Kim, J., Li, Q., Malmsten, J. E., Sierra, J. T., Zisimopoulos, P., Sigaras, A., Khosravi, P., Meseguer, M., Zhan, Q., Rosenwaks, Z., Elemento, O., Zaninovic, N., & Hajirasouliha, I. (2023). A non-invasive artificial intelligence approach for the prediction of human blastocyst ploidy: a retrospective model development and validation study. The Lancet. Digital health, 5(1), e28–e40. https://doi.org/10.1016/S2589-7500(22)00213-8
Creation of a Proof-of-Concept 3D-Printed Spinal Lateral Access Simulator
Published in: Cureus | Date:
Minimally invasive lateral lumbar interbody fusion (LLIF) offers advantages over traditional approaches, providing indirect decompression of neural elements and deformity correction while avoiding many challenges and risks of anterior and posterior approaches. Mastering this technique requires a specialized team, advanced equipment, and sufficient case exposure. Current training is limited to the classic educational model, and alternative training methods such as cadaver labs can be inconvenient, inaccessible, expensive, and incompatible with intraoperative neuromonitoring (IONM) systems.
Citation: Pullen, M. W., Valero-Moreno, F., Rajendran, S., Shah, V. U., Bruneau, B. R., Martinez, J. L., Ramos-Fresnedo, A., Quinones-Hinojosa, A., & Fox, W. C. (2022). Creation of a Proof-of-Concept 3D-Printed Spinal Lateral Access Simulator. Cureus, 14(5), e25448. https://doi.org/10.7759/cureus.25448
Distance Traveled to Tertiary Care as Prognostic Indicator in Intracerebral Hemorrhage Outcomes
Published in: Society of Critical Care Medicine | Date:
Intracranial hemorrhage (ICH) has high morbidity and mortality, disproportionately affecting rural patients despite adjusting for comorbidities. Inter-hospital transfers for rural patients cause delays in access to specialized care and are associated with adverse outcomes. Published prognostic tools lack distance as factor hence we explored training of three machine learning models to predict 30-day mortality, modified Rankin scale on discharge and discharge disposition in ICH patients using distance from home to tertiary care.
Citation: Rajendran, S. , Ong, T. , Zameza, P. , Wolfe, S. , Topaloglu, U. , Duncan, P. , Anwar, M. , Samuel, R. , Budigi, B. , Lack, C. & Sarwal, A. (2022). 779: DISTANCE TRAVELED TO TERTIARY CARE AS PROGNOSTIC INDICATOR IN INTRACEREBRAL HEMORRHAGE OUTCOMES. Critical Care Medicine, 50 (1), 384-384. doi: 10.1097/01.ccm.0000809440.55714.3d.
Predicting criminal recidivism using specialized feature engineering and XGBoost
Published in: CrimRxiv | Date:
As the research, development, and evaluation agency of the U.S. Department of Justice, NIJ invests in scientific research across diverse disciplines to serve the needs of the criminal justice community. In 2021, NIJ released the “Recidivism Forecasting Challenge.” With this Challenge, NIJ aims to: 1 - encourage “non-criminal justice” forecasting researchers to compete against more “traditional” criminal justice forecasting researchers, building upon the current knowledge base while infusing innovative, new perspectives; and 2 - compare available forecasting methods in an effort to improve person-based and place-based recidivism forecasting3. Our team entered into the Small Team category of the challenge and aimed to utilize state of the art machine learning techniques to assist in this field.
Citation: Rajendran, S., & Sundararajan, P. (2021). Predicting criminal recidivism using specialized feature engineering and XGBoost. CrimRxiv. https://doi.org/10.21428/cb6ab371.d95f8c48
In the Pursuit of Privacy: The Promises and Predicaments of Federated Learning in Healthcare
Published in: Frontiers in Artificial Intelligence | Date:
Artificial Intelligence and its subdomain, Machine Learning (ML), have shown the potential to make an unprecedented impact in healthcare. Federated Learning (FL) has been introduced to alleviate some of the limitations of ML, particularly the capability to train on larger datasets for improved performance, which is usually cumbersome for an inter-institutional collaboration due to existing patient protection laws and regulations. Moreover, FL may also play a crucial role in circumventing ML’s exigent bias problem by accessing underrepresented groups’ data spanning geographically distributed locations. In this paper, we have discussed three FL challenges, namely: privacy of the model exchange, ethical perspectives, and legal considerations. Lastly, we have proposed a model that could aide in assessing data contributions of a FL implementation. In light of the expediency and adaptability of using the Sørensen–Dice Coefficient over the more limited (e.g., horizontal FL) and computationally expensive Shapley Values, we sought to demonstrate a new paradigm that we hope, will become invaluable for sharing any profit and responsibilities that may accompany a FL endeavor.
Citation: Topaloglu MY, Morrell EM, Rajendran S, Topaloglu U. In the Pursuit of Privacy: The Promises and Predicaments of Federated Learning in Healthcare. Front Artif Intell. 2021 Oct 6;4:746497. doi: 10.3389/frai.2021.746497. PMID: 34693280; PMCID: PMC8528445.
Cloud-Based Federated Learning Implementation Across Medical Centers
Published in: JCO Clinical Cancer Informatics | Date:
Building well-performing machine learning (ML) models in health care has always been exigent because of the data-sharing concerns, yet ML approaches often require larger training samples than is afforded by one institution. This paper explores several federated learning implementations by applying them in both a simulated environment and an actual implementation using electronic health record data from two academic medical centers on a Microsoft Azure Cloud Databricks platform.
Citation: Rajendran S, Obeid JS, Binol H, D Agostino R Jr, Foley K, Zhang W, Austin P, Brakefield J, Gurcan MN, Topaloglu U. Cloud-Based Federated Learning Implementation Across Medical Centers. JCO Clin Cancer Inform. 2021 Jan;5:1-11. doi: 10.1200/CCI.20.00060. PMID: 33411624; PMCID: PMC8140794.
Extracting smoking status from electronic health records using NLP and deep learning
Published in: AMIA Summits on Translational Science Proceedings | Date:
Half a million people die every year from smoking-related issues across the United States. It is essential to identify individuals who are tobacco-dependent in order to implement preventive measures. In this study, we investigate the effectiveness of deep learning models to extract smoking status of patients from clinical progress notes. A Natural Language Processing (NLP) Pipeline was built that cleans the progress notes prior to processing by three deep neural networks: a CNN, a unidirectional LSTM, and a bidirectional LSTM. Each of these models was trained with a pre- trained or a post-trained word embedding layer. Three traditional machine learning models were also employed to compare against the neural networks. Each model has generated both binary and multi-class label classification. Our results showed that the CNN model with a pre-trained embedding layer performed the best for both binary and multi- class label classification.
Citation: Rajendran S, Topaloglu U. Extracting Smoking Status from Electronic Health Records Using NLP and Deep Learning. AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:507-516. PMID: 32477672; PMCID: PMC7233082.
Machine learning for prospective identification of immunotherapy related adverse events (irAEs)
Published in: 2020 ASCO Annual Meeting | Date:
The ubiquitous implementation of immunotherapy has significantly improved outcomes in the treatment of cancer patients; however, once rare adverse events from these therapies have increased in lock step. We now face an increased burden of identification on providers with limited experience in the diagnosis of irAEs. We use machine learning to develop prediction models that will aid providers in identifying patients at high risk for developing irAEs as well as for multiple downstream applications.
Citation: Margalski, Daniel, et al. “Machine Learning for Prospective Identification of Immunotherapy Related Adverse Events (Iraes).” Journal of Clinical Oncology, vol. 38, no. 15_suppl, 2020, https://doi.org/10.1200/jco.2020.38.15_suppl.e14064.