“Artificial intelligence in cancer research, diagnosis and therapy,” a Viewpoint article from Nature Reviews Cancer, September 17, 2021.
In this Viewpoint article, Nature Reviews Cancer asked four experts for their opinions on how we can begin to implement artificial intelligence while ensuring standards are maintained so as transform cancer diagnosis and the prognosis and treatment of patients with cancer and to drive biological discovery.
Artificial intelligence and machine learning techniques are breaking into biomedical research and health care, which importantly includes cancer research and oncology, where the potential applications are vast. These include detection and diagnosis of cancer, subtype classification, optimization of cancer treatment and identification of new therapeutic targets in drug discovery. While big data used to train machine learning models may already exist, leveraging this opportunity to realize the full promise of artificial intelligence in both the cancer research space and the clinical space will first require significant obstacles to be surmounted. In this Viewpoint article, we asked four experts for their opinions on how we can begin to implement artificial intelligence while ensuring standards are maintained so as transform cancer diagnosis and the prognosis and treatment of patients with cancer and to drive biological discovery.
What are some of the emerging and most promising AI applications for the study, diagnosis and treatment of cancer?
Olivier Elemento, Director of the Englander Institute for Precision Medicine. The most mature applications of artificial intelligence (AI) in cancer are undoubtedly those focused on using imaging to diagnose malignancies. The seminal article by Esteva et al.1 showed that it is possible to train a deep neural network to detect malignant lesions from photographs of skin lesions with accuracy that rivals that of trained dermatologists. Since then, deep neural networks have been trained to automatically analyse radiology images and digitized pathology slides for numerous different cancer types. For example, deep learning can be used to detect mammographic lesions with an accuracy that rivals that of certified screening radiologists2. As another example, deep learning can also be used to assess the presence of cancer and produce Gleason scores and tumour purity estimates from digitized haematoxylin and eosin (H&E)-stained slides of prostate tissue with accuracy equivalent to that of a trained pathologist3.
There are a number of rapidly emerging applications. For example, recent work suggests that it may be possible to use AI to analyse videos of colonoscopies to identify polyps in real time and with high accuracy4. There has been enormous interest in using AI to predict responders to certain cancer therapies, such as immune therapies or chemotherapies, whose biological determinants of response are thought to be multifactorial. Despite some undeniable progress in leveraging AI to identify mechanisms associated with response, published studies have seen limited predictive performances when assessed in independent cohorts5.
One of the most exciting potential applications of AI in cancer is the possibility of designing novel anticancer therapies or at least guiding the development of such therapies to decrease the failure rate and decrease the time to approval. There are clear signs that certain types of neural networks (for example, autoencoders) can learn to represent an ensemble of molecules with specific activities and produce novel structures with similar activities6. AI can also be used to accurately predict the mechanism of action of anticancer molecules, thus enabling precise preclinical and clinical positioning and increasing the likelihood of clinical success7. Likewise, AI may be used to predict effective drug combinations, which has become a complicated combinatorial problem as the number of anticancer drugs continues to grow8.
Christina Leslie. I would first distinguish between applications of AI to solve engineering tasks versus uses of these models to address fundamental scientific questions and drive discovery. By an engineering task, I mean that the goal is solely to make accurate predictions — to automate a time-consuming clinical task or avoid the need for a more difficult or expensive experiment or diagnostic test. For example, training a clinical diagnostic tool for digitized pathology images of tumour samples is primarily an engineering problem. Deep learning models that predict protein 3D structure from primary amino acid sequence (and corresponding multiple sequence alignment) are a recent engineering breakthrough9. Certainly, using AI to solve engineering tasks such as protein structure prediction or genomic data imputation10 can support the generation of new scientific knowledge, once the predictions are accurate enough to substitute for experimental data. However, in my own field of regulatory and functional genomics, one can also use machine learning models as a tool to reveal mechanistic information hidden in large genomic datasets rather than strictly as a prediction engine. As experimental datasets grow more complex, researchers are embracing sophisticated algorithmic tools to aid in their interpretation. Designing, training and interrogating the machine learning model has become part of the scientific process to study fundamental biological questions, including in cancer.
To provide some context, I first got involved in machine learning for computational biology approximately 20 years ago and encountered a thriving algorithmic modelling community focused on the development and rigorous theoretical analysis of algorithms for well-defined supervised learning problems (such as classification and regression). The goal was accurate prediction (that is generalization to unseen test sets), and interpretation of the model was at best a secondary focus. This might have seemed a poor fit for addressing biological questions in genomics. Nevertheless, we and others successfully used predictive modelling as a strategy to decipher gene regulation, for example to decode transcription factor binding signals and epigenomic changes that govern expression changes in cellular differentiation11 or to identify transcription factors underlying T cell progression to exhaustion in tumours and chronic infection12.
The advent of deep learning has seen rapid advances in modelling efforts in regulatory genomics, in particular rich sequence models based on convolutional neural networks that learn the mapping from genomic sequence to epigenomic signals. One can also perform in silico experiments on these expressive models to obtain novel mechanistic insights. A beautiful recent study in this vein introduced BPNet13, a model for prediction of nucleotide-resolution transcription factor occupancy profiles and learning transcription factor ‘motif grammars’, yielding novel insights into the binding patterns and dependencies of pluripotency factors in mouse embryonic stem cells. Another notable study trained a model called ‘Akita’ to predict the local contact matrix of 3D chromatin interactions as measured by Hi-C from DNA sequence14. This model can be used to predict whether structural variants might disrupt 3D chromatin organization so as to prioritize downstream experimental analyses. We have also recently harnessed graph attention networks (a neural network model for graph-structured data) to incorporate 3D chromatin interactions together with 1D epigenomic and sequence data into predictive models of gene regulation15. Interpretation of the model can accurately identify distal enhancers of genes, and the approach has potential extensions to studying transcriptional control and enhancer rewiring in cancer.
Many deep learning models involve learning non-linear or variational embeddings — mappings of high-dimensional input data to a lower-dimensional ‘bottleneck’ or latent space — bringing new tools for discovering latent structure in data and for integrating datasets. A fast-growing application of deep learning is in single-cell genomics. For example, scVI is a deep variational model for single-cell RNA sequencing (scRNA-seq) data that can be used to generate visualizations and correct for batch effects, while enabling clustering and differential expression analyses16. Another recent method called ‘scNym’ learns to predict the cell type annotation from scRNA-seq by training on both labelled (annotated) and unlabelled cells and accounts for batch (or domain) effects with an adversarial training strategy, where the classifier competes against an adversarial model that tries to predict the batch17. New deep learning models for different single-cell modalities, including multiomic readouts, to enable integration, visualization and analysis of large-scale datasets (more than one million cells) continue to emerge at a rapid pace.
Johan Lundin. For hundreds of years, pathologists have looked through a microscope at the stained slides of surgical specimens to make the diagnosis of cancer, to provide prognostic information to clinicians and to explore new approaches to prevention and treatment. This visual paradigm is rapidly changing because physical slides are being converted into digital data. We are now able to digitize, store either locally or in the cloud, transmit and analyse stained and unstained tissue. This advance, from analogue to digital data, will profoundly change pathology and cancer diagnostics.
Digitization will free pathology from the tyranny of physical slides. Pathology data will be instantly transmitted throughout the world, which means that pathology can be performed anywhere in the world. This will improve resource utilization in high-resource settings and it will deliver critical resources to resource-limited settings18. Furthermore, because pathology is becoming digital, its data can be analysed by AI algorithms such as neural networks. Once trained, AI algorithms can provide diagnostic and prognostic predictions.
In terms of diagnosis, AI algorithms are as good as the best pathologists at diagnosis because they are taught by the best pathologists. In addition, they can assist the pathologist and increase diagnostic efficiency and accuracy19.
In terms of prognosis, AI algorithms can be better than the best pathologist at prognosis because they can find complex patterns that are unobservable to the naked eye20,21. This means that they can be trained on outcome data without the need for expert guidance (that is, they can learn semi-autonomously). Just as AlphaGo22, a neural network algorithm, learned semi-autonomously to play Go on the basis of its game outcomes, and just as it proved its accuracy by beating all the Go grand champions, so too can pathology AI neural networks learn prognosis using medical outcome data. This will improve treatment selection and outcome prediction. In other words, today, almost all our predictive algorithms require expert-guided training. In the future, our AI algorithms will be able to learn by themselves through outcome supervision and discover novel associations between tissue features and both treatments23 and outcomes21.
Georgia Tourassi. In the past decade, we have experienced explosive growth in the application of AI in cancer research and oncology. This trend is due to foundational algorithmic advances (that is, deep learning), advances in digital data collection technologies and increasing computational power. Currently, some of the most promising cancer applications are in (1) medical image analysis for tumour detection, quantification and histopathological characterization, (2) computer-assisted clinical diagnosis, treatment selection, treatment planning and prognosis leveraging multimodal clinical data, (3) anticancer drug development and (4) population cancer surveillance24. Collectively these efforts aim to deliver the promise of precision oncology in which cancer management is personalized on the basis of each patient’s genetic and epigenetic variability to increase early screening efficiency, improve treatment response and ultimately improve the outcomes of patients with cancer.
What do you see as the biggest challenges for implementation of AI in clinical practice?
O.E. There is an unmistakable gap between the thousands of AI models and applications described in the biomedical research literature and AI models actually used in clinical practice. Most AI methods never get implemented in the clinic. There are several reasons for this. One is that up until recently, appropriate guidance from regulatory agencies regarding the steps needed for regulatory approval has been limited. This is rapidly changing; in January 2021, the US Food and Drug Administration (FDA) issued the “Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan”, which stipulates several guidelines for AI implementation. Another reason is that new AI methods need either to integrate within existing clinical workflows or replace existing ones. In the clinical setting, the updated or new workflows need to be validated for accuracy and reproducibility under realistic scenarios and documented, and staff need to be trained. Altogether, this process requires substantial time and resource investments that many medical facilities are unwilling to make.
An important and perhaps understated obstacle to clinical implementation is the frequent absence of user-friendly software to facilitate the use of AI in the clinic. Indeed, AI must be implemented with the primary users in mind, as ultimately practitioners such as radiologists or pathologists are responsible for rendering and communicating clinical diagnoses. In the same vein, not enough emphasis is placed on interpretable AI. AI is most helpful in situations where a clinical decision is otherwise challenging, possibly due to incomplete or conflicting observations. To be helpful, AI must be able to explain its predictions, so that users can gain confidence in them and explain them to patients and colleagues when needed. However, the main limitation of AI is often the unproven robustness of AI models. AI models used in clinical practice must face a wide variety of fluctuations in the input data, such as operator-to-operator and laboratory-to-laboratory differences in data quality, resolution, intensities and differences in disease features. Most AI models are not tested enough to demonstrate robustness in the face of such fluctuations, or when tested, clearly show deterioration in performance. To achieve success in the clinic, AI models must be extensively tested.
C.L. There is certainly a need for realistic performance evaluation in real-world clinical settings, which will occur in the context of clinical trials in the coming years and may identify significant shortcomings of existing AI models. At the time of publication (or press release), AI models have typically been evaluated on only a limited number of benchmark or ‘challenge’ datasets. These benchmarks may not reflect the true level of technical and biological variability in clinical data, the inherent complexity of the prediction task or the clinical costs of different kinds of misclassifications; overtraining on such datasets can yield optimistic estimates of generalization performance. Assembling and labelling benchmark datasets is often very difficult and limits more comprehensive assessment. In addition, it is known that deep learning models can exhibit brittle behaviour: it is possible to design or identify adversarial examples that would never fool a human and yet produce incorrect model predictions25. Obviously, brittle performance in a clinical setting for cancer diagnostic tasks has serious real-world consequences.
Explainable AI strategies — where the AI model yields an explanation of why a specific prediction was made for a given input example — may help to gain the confidence of clinicians and to integrate AI tools into diagnostic workflows. An important step in this direction is feature attribution, which scores the importance of input features towards prediction of a specific example26. Uncertainty estimation will likely also be important as models move into clinical settings. However, much work remains to be done to produce meaningful interpretations and uncertainty estimates of model predictions.
J.L. Digitization is a prerequisite for implementing AI in clinical practice. In radiology, the digital transformation has already occurred, but in pathology, digitization has been slow to take hold. Instruments for the digitization of pathology samples have been available for more than 20 years, but progress has been incremental. Recent improvements in the speed of digital imaging and access to cloud storage have greatly increased the rate of digitization. Further innovation is needed to simplify this technology, lower the costs and make it available also in resource-limited settings18,27.
Communication is a central issue in medicine. Currently, pathologists dictate a report that is placed in the electronic health record and sent to the clinician. It is not yet obvious how AI will fit into this communication system. Nor is it obvious how the AI reports will reach the clinician and what that report will look like. Finally, it is not obvious how the clinician will use this information in the clinical management of the patient. Some of these issues will have to be addressed by AI experts working closely with pathologists and clinicians.
G.T. AI is currently accelerating research across many scientific domains and industries. Still, there are many challenges associated with the development and deployment of AI in clinical practice. I believe the biggest challenge is centred on human–AI integration to ensure that AI truly augments and not inadvertently handicaps the clinical user.
First, AI developers will need to offer solutions that are not only ‘on average’ accurate but also offer a measure of trustworthiness at the individual or patient decision level28. The latter would require a detailed explanation of each decision AI makes, as well as a deeper understanding of the conditions under which AI is exceptionally successful or alarmingly flawed. To derive real-world value outside the anecdotal studies, we will need to understand these intricate issues and dive deeper into the possible sources of AI errors and uncertainty.
Second, we will need to pay special attention to how to effectively integrate AI technology with the user (‘human-in-the-loop’). Ultimately, humans and AI technology will have to work well together. But this synergy will not happen easily, as past clinical AI experiences have demonstrated. It is important to train health-care providers in how to remain vigilant so as to avoid mistakes associated with over-reliance on AI and how ultimately to be knowledgeable users of the technology. This critical step in the clinical translation of AI tools is known as user acceptance. The prevailing thought is that clinicians will be reluctant to accept AI input without an appropriate explanation that is consistent with medical knowledge. Consequently, explainable AI has become a hot topic in biomedicine and other application domains29. My opinion is that explainable AI will help to build confidence in the technology as it is integrated into real-world settings. However, as AI tools continue to display robust and reliable performance, demand for explainable AI decisions will decline.
Lastly, AI tools deployed in clinical practice must undergo regular quality monitoring and quality assurance after deployment to confirm robust clinical performance over time and across target populations. As performance deviations may happen due to shifts in either the patient population or human–AI interaction pattern, quality surveillance should be based on performance metrics that assess AI robustness both as a stand-alone technology and with the human-in-the-loop. Rigorous quality control is necessary to identify, understand the cause of and mitigate performance gaps promptly.
How important are transparency, reproducibility and validation in AI, and what steps should we be taking to ensure these standards are met?
O.E. To gain trust among the clinical and research community, AI models need to achieve greater transparency and reproducibility. For example, AI model developers should demonstrate that the training data and especially the testing and validation data reflect the diversity and complexity of real-world scenarios in which such models are meant to be used. Ideally this should be achieved by demonstrating high accuracy on data prospectively collected from multiple medical centres catering to diverse patient populations. To guide the development of reproducible and robust AI, standardized best-practice checklists similar to those used in many clinical laboratories will increasingly be needed.
C.L. Transparency, reproducibility and validation are absolutely critical, and in principle we have tools available to ensure these goals are achieved, at least in the context of scientific research: web-based notebook platforms can execute chunks of code to reproduce results from publications; open source deep learning packages (for example, TensorFlow and PyTorch) and analogous packages for the previous generation of learning methods enable sharing of models; and ‘model zoo’ efforts such as Kipoi for genomics facilitate reproducible prediction, method comparison, fine-tuning and ensembling of pretrained models. All machine learning researchers should adhere to high standards of model sharing and reproducibility, and biomedical journals should enforce such standards as a requirement for publication. However, an emerging barrier to reproducibility of AI results comes from the extreme computational resources needed to train some of the state-of-the-art models. In some cases, it is possible to use the trained model on new test data, but it is practically infeasible to retrain the model from scratch.
J.L. If transparency means that humans can read the algorithm’s parameters and understand what it is doing, then most future AI algorithms will not be transparent. For example, AlphaZero taught itself so well to play the games of chess, shogi and Go that it beat their grandmasters30. But there is no necessary reason to believe that we can, or even should, understand the rules it learned to play games.
If reproducibility means that if we trained an algorithm again on the same dataset we would get the same result, then the trained algorithm must possess model stability. This means that the algorithm’s parameters are stable because they are fixed by the data, and will not change when the data are randomized and presented again. Model stability is achieved when there is a sufficient number of events so that the algorithm parameters become fixed and is generally feasible for all but the rarest of cancers.
AI algorithms should be trained, tested and validated31. All algorithms are trained on a dataset. It is frequently the case that a dataset will be split into train and test subdatasets, and then trained on the train subdataset and tested on the test subdataset. But these two steps do not tell us about generalizability. For that we need an independent validation dataset. Out-of-sample external validation rarely occurs in pathology or in AI applied to medical image-based diagnostics in general32, likely due to a current lack of larger research consortia with uniform data collection and annotation procedures. It should be noted that accuracy decreases from train and test to validation because the validation dataset is not exactly like the train and test dataset.
G.T. Reproducibility and transparency are critical building blocks for socially responsible AI technology. On the data front, there are pressing questions regarding data quality, data bias and ethical data use. Although we all recognize the scientific value of patient data, the debate over data ownership is ongoing in terms of how best to support transparent AI innovation while mitigating the risks of unethical data handling, intentional or unintentional privacy breaches and adversarial data use. Inference attacks can jeopardize AI algorithms by targeting the training data and/or the trained AI model itself. When we aggregate patient data from different sources, the most vulnerable data source establishes the overall security level. Removing personal identifiers and confidential details is often insufficient, as an attacker can still make inferences to recover aspects of the missing data. Continuing research is needed in this specific area to effectively and safely deploy AI to obtain clinical insights from sensitive patient data while still preserving privacy.
Considering the growing challenges with patient privacy, the scientific community must pay close attention to objective benchmarking of both sensitive datasets and AI algorithms against community-consensus performance metrics. Such benchmarking is necessary to detect, monitor and possibly correct dataset biases or inconsistencies in AI technology performance. First, we need to promote a rigorous statistical framework during the phase of development of AI tools. Such a framework will help us monitor the collected data for potential biases and for measuring reproducibility and repeatability based on statistically and clinically appropriate standards33,34. Unfortunately, the emergence of ‘continuous learning’ AI systems35 complicates postmarketing quality surveillance of adaptive AI tools due to an unintended consequence known as ‘catastrophic forgetting’36. Continuously learning AI systems are designed to dynamically optimize their inner weights as new data are presented; therefore, monitoring the adaptation strategy is as important as is monitoring the performance.
Apart from raising awareness and working towards algorithmic mitigation strategies, building an inclusive and diverse AI workforce is equally important to ensure socially responsible AI. When the community of AI developers reflects the diversity present in the user community and is embedded in that user community from the start, we will be better positioned to safeguard ethical AI use for unintended consequences.
In summary, to ensure transparency, reproducibility and validation in AI, we may need to consider a multipronged approach throughout the AI lifecycle, from data collection to AI development, to clinical deployment and continuous quality control. We should work to communicate to AI users openly and clearly what they should expect across various settings, and we should educate AI users so that they are informed consumers of the technology. Responsible use of AI technology should become part of the mainstream digital education of health-care providers. We cannot anticipate every blind spot, and we should not blame AI for learning from implicit biases in the data because humans do too. By being inclusive, diverse, rigorous and vigilant, we can mitigate many of the aforementioned risks.
Where do you see the future of AI in cancer research and oncology in the short term (next couple of years) and in the long term (10 years or more from now)?
O.E. In the short term we will likely see an increased number of prospective studies designed to test the clinical utility of AI for patients with cancer. Ideal studies would be randomized and assess utility in scenarios comparing practitioners using AI versus practitioners not using AI. The lowest hanging fruits will likely be in the mature fields of pathology and radiology AI. As data to train AI models become increasingly available (for example, genomic and transcriptional profiles of tumours), I envision that AI models predicting response to certain treatments will reach maturity and sufficient performance to be implemented into clinical use.
In the longer term, AI may be used to identify combination therapies and their dosage that optimize efficacy and safety on the basis of each patient’s individual profiles. It is only a matter of time before new anticancer drugs are developed using AI to identify promising targets and design novel molecules with almost guaranteed efficacy and limited toxicity. Another promising use of AI may focus on cancer prevention. Even now, there are encouraging signs that AI algorithms may be able to predict the future risk of developing malignancies on the basis of routine imaging (for example, mammographies)37. AI may be able to help identify personalized strategies to curb behaviours that increase one’s risk of developing cancers (for example, smoking and overeating). In the even longer term, we may see AI used in unexpected areas (for example, for automated or semi-automated robotic surgery). The current challenges in placing AI in the doctor’s office likely represent growing pains associated with a new and booming field that promises to dramatically impact human health.
C.L. Over the next few years, AI model development in regulatory genomics and single-cell genomics will continue to explode, and we will increasingly see applications to important problems in cancer. Sequence models that predict epigenomic signals and 3D chromatin contacts will be used to systematically assess the function of non-coding somatic variation in cancer genomes, and similar predictive models of splicing and alternative polyadenylation will be used to screen patients for mutations that alter RNA processing. Single-cell embedding and cell-type classification methods will be used to annotate large-scale tumour atlases currently being assembled by consortium projects and to better characterize the tumour immune microenvironment. New models will exploit the representational power of modern AI to harness data generated from diverse epigenomic and transcriptomic readouts, leverage single-cell technologies, including Perturb-seq (pooled genetic perturbation screens with a scRNA-seq readout) and bridge preclinical models and patient samples. Spatial expression and proteomics, microscopy and cryo-electron microscopy will also continue to grow as AI application domains. On the clinical front, machine learning models applied to genomic data from cell-free DNA will be used for early cancer diagnosis, subtype classification and optimizing cancer treatments via longitudinal profiling.
In 10 years, AI models will become part of the standard toolkit for interpreting large-scale experimental datasets — used broadly across cancer research rather than within a smaller computational biology community — to unravel gene expression and epigenetic programmes in cancer cells, model the immune response to cancer and design therapeutics. Ultimately, AI efforts coupled to massive datasets will lead to novel therapeutic targets — identifying druggable vulnerabilities in cancer cells or approaches to modulate tumour immunity — and advance our fundamental understanding of cancer biology and cancer immunology.
J.L. In the near term, the rate of pathology digitization will increase exponentially. Within a few years, all slides will be digital data. Concurrent with this digitization — and accelerating the digital disruption — will be an increase in the use of AI algorithms. They will be used to enhance diagnostic information to increase diagnostic accuracy, and they will be trained on prognostic outcomes so as to provide highly accurate individual patient disease-specific outcome predictions.
AI algorithms will be applied to retrospective data from clinical trials to improve associations between biomarkers and treatment efficacy. This will likely include a large number of companion diagnostic biomarkers currently quantified through visual interpretation by pathologists, such as hormone receptor expression and ERBB2 amplification in breast cancer23. In the long term, the goal of AI algorithms is to improve diagnosis, assist in the selection of optimal individual patient therapies, improve patient outcomes and reduce health-care costs.
G.T. We will continue to see the development of new AI methods and their application across the full spectrum of scientific discovery and health-care delivery. Growth pace and application breadth will depend on the availability of data and computing resources. For example, there will be more AI-driven efforts in multimodal, multiscale biomarker discovery, in guiding and planning the use of radiotherapy and systemic therapy, and in dynamic prediction of the responses of patients with cancer using multimodal data. Although such efforts will keep feeding the hope and hype of AI, clinical translation will continue to lag until we develop rigorous statistical frameworks, regulatory infrastructure and policies for benchmarking and quality control. AI tools that target workflow efficiencies will be the first to be operationalized in clinical care.
In the long term, I expect that continuing advances in privacy-preserving AI and federated learning (that is, training an AI model collaboratively but without centralized training data) will enable broad collaborations and accelerate scientific discovery38. As data generation activities grow39, broad and FAIR (findable, reusable, interoperable and reproducible) access to data becomes the norm39 and high-performance computing crosses the exascale barrier40, scientists will start interleaving large-scale modelling and simulation with AI to achieve deeper understanding of the underlying biological mechanisms in cancer which will accelerate drug discovery and personalized models of responses to treatment. Ultimately, the scientific community will be able to develop a computational framework that supports longitudinal modelling of patient trajectories. Such a framework will empower patients and health-care providers to fully explore in silico various cancer management strategies to determine the ones that balance best each patient’s preferences and outcomes.
# # #
The above article appeared in Nature Reviews Cancer on September 17, 2021.