A type of artificial intelligence is being used to research Covid-19, but the experts behind it must consider the social implications of their work – and participation from diverse voices
Machine learning is a kind of artificial intelligence where algorithms are “trained” on data sets to infer useful patterns and relationships. The resulting generalisations may then be used to gain insights from that data or else to make predictions about unseen cases.
Covid-19 researchers are using machine learning (ML) techniques to distil useful information from the massive volumes of data currently being generated in relation to the virus. The time-saving benefits of ML—as well as its potential to identify useful patterns—are critical as countries are scrambling to reduce the medical threat and plan for a post-pandemic world.
But while it is clear many current challenges posed by the coronavirus response may benefit from ML-based systems, it is important both to set realistic expectations regarding ML successes and also to empower diverse voices in the development of these systems to achieve equitable outcomes.
One current challenge for medical researchers is being able to scour the ever-increasing number of scientific papers on coronavirus (currently numbering in the tens of thousands) being published to find information useful for their work. The US-based Allen Institute for AI, in conjunction with Microsoft, the White House, and numerous other research organisations, have compiled and made public the Covid-19 Open Research Dataset (CORD-19) of coronavirus research papers.
Kaggle—an online community of data scientists and machine learning experts—is hosting an ML challenge to answer selected research questions using the CORD-19 dataset, for example: “What is known about transmission, incubation, and environmental stability?” and “What do we know about Covid-19 risk factors?”
The goal is for these ML projects to quickly consolidate insights from the scientific literature that might otherwise have taken an inordinate amount of time—given the urgency of the solutions they inform—for teams who might otherwise have conducted manual reviews of the literature.
ML and the search for a vaccine
Multiple research projects are using ML in different ways to speed up the search for a novel coronavirus vaccine. For example, Google DeepMind has utilised its AlphaFold system, which employs ML to predict the physical properties of a protein based on its genetic sequence, which then generate structure predictions of SARS-CoV-2 virus. This system predicted the spike protein, which the virus uses to infect human cells, and has been confirmed by other researchers.
Further, Insilico from Hong Kong, has used machine learning methods first to generate 100,000 molecules with properties that may target novel coronavirus, and then to narrow this to a pool of 100 promising molecules from which scientists selected seven for further research.
ML and drug repurposing
Another research area where ML may help is drug repurposing. In order to shortcut the timeline required for new vaccine research, Singapore company, Gero, employed ML to find already approved and clinical stage drugs that have potential to be repurposed to treat Covid-19. Gero then suggested a list of four candidate drugs so that clinical trials could commence. Now, whether this approach is effective or not of course depends on whether the researchers’ assumptions about the relevant similarities between SARS and novel coronavirus are true, but nevertheless this example illustrates how ML techniques can be used to identify and rank viable candidates from a massive pool.
ML for potential new insights
One example of where ML may signal unexpected insights comes from a pilot study that aimed to develop a tool that could predict which cases of Covid-19 would become serious. While the small amount of data used (53 patients, of which 5 became severe cases) means that the tool is nowhere near ready for use, it did deliver a surprising result when it came to identifying the most significant factor that correlated with cases that would become severe: in one case, a ML model based on elevated levels of a liver enzyme achieved 70 percent accuracy. That this feature would have any predictive importance with respect to the severity of the case was unexpected, and it is not clear what the significance of this characteristic is—should it hold in larger studies—but an unexpected find has the potential to increase our understanding or point investigations in new directions.
We have seen there are clearly benefits for experts being augmented by ML in their work, particularly the potential to speed up some tasks and also the potential for generating new insights. While the above examples are only a selection of the numerous projects applying various ML algorithms in attempts to solve pressing coronavirus-related problems, it is worth keeping in mind that ML-based models and systems are tools that each have limitations, and that expertise from multiple domains (including data science and the application domain) are necessary to create systems that are not only accurate and safe, but also fair.
Another project, COVID-NET, attempts to diagnose Covid-19 infection based on X-ray scans of a patient’s chest. One selling point is speed: the system will classify a case in a fraction of the time it takes a radiologist to form a judgment. Similarly, Chinese technology company Alibaba released their own CT Image Analytics for Covid-19 AI-based system, which performs the same task and has been used by dozens of hospitals in China.
Alibaba boasts that their CT Image system achieves 96 percent accuracy. But the details of the system are important: if, for instance, the 4 percent of errors were false negatives for true Covid-19 cases (i.e. giving a negative result when it should be positive) then—depending on the policy for handling non-covid-19 patients—the consequences for the treatment and spread of the disease might be dire.
Additionally, the training and test data needs to be considered. For example, if the data used to train the ML model is based on advanced infections of Covid-19, then the system might be less useful for early detection, where PCR testing is more reliable (that is, testing that detects viral RNA). And if the CT images used to train the system are not representative of images from a real hospital context then it may fail to achieve the claimed level of accuracy.
These kinds of considerations have implications as to how the system can best be used to assist in screening. For instance, it might be useful to detect some cases of Covid-19 in patients that have been admitted for other purposes, but it is unlikely at the current time to replace the need for radiologist and other medical expertise. And while the prospects of AI in medical imaging are bright, this does not yet replace the human elements radiologists provide of communicating diagnoses in a way that takes into consideration patients’ values. All this is to say that care needs to be taken to understand the potential limitations of the tools so the advantages they offer as supporting tools can safely be realised.
Data scientists are also attempting to apply ML to help solve hospital resource planning needs. The UK National Health Service (NHS), in conjunction with researchers from Cambridge University, began a pilot of the Covid-19 Capacity Planning and Analysis System (CPAS) in April. This system has been trained on Covid-19 patient hospital admission data to predict demand for ICU beds, with the aim of ensuring the right resources are at the right hospitals to meet that demand.
It is worth noting there are value judgments to be made in the design of a system like this. For example, the authors state protected characteristics like ethnicity will only be included if they are predictive of patient health outcomes. But if there are proxies for characteristics like ethnicity in the data then there may still be bias against particular ethnicities
Further, if ethnicity is excluded, then it can be extremely difficult to audit the system for biased outcomes. Any attempt to engineer the system to remove negative bias will involve value judgments and trade-offs, which ought to involve participation by affected groups. This is not to say that the designers of the system have not considered issues like this, it is just to say that technical, social and ethical considerations may be entwined in considerations of possible discriminatory bias.
It is clear that while ML has great potential for positive impact in the current and future pandemic responses, the experts who develop these systems will require not just sufficient technical and application domain expertise but also appropriate recognition of the social implications of their work, and participation from diverse voices. Getting the most value out of machine learning-based systems involves not just technical considerations, but also participation from the diverse communities involved and affected.
Daniel Wilson is a professional teaching fellow in the School of Computer Science at the University of Auckland.
Originally published in newsroom. Republished with permission.