AI in Medicine: The Promise and the Evidence Gap
AI diagnostic tools have been approved by regulators, deployed in hospitals, and celebrated in press releases. The evidence that they improve patient outcomes remains thin.
FDA AI medical device approvals accelerating; JAMA systematic review showing evidence gaps; hospital AI deployment controversies.
- Where AI Works in Medicine
- The Outcome Evidence Gap
- The Regulatory Gap
- What Responsible Deployment Requires
AI in medicine is simultaneously the most hyped and most technically real of AI application domains. Radiological image analysis, pathology slide reading, ECG interpretation, sepsis prediction, clinical documentation — these are real AI systems, deployed in real hospitals, affecting real patients. The question is whether they are making outcomes better.
Where AI Works in Medicine
The strongest evidence exists in image classification tasks. A 2019 Nature Medicine paper demonstrated that a Google-trained deep learning system could detect diabetic retinopathy from fundus photographs with specificity and sensitivity exceeding trained ophthalmologists. Multiple papers have replicated comparable results for dermatological cancer detection, chest X-ray pneumonia detection, and mammography cancer identification.
These results are real but come with important caveats: performance is measured on held-out test sets from the same distribution as training data. In deployed settings — different hospitals, different equipment, different patient demographics — performance often degrades significantly. The gap between published benchmark performance and real-world deployed performance is a known and serious problem.
The Outcome Evidence Gap
The more important question is not whether AI can match radiologists on image classification tasks in research conditions, but whether AI deployment improves patient outcomes: reduced missed diagnoses, fewer false positives, shorter time to treatment, lower mortality. This evidence is substantially weaker.
A 2024 systematic review in JAMA found that the majority of published AI clinical studies used retrospective designs, single-center data, and intermediate outcomes (test accuracy) rather than patient outcomes (mortality, quality of life, hospitalization). Of hundreds of FDA-cleared AI medical devices, very few have published prospective randomized controlled trial evidence on patient outcomes.
The Regulatory Gap
The FDA has approved over 700 AI-enabled medical devices as of 2024. The approval pathway for software-based medical devices (510(k) substantial equivalence) was not designed for adaptive AI systems and does not require the level of clinical trial evidence required for new drugs. Companies can obtain clearance on retrospective data demonstrating analytical validity — how well the algorithm performs on a test set — without demonstrating clinical validity in a prospective trial.
What Responsible Deployment Requires
The AI Now Institute and academic critics of medical AI have argued for: mandatory post-market surveillance studies measuring patient outcomes, algorithmic audit requirements for equity (many systems perform significantly worse on underrepresented populations), transparency requirements so clinicians understand what AI systems are making decisions based on, and pre-registration of clinical AI studies to reduce publication bias. None of these are currently required by FDA clearance pathways.
The WokHei editorial desk continuously monitors hundreds of sources across technology, science, culture, and business — detecting emerging patterns, surfacing overlooked angles, and writing analysis grounded in what the data actually shows. It does not speculate beyond its sources and cites everything it draws from.
View all editorial analyses →- genomoncology/biomcpGitHub · LLM · Mar 14
- Iran’s black rain could expose millions to toxic chemicals after massive oil fires darkened skies over Tehran. Health experts say rainfall can pull soot, petroleum residues and carcinogenic particles from smoke plumes. Doctors warn of respiratory illness and cancer risk.r/climate · Mar 14
- Algorithm allows paramedics to predict brain damage risk after cardiac arrestMedical Xpress · Mar 14
- Fascinating story: Tech Entrepreneur in Australia, using ChatGPT, AlphaFold, and a custom made mRNA vaccine, treats his dog's cancer. With the help of researchers (who all seem so excited) he was able to significantly reduce tumour size just weeks after the first injectionr/singularity · Mar 14
- Researchers use AI and genomics to design personalised mRNA cancer vaccine — tumour shrinks >50% in dog with aggressive cancerr/Futurology · Mar 14
- Japan Approves the World’s First Treatment Made With Reprogrammed Human CellsWired · Mar 14