Aidoc: An Independent Evidence Review

Published Studies

Filter by clinical condition or view all. Each entry links directly to the peer-reviewed publication.

Intracranial Hemorrhage 2025 NPJ Digital Medicine

Real-World Performance Evaluation of a Commercial Deep Learning Model for Intracranial Hemorrhage Detection

Emory University · Published in NPJ Digital Medicine (Nature)

Key Findings Sensitivity was notably lower for subacute bleeds (45.5%) and chronic bleeds (54.8%) than for acute presentations. Overall sensitivity in outpatient settings was 72.2%. The authors note this is lower than figures reported in controlled validation studies, and attribute the gap in part to the more heterogeneous patient mix and imaging conditions in routine clinical practice.

Read the study

Intracranial Hemorrhage 2024 AJR Online

Prospective Evaluation of Artificial Intelligence Triage of Intracranial Hemorrhage on Noncontrast Head CT

University of Alabama at Birmingham (UAB) · Published in AJR Online

Key Findings Radiologists using the AI showed no statistically significant difference in diagnostic accuracy compared to reading without it. Specificity was higher without the AI (99.8% vs. 99.3%), suggesting the AI added false positives without a measurable offsetting gain in sensitivity in this cohort.

Read the study

Pulmonary Embolism 2023 Radiology (RSNA)

Prospective Evaluation of AI Triage of Pulmonary Emboli on CT Pulmonary Angiograms

University of Alabama at Birmingham (UAB) · Published in Radiology

Key Findings The study found no statistically significant improvement in radiologist accuracy, miss rate, or report turnaround time when the AI was used. The tool did reprioritize positive scans in the worklist, but this did not translate into a measurable difference in diagnostic outcomes in this prospective evaluation.

Read the study

Intracranial Hemorrhage 2026 PubMed

Head-to-Head Comparison of Two AI Triage Solutions for Detecting Intracranial Hemorrhage

Baylor / Texas Medical Center Network · Published on PubMed

Key Findings The study documented false-negative rates of approximately 6% and nearly 100 false positives within the study cohort. Error analysis found that motion artifacts and scanner hardware variations were common contributing factors — sources of variability that are routine in multi-vendor hospital environments.

Read the study

C-Spine Fractures 2021 AJNR

Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for Cervical Spine Fracture Detection

Lahey Hospital & Medical Center · Published in American Journal of Neuroradiology

Key Findings Sensitivity was 54.9% with a Positive Predictive Value of 38.7%. Failure mode analysis found that degenerative disc changes were a common source of false positive classifications. The authors noted the algorithm had received "little or no external validation" prior to clinical use, and called for more rigorous independent validation before deployment of similar tools.

Read the study

Large Vessel Occlusion 2025 Stroke (AHA)

A Retrospective Analysis Comparing AIDoc and RAPIDAI in the Detection of Large Vessel Occlusions

Academic Medical Center · Published in Stroke: Vascular and Interventional Neurology

Key Findings A 22% false negative rate for Aidoc in large vessel occlusion detection within this cohort — meaning roughly 1 in 5 cases were missed. The study concluded that neither major platform was superior and that both required extreme caution given the miss rates observed. The authors stopped short of recommending clinical reliance on either tool without further validation.

Read the study

Pediatric ICH 2026 PubMed

Performance Evaluation of a Commercial Deep Learning Software for Detecting ICH in a Pediatric Population

Independent Researchers · Published on PubMed

Key Findings The study found elevated false positive rates in pediatric patients, driven by normal anatomical features in children — choroid plexus calcifications and hyperdense venous sinuses — that the algorithm, trained predominantly on adult data, classified as hemorrhage. The authors conclude that population-specific validation is necessary before applying adult-trained AI tools to pediatric imaging.

Read the study

Operational & Patient Safety Risk Factors

Cross-cutting issues relevant to procurement, deployment governance, and clinical risk management — not limited to individual studies.

⚠️

Performance Gap: Benchmarks vs. Real-World Deployment

Performance figures in vendor materials are typically generated under controlled validation conditions. The independent studies reviewed here document meaningful sensitivity gaps in real-world settings — for example, 72.2% overall outpatient ICH sensitivity (Emory, 2025). This gap between validation-study performance and real-world deployment is a recognized challenge in clinical AI broadly. Health systems should require independent, site-specific validation before treating vendor benchmarks as predictive of their own environment.

🔬

Trained on Adults, Deployed on Children

The pediatric ICH study found that the algorithm — trained predominantly on adult imaging data — produced frequent false positives in children due to normal pediatric anatomical features (choroid plexus calcifications, hyperdense venous sinuses) that differ from adult presentation. This raises a broader due-diligence question health systems should ask of any AI vendor: has the tool been validated on the specific patient populations you serve, including pediatric or other subgroups that may differ from the training cohort?

📉

Prospective Evidence: No Significant Improvement in Radiologist Accuracy

Two prospective UAB studies — one evaluating PE triage, one evaluating ICH triage — found no statistically significant improvement in radiologist diagnostic accuracy when using the AI compared to reading without it. In the ICH study, specificity was lower with the AI than without it. Prospective controlled study designs are generally considered the strongest basis for evaluating clinical impact; the absence of a detectable benefit in two such studies is a meaningful data point for procurement decisions.

🤖

Scanner Dependency and Artifact Sensitivity

Multiple studies found Aidoc's performance varied significantly by scanner manufacturer, imaging protocol, and the presence of artifacts — motion blur, hardware-related noise, and calcifications. Hospitals use diverse scanner fleets; a tool that performs well on one manufacturer's hardware and fails on another's is not reliably deployable at scale.

📊

Lack of External Validation Before Deployment

The Lahey cervical spine study explicitly flagged that many similar AI algorithms — including Aidoc's — received "little or no external validation" before being deployed clinically. FDA clearance requires demonstrating performance, but does not require external validation across diverse hospital populations or independent replication of performance claims.

🔄

Model Drift and the Monitoring Gap

AI models can degrade over time as patient populations, scanner firmware, or imaging protocols shift — a phenomenon called model drift. Post-deployment performance monitoring in clinical AI is not yet standardized across the industry. Health systems should ask vendors specifically how ongoing performance is tracked, what thresholds trigger re-validation, and what contractual obligations exist if deployed performance diverges materially from validated benchmarks.

Sources

All studies are peer-reviewed and published in indexed medical journals.

1 Emory University. Real-world performance evaluation of a commercial deep learning model for intracranial hemorrhage detection. NPJ Digital Medicine, 2025. nature.com ↗
2 University of Alabama at Birmingham. Prospective Evaluation of Artificial Intelligence Triage of Intracranial Hemorrhage on Noncontrast Head CT. AJR Online, 2024. ajronline.org ↗
3 University of Alabama at Birmingham. Prospective Evaluation of AI Triage of Pulmonary Emboli on CT Pulmonary Angiograms. Radiology, 2023. pubs.rsna.org ↗
4 Baylor / Texas Medical Center. Head-to-Head Comparison of 2 AI Triage Solutions for Detecting ICH. PubMed, 2026. pubmed.ncbi.nlm.nih.gov ↗
5 Lahey Hospital & Medical Center. Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for C-Spine Fractures. AJNR, 2021. ajnr.org ↗
6 Academic Medical Center. A Retrospective Analysis Comparing AIDoc and RAPIDAI in the Detection of LVOs. Stroke: Vascular and Interventional Neurology, 2025. ahajournals.org ↗
7 Independent Researchers. Performance Evaluation of a Commercial Deep Learning Software for Detecting ICH in a Pediatric Population. PubMed, 2026. pubmed.ncbi.nlm.nih.gov ↗

Aidoc's AI tools: What independent research shows in real-world settings.