Study finds AI not ready to run emergency rooms

982

UPI – AI isn’t ready to run a hospital’s emergency room just yet, a new study concluded.

ChatGPT likely would ask for unnecessary x-rays and antibiotics for some patients, and admit others who don’t really need hospital treatment, researchers reported in the journal Nature Communications.

“This is a valuable message to clinicians not to blindly trust these models,” said lead researcher Chris Williams, a postdoctoral scholar with the University of California, San Francisco.

“ChatGPT can answer medical exam questions and help draft clinical notes, but it’s not currently designed for situations that call for multiple considerations, like the situations in an emergency department,” Williams added in a UCSF news release.

For the new study, researchers challenged the ChatGPT AI model to provide the sort of recommendations an ER doctor would make after initially examining a patient.

The team ran data from 1,000 prior ER visits past the AI, drawn from an archive of more than 251,000 visits.

PHOTO: ENVATO

The AI had to answer “yes” or “no” as to whether each patient should be admitted, sent for X-rays or prescribed antibiotics.

Overall, ChatGPT tended to recommend more services than were actually needed, results showed.

The ChatGPT-4 model was eight per cent less accurate than human doctors, and ChatGPT-3.5 was 24 per cent less accurate.

This tendency to overprescribe might be explained by the fact that the AI models are trained on the Internet, Williams said. Legitimate medical advice sites aren’t designed to answer emergency medical questions, but to forward patients to a doctor who can.

“These models are almost fine-tuned to say, ‘seek medical advice,’ which is quite right from a general public safety perspective,” Williams said. “But erring on the side of caution isn’t always appropriate in the ED setting, where unnecessary interventions could cause patients harm, strain resources and lead to higher costs for patients.”

To be more useful in the ER, AI models will need better frameworks built by designers who can thread the needle between catching serious illnesses while not asking for unnecessary exams and treatments, Williams said.

“There’s no perfect solution,” he said, “But knowing that models like ChatGPT have these tendencies, we’re charged with thinking through how we want them to perform in clinical practice.”