Building trust in medical AI
Photo by Evan Krape December 03, 2025
UD computer scientist earns NSF CAREER award to improve health care AI
From recording clinical notes to assisting with insurance pre-authorizations, AI is already helping health care providers. As AI advances, it continues to face major hurdles in preventing bias, protecting patient privacy and ensuring that its output is accurate, ethical and trustworthy.
With support from the National Science Foundation, University of Delaware computer scientist Rahmat Beheshti is exploring how to evaluate, and ultimately improve, the reliability and fairness of AI in medicine. The $600,000 award is part of NSF’s Faculty Early Career Development (CAREER) program.
The CAREER award supports Beheshti’s long-term commitment to pursuing “net positive tech for human well-being.” He recognizes that technology can cause both harm and good, and he seeks to ensure that the balance tilts clearly toward improving health.
“We need bold solutions like AI to address large, foundational issues in medicine,” said Beheshti, an associate professor in the Department of Computer and Information Sciences. “AI is not going away, and we have to learn how to harness its capabilities for good.”
Beheshti approaches these issues from a computer science perspective, exploring algorithmic ways to make medical AI more effective and reliable. But he views the challenges of AI in health care as socio-technical, requiring both technological innovation and close collaboration with clinicians who understand patient care in practice.
This commitment to interdisciplinary research stems from Beheshti’s postdoctoral experience at the Johns Hopkins Bloomberg School of Public Health, where he first witnessed the day-to-day issues clinicians face in providing the best treatment for their patients.
“Patient matters were number one, and there was a true sense of giving back to the community,” he recalled. “It was very eye-opening. It shifted my view of what I want to do in my professional life.”
Beheshti’s Healthy lAife laboratory at UD seeks to unleash the power of AI to solve meaningful health problems. In the new project, the team will examine the performance of large language models (LLMs), AI systems trained to understand and generate human language, in clinical scenarios and work to improve their reliability.
Evaluating AI in health care
The project’s first phase seeks to develop rigorous, scalable methods to evaluate the robustness of LLMs in clinical settings.
“You can test a model here or there, but truly scalable evaluation is extremely challenging,” Beheshti explained. “It requires massive computational power, large datasets and robust infrastructure, and even then, it’s not always feasible. Clinicians want to know: Can I trust this tool? Will it actually improve outcomes for my patients?”
To answer these questions, Beheshti’s team will work with physicians and medical students from Nemours Children’s Health and Weill Cornell Medicine to assess scenarios such as predicting a patient’s risk of developing obesity or determining optimal treatment options. Their analyses will compare LLM outputs with established clinical guidelines, meta-analyses and other gold-standard sources to measure accuracy, balance and trustworthiness.
Using retrieval-augmented generation, a method that lets AI check its answers against trusted medical sources, the researchers hope to systematically identify potential biases and errors. Pilot studies will focus on childhood obesity prevention and treatment, but the team expects these automated evaluation approaches to be generalizable across many areas of health care.
Advancing trustworthy AI
The project’s second goal is to develop methods to improve the fairness and reliability of LLMs in medicine. The team will leverage constitutional AI, an approach that guides models to behave in line with agreed-upon rules.
“Historically, models have been refined using human feedback, in which people review and correct the model’s outputs. This process is subject to human bias,” Beheshti explained. “In constitutional AI, trusted experts — in this case, clinicians — define the golden rules, such as ‘do no harm’ or ‘base recommendations on evidence.’ The model then learns to follow those principles as it generates responses.”
The ultimate goal is for the LLM to help identify and correct its own errors based on the established constitution. In the future, Beheshti hopes this AI-guided feedback will replace time-consuming, expensive manual oversight.
Bringing responsible AI to the community
Education is another cornerstone of Beheshti’s CAREER project. Building on his 2021 human-centered AI course at UD, he plans to train engineers, computer scientists and health professionals to think critically about how AI is used in practice. He hopes to engage a wide range of learners, from K-12 students to seasoned health care professionals, to explore the opportunities and challenges posed by modern AI.
Beheshti’s research and outreach efforts promise to help define what responsible AI looks like in health care, ensuring that technology is consistently guided by evidence.
Funding is provided under NSF award number 2443639.
Contact Us
Have a UDaily story idea?
Contact us at ocm@udel.edu
Members of the press
Contact us at mediarelations@udel.edu or visit the Media Relations website