AI for Clinical Evidence Summarization
Medical evidence plays a key role in healthcare decisions. The widely used approach to generate high quality evidence is through systematic review of clinical trials. Since 2020, over 130,000 clinical trials have been registered at ClinicalTrials.gov. The rapid growth of data volume poses a great challenge in the modern systematic review process. In addition, not all clinical trials are qualified for systematic review. Some trials are at high risk of bias, while some were not conducted following designed protocols. The inaccuracies of clinical trials can be further propagated in the evidence synthesis, which result in harmful clinical decisions. All of the above clearly indicate an urgent need for reliable, scalable and efficient pipeline for obtaining medical evidence, which is already a labor-intensive task.
Our approaches leverages generative AI technologies, exemplified by large language models (LLMs). Specifically, systematic review is a process for searching, screening, appraising, synthesizing, and summarizing information from free text clinical research publications. A large portion of this process can be fully or partially automated by LLMs, including text classification, named-entity recognition, and document summarization. These models, which have been demonstrated in tasks such as spam message filtering, are promising for retrieving relevant articles and appraising evidence quality. In addition, LLMs have also manifested reasoning capability, which can be used to assist with meta-analysis and results synthesis.
We primarily serve research scientists who have information need for specific clinical questions. A recent analysis of systematic review pointed out that the average time dedicated on conducting systematic review is 67.3 weeks. By the time a review is completed, the information can be outdated. We aim to mitigate the difficulties in retrieving and synthesizing medical evidence by automating the pipeline and provide timely delivery of evidence synthesis. While we do not directly serve healthcare providers and patient groups, these groups are indirectly impacted by the proliferation of misinformation and can also benefit our solution.
Our lab is in close collaboration with domain experts from neurology, rheumatology, radiology, general surgery, nephrology, and others. We have well-established traditions of working relationship with these experts in evaluating cutting-edge technologies in informatics and a strong record of publications via which we share our findings with the community. Our collaborative publications appear in Nature digital medicine, JAMIA, JBI and other journals.
- Collecting, analyzing, curating, and making sense of big data to ensure high-quality inputs, outputs, and insights.
- Prototype: A venture or organization building and testing its product, service, or business model, but which is not yet serving anyone
- Monitoring & Evaluation (e.g. collecting/using data, measuring impact)
Our solution distinguish itself for its reliability and trustworthiness for medical evidence summarization. So far, generative AI has a well-known issue of hallucination. Existing solutions based on retrieval-augmented generation provide a list of references to support the claim but do not solve our problem in that the included references, or clinical trials, may not inherently agree with the final results of meta-analysis in the systematic review. In contrast, we will enforce quality control on the input resources. While we focus on medical domain specific application, we believe the principles can be applied in general domains as well.
Our solution aims to alleviate the intensive labor requirement for clinical evidence review and hence help patients, clinicians and researchers better digest the large body of medical literature. A timely update on the best available medical evidence will help healthcare providers better understand effects of treatments, avoid using medications or procedures that have been demonstrated harmful to patients.
We design five modules that involve AI technologies. These modules are in charge of searching for articles relevant to a given clinical question, critically judging the quality of the retrieved information source, extracting key concepts from the retrieved documents, synthesizing evidence information from multiple sources, and summarizing evidence synthesis. The raw data of evidence information is typically available in publisher databases, while each task will require specific annotations curated and quality-checked by our collaborating clinicians.
To strive for fairness, we will focus on eliminating disparity censorship from the underlying data. Due to lack of healthcare service, some patient groups are under-tested than others. While developing AI models for risk prediction, assuming un-tested patients as health is dangerously naive and seemingly reasonable. To avoid such bias in our solution, we will enforce quality control on the input evidence.
Our key results for the next year will be language models that are specifically developed for summarizing results of meta-analysis. By the end of five years from now, we will also present modules for evidence appraisal and synthesis and deploy our system in review activities.
- Nonprofit
3 full-time staff and numerous collaborators.
2 years
Our team consists of members from both domestic and international scholars and students of various racial and ethnicity background. Our recruiting of team members will never be based on race, gender, religion, or other demographic factors.
We build research prototypes and disseminate the results via publications or patterns.
We can secure additional research grants for this particular project.
Human capital: 500K/year
Computing power consumption: 200k/year
Results dissemination (conference, traveling): 20k/year
100k: cover computing credits and partially conference dissemination
We are excited about all aspects provided, particularly the networking opportunities. The evaluation of AI solutions for healthcare problems, when conducted on only a single site, can be biased and conclusion may not be applicable to other organizations. We appreciate this networking opportunity to expand collaboration, in order to obtain a comprehensive solution.