Galen: the Medical AI Chatbot
Galen is a solution to the profound challenges faced by the global community of rare disease patients and their caregivers. Given the nature of rare diseases, patients often go through a 'diagnostic odyssey' which involves years of uncertainty, numerous tests, and visits to multiple specialists. Globally, it is estimated that rare diseases affect approximately 400 million individuals. Despite the name, the collective impact of these diseases is significant, yet the research and development for treatments are often limited due to the relatively small patient population for each disease.
A key contributing factor to this problem is the difficulty in accessing and synthesizing the vast and often siloed knowledge about rare diseases. Medical professionals may encounter a particular rare disease only once or twice in their career, if at all, leading to potential misdiagnoses or delayed diagnoses. Similarly, researchers are hampered by the lack of a centralized source of comprehensive, up-to-date information on these diseases.
Galen, a conversational AI chatbot, is designed to improve the diagnostic journey and enhance the efficiency of research related to rare diseases. By using AI to analyze the vast corpus of medical literature and clinical trial data, it can assist in identifying potential diagnoses based on reported symptoms and provide a ranked list of possibilities for further investigation by medical professionals. It can also uncover associations between symptoms, treatments, and outcomes that might not be readily apparent, thus aiding in treatment decision-making.
For researchers, the AI can collate and summarize existing literature, identify knowledge gaps, and even suggest potential hypotheses or experiments. This could significantly speed up the research process, reducing the resources (time, human, financial) needed to arrive at results and thereby reducing the environmental footprint of rare disease research. The AI can also monitor the latest published research and flag relevant articles, allowing researchers to stay updated on the latest developments.
Therefore, Galen targets a major bottleneck in the rare disease domain: the ability to efficiently harness, synthesize, and apply the wealth of existing knowledge for the benefit of patients and researchers alike.
Galen is a conversational AI chatbot, specifically developed for the domain of rare diseases. It works by utilizing advanced language models trained on a vast corpus of medical literature, clinical trial data, and other reliable sources of rare disease knowledge. It's designed to provide comprehensive insights into rare diseases, serving as a powerful and accessible tool for patients, healthcare providers, and researchers alike. You can think of it as a version of ChatGPT that graduated from medical school, then further specialized in rare diseases.
For patients and caregivers, the AI chatbot provides easy access to information on various rare diseases. The user can input symptoms or conditions, and the chatbot will deliver a ranked list of potential diagnoses, along with details on each one, including typical symptoms, standard treatments, and latest research findings. This can help guide users to appropriate medical specialists and resources. (*Important to note: Galen will always defer to diagnostic expertise of human physicians)
For medical professionals, the AI chatbot acts as an instant, accessible, and up-to-date medical reference tool. It can assist in identifying potential diagnoses based on the symptoms presented and can provide a range of information about each disease, including standard treatment approaches and the latest research developments.
For researchers, the AI chatbot offers a powerful research aid. It can provide concise summaries of existing literature on a particular disease, identify knowledge gaps, and even suggest potential hypotheses based on existing data. The tool can also assist in recommending published research, alerting the researcher to relevant articles for their questions.
The technology behind Galen is a large language model, akin to GPT-4, trained specifically for the domain of rare diseases. By processing vast amounts of text data from various sources, this model has learned to generate human-like text that is contextually relevant to the input it's given. It operates by predicting the probability of a word given the previous words used in the text, enabling it to generate complete, coherent, and relevant responses to user queries.
Through its natural language processing capabilities, the chatbot can understand complex questions, context, and semantics, and it can generate responses that provide detailed, accurate, and useful information to the user. The entire system is designed with a user-friendly interface, ensuring that anyone, regardless of their technical ability, can use it effectively.
Galen, therefore, serves as a centralized, accessible, and scientifically-informed source of knowledge for rare diseases, significantly improving the efficiency and effectiveness of diagnosis, treatment, and research efforts in this domain.
Galen aims to serve three primary audiences through its straightforward, conversational interface: rare disease patients and their caregivers, healthcare providers, and medical researchers.
Rare Disease Patients and Caregivers: These individuals often find themselves on a convoluted journey to find a diagnosis, marked by uncertainty and frustration. The scarcity of accessible and comprehensible information on rare diseases can lead to prolonged diagnostic journeys. Galen will provide reliable and digestible information about a wide range of rare diseases, empowering patients and caregivers to pursue suitable medical consultations and treatments. By potentially shortening the diagnostic odyssey, Galen could help save these individuals time, emotional strain, and resources.
Healthcare Providers: Given the nature of rare diseases, it can be a daunting task for clinicians to stay informed about all possible rare diseases they might encounter. Galen will serve as an easily accessible and up-to-date medical reference tool, helping in both the diagnosis and treatment process. This would enable healthcare providers to offer more informed, effective care to their patients.
Medical Researchers: Researchers working in the rare disease space face the significant challenge of sifting through enormous amounts of literature to find pertinent information. Galen can greatly speed up this process by providing concise summaries of existing literature, identifying knowledge gaps, and suggesting potential research directions based on the available data.
As I continue to develop Galen, I've made a point of seeking input from these key user groups to ensure that I fully understand their needs and challenges. I have reached out to rare disease patient advocacy groups, engaged with medical professionals, and spoken with researchers in the field. Their insights, gleaned from interviews, surveys, and usability tests, are critical to my development process, helping me tailor Galen to their needs.
Galen's positive impact extends beyond these user groups. By enhancing the efficiency of rare disease research, it has the potential to expedite the development of new treatments. Moreover, by reducing the time and resources devoted to the diagnostic process and research, Galen could also help lessen the environmental footprint of the rare disease healthcare sector.
Despite being a solo innovator, I believe I am uniquely positioned to deliver this solution due to my close connection to the intersecting communities that Galen serves - the medical community, AI and technology sector, and the communities grappling with rare diseases.
I am a physician. My experience with medical caregiving both here and abroad has given me an understanding of the healthcare landscape and patient needs, especially in the context of rare diseases. I've witnessed these things firsthand and been a part of the community, observing and understanding their pain points firsthand.
Here is where things get interesting. While I am indeed an MD, I am also an AI engineer. While working at Harvard Medical School on deep learning systems, I learned to combine my medical knowledge with advanced technology to address healthcare challenges.
Among the problems I worked on were rare cancers, such as bone and soft tissue sarcomas. I learned to map diagnostic and prediction-based medical problems to computational models, using AI to help solve them. This gave me a new perspective and the skills to build AI systems that could impact healthcare at a global scale. It also is where I became keenly aware of the problem the Horizon Prize is trying to solve: rare diseases face a data scarcity issue. Because they are rare by nature, it is difficult to consolidate and synthesize information on them, leaving everyone in the dark.
Moreover, my personal experience growing up in rural Ohio and my exposure to communities in Sub-Saharan Africa and the Himalayas have made me acutely aware of the access-to-care issues many face. This understanding deeply influences the design and development of Galen, keeping a sharp focus on enhancing accessibility and addressing the unique challenges associated with rare diseases and accessibility to information.
By leveraging my diverse experiences and maintaining an open channel for community feedback, I aim to ensure that Galen remains a user-centered tool, providing real value to those grappling with rare diseases. My proximity to these communities, both as a healthcare provider and an innovator, makes me well-suited to design and deliver this solution.
- Improve the rare disease patient diagnostic journey – reducing the time, cost, resources, and duplicative travel and testing for patients and caregivers.
- United States
- Prototype: A venture or organization building and testing its product, service, or business model, but which is not yet serving anyone
A couple things:
1. The code for Galen is already written. Here is snapshot of the code notebook being used to fine-tune a 40 billion parameter large language model (Falcon-40B) on a corpus of biomedical dialogue:

2. The (growing) dataset on rare diseases. That "corpus of biomedical dialogue" mentioned in item #1 is an expanding list of sample prompts and responses specific to medicine. This is the dataset needed these to fine-tune a LLM into a conversational chatbot. The LLM learns the subject matter content of the dialogue pairs (in this case, rare diseases) and results in a new model that has "learned" about the dataset. The more samples you have the better. I currently have around 70,000 dialogue pairs for medicine and growing
This particular portion of training data deals with 48, XXYY Syndrome - a disease seen in boys that is caused by nondisjunction error, resulting in an extra X and Y chromosome:

Applying for the Horizon Prize holds immense value for the development and potential impact of my AI chatbot, Galen. As an individual innovator currently self-funding this project, several barriers exist that the Prize can help overcome.
Financial: The project, while promising, requires substantial resources for its continued development, maintenance, and scaling. Funds are needed for data acquisition, infrastructure upgrades, user testing, and deployment. Winning the $150,000 prize would significantly alleviate these financial constraints, allowing for robust development and enabling Galen to reach its full potential.
Technical: While I have a medical background as an MD and am enhancing my computer science skills at Carnegie Mellon University, additional technical expertise would accelerate Galen's development and fine-tuning. Access to a network of experts and fellow innovators via the Horizon and MIT Solve community would be invaluable, providing opportunities for collaboration, advice, and potentially even partnerships.
Market Barriers: As a new entrant in the healthcare tech space, getting the solution recognized and accepted by patients, healthcare providers, and researchers poses a significant challenge. I am fortunate to have brand-name value in my training pedigree, having spent time at Harvard Medical School and now getting a masters degree from Carnegie Mellon for computer science and AI. However, that alone is not nearly enough traction to stand out in the LLM space right now. Galen is built for an altruistic mission, and needs ways to be discovered and impact lives. The visibility and credibility provided by the Horizon Prize would help overcome market barriers existing in the LLM market, boosting trust in Galen and increasing its adoption.
Cultural: Changing behaviors, especially in healthcare, is always a challenge. (If you want proof of this, consider the fact that we are the reason fax machines and pagers still exist.) A solution like Galen requires acceptance not only from patients but also from physicians and researchers who are typically used to traditional modes of information retrieval and patient care. It's my hope that the endorsement from a respected entity like MIT, when combined with a noble mission and fascinating tech, will help overcome cultural resistance to AI technology in healthcare.
Applying for the Horizon Prize offers not just the financial support needed but also the opportunity to leverage the knowledge, network, and credibility associated with MIT Solve and Horizon. With these resources, we could significantly advance and distribute Galen, overcoming the barriers that currently limit its growth and impact.
My deep connection to the communities served by Galen stems from three key aspects: my medical background, passion for AI innovation, and commitment to improving access to care.
Medicine: My personal journey has been shaped by a mission to do good and help others, especially those in need of medical care. These desires are why I went to medical school. The practice of medicine has given me a deep understanding of the human body and the challenges faced by patients and healthcare providers, especially for those dealing with rare diseases.
AI Innovation: My recent work at Harvard Medical School involved developing and researching deep learning systems for surgery and oncology. This experience revealed to me my (previously unrealized) aptitude for AI development and its immense potential to scale healthcare solutions for positive impact. Driven by the transformative potential of AI, I pivoted from my original path towards surgery, choosing instead to pursue a Master's degree in computer science and AI at Carnegie Mellon University. I now plan to use AI & other exponential technologies to use my medical knowledge for positive change at scale.
Access to Care: Given the hi-tech nature of this whole endeavor, you might surprised to learn that I actually grew up on a cattle farm. (In fact, that's where I am sitting as I write this!) Growing up in rural Ohio, I witnessed firsthand the barriers to healthcare access due to geographic and financial constraints. This early exposure was later compounded by later experiences I had abroad. During a church missionary trip in South Korea, and later on medical mission trips to Sub-Saharan Africa and the Himalayas, I saw communities struggling with limited healthcare resources. These experiences instilled in me a deep commitment to enhancing healthcare accessibility, a commitment that drives my current work.
These three aspects all influence Galen, which I hope can help address the unique challenges associated with rare diseases. Galen leverages the power of AI to provide comprehensive, accurate, and easily accessible information on rare diseases, thereby serving as a valuable tool for patients, caregivers, healthcare providers, and researchers. By pooling data and knowledge about rare diseases, Galen aims to reduce the informational barriers that can prolong diagnostic journeys and hinder effective treatment.
I hope that Galen can contribute to eliminating the helplessness often associated with rare diseases, leveraging my experiences in medicine, AI, and my commitment to enhanced healthcare access to make a meaningful impact in the rare disease community.
First, Galen addresses the problem of data scarcity, which is a significant hurdle in rare disease research and treatment. By leveraging large language models and open-source databases, Galen provides a unique solution to amass and analyze data about rare diseases. It does this without resorting to personal health records, thereby ensuring privacy and avoiding ethical dilemmas associated with sensitive data usage. This approach circumvents traditional bottlenecks in healthcare data collection, offering a novel and privacy-centric way to tackle the problem.
Second, Galen democratizes access to information. It makes expert knowledge on rare diseases accessible to non-specialists, reducing geographical and knowledge barriers that often impede accurate diagnosis and treatment. While on medical mission trips to both Africa and India, I witnessed how far some patients would travel for their diagnoses, and the resulting toll it took. However, almost all of them used smart phones. A smart app would prevent the need to travel so far for their questions. Improved accessibility can expedite the diagnostic journey for patients, alleviating their emotional and financial burden while simultaneously minimizing their carbon footprint due to reduced travel and testing.
Third, Galen fosters cross-pollination of knowledge by bringing together information on diverse diseases. This approach encourages a holistic understanding of rare diseases and their interconnections, opening up new avenues for collaborative research. Researchers across the world can draw insights from this interconnected knowledge, potentially sparking breakthroughs and enhancing research efficiency.
This AI-driven, data-intensive approach to rare disease diagnosis and research makes Galen not only an innovative solution but also sustainable. In terms of environmental sustainability, Galen reduces carbon emissions by minimizing travel, diagnostic redundancies, and single-use plastic usage inherent to repeated testing. In terms of market sustainability, Galen stands poised to change the landscape of healthcare by promoting efficiency, inclusivity, and sustainability. It catalyzes broader impacts by setting a precedent for other AI-based healthcare solutions, encouraging them to follow suit in considering the environmental footprint of healthcare delivery. Galen could potentially transform how we approach not only rare diseases but also healthcare more broadly, prompting a shift towards more environmentally-friendly, patient-centered, and data-driven healthcare delivery systems.
In the next year, our immediate impact goals for Galen include:
Finalizing the AI model, ensuring it provides accurate and valuable information on rare diseases. We will continue refining the algorithms and integrating more open-source data to achieve this goal.
Launching the Galen chatbot to a group of initial users, both patients and healthcare professionals, for beta testing. Feedback from this group will guide further refinement of the system.
Establishing partnerships with rare disease advocacy groups and medical research institutions. These relationships will ensure that Galen's insights reach the people who need them most and are used to guide research towards impactful solutions.
In the next five years, our long-term impact goals include:
Expanding Galen's user base to include healthcare professionals, researchers, and patients worldwide. We aim for Galen to be a globally recognized resource for understanding and addressing rare diseases.
Driving tangible improvements in rare disease diagnosis and treatment. By providing comprehensive, easily accessible information, we hope to reduce the average diagnostic journey and improve access to treatment for rare disease patients.
Contributing to breakthroughs in rare disease research. By pooling and analyzing data on rare diseases, Galen can highlight unexplored connections and potential research avenues, potentially catalyzing novel solutions.
Reducing the environmental footprint of rare disease healthcare. Through minimized travel, decreased redundant testing, and streamlined research efforts, we aim to significantly lower the carbon emissions associated with rare diseases.
Galen is still in development, so my current metric for success is the number of knowledge areas incorporated into the dataset (i.e., the variety of diseases, number of training samples created, etc).
As time goes on, our team could employ a combination of qualitative and quantitative indicators inspired in part by UN Sustainable Development Goal (SDG) #3 "Good Health and Well-being."
For example, in the short term (next year), Galen's indicators could include:
Number of datasets integrated: We could measure the diversity and scale of data integrated into Galen. This is key to ensuring comprehensive and accurate information on rare diseases, so that's how I'm measuring progress now.
User engagement metrics during beta testing: User feedback, the number of questions asked to Galen, the number of unique users, and session lengths. These metrics will help assess whether Galen is providing real value to users and where improvements are needed. I expect a lot of fine-tuning the model during this period.
Number of partnerships established: This might serve as a direct measure of integrating Galen into the wider rare disease community and ensure its insights are reaching those who need them most.
In the long term (next five years), we could also consider:
Improvement in diagnosis and treatment times: While this will be a more challenging metric to track, user surveys and partnership with healthcare institutions could help us estimate how Galen is contributing to quicker and more accurate diagnoses and better treatment plans.
Number of research initiatives informed by Galen: In the R&D space, we could track mentions of Galen in research papers and feedback from our partner institutions to assess its impact on advancing rare disease research.
Carbon footprint reduction: By quantifying travel and physical resource use saved through Galen's use (e.g., reduced need for patients to travel for specialist consultations or redundant tests), we can estimate the carbon emissions saved.
Global user engagement metrics: To align with SDGs such as target 3.b (providing access to medicines for all) and 3.d (strengthening global health risk management), we will likely measure Galen's reach and usage globally, particularly in developing countries. I'm most excited about this metric, as someone engaged with global health efforts.
These indicators will probably be reviewed and updated over time to accurately reflect the progress and guide our team's ongoing efforts.
The theory is that increased access to comprehensive and accurate rare disease knowledge can streamline diagnoses and improve treatment options, thereby making a profound impact on patient lives while minimizing environmental costs. Here's a rough breakdown:
Activities:
- Development and optimization of the Galen AI chatbot
- Integration of expansive and diverse datasets related to rare diseases
- Establishing partnerships with healthcare institutions, researchers, and patient communities
- Continuous user feedback incorporation and model refinement
Immediate Outputs:
- Comprehensive, user-friendly AI chatbot providing relevant, high-quality information on rare diseases
- Strong network of partnerships with key stakeholders
- Positive user feedback and engagement metrics indicating user value
Longer-term Outcomes:
- Decreased time to diagnosis and improved treatment plans for rare disease patients
- Increased number of research initiatives informed by Galen, fostering advancements in rare disease research
- Reduction of environmental costs associated with rare disease healthcare, such as travel emissions and redundant testing
- Global accessibility of reliable rare disease information, reducing disparities in knowledge and care
There is evidence to support the impact of such an approach. Many studies highlight the inefficiencies in rare disease healthcare, underscoring the necessity and potential impact of a solution. I expect a widely-accessible and simple-to-use technology like Galen will provide that solution. By democratizing access to high-quality rare disease information, we can create a more sustainable and patient-centric model for rare disease healthcare.
Natural Language Processing (NLP): NLP is a branch of artificial intelligence that focuses on the interaction between computers and human languages. It involves several tasks like translation, sentiment analysis, speech recognition, and topic segmentation, among others. For Galen, the NLP component enables the chatbot to understand and process the queries posed by users in natural language, i.e., the way humans speak and write. This is essential for the system to extract relevant information from the query and provide a meaningful, contextually accurate response.
Generative AI: Generative AI refers to a type of artificial intelligence, more specifically a class of machine learning, that is capable of creating new content. It can generate texts, images, voice, and other types of data that resemble the original training data. In Galen's case, the generative AI allows the chatbot to produce human-like text based on the input it receives. This goes beyond a simple question-answer model, as Galen can generate new sentences, even paragraphs, of information that are tailored to the user's inquiry.
Large Language Models (LLMs): LLMs are a type of generative AI that have been trained on vast amounts of text data. They are designed to generate text that is contextually relevant and grammatically correct. They do this by predicting the likelihood of a word given the words that came before it. In Galen, an LLM is used to understand the context of the user's questions and generate relevant, comprehensive, and accurate responses. It's the backbone of the system, allowing Galen to converse naturally, understand complex medical information, and provide insights on rare diseases to even inexperienced end-users, such as elderly members of patient families.
- A new application of an existing technology
- Artificial Intelligence / Machine Learning
- Big Data
- Software and Mobile Applications
- Not registered as any organization
Just me so far!
I've been working on this for about 8 weeks.
It's still early to comment on the leadership and organizational stance, since it's just me. But with regards to the mission of the technology, diversity and equity are central. The entire goal of Galen is to create and democratize a highly-accessible technology to a global community of disenfranchised patients and their loved ones. It stemmed from experiences I've had volunteering as a healthcare worker where the inequalities are jarring, such as the Kingdom of Eswatini and the Outer Himalayas. The hope is that AI technologies for health (both Galen as well as others) can help overcome this divide.
I expect Galen to use a freemium business model to ensure maximum accessibility, especially in third-world nations. At its core, Galen's purpose is to democratize access to rare disease information and provide the necessary support for patients, caregivers, and healthcare providers alike.
Free Tier: The basic version of Galen is available to everyone, offering comprehensive disease information, query-based symptom recognition, and an interface to engage with patient communities. This ensures that anyone, irrespective of their financial means, can benefit from Galen's key features.
Premium Tier: For a subscription fee, users can access advanced features like personalized alerts for new research or clinical trials, advanced analytics about certain disease trends, or a secure telemedicine feature for remote consultations. Extra features like these might be valuable for healthcare providers and research organizations, who are more likely to have the means for this level of subscription.
Partnership with Healthcare Providers and Research Institutions: Galen could provide value as a platform for patient education, symptom tracking, and remote patient monitoring. Later on, Galen might collect data (always with user consent and anonymized for privacy) to generate insights for researchers, helping them identify patterns and correlations in rare disease occurrences, symptoms, and treatments.
Philanthropy and Grants: Given the altruistic mission, Galen could also operate with the help of philanthropy and grants. If we secure enough funding, maybe we could provide the premium services free of charge to underserved communities or individuals who can't afford them. (We would need to design an application process or something, but possible.)
The goal is to ensure that Galen is not only self-sustainable but also continually evolving and improving to serve the needs of the global rare disease community.
- Individual consumers or stakeholders (B2C)
We will begin by acting as a consumer app (B2C), and from there secure enterprise deals with R&D organizations, healthcare systems, etc. Once we have some freedom of operation, then we can transition to a freemium model without going under. The free tier will ensure a (capped) accessibility to all users, while the premium tier provides advanced features at a reasonable subscription fee. The premium tier would be aimed those who want extensive use of the AI - primarily healthcare providers, research organizations, and patients who can afford and benefit from the enhanced services.
Two days ago I created a signup list on the Galen website, and it has a couple hundred signups so far. Not actively collecting payments yet, however.

Founder & Physician