The Ersilia Model Hub
Low and Lower Middle Income Countries (LMIC) produce less than 10% of the world’s research output (WHO 2021). This means that their population of 3.3 billion people largely rely on solutions devised elsewhere, often not adapted to the specific needs of their communities. LMIC governments cannot prioritize investment in scientific innovation, with most countries dedicating less than 0.5% of their domestic gross product to R&D activities. The lack of funding and research infrastructure is aggravated by the barriers encountered by local researchers to get involved in studies focused on endemic issues. This practice, where scientists from High Income Countries (HIC) liaise with collaborators in LMIC to merely coordinate data collection (a.k.a. ‘helicopter research’) is acknowledged as a major obstacle towards development.
This imbalance is particularly acute in the healthcare sector. Six of the top ten causes of death in Low Income Countries (LIC) are still due to infections, including Malaria, Tuberculosis and HIV/AIDS, but only 10% of the drugs in development are targeting these disease areas. Drug discovery is expensive, taking more than 2 billion USD and at least 10 years to approve a new drug, reason why pharmaceutical companies often overlook therapeutic opportunities that have low return on investment. The best avenue to discover new treatments for the most neglected diseases is to empower the next generation of local scientists to tackle the endemic diseases of their home regions.
Data Science and Artificial Intelligence (AI/ML) is a low-cost solution that may substantially reduce the number of necessary experiments to find a new drug, lowering the prohibitive costs of drug discovery in low-resourced settings. However, effective usage is currently restricted to domain experts and often subcontracted. The reason behind poor adoption of AI/ML is the lack of user-friendly, ready-to-use tools that can be integrated into the day-to-day research of non-experts.
The Ersilia Open Source Initiative aims to strengthen the research capacity for infectious and neglected diseases in Low and Middle Income Countries by lowering the barrier to access AI/ML expertise.
The Ersilia Model Hub provides a catalog of ready-to-use open source AI/ML models to be integrated into experimental pipelines without the need to write a single line of code. Simply put, a researcher can select a model of interest (for example, prediction of activity against the Malaria parasite), input their molecules of interest and obtain an estimation of their antimalarial potential. We aim to gather, in a single resource, two classes of models. On the one hand, we collect models developed by third parties and available in scientific publications. On the other hand, we develop models in-house and/or in collaboration with research groups based in LMIC. Thus, part of our philanthropic mission is to increase visibility and facilitate access to AI/ML research developed by the community, and part is to contribute AI/ML tools ourselves in order to fulfill unmet global health needs. We partner with key research organizations in LMIC and jointly develop AI/ML models to answer their scientific questions. In sum, we propose an innovative model of collaboration that, from inception, empowers the local institution by augmenting their research capacity through a sustainable adoption of digital assets.
We started our journey by focusing on preclinical drug discovery, but after a successful pilot project, we are expanding the platform to include AI/ML models that support the clinical and post approval stages of drug discovery. Most drug regimens have been adapted to Caucasian or Asian populations, and the rich genetic diversity of the African continent is rarely taken into account. We aim to tackle these challenges using the technology we have developed, focusing on the adaptation of drug dosages to African ethnicities and improving the design and patient recruitment for clinical trials on the African continent.
Our main beneficiary population are researchers and clinicians located in Low and Middle Income Countries who work in the field of infectious and neglected diseases. These scientists have little to no support from their governments and depend on funding from large donors from the Global North.
We empower those researchers through the sustainable development and implementation of novel artificial intelligence and machine learning (AI/ML) tools. AI/ML-driven drug discovery methods have been successfully implemented in pharmaceutical companies and tech start-ups. AI/ML-designed drugs are progressing much faster through development phases (in some cases, spanning only one year from design to clinical trials), demonstrating the potential of the technology to accelerate research. By lowering the barrier of access, scientists in low resourced settings will also be able to harness AI/ML and data science to speed up their research and focus the limited resources available on essential tasks.
We serve our beneficiaries in a three-pronged approach:
Open source access to the platform featuring hundreds of AI/ML models for biomedical research.
In situ implementation and training in data science and AI/ML during research visits. Gaining new skills and ensuring project continuation after the collaboration is a cornerstone of Ersilia’s mission, and enhances our users’ careers and funding opportunities.
Focus on neglected disease areas and support projects that leverage in-country resources such as natural products, or that aim at repurposing existing drugs for new indications.
Our vision is to achieve the development of biomedical data science hubs across the Global South, supporting the research networks of these countries and providing tools to answer their most pressing healthcare needs, traditionally neglected by the westernized research landscape.
The Ersilia Open Source Initiative was co-founded by a team of PhD-level scientists, experienced researchers with over 60 publications in peer-reviewed journals, and collaboration experiences in Europe, North America, and Africa. Our small mission-driven team is motivated by the barriers we faced during our own academic careers:
Gemma Turon, CEO and Co-Founder. Gemma was trained as a molecular biologist in the field of oncology and stem cells. During her PhD research, she experienced first-hand how difficult it was to apply new data analysis and AI-based tools to her experimental pipelines. She has always combined her scientific career with work and volunteering in the third sector, with experiences in organizations in Spain, Zambia, Palestine and South Africa. Her willingness to create impact led her to move from a purely academic career to a job where the main goal is to support others' research. Gemma’s background in cellular biology brings the needed expertise to bridge the gap between dry-lab and wet-lab researchers.
Miquel Duran-Frigola, CSO and Co-Founder. Miquel is a computational pharmacologist who spent 10+ years in academia developing AI/ML algorithms for drug discovery. He has 30+ publications in this field and 1,500+ citations. He has a natural interest in the transformative power of computer science in restricted settings, and has combined his academic career with research stays in El Salvador, Moçambique, Zambia and South Africa. He has first-hand experience in the development of research projects in collaboration with local scientists, with success examples such as the clinical data management of women living with HIV and cervical cancer (Pry et al, Lancel Global Health, 2021).
Edoardo Gaude, Co-Founder and Trustee. Edoardo is a molecular biologist with expertise in the development of medical devices and solutions adapted to users' needs. He is the Co-Founder and CSO at PockIT Ltd and serves as director of the Board of Trustees for Ersilia, providing essential business management expertise.
- Build fundamental, resilient, and people-centered health infrastructure that makes essential services, equipment, and medicines more accessible and affordable for communities that are currently underserved;
- Growth
We are applying to Solve to scale up our technology and bring it to production. We have successfully completed a Pilot Project demonstrating the feasibility and impact of our approach. We hope to participate in the Solve Program to learn, network and grow our non-profit initiative so that we can serve more beneficiaries. We are moving from basic and pre-clinical research into the clinic to create a larger and more direct impact on populations suffering from healthcare inequalities, and the Solve community expertise would be ideal to help us in this transition.
- Human Capital (e.g. sourcing talent, board development, etc.)
Data science and AI/ML skills are transforming the healthcare industry, from medical research to diagnosis and service delivery, but the field is still very siloed. New AI/ML tools are constantly being developed and published in peer-reviewed journals. However, most of them are not accessible by a majority of researchers. The major roadblock is the lack of open and maintained infrastructure to support the deployment of AI/ML models for non-experts.
A clear example of the reach AI/ML models can have if deployed in a user-friendly manner is HuggingFace. Focused in the field of Natural Language Processing (NLP), this company now serves thousands of clients worldwide, and spears innovation in the field. In the biomedical sciences arena, a few approaches have been made. For example, the Kipoi database for genomics-oriented models, or the ModelHub for image analysis. Our solution aims to foster, in a single resource, all open source models related to infectious diseases and drug development. We continuously survey the scientific literature to identify relevant assets and, when those do not cover the needs of our users, we work with them to develop new AI/ML models. The key differential aspects of our solution are:
Co-development and implementation of AI/ML tools together with our users to ensure adoption and the creation of a product that fully covers their needs.
Delivery of seminars, workshops and training to build up skills in data science.
Open Science approach: we are committed to working in the open and giving access to all our tools for free.
Purpose-driven non-profit initiative; our goal is to bring new technologies to those who do not have access to them.
We expect that our initiative will support the creation of a strong network of LMIC researchers, collaborating with well-established institutions and initiatives and helping put forward in the scientific agendas the healthcare needs of LMIC populations.
Our impact goals for the next year:
Deploy our tools in 10 institutions located in LMIC before the end of 2024: we have successfully completed our first pilot implementation at the H3D Centre in South Africa. We will now replicate this model in Cameroon, Zambia and Kenya in 2022 and 2023. We are also exploring collaboration opportunities in India and Mexico.
Train 1000 young scientists in the applications of AI/ML to support experimental research: we do so during our in-country visits, devising workshops adapted to our partner’s needs and incorporating interns into the organization.
Experimental validation of in-house developed AI/ML models: we have partnered with the Open Source Malaria Consortium to design new patent-free drugs against P.Falciparum, the causal agent of malaria. These AI/ML-designed drugs are currently being tested in the laboratory. This will be a major milestone to support our technology and approach.
Deploy the Ersilia Model Hub: based on our users feedback, we will focus on incorporating 150 new models before the end of 2022, to cover all aspects of the drug discovery pipeline, and in making the Hub more accessible by designing a graphical user interface and deploying our infrastructure online through the AWS cloud.
Our impact goals for the coming five years are:
Contribute to the progression in the drug discovery pipeline of 5 candidates for malaria and tuberculosis.
Establish partnerships with 5 governments and healthcare agencies in LMIC to analyze their clinical data and improve patient follow up guidelines.
Implement at least one disease monitoring tool aided with AI/ML in a LMIC.
- Reach 1500 AI/ML models in the Ersilia Model Hub.
We use indicators adapted to each stage of our development process. Now, we are focused on measuring the effectiveness of our technology: number of models incorporated in the platform, number of models validated experimentally, and number of users to which we provide service. Our long-term impact measurements are: the number of drugs progressing into more advanced clinical stages, the number of scientific publications arising from our collaborations, the number of new projects including data science components in our partner’s institutions, the funding raised to support local researchers, the involvement of the communities in the research projects (for example, natural plant collectors, field practitioners or community leaders).
Our theory of change is based on a cascade approach, where a few individuals receive intensive training and support and become in turn champions and changemakers in their home institutions. This is an effective method to expand the impact of our work. By focusing on the development of new skills as opposed to the delivery of a service we ensure that the impact of our work in the improvement of our beneficiary target communities is sustained across time. In particular, the upscaling of in-country research capacity will have a lifelong impact on the lives of currently neglected patients in those countries, thanks to improved healthcare and treatment follow up.
Ongoing collaborations between pharma companies and tech startups, such as Exscienta, have been able to bring to clinical trials drugs completely designed by AI in less than one year. These approaches demonstrate that lowering the barrier to access these technologies is a crucial and necessary step to enable LMIC countries to develop their research programs and find new drugs for endemic diseases. The pilot project in South Africa has demonstrated how experimental scientists can effectively adopt our tools and incorporate them in their research.
The core technology of the Ersilia Model Hub is the Chemical Checker (CC), a data-driven drug discovery tool optimized for AI/ML applications. In brief, the CC brings together a massive amount of information about drugs and small molecules available from the public domain, and encodes it in an ultra-dense format (a.k.a. CC signature) that can be used out-of-the-box with any AI/ML algorithm.
The CC was built by our Co-Founder Miquel, and currently incorporates bioactivity data for 1M molecules. It was originally published in the high-impact journal Nature Biotechnology (Duran-Frigola et al, 2020) and has been used in several follow-up studies, including in-silico, in-vitro and in-vivo validations. The CC was also repurposed for COVID-19 research during the pandemic outbreak in 2020.
- A new business model or process that relies on technology to be successful
- Artificial Intelligence / Machine Learning
- Big Data
- Software and Mobile Applications
- 3. Good Health and Well-being
- 9. Industry, Innovation, and Infrastructure
- 10. Reduced Inequalities
- South Africa
- Spain
- United Kingdom
- Cameroon
- Kenya
- South Africa
- Spain
- United Kingdom
- Zambia
- Nonprofit
We acknowledge the Founder’s team diversity is limited (two men and one woman, from Spanish and Italian nationalities). One of our main objectives as we grow is to increase the participation of those underrepresented in tech and to include representatives from our beneficiaries, to ensure that our mission stays aligned with their needs. To this end, we have already started expanding our Board of Trustees, the body that ultimately decides the projects we will develop. Currently, our Board is composed of three self-identified men from 3 nationalities (Spain, Italy and India) and one self-identified woman from Namibia. The goal is to have a young, inclusive and mission-driven Board that merges the necessary expertises to fulfill our mission. In addition, we are growing a community of volunteers (currently over 20 participants in our Slack channel) including male and female colleagues from Zambia, Kenya, South Africa, Nigeria, Italy, India, UK, Spain and US.
Finally, we are working to provide opportunities to those underrepresented in STEM. We are currently mentoring one PhD student from Zambia and we are participating in the Outreachy (https://outreachy.org) program to incorporate two interns from minority groups this summer.
As a non-profit initiative working in the Open Science domain, our business model is based on the Open Canvas by the Mozilla Foundation, and it has the following key components:
Problem: Diseases affecting LMIC are less researched and the scientists in these countries have less resources to do it. AI/ML can speed up research at a lower cost, but many scientists cannot access those skills
Solution: provide free, ready-to-use AI/ML tools for researchers in LMIC, with a focus on communicable diseases. Service offer includes training and mentoring into data science
Key Metrics:
Number of AI/ML models available through the Hub
Number of users reached
Number of A/ML model depositors (data scientists leveraging our infrastructure to publish their tools)
Number of institutions and universities using our tools on a daily basis
Number of scientists that attended a workshop/training each year
Number of new projects in LMIC incorporating Ersilia tools
Resources required:
Open Source AI/ML models published in peer reviewed journals.
Experimental partners that produce data for new models, and validate our tools in the field.
Trainers to develop and deliver courses.
Tech providers (AWS, Google Cloud amongst others) to host the platform online.
Grants and philanthropic support
Contributor profiles:
Scientists working in the neglected and infectious disease domains
Research software engineers
Tech providers (pro-bono and non-profit plans): AWS, Google Cloud, FossHost
Funding Institutions: research bodies (Wellcome Trust, NIH, Europe Horizon, Right Fund) and philanthropic funders (Bill and Melinda Gates Foundation, Chan Zuckerberg Foundation, Sloan Foundation and others)
User Profiles:
Research Institutions/Universities in LMIC interested in increasing their data science capabilities
Computational Biologists that leverage our infrastructure to publish their models
Contributor Channels: GitHub is the central repository for our code and models. We are currently working to create a clear contributor procedure.
User Channels: we reach our users through academic communications (peer-reviewed journals, conferences and seminars) and social networks (Twitter, Linkedin, Youtube, Slack…)
Unique Value Proposition: we are a mission-driven organization with a team that has over 10 years of expertise in the field of computational biology and previous experience in developing projects in low-resourced settings.
- Organizations (B2B)
Our revenue model is based on fundraising. We aim to achieve sustained donations from philanthropic funders working in the healthcare domain, such as the Bill and Melinda Gates Foundation, as well as education-oriented organizations (Sloan Foundation). We are also targeting funders in the open source domain (Invest in Open, Chan Zuckerberg Foundation, AWS, among others). In addition, we apply to research grants in collaboration with our partners, to support each project individually. Main funders in the domain include the Wellcome Trust, the NIH and Europe’s Horizon. Finally, we also work with pharmaceutical companies, such as GSK, Novartis and Merck, funders who are interested in supporting the development of some of our tools.
We have not yet explored the opportunities to obtain earned revenue from the use of our platform, but it is a source of income we want to implement in the coming years.
The Ersilia Open Source Initiative was incorporated with the Charity Commission for England and Wales in November 2020. In one year, we have been able to secure over 60.000 GBP in funding. The major sources of funding include the FundOSS crowdfunding campaign (13.000 USD), the Merck Biopharma Speed Grant (30.000 EUR), the Digital Infrastructure Incubator (5.000 USD), the Fast Forward Accelerator (25.000 USD) and the Rosetrees Trust(10.000 GBP). In addition, we are partnering with Dr. Fidele Ntie-Kang from the University of Buea in the context of the Bill and Melinda Gates Foundation Calestous Juma Fellowship for a 5-year project to develop AI/ML models for drug discovery from African natural products, and we are also in the last stages of approval for funding from the Project Africa Gradient GSK-Novartis. We also participate in intern recruitment programs, and have been able to secure 7000 USD in funding for one intern from the Outreachy program.
Finally, we also benefit from pro-bono support from a number of organizations. We have been working with the Cranfield Trust to obtain business and strategic advice, and we benefit from Google for non-profits program and AWS for non-profits program. As part of the AWS support, we are able to share our data publicly at no cost via the Registry of Open Data. In addition, we are working with two volunteers from Atlassian in the Engage4Good project. Employee volunteer programs allow us to access highly skilled volunteers to further our work. We are also part of communities and networks that help us reach more beneficiaries and improve our practices in the open source domain (The Coaching Fellowship 2021, Software Sustainability Fellowship 2022, Open Life Sciences 2022)
The participation in the Fast Forward Accelerator is helping us improve our fundraising efforts and we are sure this will bring in the necessary funds in the coming year to make Ersilia sustainable in the long term.
The Ersilia Open Source Initiative is committed to transparency and openness at all levels. As such, we report all the grants we have applied to, as well as report all our financial transactions via Open Collective.

CEO

CSO and co-founder