Ersilia Record Linker
Our solution will bridge health care data silos, including data from routine care and implementation campaigns, that dominate the health care data ecosystem in low and middle income countries (LMIC) hampering evidence-based patient and program management. Access to country-wide, unified and comprehensive patient information is essential to evaluate existing primary health service delivery, treatment guidelines and disease-prevention strategies. In High Income Countries (HIC), the widespread use of Electronic Health Records (EHR) and the presence of patient Unique Identifiers (UID) simplifies the process of data collection, integration and analysis, resulting in good healthcare coverage and data-driven decision-making (Rao and Pilot, 2014). Unfortunately, though, primary healthcare data collection and analysis in LMIC presents several fundamental challenges, including:
Siloing of health care data in isolated health care service provider or program databases that hamper the management of patients and programs across the continuum of care
Lack of shared Unique Identifiers of patients. While most programmatic databases have their own UID for each patient, these are not consistent across diverse databases. National IDs or other identifiers such as social security numbers are not commonly used, further aggravating this problem and resulting in duplicate data entries, cumbersome integration, and loss of patient follow-up
Differing data reporting guidelines. Oftentimes, data collection in LMIC is a result of a specific, time-framed program implementation funded and/or led by a stakeholder from the Global North, who may operate independently to other stakeholders. As a result, data is not contributed to a centralized resource, or integrated through data linkage procedures.
Insufficient data management capacity. Unaffordable software licenses, lack of expertise in modern techniques such as machine-learning (ML), and demanding hardware requirements often result in externalization of the data analysis steps to partners based in the Global North.
These factors have direct implications on the development of effective healthcare policies, including preventative campaigns such as cancer screenings, massive drug administration for malaria elimination, and adherence to antiretroviral HIV treatment, among many others. Therefore, linking records from primary care programs across patients’ continuum of care, including secondary and tertiary care, is essential to deliver more targeted interventions, and is key to improve disease control in resource-constrained settings such as rural areas or impoverished regions. Overall, the lack of centralized data management systems affects billions of people across the Global South, and has implications both for non-communicable and communicable diseases. An example of our work is shown through our ongoing collaboration with the Centre for Infectious Disease Research in Zambia (CIDRZ), which is the only institution with access to the national HIV database - SmartCare (Moomba et al, 2020). SmartCare was initially intended for HIV testing and treatment, containing over half a million entries, laboratory data for over a hundred thousand tests, and program data for pregnancies and deliveries. However, due to a lack of a unique identifier, it is currently not possible to directly link the adult (PMTCT) database to the pediatric records, hampering monitoring and evaluation of HIV screening, testing, and initiation to care of HIV exposed infants. Connecting these otherwise independent resources is necessary to extract crucial, actionable information from the data collected historically, and synchronize it with the data points being collected on field in real time.
References
Moomba K, Williams A, Savory T, Lumpa M, Chilembo P, Tweya H, Harries AD, Herce M. Effects of real-time electronic data entry on HIV programme data quality in Lusaka, Zambia. Public Health Action. 2020 Mar 21;10(1):47-52. doi: 10.5588/pha.19.0068.
Rao M, Pilot E. The missing link--the role of primary care in global health. Glob Health Action. 2014 Feb 13;7:23693. doi: 10.3402/gha.v7.23693.
We will develop and deploy the Ersilia Record Linker, an open-source tool to match patient records from multiple healthcare datasets. This tool will be incorporated into the machine learning toolbox developed and maintained by Ersilia to provide access to researchers worldwide.
The Ersilia Record Linker allows the merging of two or more datasets by identifying which records correspond to the same patient, based on the (partially) available demographic, biometric and clinical data. This approach has shown improvements in healthcare delivery in HIC (Boyd et al, 2015, Padmanabhan et al, 2019), and has sparsely been applied to LMIC as well (Kabudula et al, 2014). Our goal is to provide an easy-to-use, end-to-end pipeline for researchers, clinicians and policymakers to link healthcare data from different facilities and programs. The Ersilia Record Linker will have six unique features:
Easy to use. Pre-processing and comparison of datasets will happen in a fully automated way, including detection of relevant fields in a table, standardization of names and dates, and scoring of record pairs. The tool will offer a graphical user interface and a programmatic API (command-line interface).
Scalable. Matching of large datasets containing millions of records and involving billions of pairwise comparisons will be possible within minutes. Smaller datasets (i.e. thousands of entries) will be compared in nearly real time.
Based on machine-learning (ML). Fuzzy/probabilistic record linkage (i.e. inference of a match between two records when a UID is not available) will be enabled by a set of pre-trained ML tools. ML tasks will include comparison of names robust to misspellings and abbreviations, standardization of dates, and detection of clinical information in free-text fields, among others.
Deployed locally. The tool will work with minimum hardware requirements and will function off-line. This feature will allow deployment of the tool in data centers as well as in the health facilities, where internet connectivity is often a problem. Usage of the tool in-country is key to expedite decision making and ensure data governance in local institutions.
Free and open source. The code will be made available through Ersilia’s repository of ML assets and released under a permissive GPLv3 license.
Transdisciplinary co-creation. We will work with local stakeholders from LMICs, including Ministries of Health, hospitals/health facilities, and community leaders that are currently in need of such tools, to ensure their inputs and requirements are embedded within the platform.
We envisage the Ersilia Record Linker as a complementary tool to existing software for healthcare data collection and visualization in LMIC, since it enhances the analytical power of current tools by combining data from different programs, sources, health facilities and regions. Thanks to the functionalities of the current tool, we expect record linkage to become a routine practice in many scenarios, for instance, to identify patients from a primary care registry in a larger database as part of an epidemiology study, to deduplicate entries in a clinics registry, or to fetch contact details when those are missing, and optimize day-to-day care. We will put special emphasis on the appropriate treatment of patient names (crucial in regions of high ethnographic diversity) and record encoding in cases where anonymization or pseudo-anonymization is necessary. Most of our ML tools work on numerical encodings called “embeddings”, which effectively serve as encrypted versions of the data (see, for instance, our Duran-Frigola et al, 2020, in the context of drug discovery). Additionally, user privacy will be possible thanks to the so-called fully-homomorphic encryption protocols that we’ve recently implemented successfully for a range of ML algorithms. Finally, sharing of results with the broader community will be enabled with a synthetic data generation option that allows to recreate and demonstrate datasets realistically without exposing private patient information.
The Ersilia Record Linker has already been piloted in low-resource settings. Our first record linkage was performed in Lusaka (Zambia) on the biggest cervical cancer database in the region - the Cervical Cancer Prevention Program in Zambia (CCPPZ). As a result of our data analysis, we evaluated the outcomes of the current patient referral guidelines for HIV-positive and HIV-negative women and proposed to strengthen the surveillance on the onset of patients with higher probability of cervical cancer (Pry et al, The Lancet Global Health, 2021). Linkage procedures were applied to evaluate and quantify follow-up visits within CCPPZ. This study led to the development of an early tool named EasyLinkage that is currently used at CIDRZ, followed by a more advanced, end-to-end command-line interface that uses ML at its core and enables synthetic data generation. Furthermore, Ersilia led the ML arm of a record linkage course (over 50% of the course content) aimed at building local capacity among data managers in low-resource settings offered in collaboration with the Swiss Tropical and Public Health Institute (Swiss TPH).
With the current project, we aim to gather our accumulated expertise on ML-based record linkage into a simple, powerful and innovative tool to be installed in research centers, hospitals and health facilities in LMIC. As a first implementation of the tool, we expect to leverage our ongoing collaborations in Zambia.
References
Boyd, J.H., Randall, S.M., Ferrante, A.M. et al. Accuracy and completeness of patient pathways – the benefits of national data linkage in Australia. BMC Health Serv Res 15, 312 (2015). https://doi-org.ezproxyberklee.flo.org/10.1186/s12913-015-0981-2
Duran-Frigola M, Pauls E, Guitart-Pla O, Bertoni M, Alcalde V, Amat D, Juan-Blanco T, Aloy P. Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker. Nat Biotechnol. 2020 Sep;38(9):1087-1096. doi: 10.1038/s41587-020-0502-7
Kabudula, C.W., Joubert, J.D., Tuoane-Nkhasi, M. et al. Evaluation of record linkage of mortality data between a health and demographic surveillance system and national civil registration system in South Africa. Popul Health Metrics 12, 23 (2014). https://doi-org.ezproxyberklee.flo.org/10.1186/s12963-014-0023-z
Padmanabhan S, Carty L, Cameron E, Ghosh RE, Williams R, Strongman H. Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications. Eur J Epidemiol. 2019 Jan;34(1):91-99. doi: 10.1007/s10654-018-0442-4.
Pry JM, Manasyan A, Kapambwe S, Taghavi K, Duran-Frigola M, Mwanahamuntu M, Sikazwe I, Matambo J, Mubita J, Lishimpi K, Malama K, Bolton Moore C. Cervical cancer screening outcomes in Zambia, 2010-19: a cohort study. Lancet Glob Health. 2021 Jun;9(6):e832-e840. doi: 10.1016/S2214-109X(21)00062-0
Our solution will have a direct impact on people living in LMIC, particularly those living in hard-to-reach, isolated communities where access to primary healthcare is limited. Our goal is to facilitate data linkage, analysis and interpretation at a national level, giving researchers and policymakers the tools required to improve existing health service delivery programs and plan more effective and targeted prevention campaigns. Improved patient identification and linkage to care in under-resourced and under-staffed facilities will lead to healthier communities thus improving the wellbeing and social-economic status of the communities.
As a specific first use case of the Ersilia Record Linker, we will deploy the tool in the context of cervical cancer screening and HIV care in Sub-Saharan Africa. In 2020, almost 350,000 women died from cervical cancer, with almost 90% of deaths occurring in LMIC. Cervical cancer is highly preventable, and one of the most treatable forms of cancer if diagnosed early. The WHO Global Strategy to Accelerate Elimination of Cervical Cancer established global targets to be reached by 2030, including 90% of girls being vaccinated against HPV, 70% of women screened for cervical pre-cancer and 90% of women with cervical pre-cancer or cancer treated. Girls and women living with HIV are at high risk of developing cervical cancer. Yet, many HIV treatment programs do not have access to HPV vaccination and cervical cancer screening program data, and can’t therefore neither monitor nor manage whether girls and women receive the care they need. A team at the Swiss TPH under the lead of Prof. Julia Bohlius has conducted a survey across 30 HIV clinics in Sub-Saharan Africa. These study findings showed that most facilities were not able to identify the number of girls living with HIV that were vaccinated against HPV, or the number of women screened, treated, and followed for further monitoring. This survey has highlighted the urgent need for record linkages across program databases to understand how many women living with HIV have received appropriate care for cervical cancer prevention. Moreover, the Swiss TPH team has conducted several international stakeholder meetings with patient, community, health care provider and government representatives from Sub-Saharan Africa to identify data reporting and linkage needs. Using a Delphi consensus process, relevant stakeholders from Sub-Saharan Africa have identified and agreed on the set of indicators and its variables for cervical cancer prevention and care performance metrics. These indicators can be used to improve the management and up-scaling of cervical cancer prevention and care services to girls and women living with HIV.
We have previously worked with Prof. Bohlius on related record linkage methodologies, including technical mentorship of a PhD student co-supervised with CIDRZ. At CIDRZ, our lead collaborator is Dr. Albert Manasyan, MD, Head of the RMNCH department. Given our long-standing partnership with the Swiss TPH and CIDRZ, and the role CIDRZ plays in data management of HIV and cervical cancer programs, the Ersilia Record Linker will have a relevant early adopter, which shall greatly contribute to the refinement of the tool and deployment in a real-world setting.
The Ersilia Open Source Initiative is a non-profit organization with the mission to equip laboratories, clinics and universities in LMIC with ML tools for research in global health. Ersilia partners with institutions across the Global South (such as the H3D Center in Cape Town, the University of Buea in Cameroon or CIDRZ in Zambia) and combines remote working with on-site project development and implementation. Ersilia’s approach is focused on three pillars:
Open Source. Our tools are released under permissive open source licenses, for the benefit of the broader research community.
Strengthening in-country research. We join existing and ongoing research projects where scientific leadership resides within our LMIC partners.
Sustainable collaborations. Our projects include capacity building activities to ensure our assets can be maintained and further developed by local researchers.
All team members on this MIT Solve challenge have experience working with our beneficiary population. Ersilia’s co-founder and CEO, Dr. Gemma Turon, leads the communication and project implementation on-site with our partner organizations, with experience in South Africa, Cameroon and Zambia. Prior to joining Ersilia, Dr. Turon worked with social initiatives for youth development in Zambia and Palestine. Ersilia’s co-founder and Lead Scientist, Dr. Miquel Duran-Frigola, is a computational pharmacologist who spent 10+ years in academia developing ML algorithms for drug discovery. He has 30+ publications in this field and 1,700+ citations. He has an interest in the transformative power of computer science in resource-limited settings, and has combined his academic career with research stays in El Salvador, Mozambique, Zambia and South Africa. He has first-hand experience in the development of research projects in collaboration with local scientists, with success examples such as the clinical data management of women living with HIV and cervical cancer (Pry et al, 2021) and malaria immunology (Moncunill et al, 2020).
As a non-profit organization operating at the interface between academia and on-field implementation partners, Ersilia is uniquely positioned to develop a tool that is adapted to the needs of the local communities. Factors such as minimal infrastructure requirements (including periodic power cuts), ease of use, local language considerations, and privacy preservation across disparate datasets are considered from the inception of the Ersilia Record Linker tool. We have a track record of successfully processing dozens of registry and e-health datasets from LMIC, including (beyond linkage) sustainable automated reporting of clinical laboratory data and development of dashboards for international consortia.
References
Moncunill G, Carnes J, Chad Young W, Carpp L, De Rosa S, Campo JJ, Nhabomba A, Mpina M, Jairoce C, Finak G, Haas P, Muriel C, Van P, Sanz H, Dutta S, Mordmüller B, Agnandji ST, Díez-Padrisa N, Williams NA, Aponte JJ, Valim C, Neafsey DE, Daubenberger C, McElrath MJ, Dobaño C, Stuart K, Gottardo R. Transcriptional correlates of malaria in RTS,S/AS01-vaccinated African children: a matched case-control study. Elife. 2022 Jan 21;11:e70393. doi: 10.7554/eLife.70393.
Pry JM, Manasyan A, Kapambwe S, Taghavi K, Duran-Frigola M, Mwanahamuntu M, Sikazwe I, Matambo J, Mubita J, Lishimpi K, Malama K, Bolton Moore C. Cervical cancer screening outcomes in Zambia, 2010-19: a cohort study. Lancet Glob Health. 2021 Jun;9(6):e832-e840. doi: 10.1016/S2214-109X(21)00062-0
- Employ unconventional or proxy data sources to inform primary health care performance improvement
- Provide improved measurement methods that are low cost, fit-for-purpose, shareable across information systems, and streamlined for data collectors
- Leverage existing systems, networks, and workflows to streamline the collection and interpretation of data to support meaningful use of primary health care data
- Provide actionable, accountable, and accessible insights for health care providers, administrators, and/or funders that can be used to optimize the performance of primary health care
- Prototype
We are applying to this MIT Solve Challenge to materialize the implementation of a set of ML-based record linkage tools that we’ve been developing in collaboration with our partners. We want to achieve a user-friendly platform that can be easily accessible by any researcher in the Global South.
Being part of the MIT Solve program would be crucial to translate research done in the context of academic collaborations to a software that can effectively improve primary health care in LMIC. We are seeking, on one hand, financial support to continue the development of the Ersilia Record Linker, with an emphasis on automatizing its use for non expert researchers. The goal will be to incorporate this service to the existing catalog of tools we provide to our beneficiaries (including automated ML modeling for drug discovery and privacy-preserving frameworks for chemistry data). On the other hand, we are looking for technical expertise on widely used data collection and visualization tools (such as DHIS2) to ensure seamless integration with our Ersilia Record Linker platform, as well as legal support in the matters of data privacy and market expansion to move beyond implementation in Zambia and serve other LMIC.
There is currently no standard free solution for record linkage of health data. A few initiatives, such as CHeReL or G-Link offer record linkage as a fee for service to researchers, offering probabilistic functionalities and graphical user interfaces. ML-based record linkage tools are still rare or require strong data science skills. The Ersilia Record Linker will be the first open source tool specifically designed to incorporate ML-based record linkage at its core, taking into account the challenges for data management in LMIC settings, and focusing on creating an easy to use pipeline that can be run by researchers in the field who might not be record linkage experts. This will increase the capacity of local researchers and policymakers to monitor the primary healthcare progress in their regions, identify the most successful interventions and modify the existing guidelines when required. In addition, as part of our mission, we are committed to offering intensive support on the implementation and usage of our tool on-site.
In the first year, our aim is to establish a pilot using HIV and cervical cancer data as a case-study of the potential of record linkage to improve cervical cancer care cascade. This will be done in collaboration with Prof. Julia Bohlius at the Swiss TPH, and will result in guidelines and policies for cervical cancer screening across Sub-Saharan Africa.
Beyond the pilot study, in the next five years we aim to:
Partner with, at least, 6 other organizations in LMIC working in healthcare delivery.
Monitor and improve the clinical outcome in, at least, 3 disease areas beyond cervical cancer and HIV.
Have 100 active users and contributors to our Ersilia Record Linker tool.
Deliver training on ML-based record linkage to 200 researchers in LMIC.
Number of case-studies that the Ersilia Record Linker has enabled.
Number of database records successfully linked by our collaborators.
Number of disease areas where our platform has provided novel insights for case control, management and prevention.
Number of scientific publications enabled by the Ersilia Record Linker, and number of citations.
Number of users of our platform.
Number of researchers trained on ML-based record linkage.
The overarching goal of this project is to improve primary healthcare in LMIC by helping researchers to gain insights from comprehensive patient databases that include information collected in multiple clinics, for different disease treatment and prevention programs. To this end, we have developed a ML-based record linkage tool that merges datasets identifying which data corresponds to the same individual patient. During this project, we will focus on:
Devising an easy-to-use platform that removes the requirement of technical expertise.
Adapting the tool to the specific challenges of dealing with health data collected in LMIC.
Using our tool for a case study on HIV and cervical cancer in Sub-Saharan Africa.
Offering capacity building activities on ML-based record linkage.
The immediate outputs of these activities will be:
Recommendations on the improvement of cervical cancer screening guidelines.
Training of LMIC researchers on the use of ML for record linkage.
The most important long term outcome of this project will be the creation of unified patient healthcare databases in LMIC countries. These unified databases will start by aggregating historical data collected from siloed programs in a specific region, and then they will be used to analyze the effectiveness of treatments, preventative interventions and the correlation between disease areas. The results of these integrative analyses will help inform governmental and non-governmental primary healthcare providers, and focus the resources in those areas and target populations that require it most. Finally, the expansion of record linkage-based analysis will serve to advocate for the improvement of data collection and management in LMIC.
The Ersilia Record Linker is a record linkage platform powered by pre-trained ML models. From the perspective of the user, a simple graphical user interface (desktop or browser app) will expect two or more input files to be linked. The Ersilia Record Linker will automatically detect relevant columns in the table (aka schema matching) and pre-process data accordingly. Thus, minimal intervention on the raw input files will be expected from the user: dates will be standardized, misspellings will be corrected and spurious entries will be detected and amended. As an output, the user will get:
A file of matched entries, along with a confidence/linkage score.
A linkage report in the form of a graphical canvas, where statistics on matched records, duplications, missing data, etc, will be presented.
From a backend perspective, the tool will execute an end-to-end pipeline consisting of five key steps:
Schema matching. Columns of standard types (e.g. names, locations, dates, phone numbers, clinical test outcomes, etc.) will be automatically detected. Column assignment will be performed with pre-trained ML text classifiers.
Data pre-processing. Data will be pre-processed and cleaned according to the identified data type. Misspelling correction ML models will be applied here, as well as name/surname ordering and identification of abbreviations. Dates will be converted to a common format and relative time points such as age will be mapped to absolute dates.
Blocking. To enable scalable linkage, we perform a blocking procedure to filter out easy non-matching pairs. This procedure greatly reduces the number of fine-grained comparisons to be made. We have already implemented a highly efficient blocking procedure based on simple similarity metrics between vector-based (embedding) representations of the data.
Ensemble-based comparisons. Selected pairs for accurate comparison will be scored on the basis of an ensemble of ML models specifically pre-trained to discriminate matching and non-matching pairs based on available column types. Pre-training these ML models has been a major development task, involving generation of thousands of synthetic datasets over a range of scenarios. The algorithm will automatically select the subset of pre-trained ML models that are more relevant to the user input, and use them to provide a “consensus” linkage score for each pair.
Output production and reporting. Linked pairs will be delivered to the user in a simple tabular format, along with a set of plots and statistics about the linkage procedure.
This pipeline runs on a conventional laptop computer and scales well to datasets of any size. A typical run, involving datasets of over 10,000 entries, is completed in seconds. Importantly, as part of the development cycle of the Ersilia Record Linker, we will incorporate a user feedback module that will allow us to improve the tool and the underlying ML models based on the suggestions of our collaborators and beneficiaries.
- A new application of an existing technology
- Artificial Intelligence / Machine Learning
- 3. Good Health and Well-being
- 10. Reduced Inequalities
- Cameroon
- South Africa
- Spain
- United Kingdom
- Zambia
- Cameroon
- Colombia
- South Africa
- Spain
- United Kingdom
- Zambia
Typically, data is collected at the clinics and health facilities as part of funded programmatic projects (e.g. CCPPZ for cervical cancer) or routine data collection, often led by the Ministry of Health in collaboration with implementation partners (e.g. SmartCare for ARV treatment at CIDRZ). CIDRZ and the Swiss TPH are well-established, reputable organizations in Global Health, and their data collection campaigns are incentivized by the will to improve healthcare in LMIC. They rely on competitive funding and have a track record of published studies in clinical research. The Ersilia Open Source Initiative does not perform data collection
- Nonprofit
The Ersilia Open Source Initiative is committed to becoming a diverse organization, in order to better represent and serve the needs of our beneficiaries worldwide. You can find our DEI statement and code of conduct here.
As an incorporated Charity, the Board of Trustees is the governing body of the organization. Currently, our Board is composed of three self-identified men from 3 nationalities (Spain, Italy and India) and one self-identified woman from Namibia. The goal is to have a young, inclusive and mission-driven Board that merges the necessary expertises to fulfill our mission. Ersilia’s co-founder and CEO is a woman of Spanish nationality.
Our hiring and recruitment process follows our DEI Statements, and most of the projects we develop are carried out in collaboration with employees from our partner institutions in LMIC. Working together with our collaborators is central to our mission as an organization to democratize access to data science expertise and training, and whenever possible we expand on those opportunities by offering internships and PhD supervision to students underrepresented in STEM. Ersilia is currently a host organization of the internship program Outreachy. Finally, we are growing a community of volunteers (currently over 35 participants in our Slack community channel) including male and female colleagues from Zambia, South Africa, Nigeria, Cameroon, Italy, India, Pakistan, UK, Spain and the US.
As a non-profit initiative working in the Open Science domain, our business model is based on the Open Canvas by the Mozilla Foundation, and it has the following key components:
Problem: scientific research productivity in LMIC is hampered by insufficient funding, infrastructure and training opportunities, resulting in a lack of solutions to the healthcare problems that affect those countries. For example, six of the top ten causes of death in LIC are still due to infections, but only 15% of the drugs in development are targeting communicable diseases (WHO, 2022)
Solution: develop free, open source machine learning and data science tools focused on the needs of LMIC researchers, strengthening their research capacity and scientific leadership.
Key Metrics:
Number of institutions and universities in LMIC using Ersilia’s toolbox.
Number of scientists that attended our workshops/seminars.
Number of partnerships with organizations in LMIC.
Number of end users reached.
Number of free, open source tools offered to our beneficiaries.
Number of scientific publications enabled / contributed by Ersilia.
Number of field implementations of our tools (for example, guidelines resulting from Ersilia’s Record Linker).
Resources required:
Open Source AI/ML models published in peer reviewed journals.
Access to health datasets to improve our ML algorithm training.
Trainers to develop and deliver courses.
Tech providers (AWS, Google Cloud amongst others) to host online services.
Grants and philanthropic support.
Contributor profiles:
Scientists working in the neglected and infectious disease domains.
Research software engineers.
Tech providers (pro-bono and non-profit plans): AWS, Google Cloud, FossHost.
Funding Institutions: research bodies (Wellcome Trust, NIH, Europe Horizon, Right Fund) and philanthropic funders (Bill and Melinda Gates Foundation, Chan Zuckerberg Foundation, Sloan Foundation and others).
User Profiles:
Research Institutions/Universities in LMIC interested in increasing their data science capabilities.
Governmental Healthcare Policy Makers.
Foundations and other non-profit initiatives implementing health programs in LMIC.
Contributor Channels: GitHub is the central repository for our code and models.
User Channels: we reach our users through academic communications (peer-reviewed journals, conferences and seminars) and social networks (Twitter, Linkedin, Youtube, Slack…)
Unique Value Proposition: we are a mission-driven organization with a team that has over 10 years of expertise in the field of computational biology and previous experience in developing projects in low-resourced settings.
- Organizations (B2B)
Our revenue model is based on fundraising. We aim to achieve sustained donations from philanthropic funders working in the healthcare domain, such as the Bill and Melinda Gates Foundation, as well as education-oriented organizations (e.g. the Sloan Foundation). We are also targeting funders in the open source domain (Invest in Open, Chan Zuckerberg Initiative, AWS for Non-Profits, Google For Non-Profits, among others). In addition, we apply to research grants in collaboration with our partners, to support each project individually. Main potential funders include the Wellcome Trust, the National Institutes of Health (NIH) and Europe’s Horizon. Finally, we also work with pharmaceutical companies, such as GSK, Novartis and Merck, funders who are interested in supporting the development of some of our tools. We have not yet explored the opportunities to obtain earned revenue from the use of our platform, but it is a source of income we want to implement in the coming years.
The Ersilia Open Source Initiative was incorporated with the Charity Commission for England and Wales in November 2020. In our first year of operation, we raised over 40.000 GBP in funding, and in our second year we will triple our funding, reaching around 150.000 GBP. All supporters are listed in our website.
Ersilia has participated in two US-based non-profit accelerator programs; the Digital Infrastructure Incubator by Code for Science and Society, and the Fast Forward Accelerator. Both programs provided seed funding and have been instrumental in securing further unrestricted funds through corporate sponsorship and donations. In May 2022, Ersilia received a 25.000 USD best non-profit award from BlackRock, and we have received a grant from the Fore, a UK non-profit supporting early stage social initiatives.
Our research projects are funded by a Merck Biopharma Speed (November 2021- July 2022), a GRADIENT grant by GSK-Novartis in collaboration with the H3D Center (August 2022 - August 2024) and a Calestous Juma Fellowship subaward from the Bill and Melinda Gates Foundation for a 5-year project with the University of Buea, Cameroon.
We also apply for specific funding for our capacity building activities. For example, we are organizing a 4-day hands-on workshop on artificial intelligence for drug discovery together with the H3D Foundation (South Africa) as part of the Event Fund by Code for Science and Society.
Finally, we also benefit from pro-bono support from a number of organizations. We have been working with the Cranfield Trust for more than a year to obtain business and strategic advice, and we benefit from Google for Non-Profits program and AWS for Non-Profits program. As part of the AWS support, we are able to share our data publicly at no cost via the Registry of Open Data. We are also partners of A4ID, which connects us with pro-bono legal services. In addition, we are working with volunteers from the Atlassian Foundation, and plan to expand our network via employee volunteer programs from corporate tech organizations that grant us access to highly skilled volunteers. Finally, we are part of communities and networks that help us reach more beneficiaries and improve our practices in the open source domain (e.g. The Coaching Fellowship 2021, Software Sustainability Fellowship 2022, Open Life Sciences 2022).

CEO