Matching official health messages to citizen FAQs on digital platforms
Using Natural Language Processing to automatically match citizens’ questions with official answers from health authorities on text-based digital platforms at national scale in LMICs
Debbie Rogers
Managing Director
Praekelt.org
- Respond (Decrease transmission & spread), such as: Optimal preventive interventions & uptake maximization, Cutting through “infodemic” & enabling better response, Data-driven learnings for increased efficacy of interventions
In both acute health emergencies like COVID-19 and chronic disease management such as TB, HIV, or diabetes, reliable and up-to-date answers to citizens’ health questions can make the difference between life and death. However, the scale and fast-changing nature of many health challenges makes it difficult for authorities to provide official information quickly enough. Citizens now routinely go online for health information, but content can be inaccurate, irrelevant, or worse, misleading. Through social media in particular, misinformation can spread far and fast. This is a problem that affects billions of people, especially during a pandemic when citizens need reliable answers to rapidly evolving questions to make informed decisions about their health.
The vast majority of the world’s 4.7 billion internet users and 4.2 billion social media users access the internet and social media through mobile phones. Over 2 billion people use WhatsApp (Source: Hootsuite). Providing health-related information via WhatsApp, USSD, SMS, and phone-based apps is therefore essential, especially in LMICs, home to three quarters of all mobile internet users (Source: GSMA). However, one-way provision of information is insufficient: to inform decisions and improve health outcomes, mobile platforms need to understand citizens’ questions and respond with relevant, up-to-date answers.
Praekelt operates 20 unique digital health platforms, spanning 17 countries, with over 26 million users in the past year. Users engage with Praekelt platforms to obtain reliable information to inform health decisions. These include: pregnant women unsure whether it’s safe to get a COVID vaccine; young mothers with questions about what to do with a sick child; newly positive HIV patients seeking information about the side effects of new drugs; workers unsure whether a lingering cough means they should get tested for TB; and healthcare workers who have access to clinical and psychosocial to continue developing as professionals.
To understand user needs and refine its services, Praekelt conducts user surveys, tests messaging in focus groups, and monitors user engagement with platform tools. IDinsight and Praekelt are also conducting RCT experiments to rigorously test which messages are most effective in influencing recommended health behaviors among users. This involves user surveys, qualitative interviews, and analysis of app data on user engagement.
To further ensure users obtain answers to their health-related questions, IDinsight and Praekelt will analyze questions users submit, create a dashboard for health officials to understand citizens’ needs, and seek user feedback on content that best matches their questions.
- Scale: A sustainable project or enterprise working in several contexts, communities or countries that is looking to scale significantly, focusing on increased efficiency
- Artificial Intelligence / Machine Learning
- Big Data
- Crowd Sourced Service / Social Networks
- Software and Mobile Applications
The FAQ chatbot will use open source tools and be fully open source itself under the MIT or BSD 3-clause license. The core product will be agnostic to problem context, have supporting documentation on configuring the solution, and can thus be deployed for other similar problems. Customisations and tuning would be done through YAML configuration files. We will also explore publishing any custom embeddings trained, subject to privacy concerns or other restrictions.
The solution has also been modularised into three main components to allow for extensibility:
- The core model: Presents an api that accepts an incoming message and returns matching faq content
- The FAQ front-end: A webapp that allows users to add or edit FAQ content. It also allows each FAQ to be tagged with keywords to assist the matching algorithm
- The testing interface: A webapp that allows users to simulate Whatsapp messages being sent to the core model
Each of these is built as a separate application running inside a docker container that together form a comprehensive solution. Additional components, such as reporting dashboards, can be added to this architecture. We have designed the solution to easily integrate into the existing infrastructure of large scale messaging services.
Our solution targets all citizens with access to a mobile phone and internet who are interested in seeking information to make smart health decisions. We focus on citizens in LMICs who face chronic health issues such as HIV, and who may have limited access to reliable information during acute health crises. For example, based on qualitative interviews with South African COVID-19 Connect users, we know that citizens have questions about the safety and efficacy of COVID-19 vaccines as well as general distrust of the government’s COVID-19 response.
Our solution would allow citizens to obtain accurate and updated responses to their questions on specific health concerns (e.g. efficacy of COVID-19 vaccines). This will address any misinformation or misperceptions citizens have and increase the likelihood that they adopt the recommended health behavior (e.g. getting vaccinated), leading to improved health outcomes.
Further, because Praekelt platforms are built in conjunction with national and international agencies (e.g. South Africa’s Department of Health, WHO), engaging with citizens in a 2-way dialogue (listening and responding to their questions) will increase public trust in these agencies. This can increase future compliance with health protocols, leading to an improvement in longer term health outcomes.
Our current solution focuses on responding to COVID-19 FAQs in English in South Africa. Over the next year, we plan on leveraging Praekelt’s deep relationship with organizations like the WHO and expand the COVID-19 FAQ solution to other countries and languages where Praekelt operates COVID-related messaging services. Our goal for year 1 is to improve and deploy the solution in at least 2 additional languages and reach 100,000 monthly active users.
As we move into year 2, we plan on scaling the NLP solution to Praekelt’s other digital health platforms like TBConnect, MomConnect, HealthWorkerConnect and NurseConnect. At the end of year 3, we will deploy our solution to 10 or more digital health platforms operated by Praekelt in 5 or more countries with at least 5 million monthly users.
Further, as our solution will be open-source, we hope that it will be adopted and improved upon by other organizations operating digital health platforms. This will further improve the performance and reach of our solution, providing better answers to more users, and successfully scaling its impact.
Praekelt.org closely tracks usage of our digital health platforms to measure progress to scaling our solution. For example, in November 2020 there were over 8 million unique HealthConnect users in South Africa, and we expect over 10 million by the end of 2021. In total, users in South Africa have completed over 13 million HealthChecks (symptom tracker surveys) on HealthConnect; our target is 20 million.
Using our Natural Language Processing (NLP) solution will enable Praekelt and IDinsight to automate the provision of rapid, reliable, and tailored responses to questions frequently asked by Praekelt’s platform users. We will measure how many user questions our solution is able to automatically match (with pre-written answers) and respond to each month; the percentage of users surveyed who report being satisfied with answers received; the percentage of users asking more than one question (another satisfaction indicator); and the number of countries and languages where we can offer our solution. Our goal is to scale our solution to 10+ digital health programs in 5+ countries and engage at least 5 million monthly users.
- Bangladesh
- Burundi
- Congo, Dem. Rep.
- Côte d'Ivoire
- Ethiopia
- India
- Jamaica
- Kenya
- Madagascar
- Malawi
- Mozambique
- Pakistan
- Sierra Leone
- South Africa
- Eswatini
- Timor-Leste
- Uganda
- Australia
- Bangladesh
- Brazil
- Burundi
- Canada
- Congo, Dem. Rep.
- Côte d'Ivoire
- Ethiopia
- India
- Jamaica
- Kenya
- Madagascar
- Malawi
- Mozambique
- Pakistan
- Sierra Leone
- South Africa
- Eswatini
- Timor-Leste
- Uganda
- United Arab Emirates
- United Kingdom
- United States
Financial: Financial support for roll-out at scale remains a challenge. Praekelt.org makes the COVID-19 platform available to Governments free-of-charge but the costs of implementation in the long term need to be met by the partner or an external funder. Praekelt has, however, secured a number of in-kind resources from partners such as WhatsApp and Amazon Web Services to assist in making the solution cost-effective for partners to run.
Legal: Many LMICs insist on local hosting for health services. This increases the cost of operation and reduces the ability to scale efficiently, as cloud services are primarily hosted in high-income countries and regions.
Cultural: Adapting these tools to a variety of languages and cultures is critical but resource intensive. We envision creating a core model that works across a number of languages. For languages we are unable to cover ourselves, we hope to provide a framework that others can build off of to customize our tools to their setting.
Policy: Some country policies are not supportive of telemedicine programs. We have, however, seen that many countries have adjusted these policies in recent months in response to the need created by the COVID-19 pandemic.
- Nonprofit
IDinsight (www.idinsight.org)
This proposal is a “shovel ready” opportunity for philanthropists. We have a team set up, a proof of concept model, and a platform that already exists at scale to implement our technology on. We just need the resources to expand our work.
Our funding runs through August 2021 and the timing of the Trinity Challenge would allow us to continue serving the immediate needs of our COVID-19 response work while building tools for better responses to health crises in the future.
Beyond monetary resources, our technical team would benefit from mentorship and tooling from major tech companies (Google, Microsoft, etc). Given the strong technical talent in our organizations, we would be able to take full advantage of technical expertise and resources available, leveraging world class experts as advisers, rather than relying on them to implement solutions for us which is unsustainable.
In particular, we would be keen to collaborate with members of Google’s NLP team (for example, the researchers working on BERT) and the team at Google Translate working on low-resource languages. Finally, given the computational resources required for this work, we would appreciate support from the cloud computing teams at major tech companies.
With support from the Global Innovation Fund and Horace W. Goldsmith Foundation, IDinsight and Praekelt are currently collaborating to build a Monitoring and Evaluation Framework for the COVID-19 Connect platform; run multiple experiments to rigorously test how different behavioural science “nudges” can improve health decisions and health-seeking behaviour among the COVID-19 Connect user population; and use statistical modeling techniques to predict and build an early warning system for COVID-19, which would help South Africa’s National Department of Health allocate medical resources and create policies to help curb the spread of the infection.
We are open to additional partners joining our consortium to provide financial support or lend their expertise in health communications or technical product development. In particular, we would benefit from those working at the cutting edge of NLP at places like Google or Microsoft, particularly large scale language models like BERT and GPT-3.