MIT Solve

Solution overview

Our Solution

Saajha Manch Mobile Vaani

Tagline

AI for citizenship engagement of low-wage migrants in India

Pitch us on your solution

The entitlements of India's low-wage migrant workers are under-realised because of low awareness, convoluted and un-transparent procedures and regional barriers. India’s digital inclusion and universal ID have thus far not addressed these issues, with grave consequences for democratic participation and poor working and living standards of these people.

Gram Vaani has established the use of voice-based IVR technology to build engagement of these populations in entitlements and governance. We now propose to widen and deepen this engagement, through shifting from manual transcription, tagging and curation, to automation of two key functions:-(i) curating and publishing responses to queries and grievances, and (ii) feedback surveys of listeners on policies and entitlements.

The application of technologies such as information retrieval algorithms, speech-to-text and natural language processing to this low-wage, mobile and Hindi-speaking population of 100 million, holds out the promise of greater civic engagement and rights realisation for a growing yet critically neglected and disenfranchised section of Indians.

Film your elevator pitch

What is the problem you are solving?

Many legal provisions covering India’s 400m-strong low wage workforce are not used by their intended beneficiaries because people are unaware of them or because, in spite of awareness, they are unable to get through the procedure required to access them. Further, in spite of guarantees in its constitution, an estimated 100m of India’s poorest citizens are prevented from accessing and exercising their rights because they are economic migrants spending most of the year away from their home state. India’s digital inclusion and universal ID have thus far not addressed this, with grave consequences for democratic participation and poor working and living standards of this growing migrant population.

Since 2010, Gram Vaani has been using IVR technology to overcome literacy, digital and geographical barriers, to reach close to 100 thousand unique monthly users with useful awareness material and build their engagement in their rights and entitlements, through peer contributions, Q&A dialogues, feedback surveys, online discussions and interaction around dramas and studio content. The 1000+ hours of audio content so far generated has been manually transcribed, tagged and curated. Other than being expensive and vulnerable to human error, the labour-intensiveness of these process has delayed the feedback and results loop and impeded scale.

Who are you serving?

The 10,000 or so regular users of our IVR platform for low-wage migrant workers (called Saajha Manch or ‘everyone’s platform’) comprise the initial base for the project. Since its inception in late 2017, Saajha Manch (accessed by calling a unique phone number and taking the call-back) has posted and provided curated response to, over 500 queries and surveyed 700 users, on their rights and entitlements.

This Hindi-speaking population is representative of the estimated 100 million circular or seasonal migrants who make up a growing proportion of India’s workforce. Originating largely from the country’s most backward districts in north, east and central India, these people face diminishing returns to farming and have no option but to survive through a combine of small-scale and uncertain cultivation and migration to industrial hubs for work. Though protected by schemes and laws both at home and at job destination (such as minimum wages, employer-linked health schemes, occupational safety, food rations and pensions), these low-wage workers face widespread violations and enforcement failure, exacerbated by low awareness, poor government accountability and their migrant status.

What is your solution?

Gram Vaani will widen and deepen its engagement with this population, through automation of the two key functions of (i) curating and publishing responses to queries and grievances, and (ii) feedback surveys of listeners on policies, rights and entitlements. Each of these are detailed:-

Query & grievance responses:

We will upgrade our manual-based processes to build a voice-based intelligent service through which users access vital and actionable responses to queries and grievances. Similar to a voice-based Quora, access is by IVR and Android app, wherein users engage with a virtual assistant using speech commands to return relevant information. The backend is a knowledge base, initially seeded with the close to 1000 archival Q&A pairs available in-house, on topics spanning rights at work, residence-linked entitlements, education and civic services, and eventually, agricultural advisory.

We will promote the service among users and enlist local domain experts to provide specific and actionable query responses, thus building an incremental ground truth dataset. We will deploy user-centric design and deep learning to ensure a chatbot experience which is easy-access and conversational. Being text-free, accessible over basic phones and nationwide, the service will be equipped to tackle literacy, technology and geographical barriers.

Feedback surveys:

Gram Vaani undertakes press-key surveys which require participants to select a single option from a list of maximum 4 or 5 responses. While a strong survey response is maintained by the pre-existing relation with the listener base, the quality of response is restricted by technology. Using speech-to-text tools and natural language processing, we will develop a refined survey application to enable more complex question paths and improve the quality and accuracy of response. Establishing inter-conversion between the text-based backend and the voice-based front-end, we will use natural language processing to spot named entities, dates, numbers, etc to discover the user intent more precisely, and build an appropriate dialogue-flow for the virtual assistant. This work will greatly extend the scope and scale to collect, and rapidly analyse, unstructured, qualitative feedback on the implementation of policies, access to entitlements and workings of democratic process.

This work draws on over 1000 hours of recordings of rural Hindi audio spoken from across four different regions and we are already collaborating with Mozilla to build speech recognition based on 150 hours of data. This dataset increases at the rate of 65 hours of new audio each month across all our Hindi-medium platforms.

Select only the most relevant.

Make government and other institutions more accountable, transparent, and responsive to citizen feedback
Ensure all citizens can overcome barriers to civic participation and inclusion

Where is your solution team headquartered?

Delhi, India

Our solution's stage of development:

Prototype

More about your solution

Select one of the below:

New application of an existing technology

Describe what makes your solution innovative.

While resourceful users are able to go online to seek different information they require, our target users are mostly not comfortable in navigating complex text interfaces to seek information. Our primary innovation is to use voice interfaces to cater to this user segment. One of the several applications of our platform is for our captive user of migrant labourers to seek information on labour rights and other vital topics. Currently we manually curate the user queries and respond with answers from our panel of experts. This has led to building a repository of 1000+ question answer pairs. This brings us to the second innovation. We are now using IR and NLP tools to build metadata for the questions and answers. This will enable us to use ranking algorithms to return appropriate responses to users when a new query arrives. Using similar techniques we want to build a voicebot interface to help our users to navigate information using voice conversations and also use it as a tool for interactive data collection.

While speech interfaces exist for western languages like English, such interfaces are either missing or do not perform well for resource-poor languages like Hindi. We have been running voice interfaces for over 6 years and are currently building speech to text models customised for the Hindi dialect popular among our users.

Describe the core technology that your solution utilizes.

The core of the technology we are currently building uses 3 broad technology areas:

Speech to text: one of the primary components in our new pipeline is converting speech input into text because there are several existing NLP/IR algorithms in the text domain which we want to apply to this input. While there are commercial off-the-shelf speech to text converters, they do not work very well with vernacular Hindi that our users mostly speak. We are building a Deep Neural Network based Speech to Text model customised for vernacular Hindi to use for our applications. This model will be trained on the speech database we have accumulated on our platform over the years. We also intend to make this technology open source if we are funded to build this model.
NLP: we are using Natural Language Processing (NLP) techniques to understand the intent and entities in users' conversations. Using these techniques we want to build a conversational question-answering system (voicebot) which can be used to answer users' queries and also in data collection.
IR: we are using Information Retrieval (IR) techniques to cluster a repository of pre-existing question-answer pairs and build metadata. Using the clusters and metadata we want to design ranking algorithms similar to search engines which will return appropriate responses from the repository to new queries from users.

Please select the technologies currently used in your solution:

Artificial Intelligence
Machine Learning
Social Networks

Why do you expect your solution to address the problem?

We are operating different voice-based community media instances for over 7 years across more than 6 states in India. Our platforms are accessible over basic phone calls, and recently also through an Android app. The users of our platform, Sajha Manch, are migrant labourers who are not comfortable in navigating complex text-based websites and prefer to interact with our platform, over voice.

A recent survey we conducted demonstrated evidence that among our users who are recently moving to using smartphones, a large proportion prefer to use voice search to navigate their phones. In another experiment with a similar population we discovered that when presented with the same information over text and audio, people were able to retain more information from the audio format.

Our experience with voice and the above experiments confirm that voice is the most preferred medium for our users. Augmenting our existing operations with ML/AI algorithms will help us scale and serve more users in a more efficient mannner.

Select the key characteristics of the population your solution serves.

Rural Residents
Peri-Urban Residents
Urban Residents
Very Poor/Poor
Low-Income
Refugees/Internally Displaced Persons

In which countries do you currently operate?

India

In which countries will you be operating within the next year?

India

How many people are you currently serving with your solution? How many will you be serving in one year? How about in five years?

Our current user base for Sajha Manch is around 10,000 active users from low-wage migrant workers around the national capital Delhi. in a years time we aim to grow to 100,000 users and in five years to 1,000,000 users. In the last year, our Q&A service has doubled and we are now providing curated responses to around 100 questions per month. The positive feedback from this service (in the form of audio contributions registered by users and published on the IVR) has grown as fast. What's more, users of the service are increasingly spread across the country yet share the language and working conditions, which draws them to this service. We are thus well positioned to grow rapidly across India's key industrial areas through the twin interfaces of App & IVR, and believe that we are now held back mostly by manual process and technology readiness to some extent. We will also be readily leveraging our public media platforms (Mobile Vaani -federated network of community media platforms) that have a reach of about 100K users per month to promote and publicize the service. The proposed technology enhancements we believe will help expedite the plans to expand our reach and levels of engagement as well.

What are your goals within the next year and within the next five years?

Within the next year we want to have a robust speech to text engine for vernacular Hindi which gives at least 85-90% accuracy. We also want to put in place an indexing and retrieval system from a question-answer repository. We also want to finalise our voicebot framework and build entity and intent recognition modules for basic conversations in our use case. With these tools in place, we want to scale our operations and make the question-answering the biggest USP among our audience, which in turn will draw more users.
Within the next 5 years we want to serve 1,000,000 users, scaling through our USP of automated question-answering on labour queries. We want to become the defacto reference point for all queries related to labour laws. We want to make our speech to text model open for use by other social organisations.

What are the barriers that currently exist for you to accomplish your goals for the next year and for the next five years?

Our system depends on the efficiency of the speech to text models we use. The off-the-shelf solutions are not customised for vernacular Hindi. Training our own model requires very high computational resources, which we do not have at our disposal. We are collaborating with the Mozilla Foundation to build the model.

How are you planning to overcome these barriers?

We are taking the help of the Mozilla Foundation, who have the resources to provision the requisite computational resources to train a customised model based on our audio database.

If your solution has a website, provide a link here:

https://gramvaani.org/?p=3191

If your solution proposes an app that is available, provide a link to your app online or in an app store here:

https://play.google.com/store/apps/details?id=org.gramvaani.mobilevaani&hl=en

About your team

Select an option below:

For-Profit

How many people work on your solution team?

The Sajha Manch team comprise 5 full time staff, 1 contractual staff and around 25 field volunteers who are paid a monthly stipend.

The Gram Vaani team comprises close to 80 full time employees and 50 volunteers spread across 10 locations in India.

For how many years have you been working on your solution?

3 years

Why are you and your team best-placed to deliver this solution?

Our organisation comprises an eclectic mix of computer engineers, content experts, activists, entrepreneurs, academicians and researchers. We specialise in communications in low-resource, low-literacy and unreachable contexts for over 10 years with different partners, in different use-cases, in different geographies. All of us are driven by a passion to use technology to make a positive impact in the lives of the marginalised people in the society. Our organisation and research group has conducted extensive research in collecting policy feedback from the marginalised population and on disseminating vital information to this population. Our organisation has also worked on voice-based Q+A programmes for sexual and reproductive health and labour rights. We have also built tools to analyse bias in media reporting on different important issues using techniques for entity extraction, sentiment analysis, summarisation and topic modelling.

With what organizations are you currently partnering, if any? How are you working with them?

We have several funding and implementation partner organisations working with us:

The C&A Foundation: Labour Rights
The Gates Foundation and Jeevika: Maternal and child health and nutrition
BIRAC and the University of Montreal: vaccination
CREA: sexual and reproductive health
SIDBI: Financial Inclusion

Your business model & funding

What is your business model?

Our primary flagship platform is Mobile Vaani, which is a voice based community media platform accessible over basic phone calls. This platform is built and maintained by an in-house team of engineers. Over this platform we run district level community media clubs in rural India. These clubs are primarily driven by community journalists who are very passionate about news reporting and making social impact through the media platform.

Over the same technology platform we also run campaigns for our funding and implementation partners. We run campaigns on maternal and child health and nutrition, labour rights, financial inclusion, sexual and reproductive rights, agriculture, education, women's rights, social entitlements, etc. We have an in-house creative content team which produces audio programmes on the different themes to run on the platforms. We also have an extensive field team present in our catchment geographies to conduct field mobilisation for our partners. We also have researchers who design our programmes and also evaluate the impact.

We plough in most of our profit into technology innovation like speech to text and in applying AI/ML to our platforms.

What is your path to financial sustainability?

There are several revenue models that we operate in:

Through philantrophic grants from foundations wanting to make an impact in society: The funders recognise our expertise in using technology to bring societal change and fund our ideas.
Through service contracts: several social organisations want to scale their footprint through the use of technology. Given our expertise in using the lowest common denominator voice technology to reach the last miles of society and our extensive presence in rural India, several organisations reach out to us to implement the technology arm of their outreach.
Advertisement: We run several independent media clubs in rural districts of India. These operations are partly supported by advertisement revenue generated from local businesses and people.
We are also experimenting with subscription models where individual users contribute monthly subscriptions for our services.

Partnership potential

Why are you applying to Solve?

While we have good expertise in implementing and deploying voice solutions for low-resource users and geographies, we want to use AI/ML techniques to automate and scale our operations through this grant, along with building new voice applications for our users. We require mentorship in building the AI/ML techniques. Along with that, we need resources to transcribe and annotate the few thousands of hours of vernacular Hindi audio we have on our platforms. We also need access to computational resources required to build Deep Neural Networks based models.

What types of connections and partnerships would be most catalytic for your solution?

Business model
Technology
Funding and revenue model
Talent or board members
Monitoring and evaluation
Media and speaking opportunities

With what organizations would you like to partner, and how would you like to partner with them?

Google: help with Speech to Text, IR, NLP, AI, ML algorithms and computational resources
Organisations working on language models to help us create a model for Hindi
Organisations working on Indian and international labour laws to help expand our knowledgebase of questions and answers
Organisations working on interface design to help us improve our voice interfaces and Android app

If you would like to apply for the AI Innovations Prize, describe how you and your team will utilize the prize to advance your solution. If you are not already using AI in your solution, explain why it is necessary for your solution to be successful and how you plan to incorporate it.

We are building AI innovations into our Hindi voice-platform for migrant labourers:

We are building a speech to text module for vernacular Hindi based on the few thousands of hours of audio we have on our platform. We need resources to transcribe and annotate the audio and computational resources to train neural network models to build the translator
We are building clustering and ranking algorithms to build a search engine on an existing in-house repository of question and answers
We are building voicebot frameworks which will enable our users to navigate complex information and also serve as a data collection tool.

If you would like to apply for the GM Prize on Community-Driven Innovation, describe how you and your team will utilize the prize to advance your solution.

Our solution is driven by the motive to empower migrant labourers, who are one of the most marginalised and exploited groups of people in a developing country like India. These labourers, often poorly literate, are not aware about labour rights and laws and are therefore not able to fully utilise the benefits that the state guarantees them. Our platform serves as an outlet for the voices of such people and also as a goto resource for queries on labour issues. We want to scale and enrich our operations by building automated question-answering systems using AI and ML algorithms.

If you would like to apply for the Morgridge Family Foundation Community-Driven Innovation Prize, describe how you and your team will utilize the prize to advance your solution.

Our solution is driven by the motive to empower migrant labourers, who are one of the most marginalised and exploited groups of people in a developing country like India. These labourers, often poorly literate, are not aware about labour rights and laws and are therefore not able to fully utilise the benefits that the state guarantees them. Our platform serves as an outlet for the voices of such people and also as a goto resource for queries on labour issues. We want to scale and enrich our operations by building automated question-answering systems using AI and ML algorithms.