Muscogee Soundwaves: An Interactive Corpus
- United States
- Nonprofit
The Muscogee tribe is endeavoring to revitalize our language. There are about 300 remaining first language speakers, and in recent years there has been an increase in tribal members interested in learning the language.
While linguistic documentation has been developed over decades for muscogee, educators still have little access to these resources. Books are expensive and not searchable. Additionally, there are many hours of recordings that could be great resources for classroom use. Language reclamation cannot happen without the work of our teachers, so we must prioritize supporting our educators and providing them access to linguistic resources in an intuitive way.
The proposed solution will be a searchable interface that serves as a repository for Muscogee language recording. It will be AI-powered by using speech recognition to index recordings in order to allow them to be searched for key terms and example sentences. This will allow educators and learners to hear recordings in their contexts.

The technology behind this interface will use state-of-the-art approaches to speech recognition and AI in order to locate spoken terms, tokenize words, and find relevant results. Recordings can be downloaded for use in a classroom setting, increasing students’ exposure to hearing the language.
This solution will help educators and learners in the Muscogee tribe, and by extension its 100,766 members. It will bridge the gap between language education and linguistic documentation by supporting teachers in their language revitalization efforts, and it can be used as a resource in community classes as well as in courses at the tribal college.
This project will be a partnership between tribal leaders, community activists, teachers, elders, and students. We will be regularly working with these groups in order to make sure that the tool is useful and intuitive.
All of the members of the Muskogee language Foundation are Muscogee Nation citizens. The majority of our team members are located in Oklahoma on the Muscogee reservation. Our Team Lead is Muscogee and like all of us is deeply invested in the continuation of our language in order to honor our elders and ancestors. We are able to design our system with input directly from the people who will be using it and benefiting from it.
- Advance community-driven digital sovereignty initiatives in Indigenous communities, including the ethical use of AI, machine learning, and data technologies.
- 4. Quality Education
- 10. Reduced Inequalities
- Concept
We already have a working Muscogee language speech recognition model with a word error rate (WER) that is comparable to state-of-the-art for low-resource languages. We also have nearly 40 hours of existing high-quality studio recordings.
We hope to raise awareness for our community’s needs, network with website, software, intellectual property and AI experts, and seek other funding opportunities and partnerships.
- Legal or Regulatory Matters
- Technology (e.g. software or hardware, web development/design)
Our team lead is a Muscogee Nation citizen, an active participant in community language classes, and in regular conversation with tribal leaders and educators. Although Julia is not co-located with the tribe, she has developed relationships with the community over many years and is able to visit on an annual basis.
The Muscogee Soundwaves project seeks to use state-of-the-art AI technology to make Muscogee language recordings available as an educational resource. This could be a starting point for the use of AI and natural language processing in a practical way that supports community efforts. In the past, Indigenous data has been harvested harmfully, and this project can reclaim AI and utilize it in a way that positively gives back to the community and can serve as a model for how AI-powered tools can be made that support human endeavors.
The goal is that this project will positively impact language revitalization for Muscogee. There is research showing that hearing a language is crucial for acquiring the language. Giving educators access to recordings of natural speech will allow them to use these recordings in their classrooms and therefore increase learners’ listening exposure to the language, therefore significantly improving language acquisition.
We can measure impact by how many educators and learners are accessing the system, and how many hours of Muscogee recordings are listened to.
The underlying technology will be ASR, automatic speech recognition, which uses transfer learning from a large multilingual model fine-tuned with Muscogee language data. This model will be combined with a large database of Muscogee recordings and built into a simple, searchable online interface.
- A new application of an existing technology
- Artificial Intelligence / Machine Learning
- Audiovisual Media
- Software and Mobile Applications
Oklahoma and Florida
1
- Individual consumers or stakeholders (B2C)
