ASSISTments: QuickCommentsLive
- Nonprofit (may include universities)
ASSISTments is a research-proven digital learning platform that has impacted over 1 million students in the last 5 years. Students complete problems online from popular open source curriculum, like Illustrative Math, receiving immediate feedback and hints on every auto-scoring problem. Immediate feedback is one of the learning science principles that makes ASSISTments effective for improving student math achievement, however, approximately 40% of the problems within the curriculum we support are open-ended questions that cannot be auto-scored by a computer. These questions are critical for learning and supporting students’ ability to communicate mathematical concepts and explain solutions to math problems, a priority for the Common Core standards.
In order for students to receive feedback, the teacher must manually review their responses and leave feedback. We know from our research that it is challenging for teachers to find the time to do this. As one student shared, “Sometimes I have to wait days or weeks to get feedback”. Our data tells a story of students often never getting feedback. Out of 45 million instances of students answering open-ended questions in our system, only 2% have feedback from the teacher.
In 2019, we set out to solve this problem, making it more efficient for teachers to leave feedback on student responses with a feature we called Quick-Comments. Modeled after the concept of Google Smart Reply, this feature provided suggested feedback messages a teacher could send to students with the click of a button. This video shows how this feature currently looks within our product. Teachers have reported that they loved the efficiencies this feature creates, but we know that students are still not receiving the immediate feedback they need to grow in their learning.
With the introduction of Generative AI, we quickly saw the potential to bring immediate feedback on open-ended questions directly into the student experience. We created a prototype in ASSISTments that bypassed the teacher and gave students text feedback on their answers and an opportunity to try again. We called this Quick-Comments Live. This video shows this prototype in action.
We are requesting funds to move QUICK-Comments-LIVE from a beta to a user-ready feature that can impact 90,000 students.
Gpt4 powers this prototype, but that is expensive. Our ultimate goal is for the feature to be powered by a free LLM that we train from an open source LLM (and call GOAT), allowing us to equitably offer this feature via ASSISTments, which we make free for teachers and students.
We are very proud of the research capacity of our prototype. We built it to deliver feedback randomly to students, allowing us to determine the effectiveness of the provided feedback. It was in this way that we determined that the GPT feedback was better than our original teacher facing suggestions that used our non Generative AI.
ASSISTments is the most proven math learning solution on the market, with two large-scale randomized controlled trials. These trials demonstrate significant immediate and long-term effects, with the greatest gains for Black and Latino students and students experiencing poverty. This work presents an exciting opportunity to scale this impact further by adding immediate feedback to more of our problems. Our platform primarily serves grade 3-8 teachers in Title I schools (~75% of our users), and we are well positioned to achieve even greater scaled impact in these communities through emerging partnerships, including one with the Maryland State Department of Education.
Our approach to product development is deeply grounded in and informed by user research, and we make a concerted effort to ensure diverse representation in the users we interview. From our research with over 80 teachers and students, we identified a clear and compelling need for a feature like Quick-Comments Live. We know students are excited to get feedback and to be allowed to try again utilizing that feedback. Here are a few quotes from the 7th-grade students who participated in our pilot:
- “I can use this feedback. I like how the AI explained what a rate unit is because i was a little confused at the beginning."
- “It works well. I think the use of AI in this case could be helpful because if the teacher is busy or you are at home this can help you understand.”
- ”Very helpful! It showed me what a unit rate was”
- “I find this helpful because it reminds me to relook at the question and helps me understand the question better.”
ASSISTments has always centered on equity and the needs of students who continue to be underserved in our traditional school system. For this reason, we committed to making our platform forever free for teachers and students. If generative AI is going to scale equitably and be something available to all via free learning platforms like ASSISTments, we must figure out a way to deliver impactful features like Quick-Comments Live without charging. To put the cost of using an existing popular LLM like ChatGPT for such a feature into perspective, we did the math: It would cost $450,000 to provide live feedback to students on the 45 million responses to open-ended questions we received over the last 5 years.
If we are going to reach the communities furthest from opportunity sustainably over time, there needs to be a better way. We will chart that course by developing GOAT, a free LLM that we develop and train using existing open-source LLMs and optimize specifically for this context.
Cristina and Neil Heffrnan, the founders of ASSISTments, met while teaching in a Title I school in Baltimore in 1995. They personally experienced the challenges of implementing a formative assessment routine. They saw how time consuming grading, giving meaningful feedback, and analyzing data trends for a class of 32 students can be. They had the foresight to know that with the inevitable proliferation of computers in classrooms, technology would be instrumental to making formative assessment scalable. With this origin story, the ethos of valuing teacher voice is core to how the ASSISTments Foundation operates. In the past two years alone we have engaged 450 teachers in user research, 75% of whom work in Title I schools, to inform 23 significant feature improvements to our product.
Through a unique ongoing partnership between The ASSISTments Foundation, and the ASSISTments Project @ WPI, the team behind ASSISTments is uniquely positioned to create breakthroughs in how GenAI can impact student achievement. Dr. Heffernan is the William Smith Dean's Professor of Computer Science and director of the Graduate Program in Learning Science & Technologies at Worcester Polytechnic Institute. He is one of the leaders in the field of Artificial Intelligence Applied to Education (AI in ED), and he and his lab are dedicated to research and development related to LLMs for education. The WPI team includes a research scientist and 6 PhD students in Dr Heffernan's lab. They have published a dozen journal articles and strictly reviewed conference papers, and won or been nominated for Best Paper 20 times.
Cristina, who is now the Executive Director of the ASSISTments Foundation, has shepherded the scale in classrooms in all 50 states, and grown the organization to 23 employees. Together, they have a track record of scaling a product while innovating in new areas like GenAI with a commitment to rigorous research.
- Providing continuous feedback that is more personalized to learners and teachers, while highlighting both strengths and areas for growth based on individual learner profiles
- Encouraging student engagement and boosting their confidence, for example by including playful elements and providing multiple ‘trial and error’ opportunities
- Grades 3-5 - ages 8-11
- Grades 6-8 - ages 11-14
- Prototype
ASSISTments as a digital learning platform is focused on growing, with 90,000 current active students. Quick Comments Live as a feature is not live in our product. We have a working interface as an MVP, with some initial user testing points to areas of improvement in the UI and usability of the feedback delivered to students. In addition, to be able to provide this feature for free and bring it to scale, we need to develop underlying technology to support it.
- Yes
We have users in all 50 states.
Our solution involves using trained open-source LLMs to create a safer, more accurate solution and keep costs down.
Everyone expected OpenAI and Google to have a defensive barrier to competitors (i.e., a "moat" to someone stealing their customers), given that it costs tens of millions of dollars to train an LLM (with a few billion parameters) on some part of the whole web, no one expected there would be free competitors. It started when Meta made their own version of a LLM, but they stunned the world by releasing it for free usage. No one really knows why Mark Zuckerberg open-sourced the model they call Llama. But as soon as they did, they were used massively.
What shocked the world was that Percy Liang, at Stanford, used Llama to finetune a model that had close to GPT performance but was 100% free! The paper is here.
What Liang and his team did was devise 150 somewhat abrirary tasks that you might want to ask GPT to do. They then paid OpenAI to simulate data for these tasks. They used that data to finetune Llama (and called the resulting model Alpaca since it was close to Llama.)
A month later, the infamous leaked Google memo showed that Google executives where scared of what Liang had done. The memo was called "We Have No Moat: And Neither Does OpenAI"
If someone could spend 600 dollars to turn a free model into something that ChatGPT charged money for, that threatened their business. Since then, in addition to Meta releasing Llama 2 & 3, other free LLMs have been released, such as Mistral and FALCON (the one from the United Arab Emirates) and the most recent addition: Microsoft's Phi-3. There is an abundance of free models that individuals can fine-tune to their specific business case.
Everyday new and better models are available (on a site with a funny name called HuggingFace.com). So, our project will be innovative in using open LLMs. This does not mean that we won't sometimes call out to ChatGPT to gather more data, but we are not condemned to pay OpenAI for the rest of our lives.
We call the new model that we are fine-tuning GOAT because a goat is like an alpaca and because WPI's mascot is a Goat (its not because we think our model will be the greatest of all time). Instead, GOAT stands for "Generative Open Access Transformer" because it is a generative model that will be made open access, and of course, it is a transformer.
We are using open LLMs and ChatGPT. Dr Heffernan and his PhD students have already been training their own model for this task called GOAT.
We conducted a Beta Test of the MVP. As part of that work, we surveyed both the teachers and the students. Here are some of the results.
We had students keep logs where they wrote notes on the feedback they received from the AI. Here are the top 4 themes from the 296 messages submitted. From this feedback we know we are close but have some room for improvement with our LLM and our Prompts.
Feedback is helpful/good (173)
Feedback is not helpful (46)
Feedback is confusing (34)
Feedback lack of specifics or clarity (32)
Students gave us feedback on the user interface. These were some themes from their feedback.
Revise the three options
Fix the box that contains the feedback to make it fully shown
Don’t correct students when the answer is correct
Don’t tell students to do things that have been done
Provided customized feedback based on how much time students spent on the problem, whether the answer is correct or wrong, and whether the answer was gibberish
Provide alternate answers
We received some great quantitative findings. These tables show those results.
Note:5 means very much, very helpful, very easy.
How much do you like getting feedback on your answers to open-response questions?
- 1 - 4
- 2 - 8
- 3 - 28
- 4 - 32
- 5 - 18
- Grand Total - 90
Overall how helpful was the AI feedback for you?
- 1 - 1
- 2 - 22
- 3 - 21
- 4 - 36
- 5 - 10
- Grand Total - 90
How easy was it for you to decide if a feedback was helpful for you or not?
- 1 - 1
- 2 - 4
- 3 - 21
- 4 - 34
- 5 - 30
- Grand Total - 90
How easy was it for you to decide if a feedback might be wrong?
- 1 - 4
- 2 - 16
- 3 - 26
- 4 - 25
- 5 - 19
- Grand Total - 90
Currently the AI feature allows students to revise once. Would you like more chance to revise your answer?
- No - 21
- Yes - 69
- Grand Total -90
What’s the maximum number of times you would like to revise your answer?
- 1 - 21
- 2 - 13
- 3 - 41
- 4 - 6
- 5 or more - 9
- Grand Total - 69
Would you like to receive feedback on the revised answer?
- No - 7
- No Preference - 15
- Yes - 68
- Grand Total - 90
Would you find it helpful to get a score along with the feedback?
- No - 7
- No Preference -12
- Yes - 71
- Grand Total -90
It is clear that students are very interested in getting feedback to help them learn. They expect it, in fact. With more work, we can get them what they want.
A very different type of evidence is a paper we have submitted here. In this paper, we compare GOAT with GPT4.0.
We are very aware that LLMs are trained on the web, so they could have the same bias regarding the data they are training on. In fact, the Heffernans’ son did a study with Adam Kalai, the single most influential researcher on the topic, who showed how biased these technologies can be. Specifically, his 2016 paper "Man is to computer programmer as woman is to homemaker? Debiasing word embedding". The work conducted by the younger Heffernan during his internship with Kalai in 2019 showed that word embedding was biased by people's last names (i.e., "Mohamed" is more likely to be associated with terrorists compared to names like Peter and Steve.) Their paper is here.
Kalai has volunteered to serve on the ASSISTments board and is now working on safety and fairness at OpenAI. He has agreed to act as an advisor to our lab and help us interpret and consume the many published findings that are emerging from OpenAI.
While it is well known that using AI to review resumes is very biased toward the names of individuals, our proposed work is not similar at all; the student does not include their name in this task. However, even though this task is not likely to generate an unfair result, we have already done some fairness testing. In this paper, we looked at what teacher factors might cause bias in an early version of this system (when teachers were writing their own comments instead of being given feedback from an LLM). We find little evidence of unfairness then, but if we were to be funded, we would continue with our work to check and double-check on fairness issues.
The solution team is a partnership between Worcester Polytechnic Institution (WPI) and the ASSISTments Foundation (TAF). Cristina Heffernan, running TAF, currently has 23 full-time employees, while Neil Heffernan's lab has 3 programmers, a postdoc, and 6 PhD students doing AI research. For this project, WPI will be doing the AI work, while TAF will be interacting with our community of teachers and getting feedback from them and their students.
ASSISTments already has over 90,000 student users, so if we can make a good product that does not cause us to spend a lot of money at OpenAI, we are in a position to quickly scale this by giving it to our users. Not only is that nice, but it will also allow us to improve our work by finetuning our LLM, GOAT. This works because we created a workflow in the product where students give feedback; students can up-vote a good comment or down-vote a bad one, and we can finetune GOAT every night.
By creating our own LLM, we can include QUICK-Comments Live in our free ASSISTMENTS-Teacher, enabling students across the country to access this feedback.
We believe in feedback. Up until GenA, it was not possible to automate feedback for students open response answers. Now we can see that it is, and we want to ensure that every student using ASSISTments gets the support they need while practicing math. This financial support will help us get this initiative over the finish line, improving the prompts and building a less expensive solution to ChatGPT.
- Business model (e.g. product-market fit, strategy & development)
- Human Capital (e.g. sourcing talent, board development)
- Technology (e.g. software or hardware, web development/design)
![Cristina Heffernan](https://d3t35pgnsskh52.cloudfront.net/uploads%2F73869_Cristina+Head+Shot+LOW+Res+copy.jpg)
Executive Director