Assessment Improvement Project
- Nonprofit (may include universities)
Our solution will focus on the systems needed to maximize efficiency of delivering hands-on performance tasks. We would like to use an open source LLM to create randomized NGSS aligned performance tasks with a specific rubric using a set of materials that students would use throughout the year. For instance, a model would create a task that would ask students to examine the relationship between the force needed to pull a block across a surface. The student would be given all of the materials and a teacher would proctor and score the student’s progress with an A.I. assistant. A student would complete this task along with a few online questions in order to get a complete diagnostic of the student and meet federal requirements. To recap, the main solution is the A.I. assistant which creates the task and rubric with a fine-tuned education-domain LLM (EdLLM).
This approach will allow teachers to seamlessly administer and rate the assessment without needing extensive technical knowledge. Inquiry based problems will be laid out for students to solve in order to assess higher order thinking skills even if students have limited knowledge. To prepare students for this new form of assessments for a drastically changing workforce, teaching styles will be adapted. Educator input will be the most essential part of the process as they will need to review items throughout the creation and administration of this assessment.
Students will spend more time off the computer and foster social interaction with teachers and students. It is important for learners to develop a desire to explore and investigate problems rather than a formulaic approach especially given the changes in the labor market.
Today’s students are dealing with a number of issues related to mental health, knowledge gaps, and more. One idea that is missing from the conversation is that students, as human beings, want to learn and solve problems that are of interest to them. They are still human. Transforming summative assessments to reflect that promotes inquiry based learning and allows more opportunity to foster human connection in the classroom. Many marginalized schools often face significantly more pressure to achieve on the assessment which constrains teaching methods and resources. This can add to students' feelings of instruction not reflecting their interests and individuality.
Inquiry based learning translates to significant health and educational benefits which can lead to profound opportunities for educators to address the specific needs of their students. The traditional “I do-You do” models do not promote student ownership of their learning and fail to capitalize off the natural curiosity of students. The shift will promote skills students need to become more malleable and successful for the rapidly changing workforce.
Current metrics used for how we measure school progress are generally met with disdain from most educators. We need to hit a reset on what we measure and how it relates to the ultimate goals of independence and employment for students. Once we acquire this meaningful data, then we can proceed to effectively target gaps amongst different groups, as our new goal will be attached to future employment statistics. Acquiring data that is connected to students' future employment prospects will allow us to make better decisions especially for the most marginalized groups. This is a foundational issue for education and successfully implementing this in one state will have profound impacts to the entire education system.
I am a current teacher and have been for the past 10 years in low-income schools serving black and brown communities, which influences everything I do in this space. Stories from individuals, teachers and students are necessary for me to maintain a guiding principle. The biggest issue I saw while being a teacher was the lack of job opportunities when a student graduates. Tying these assessments to labor market needs are a direct inspiration from the stories I would here.
I would classify myself as an upper-mediocre teacher who excels in personal relationships with students. Communication is critical for me to have purpose as an educator as it serves as fuel when there is chaos. My experience as a teacher dealing with summative assessments has been abysmal, to say the least. This was something I experienced firsthand as a new teacher where I had little guidance due to the fact that I taught a non-tested subject. Resources are devoted towards areas that everyone agrees have little beneficial impact on the student. The data that summative assessments collect is supposed to be used to better allocate resources. The end result is usually more pressure on students and teachers to make the district admin look well. This is one of the reasons I believe it is important to collect meaningful data that reflects the time. While we struggle to keep pace with the outside world, we are forced to stay behind and improve results on outdated assessments.
We believe it is crucial to be connected to the realities of the classroom. I would like to continue co-teaching in order to maintain contact so we do not lose touch. Hopefully most who are involved will also be able to connect with the classroom so we can continually gain feedback and respond accordingly. This will yield significant benefits in our overall approach and understanding of the communities we serve.
- Analyzing complex cognitive domains—such as creativity, collaboration, argumentation, inquiry, design, and self-regulation
- Providing continuous feedback that is more personalized to learners and teachers, while highlighting both strengths and areas for growth based on individual learner profiles
- Encouraging student engagement and boosting their confidence, for example by including playful elements and providing multiple ‘trial and error’ opportunities
- Other
- Other
- Grades 3-5 - ages 8-11
- Grades 6-8 - ages 11-14
- Concept
We have developed a guiding rubric of criteria from extensive interviews with experts of the past two years.
A portion of the assessment must be short physical tasks that focus on a student’s higher order thinking skills.
The content should be relatable to the population group.
Educators and students should be a participant in every step of the process.
States should always have the option of producing this with a local higher ed institution.
Securing an advocate in the decision-maker is essential.
We also have an extensive network of experts and connections who are ready to help build.
- United States
- No, but we will if selected for this challenge
Our solutions leverage state of the art technology with some fine tuning to allow us to apply it specifically to the education domain. We recognize that we don’t have the resources to generate new AI models but rather can use existing open source models and apply them in an unique fashion to this space. Additionally we have created a different business approach. As opposed to operating based on large contracts and holding generated IP as proprietary, we aim for a milestone based approach and want to keep the value generated open source.
Most companies can measure how much value they bring by how much money they can save the customer. Schools and SEAs do not make more money because an Edtech product is used. This is a fundamental flaw in the system which causes extreme inefficiency and overselling unnecessary products to school districts. Education is not a free market and we are not pursuing a capitalist approach to scale this solution.
Instead, our approach seeks to directly work with the State Educational agency and a local high education institution. Spacex did something similar by going directly to the decision maker and building the product through milestone based payments. This approach is attractive because it allows us to compete with larger companies and requires less navigation through the complex bureaucracies. We want to provide the state with all the necessary tools to create their own exams as efficiently as possible. Gen AI makes this possible and we hope to operate by providing the necessary support for states and operating more in a consulting fashion.
This approach could provide a model for other states to pursue similar approaches and goals without having to rely on vendors with little incentive to improve their product. The market would drastically shift from an oligopoly to a decentralized system where educators have the power to assess the students in an authentic, financially feasible, and human centered way.
Our core AI technology is based on large language modeling, which has become a mainstay in many domains over the last couple years with the emergence of ChatGPT. Recently, many open-source models have been developed that rival the performance of ChatGPT on many benchmarks. These include Mistral, Gemini, and most recently Llama 3. These models can be modified and fine-tuned on specific domains of data. We seek to fine-tune a suite of these models on public educational resources. These public educational resources could consist of worksheets, experimental labs, exams, etc. We can use NLP extraction techniques to comb these resources for relevant data which can be used to train an open LLM to create an education-domain LLM (EdLLM). We follow a similar pipeline as to Bulathwela et al. in "Scalable Educational Question Generation with Pre-trained Language Models". Thereafter, we will use retrieval-augmented generation (RAG) to create assessments personalized to specific groups. Groups like NVIDIA have used this technique to give open-source models a folder with files, which it can then parse and give detailed responses about. In a similar manner, we seek to use RAG with our EdLLM. We will provide EdLLM with files containing a specific group's (e.g. School A Science) education materials. EdLLM can use RAG to identify crucial pieces of the curriculum and thereafter, generate assessments that best match the provided materials. We can ensure the assessments fit NGSS standards in 2 ways. First, the standards can be added to the files provided to EdLLM for RAG. Second, precise prompt engineering, when requesting assessments, can enable the development of well-defined assessments that satisfy NGSS standards. We also envision a third manner, in which we fine-tune another LLM specifically on NGSS standards. In this way, assessments generated via EdLLM are passed to the NGSS LLM, where it verifies if the assessment satisfies all criteria.
Technically, our solution has two key features that it must have. The first phase will be the fine tuning done on general education materials provided by local districts and the state. The goal here is to help the general model become more specific to the education domain. This has been done successfully in the domain space of education question generation as demonstrated by Bulathwela et al. We expect the need to have a specific focus on the Pre-K-8 population group but the previously provided paper lays the foundation for what we hope to achieve. The second phase will be focused on catering the models generated questions to match the content the population of a specific school will be exposed to and cover the required NGSS standards. This specific approach was deployed in a healthcare context and proven to significantly reduce the occurrence of hallucinations as shown by Dr. Miao et al. in “Integrating Retrieval Augmented Generation with Large Language Models in Nephrology: Advancing Practical Application”. They had a focus on aligning to the KDIGO 2023 standards, while we would adhere to NGSS standards for our model.
This is quite a difficult task in today’s climate, especially in red states like mine. When I speak to state administrators, many of them have their hands tied as they have to follow the laws and are reactive to complaints that could bring negative attention to their agency. Addressing underrepresented learners is critical and I believe a long-term strategy must be set up for success.
To combat algorithmic bias we are going to take a few steps. First, we want the teachers to be active participants in the assessment development and implementation cycle. They are the ones with the most proximity to the students, which allows them to be the best to respond to the needs of the individual student. The students should perform the hands-on task with materials they have used throughout the year which would allow districts more autonomy to represent their student population.
Second, we have to collect input from teachers and students for various events within the assessment ecosystem. These events can include classroom instruction, formative assessments, and surveys. Working with a publicly funded Higher-ed institution within the state we can start to purposefully collect data that can represent all populations so we can avoid bias and assess the true skills of the learner. We want to test the critical skills of the student and how they affect their future employment. This data collection will hopefully help states allocate resources to provide more opportunity and advancement to all learners.
The scores from the assessments will help us understand how skills will affect the students’ employment prospects. States will have better information to allocate resources that will have the most impact on students who need the most help.
We will continually improve our algorithms and processes by consistently engaging with stakeholders from the communities we serve. Auditing our systems will be a part of our internal process and we will be as transparent as possible in order to allow for productive feedback.
Prathic Sundararajan: 714-299-6088
Suraj Rajendran: 224-587-0371
Upon reflection, my previous attempts to convince a state to start the process of a feasibility study were misguided. Even though the Assessment Improvement Project is not attempting to be a sales organization, there is a lot we can learn from the Enterprise Sales process. I did not acknowledge the massive risk that many state leaders would be taking to even attempt a new type of assessment. Other vendors would often have 9-10 members on their sales team when pitching to states to bid for a contract. Many questions that states had could only be answered by experts in assessments such as psychometricians and engineers. A well thought out strategy that includes a technical expert, “the money”, and a trusted advisor would yield dividends.
The current strategy for becoming pilot ready will take these steps
Assemble a team of experts: We will bring together our group of educators, psychometricians, engineers, and researchers to build a prototype if we win this competition.
Prototype Build: We will build a prototype of the assessment with the experts. This will allow us to show states what the solution could look like.
Engage with state leaders: We will start the full sales process of conversing with sales leaders in a strategic manner. This is an intensive undertaking and using enterprise sales techniques could yield results. There are certain states that would make the best option.
IADA application: We will apply for an IADA application in order to get an exemption from current assessments. Taking both assessments simultaneously will be burdensome and unpopular which could handicap the project
Build and Scale: We will begin to build and scale over a 5 year period throughout the state.
This would be a state funded enterprise after we prove viability and get the assessment off the ground. SEAs or LEAs will not have to find any new money in order to produce this assessment. Summative assessments are incredibly expensive and costly but our goal is to work within the current state budget and maximize all funds dedicated towards this specific assessment. We plan to sell a service and license certain products but the rest can and should be done within the state education infrastructure. We will operate as a non-profit and would hopefully be phased out after enough states become self-sufficient. Generative AI capabilities make it much easier for states to become less reliant on outside vendors, who often complicate and offer limited solutions for SEAs.
As a teacher, I often see the most important facets of education overlooked due to the pressure of high-stakes testing. The pandemic really exposed cracks in the education system which students have already known. School for many kids is a burden and in my experience they are often not wrong. I want students to be able to find an intrinsic value for the thirteen years they spend in school. High stakes assessments are one of the main foundational cracks within the system and their tentacles spread through many areas. We must change this and I would like to be a part of the solution.
The largest barrier I faced when pitching this idea was getting a state partner to start the process of creating a new large-scale summative assessment. Having the backing of Solve and the Gates foundations simplifies the partnership process in a number of ways. Pursuing these partnerships without the backing of either side created a chicken or the egg first situation. I need a champion who can be in the final meetings with me in order to bring validity to the proposal. In retrospect I should have put more emphasis and learned a few enterprise sales techniques.
- Monitoring & Evaluation (e.g. collecting.using data, measuring impact)
- Public Relations (e.g. branding/marketing strategy, social and global media)