Multimodal Inner State Estimation based on Deep Learning
Over 70% of Americans regularly experience stress. Chronic stress results in cancer, cardiovascular disease, depression, and diabetes, and thus is deeply detrimental to physiological health and psychological wellbeing. According to the British Health and Safety Executive (HSE), stress accounted for 37% of all work-related ill health cases in 2015/16 . These severe side effects of stress call for automated detection methods.
Emotions are valenced responses to relevant stimuli that are directed toward specific targets (e.g., people, objects, or events), differentiated, and relatively short-lasting (Ekman 1993, Frijda 1994). They include facial expressions—the traditional remit of research on emotional expressions—but also vocal/acoustic, bodily/postural, verbal/textual, and symbolic/pictorial expressions (e.g., emoticons or emojis).
Therefore, modeling stress detection and emotions technology enables the creation of agents with more human-like behavior, thus increasing the validity of the resulting simulation. This is why we are working on Multimodal Inner State Estimation based on Deep Learning.
In the struggle for easy validation of human emotions and stress-free environment, we came up with four different models as one to find easy solutions to these problems.
First Modality: Text Speech modality was implemented using pre-trained embedded matrix and Deep Learning for inference and a better model.
Second Modality: Audio modality was implemented using python packages (i.e. Librosa) for feature extraction and a sequential model is built for validation.
The Third Modality: Images/Video modality. We planned to implement RESNET for image feature extraction, before we plan to implement a good architecture for better performance.
Fourth Modality: Physiological data is implemented in this model to complement other models for better performance of our model.
Modal Fusion: We are planning to compare early and late fusion, and submit to a better method with good performance.
Our product is a business-to-business solution which would help health professionals to validate and confirm the levels and state of stress and emotions of their patients or clients through the use of technology.
There are a few multimodal multi-party emotion detection models in the market to serve million, especially the poor. Most of the multimodal models out there used dataset that involved one or two people conversation. Our solution is going to be a product of multiparty conversation from movie series as been released by the Multimodal EmotionLines Dataset (MELD). Also, this model would include a stress or no stress detection model which would makes four models in one for a better performance.
Approving our application would bring a brilliant idea in to practice and makes it available to large population at ease and cheaper rate. We plan to deploy our technology through phone applications, web applications and some online services like AWS and as well teach the medical and/or health professionals how to make use of it.
- Enable continuity of care, particularly around primary health, complex or chronic diseases, and mental health and well-being.
- France
- Concept: An idea for building a product, service, or business model that is being explored for implementation; please note that Concept-stage solutions will not be reviewed or selected as Solver teams
The product is at the development stage. And enterprise is a startup.
As recent graduates and a startup enterprise, we are applying to become a solver to evade the market barriers. We are of the opinion that pushing our solution through becoming a solver of MIT and the expert mentorship would help us compete and spread good news about our brilliant project.
- Business Model (e.g. product-market fit, strategy & development)
- Financial (e.g. accounting practices, pitching to investors)
- Human Capital (e.g. sourcing talent, board development)
- Monitoring & Evaluation (e.g. collecting/using data, measuring impact)
- Product / Service Distribution (e.g. delivery, logistics, expanding client base)
- Public Relations (e.g. branding/marketing strategy, social and global media)
- Technology (e.g. software or hardware, web development/design)
The ideal of including physiological solution in the model and multi-party conversation that could allow easy predictions even when many people are involved makes it different.
We planned to impacted lives through the use of our models. Plan to continuing improving the product after being released and continue working on more solutions to ease stress of the entire world population.
- 3. Good Health and Well-being
Being a start-up company, we are planning to partner with many established organizations like MIT, and as well planned to use different platforms of opinion submission to evaluate the impact of our models after being released.
We expected our product to have a short- and long-term impact because firstly, we are putting our knowledge to make sure we achieve a reliable and very cheap product that would be very useful for the rich and poor with no stress; secondly, the model is definitely different from countless related products out there when it comes to dataset and number of fused models we are working on.
This solution will have a great impact because as I said earlier, we targeted B2B for the first release and planning to makes the product available phone apps, web apps and virtual services for easy use for the rich and poor. The model is built using python programmes and libraries for general usage solution.
- A new application of an existing technology
- Artificial Intelligence / Machine Learning
- Behavioral Technology
- Biotechnology / Bioengineering
- France
- Egypt, Arab Rep.
- Finland
- Germany
- Nigeria
- Poland
- Saudi Arabia
- Hybrid of for-profit and nonprofit
The approach to makes it diverse is to explore dataset from other regions, culture and colour in the nearest future and makes the product available.

Team member