Standard Biodata
- United States
- Other, including part of a larger organization (please explain below)
Not incorporated for now; will be a Delaware C Corp
High-quality biological data has always been a cornerstone of biological research. As computational methods increasingly demonstrate their potential in areas such as drug discovery and the selection of clinical trial participants, the critical need for robust data becomes even more apparent. Just three years ago, constructing an LLM was a feat few could achieve. Today, however, the principal constraint in harnessing the full power of computational technology is becoming clear. To truly advance healthcare, what is essential is the availability of high-quality, diverse, and longitudinal data.
It has become clear that what ultimately matters is not the mere number of samples but the number of high-quality longitudinal phenotypic x genomic samples to get a full picture of a single patient as well as having more diverse patients. As a result, we collect a holistic dataset on each patient to get a full picture of each individual and derive important insights.
Key Problems in Current Data Collection Practices and Sources
Inadequate Representation of Global Diversity: The majority of medical data originates from populations that do not represent the global diversity in genetics, environments, and lifestyles. This lack of representation leads to significant disparities in health outcomes, as drugs and treatments developed using these datasets may not be effective or could have unintended effects on underrepresented populations.
Structural and Usability Issues with Data: Data from electronic health records (EHRs) and other sources are often poorly structured and incompatible across different systems, posing significant challenges for effective aggregation and analysis. Additionally, these datasets frequently lack comprehensive genomic data and suffer from inconsistent formatting, which complicates efforts to conduct broad, inclusive research
Bias and Limitations in Research Outputs: The predominance of data from specific demographic groups introduces bias into research findings, potentially skewing clinical guidelines and treatment protocols meant to be universally applicable. This bias limits the understanding of unique genetic variants and conditions prevalent in underrepresented groups, further exacerbating health inequities.
We're establishing a new standard for medical data collection that can be replicated across diverse geographies, with a focus on diverse and underrepresented populations. Our approach features a dynamic, self-updating RWE aggregator, which continually incorporates new information on existing patients. This allows researchers to access additional data on patients of interest which is a significant improvement over the static datasets offered by existing biobanks and other data providers. In addition, pharmaceutical companies, model trainers, and other stakeholders can request additional information and testing for patients of interest. As real-world evidence gains growing acceptance in clinical trials, there is an increasing demand for dynamic datasets that can be continuously updated throughout the course of these trials.
We create a key feedback loop in which patients can submit requests for additional tests (such as blood tests or MRI) through our app with a doctor's note. We cover the cost of those tests while gaining access to the patient's historical EHR and respective test results. This feedback loop helps us enrich our data with patients who have unique health conditions, in addition to collecting data on fully healthy individuals, which serve as a control group.
Unlike existing biobanks, we build long-lasting, symbiotic relationships with our patients, allowing us to collect dynamic and updated data.
To increase participation rates and reduce "healthy bias", we fairly compensate each participant for their time. We share abnormal test results with patients to benefit the respective community and further incentivize trial participation. We ensure that collaborating with us offers genuine benefits to patients, which serves as the most rewarding incentive.
Target Population
Our solution is designed to serve two main groups that are currently underserved in the medical research domain:
1. Patients from Historically Underrepresented Groups: These include individuals from diverse ethnic backgrounds, geographical regions that are typically overlooked in global health datasets, and socioeconomically disadvantaged communities. Current medical research often fails to include sufficient diversity, leading to a lack of effective treatments and medical understanding for these populations.
2. Researchers and Healthcare Providers: Professionals in the medical field who struggle with the limitations of current data repositories which are static, non-diverse, and poorly structured, making it difficult to conduct comprehensive and inclusive research.
Current Challenges Faced by the Target Population
1. Patients: Those from underrepresented groups often receive medical treatments that are less effective because they are based on data predominantly collected from non-diverse populations. This can lead to higher rates of misdiagnosis, ineffective treatment options, and greater health disparities.
2. Researchers and Healthcare Providers: They face significant obstacles in developing treatments that are broadly effective across different populations due to the lack of diverse genetic and health data. This limitation hinders progress in personalized medicine and equitable healthcare.
Impact of the Solution
1. For Patients: By integrating a more diverse range of data into medical research, our solution ensures that future medical treatments and drugs are effective for a broader demographic, reducing health disparities. Patients from traditionally underrepresented communities will see improved healthcare outcomes as treatments become more tailored to their specific health profiles. Additionally, by engaging directly with our system, patients gain access to more personalized healthcare management and opportunities for targeted treatments.
2. For Researchers and Healthcare Providers: Our solution provides access to a rich, dynamic database that includes longitudinal health data across diverse populations. This comprehensive data allows researchers to conduct more valid and inclusive studies, ultimately leading to breakthroughs in understanding and treating diseases that affect diverse populations. Healthcare providers can utilize this data to make more informed decisions, leading to better patient care across varied demographic groups.
Sofia's Background: Originating from Eastern Europe and having lived globally, Sofia understands the healthcare challenges in post-Soviet regions and other underrepresented communities. Her experiences shape our strategies, making sure our solutions are culturally sensitive and tailored to these areas.
JZ's Background: Coming from an immigrant family, JZ brings a personal understanding of the diverse needs of underrepresented populations worldwide. His commitment is to ensure our solutions address global disparities in healthcare access and effectiveness.
Our professional backgrounds
Justin is architecting and leading the technical development of the data platform and other technical tasks that come along. He founded a biometric encryption startup out of high school and was the founder/CEO/CTO of a crypto gaming company afterwards, which was acquihired. Sofia is managing the biological data collection, storage, and further expansion to handle more biological species as well as any other “bio” tasks. Sofia served as a Principal Investigator on a National Science Foundation grant at 22 y.o. and was a Founder of a genomics x viruses company for over four years. She is intimately familiar with running a biotech venture and managing a research team.
How we work with the communities
Direct Engagement: We actively involve the communities we serve in the design process. By incorporating their feedback and suggestions, we ensure our solution is relevant and effectively meets their needs.
Feedback-Driven Development: Our project development is continuously guided by input from these communities. This approach helps us adjust our methods and technologies to better align with the users' expectations and cultural nuances.
- Ensure health-related data is collected ethically and effectively, and that AI and other insights are accurate, targeted, and actionable.
- 3. Good Health and Well-Being
- 10. Reduced Inequalities
- 17. Partnerships for the Goals
- Prototype
We’ve created the collection procedure for our initial pilot country and have a partnership agreement with an on-the-ground lab to begin collection. In addition, we’ve built out the app and storage infrastructure to support the collection.
We are applying to Solve because we believe it offers a unique platform that aligns perfectly with our mission to revolutionize medical data collection. Solve’s network of partners, mentors, and technologists could provide the critical support we need to overcome the specific barriers we face.
What specific barriers do you hope Solve can help you overcome, and how?
Technical and Market Access: As we introduce a novel approach in medical data collection, gaining traction and acceptance from stakeholders—including researchers, pharmaceutical companies, and patients—is challenging. We believe Solve’s ecosystem can help us establish credibility, facilitate key introductions, and provide market access.
Legal and Regulatory Guidance: Navigating the complex landscape of medical data compliance across different regions, especially regarding patient consent and data usage across the U.S. and E.U., presents a significant hurdle. We anticipate that legal experts within the Solve community could provide invaluable guidance, ensuring our operations are compliant and scalable.
Cultural Barriers: To truly standardize our solution globally, understanding and integrating into diverse cultural contexts is crucial. Solve’s global community offers a perspective-rich environment where we can gain insights into local healthcare ecosystems, which is essential for our model to be effectively adapted and implemented worldwide.
Financial Efficiency: We hope to leverage Solve’s resources to explore innovative funding models and partnerships that support our cost-effective approach.
- Human Capital (e.g. sourcing talent, board development)
- Legal or Regulatory Matters
- Product / Service Distribution (e.g. delivery, logistics, expanding client base)
- Public Relations (e.g. branding/marketing strategy, social and global media)
We believe that the need for “good” healthcare data will be 10X of what it is now in less than 5 years down the road. “Good” means not just genomic or just phenotypic data - it is a combination of genomic and phenotypic data over time in populations with diverse genetics and environmental pressures. Overmined EHRs in high-income countries (mostly the U.S.) is not the way to fill this data demand.
Competitors
Private Initiatives
IQVIA (S&P-500) is the largest player in this field, with its main focus being clinical trials. TrinetX is an emerging player that has some contracts but not significant revenue for this field (~25M for the last year). A lot of smaller companies focus on mining the already overmined U.S. healthcare records. It turns out that what they were collecting before didn't get the expected insights.
There are several successful companies that focus on data collection in the cancer field specifically (such as Flatiron, Tempus, and Aster Insights). Yet, it is clear that there is a move away from cancer towards other diseases, especially immunological ones for big pharma and aging for small biotechs and other private initiatives. Another comparable initiative was 54gene - a company that raised a significant round of funding (over $50M) to discover the genetic diversity in Africa but shut down due to external problems (problems between founders and investors & internal company issues, not business model-related). deCODE genetics was purchased by Amgen for a little under half a billion dollars. We’re rather likely to exit within five years through an acquisition by IQVIA, Google, Microsoft, or alike who use medical data for model training, or big pharma.
Public Biobanks
There are several existing biobanks, such as the UK Biobank and All of Us. These biobanks are not diversified enough to create training data for robust models (as many of these banks were created as non-profits for their home country). In addition, they were not built for the purpose of model training and require additional annotation to the data. They also do not allow recontact of the patients in the biobank, limiting the potential for additional targeted testing and clinical trial selection. In other words, we have a live, self-updated EWR where new information continues coming for “old” patients and one can get additional information for the patients of interest vs a static dataset provided by biobanks and other data providers. Finally, many of these large biobanks are housed in high-income countries, making it prohibitively expensive to do extensive testing.
Activities
a. Dynamic Data Collection: Implementing a system to continuously update and expand our database with real-time health information from a diverse set of populations.
b. Community Engagement: Working directly with underrepresented communities to ensure their data is included, using culturally appropriate outreach and technology interfaces.
c. Feedback Loop: Establishing a mechanism where patients can update their information and request additional tests, integrating their input into the data collection process.
Immediate Outputs
a. Enhanced Data Sets: The activities lead to the creation of comprehensive, dynamic, and diverse datasets that include up-to-date health information from a wide range of ethnic and geographic backgrounds.
b. Increased Participation: Improved engagement from communities that have historically been underrepresented in medical research, leading to more inclusive data.
Short-Term Outcomes
a. Research Utilization: Researchers begin to use the enhanced datasets, which now reflect a broader demographic spectrum, in their studies and trials.
b. Improved Research Quality: The quality and applicability of medical research improve as studies incorporate more representative data, leading to findings that are valid across diverse populations.
Long-Term Outcomes
a. Better Health Outcomes: Treatments and drugs developed using these datasets are more effective across different populations, reducing health disparities and improving global health equity.
b. Policy Influence: Policymakers use insights from more diverse and accurate data to create better health policies, further supporting public health improvements.
5. Evidence Supporting the Links
a. Research on Diversity in Medical Data: Studies show that increasing the diversity of data in medical research enhances the effectiveness and safety of treatments across different populations (source: peer-reviewed journals, health policy studies).
b. Feedback from Target Communities: Interviews and surveys with patients from underrepresented communities indicate a high willingness to engage with a system that respects their input and provides health benefits (source: community feedback reports).
c. Pilot Program Results: Initial tests of the dynamic data system show significant improvements in data accuracy and researcher satisfaction (source: internal evaluation reports).
Impact Goals
Increase the Diversity of Health Data: Our primary goal is to enhance the diversity in health datasets, ensuring that medical research reflects the global population's genetic, environmental, and lifestyle diversity.
Improve the Efficacy of Medical Treatments Across Populations: By providing more representative data, we aim to help develop treatments that are effective for diverse populations, thereby reducing health disparities.
Enhance the Responsiveness of Healthcare Research: Facilitate quicker and more accurate responses in medical research through real-time, dynamic data updates.
Measuring Progress Towards Goals
1. Percentage Increase in Data Diversity: Measuring the ethnic, geographic, and socioeconomic diversity of the dataset annually to ensure broad representation. This is compared against baseline data collected at the start of the implementation.
2. Number of Research Studies Using Our Data: Tracking the number of external research entities that utilize our datasets annually, which indicates the relevance and utility of the data provided.
3. Improvement in Treatment Outcomes: Collaborating with healthcare providers to track the efficacy of treatments developed using our data. This involves pre- and post-implementation studies to measure changes in treatment success rates across diverse populations.
4. Patient Engagement Metrics: Monitoring the number of active patient interactions with our platform, such as data updates, test requests, and feedback submissions, to assess engagement and the effectiveness of our community outreach strategies.
5. Feedback from Researchers and Healthcare Professionals: Conducting annual surveys to gauge satisfaction with the data's quality, accessibility, and impact on research and treatment development.
Alignment with UN Sustainable Development Goals (SDGs)
1. SDG 3 (Good Health and Well-Being): Target 3.8 on achieving universal health coverage, including access to quality essential healthcare services. Indicators for this could include the number of healthcare services developed using our datasets that are adopted in underserved regions.
2. SDG 10 (Reduced Inequalities): Target 10.2 to empower and promote the social, economic, and political inclusion of all. Indicators for this might involve measuring the inclusivity of our data collection practices and the reduction in health outcome disparities due to more informed, diverse-based research.
- Standard clinical lab procedures (with lab partners) for data collection and analysis (like whole blood tests, MRI's, etc.)
- -80 storage and standard research lab procedures to ensure that we can store samples for decades
- Cloud Computing for sample storage
- AI to organize the data from various different formats
- App with a hospital & clinic and patient interfaces
- AI for data analytics and deriving insights
- Worldcoin to do patient authentication in remote locations where gov-issued ID's are not present or are not trustworthy
- A new application of an existing technology
- Artificial Intelligence / Machine Learning
- Big Data
- Biotechnology / Bioengineering
- Crowd Sourced Service / Social Networks
- Software and Mobile Applications
- United States
- Ghana
- Kyrgyz Republic
Justin Zheng and Sofia Sigal-Passeck, both full-time + a couple of contractors for website design, etc. + clinical lab partners for data collection and analysis
3 months, full-time and have already achieved a lot of traction
Our company is built on the principle that a diverse team leads to better solutions. Here’s how we maintain this standard:
Diverse Leadership: Sofia and JZ, who lead our team, come from varied cultural backgrounds which inform our strategic decisions and workplace policies. Their perspectives are crucial in shaping a broad-minded company culture.
Hiring Practices: We focus on hiring talent from diverse backgrounds. Our job postings are crafted to be inclusive, encouraging applications from individuals with diverse experiences and skills, rather than just traditional credentials.
Workplace Environment: We’re committed to maintaining a respectful and supportive work environment. This includes clear policies on respect and inclusion, regular training sessions on these topics, and a feedback mechanism that ensures all voices are heard and acted upon.
By integrating these practices, we ensure our team not only reflects global diversity but also works effectively to tackle global challenges in healthcare data.
We will sell data to pharmaceutical companies for drug discovery, clinical trial recruitment, and AI model training. In the future, we will look to spin out and use our proprietary data ourselves. For example, we can look to develop our own therapeutics or train our own models as well as look for disease targets and data insights to sell to pharma. Other opportunities in decentralized clinical trials and digital twins are further possible as well as working with more diverse customers, such as model training companies and the U.S. government.
- Organizations (B2B)
There's a clear demand to pay for this data. Amgen purchased deCode Genetics (an Icelandic biobank) for $415M. GlaxoSmithKline invested $300M into 23andme for exclusive access to their genetic data (and 23andme doesn’t have any phenotypic data, which is now considered essential to draw insights from genomics data). GSK also paid Tempus $70M for three-year access to their cancer data portfolio. All of this happened before the “ChatGPT Revolution” and a rapid change in the pharmaceutical ecosystem, in which data is becoming increasingly more valuable. Importantly, a combination of genomic and phenotypic data.