The Light Collective Data Trust
Online patient communities have proven both the value of participation in care and the vulnerabilities of digital health products. The digital footprints created by online patient groups are vulnerable to data breaches, misinformation, and outright exploitation. Our collective data on social media lies fallow, while our communities see many ways that our collective data likely could improve health outcomes if we could develop collaborations with trustworthy partners. Moreover, our patient communities have no access to the data they produce.
The Light Collective is piloting a civic data trust for genetic mutation groups on social media, together with patients, experts in governance, clinicians, coders, and cybersecurity.
When patient communities have rights to their collective data via civic trust, we accelerate innovation and adoption of clinical interventions and technologies to help people with rare diseases get the right care and get better care.
Rare disease patients who fall through the cracks of the healthcare system increasingly turn to their peers on social media for knowledge, support, and ways to innovate in healthcare. Emerging rare patient groups on social media lack governance structures that lead to fair partnerships and ensure community interests are represented as digital health technologies emerge.
Consider this study reporting predictive power of Facebook user posts which show risk of 21 different health conditions.
Predictive algorithms can be useful tools for clinicians and patients to make informed medical decisions. With good data governance, predictive algorithms can help patients with rare diseases get the right care faster and more accurately. With poor governance, predictive algorithms can be biased, inaccurate, and dangerous when making predictions about a group without their knowledge or consent.
Currently patient groups on social media have no real access or rights to collectively steward the highly sensitive data they produce.
However, patient peer support groups on social media create massive digital footprints, making them easy targets for medical misinformation, discrimination, and exploitation.
We posit that when when rare support groups are equipped to be stewards of our collective data, we unlock new collaborations technologies to change the course of our disease.
Digital rights for patients can be described in 4 words: No aggregation without representation. We are developing a new Civic Data Trust model for peer support groups to self-govern the data that effects their lives and their futures.
Our solution is to work with the early adopters of new technologies starting with fellow rare mutation carriers on social media. While civic trusts are not new, we are applying our model to communities on social media in a way that has never been done.
In order to help people with rare diseases get the right care faster we are piloting a civic trust to represent the digital rights of peer support groups on social media. We posit that when patient populations are equipped to self-govern their collective data via a civic data trust, we lay the groundwork for adoption of more accurate, unbiased, and fairly implemented predictive algorithms when being built upon our data to assess disease risk.
The leading submitters of this application are rare mutation carriers working to protect our communities online. We formed The Light Collective to serve the leaders and organizers of rare hereditary cancer mutation carriers (BRCA, PTEN, PALB2, Lynch, etc.) patient communities on social media and their group membership.
We started with BRCA because of the work we've done with this community to build capacity in data governance since uncovering security and privacy problems on Facebook in 2018. Within the BRCA community, we call ourselves “BRCActivists” because we believe in the power of our collective data to bring about change and improve health outcomes. This includes women, men, people of color, and the LGBTQ community.
Our Understanding of BRCA and Other Herditary Mutations has Shifted In the Last Decade
While BRCA is considered rare, the use cases in our community apply to many other groups of genetic mutation carriers. We are learning more about the size and global impact of BRCA and other rare cancer mutations, as well as rare diseases in general.
From global projects like BRCA Challenge, we're learning more about rare hereditary cancer genetic variants every day. It has been reported that the population frequency of pathogenic BRCA1/2 mutations is 1:400, which would be about 187M people globally. But there's a lot we don't know, too; despite being the must studied gene in the human genome, a 2018 study from Geisinger's MyCode initiative (n = 50, 726) showed that 8 out of 10 people who carry a BRCA mutation are unaware that they have it despite frequently engaging with the healthcare system. But that's just BRCA1/2. There are a host of other genes associated with hereditary cancers and even more associated with other diseases and most people who carry these mutations have no idea they do, and thus can not take the steps need to prevent illness and sometimes death.
Black women are 40% more likely to die of breast cancer than white women for a range of reasons, including the lack of identified mutations and access to genetic services. We're just at the tip of the iceberg when it comes to understanding how communities of genetic mutation carriers adopt new clinical interventions to make informed decisions based on genetic data. More needs to be done get the right information to the right people about their genetic risk, and to integrate and broaden use of trustworthy technologies. A key here is defining what's trustworthy and providing patient communities with the tools to self-organize and self-govern their data.
Patient Communities are Trapped on Social Media
Outside of the walls of clinical and research institutions, the data-sharing landscape where patients engage with peers is fraught with privacy problems. In 2018, our team uncovered major health data leaks in Facebook Groups which led to a congressional inquiry. Current social media platforms where hereditary cancer peer support groups reside are designed for monetization of data through marketing, not for improving health outcomes. Online patient forum alternatives are not adequate to counter the network effects and outsize power of incumbent and emerging venture-backed social media platforms.
Some of our groups wish to leave the social media platforms where we reside. But where do we go? Simply leaving Facebook and Twitter are not an option, when these groups are at-capacity delivering their core mission to offer social support. As a mother of a son with autism put in in this New York Times Op Ed, her group served as a lifeline for her as she navigated support and resources. Further, younger generations of our community will always gravitate to the next emerging platforms like Clubhouse and TikTok.
Investors and developers of incumbent platforms have failed to serve as good stewards of vulnerable population data, and have outsize power to create technology that impacts, potentially adversely, the futures of marginalized groups. Financial interests to shareholders are often not aligned with interests of patient communities. And privacy laws fall short of protecting these vulnerable populations. Rare patient communities, as producers of data on social media, must be equipped to collectively self-govern their data. That's why we urgently need a mechanism for independent digital governance for rare health groups on social media.
Our desired impact for patient peer networks on social media is to ensure we're fairly represented in the design of technologies that affect our communities and to be equipped to utlize our own collective data to shape research and drive better treatment and prevention options. We are actively piloting data-driven strategies to understand how to respond to misinformation, how to communicate good information more effectively and more broadly, and how to utilize the data we generate to drive research and progress on our conditions.
Our proposal takes a “community health worker” approach to putting analytics tools into the hands of patient community leaders and, further, equipping them to collaborate fairly and ethically with research and clinical institutions. At scale, our method bridges data literacy gaps between at-risk patients and the social media platforms where they convene, equips patient communities to self-organize and govern the data they generate, and enables the development of disease-focused learning health networks with organized patient communities at the center.
The Light Collective has identified a broader need in this arena. Community leaders who represent diverse health conditions ranging from rare disease to lupus to Long Covid have reached out to us with interest to participate in activities of The Light Collective and effect critically needed change in this area.
- Unlock collaboration among patients, scientists, and health care providers to improve patient outcomes
A civic data trust puts data directly in the custody of the ultimate stakeholder: patient communities.
Having access to our community's data would benefit mutation carriers by improving our scientific and data literacy and ability to effectively participate in research, policy, and our own healthcare. Furthermore, data self governance allows us to develop mutually beneficial partnerships with technology groups and clinical and research institutions. Not having rights to our collective data not only disenfranchises us--especially when analytics can be used to discriminate or exploit--but also misses an enormous opportunity for patient communities to drive research and innovation around their conditions.
- Pilot: An organization deploying a tested product, service, or business model in at least one community.
Our initial pilot will be kicking off in late summer, and we have secured funding for the first stages.
We originally were funded through the Pioneering Ideas program at Robert Wood Johnson Foundation in 2019 to develop the initial roadmap components of a civic data trust, which are publicly available on our website.
- A new business model or process that relies on technology to be successful
As a group of patients leading participatory design and testing, we help innovators to ensure emerging technology like ethical AI is grounded in real use cases and community behaviors.
Behavior change is hard for one person and even harder for an entire community of people who share a health condition. Many mHealth interventions have shiny features, but often these apps fall short of their promise or fail to be grounded in practical use cases.
The Light Collective is unique because we're "flipping the narrative" of patients getting passively recruited to adopt technologies that impact our lives. Instead of first asking an engineer or a scientist what they think is important, we start with those who are affected by the problem.
Rather than conforming to terms of service for technologies that change as fast as their business models pivot, our community seeks to shift a paradigm to data rights as human rights. As we think about the next generations of technologies we adopt as genetic mutation carriers, the goal of a civic data trust is to proactively define our rights and terms of data use with partners who seek early adopters of new technologies.
We are actively evaluating technology partners with data impacting the BRCA community, and seeking ways to develop use cases with these groups while testing their solutions. As early adopters our goal is to formalize partnerships with emerging technologies and platforms, researchers and clinical institutions that meet our standards and respect our digital rights.
A civic data trust has 4 main components. Here is how it works:

In short, we're relying heavily on API's developed by technology partners. See Case Study #2 for further detail as an example in the next section.
Civic data trusts as core components of learning health networks are being developed and tested through Cincinnati Children’s Hospital and a series of projects centered around different health conditions. Improve Care Now (ICN) is one such project that has 71-sites serving more than one-third of US children and adolescents with IBD. Through this model, the clinical remission rate for patients has increased from 60% to 79%.
The Light Collective proposes to build upon the learning health network model by helping our global networks on social media organize towards improved outcomes in the following ways:
- Improve accuracy of predictive risk for people with rare mutations and/or variants of uncertain significance (1) (2).
- Improve trustworthiness and fairness of predictive algorithms.
- Improve health and data literacy while countering the effects of medical misinformation on social media.
- Improve options for screening, prevention, and treatment of cancer/other disease.
- Improve access of all patients, including patients from historically marginalized groups to quality peer support and evidence-based information.
Our project centers around a civic data trust model. Civic data trusts, their components, and their benefits have been described extensively by our partner, attorney Sean MacDonald, here and here.
Civic data trusts as core components of learning health networks are being developed and tested through Cincinnati Children’s Hospital and a series of projects centered around different health conditions, including Improve Care Now, described in the previous section.
Once our pilot civic data trust model is established in Year 1, we will evaluate partnerships with existing technologies, both by organizing data resources already in use, and evaluating ways data produced can be self-governed.
Case Study 1: BRCA Exchange
We will prioritize partnership with public repositories such as BRCA Exchange, which have curated evidence at a global scale. BRCA Exchange is the world’s largest public source for information on BRCA1/2 variants.
Community advocates from The Light Collective work to find strategies to help our community classify variants of uncertain significance (VUS) for rare or unidentified partners.
By implementing a civic trust, we would be better equipped to formalize our community/researcher/clinician partnerships and help people with variants of uncertain significance (VUS) connect to researchers, ultimately lowering the VUS rate. We would also seek to co-design patient-facing resources for genetic literacy on genetic mutations. (Video Here).
Case Study 2: Project Domino
Project Domino is in the vanguard of medical misinformation AI, and a potential partner that is part of the MIT Solve ecosystem. Project Domino simultaneously pursues fused forensics/NLP/social network analysis, continuous large-scale automation, open innovation, and specializes it for medical domains.
A partnership with Project Domino could have applications beyond misinformation. Our role would be to help Project Domino test real-world use cases driven by communities. See our Solve for Health Security Challenge submission here.
The cost of harm to patient groups is immeasurable and undefined at this time, as the harm expands beyond the exploitation of data. With this grant, we will also explore ways to measure this harm.
- Big Data
- Biotechnology / Bioengineering
- Crowd Sourced Service / Social Networks
Security, privacy, and data sovereignty are our primary concerns. Use cases and/or threat models for vulnerable patient data will vary by community and by digital asset. We will work closely with experts in cybersecurity and research ethics to carefully map out risks for any new data set or technology we bring into the data trust.
Part of our challenge will also be to focus on - and balance the interests of - a cohort of groups to achieve tangible and measurable outcomes. Those outcomes have to be defined by the groups we choose, so we’re in a bit of a catch 22. To mitigate this, our initial ‘stakeholder mapping’ process in our first part of the project will be to identify and map the community leaders who will be best equipped to engage in this project.
Finally, civic data trusts provide communities with a tool to better use data in a transparent and equitable way, it’s important to understand this model is not a panacea,a replacement for laws and regulations, or a magical pathway to build capacity for patient communities to self-govern. Governance is really hard.
So many initiatives begin with the best of intentions, and end falling short of their promises. We learned this the hard way by becoming enmeshed in a monopolistic platform that completely ignores the security and privacy of its users. This is how that turned out.
The cost of harm to patient groups is immeasurable and undefined at this time, as the harm expands beyond the exploitation of data. With this grant, we will also explore ways to measure this harm.
- Women & Girls
- LGBTQ+
- Low-Income
- Middle-Income
- Minorities & Previously Excluded Populations
- Persons with Disabilities
- 3. Good Health and Well-being
- 8. Decent Work and Economic Growth
- 9. Industry, Innovation and Infrastructure
- 10. Reduced Inequality
- 11. Sustainable Cities and Communities
- 17. Partnerships for the Goals
- Canada
- Iceland
- Israel
- United States
- Argentina
- Canada
- Iceland
- Israel
- Portugal
- United States
Year1:
Consider the cancer tag ontologies on Twitter for example which were included in study co-authored by patient leaders who established these networks. In total, CTO hashtags were used in 1,813,515 tweets with a median of 14,706 tweets per hashtag (range, 46 to 612,768 tweets). Hashtags had a median of 3,733 users (range, 38 to 60,011 users). The eight hashtags with moderated Twitter chats accounted for 1,167,009 (87.3%) of all tweets. Annual volume rose each year from 28,277 in 2011 to 446,026, with the largest absolute annual increase occurring in 2014, after the November 2013 publication of the CTO hashtag collection.
Consider the hereditary cancer community on Facebook, of which BRCA Sisterhood has represented 10K members. These communities include a mix of BRCA, PTEN, PALB2 mutation carriers along with people who have rare or unidentified genetic mutations. In year one we would aim to serve these communities directly of 60k-100k people.
Year 5:
By year 5 we would like to expand our reach to any rare disease or health group on social media at a global scale.
Year 1
Identify Target Partners. The initial partners will be community partners, i.e., leaders of communities seeking to participate in the data trust. Secondary to this are the potential data holders and/or technology partners seeking work with patient communities on social support solutions.
Establish Rights. To establish the rights of patient communities, community leaders will define how communities make collective decisions about data use.
Identify Data Artifacts. Data artifacts include existing datasets licensed by the trust from target partners.
Translating Network Purpose into Data-Sharing Activities. As we define how the network will carry out our purpose, the our pilot cohort will need to decide on the types of activities the data trust will allow and how individual data-sharing decisions are made.
Phase II: Negotiate
Negotiation with Partners. To establish the rights of patient communities, community leaders will define how the requester can use network data and its derivatives in ways that are beneficial not only to the requestor but also to the patient communities in question, and that are not harmful or risky to the patient communities. We will work with our legal counsel to represent The Light Collective in partnership agreements.
Develop Contracts. We develop data-sharing agreements based on established purpose, licensing, and rights sought by the community.
Year 2-5
We essentially repeat this process each year, expanding from a cohort of 3-5 patient support groups in our pilot to 10-20 in year 2 and beyond.
Goal 3: Ensure healthy lives and promote well being for all. Data is knowledge, knowledge helps communities to make better health decisions. Our intervention for patient-driven data governance through social media can have a global impact if we successfully establish an independent oversight board to help patients serve as stewards of health data and content they collectively produce.
Goal 8: Decent work for all. Leaders and organizers on social media support groups for cancer and rare disease serve as unpaid content moderators without any support from health institutions or training to be effective data stewards. Some moderators work 20-30 hours per week while the data they produce can be monetized by platforms without consent from the group. For example, Facebook made over a billion in revenue targeting Pharma ads to consumers in 2020 without supporting or fairly compensating health group moderators, furthering economic disparities of these groups.
Goal 10: Reduced Inequalities: When patients cannot access healthcare services they turn to social media. Yet, the very mechanisms on social media for health support groups to engage can also be used to raise insurance rates, and deny jobs of group members without their knowledge or consent. This is a systemic problem when underserved communities participating in health groups for support are being exploited in aggregate through the data they produce.
- Nonprofit
Our team bios can be found here.
We are an interdisciplinary collective of patients, clinicians, policy experts, coders, cybersecurity experts, and researchers.
Full time staff: 2
Part Time staff: 4
Legal Counsel: 1
Pro Bono Advisors: 15
Volunteers: 50
We currently have had the leaders and organizers of communities on social media representing a patient population of over 190,000 people reach out with interest to join The Light Collective. Our challenge is funding and capacity to serve the needs of groups beyond our initial cohort from 2019.
The Light Collective uncovered a broader need to from BRCA community to other peer support groups who represent diverse health conditions ranging from rare disease, to Long Covid peer support groups have reached out to us with interest to participate in activities of The Light Collective.
Over the period of 2 years, The Light Collective has worked closely with their Community Advisors & Council of the Wise to determine a strategy for this project that will enable The Light Collective to establish a legal framework for patient communities to engage in collective negotiations.
Notably we have clinical mentors Bob Cook Deegan MD and former FDA CIO Eric Perakslis MD who have mentored and guided us.
Representation matters - and it is important that The Board of The Light Collective accurately reflects the constituency that we represent. Our board leadership is composed of:
- 28% African American Women
- 71% Women
- 14% LGBTQ
- 86% Rare or Unidentified Mutation Carriers
While many organizations think of diversity and inclusion as bringing in a patient advisory board to consult or advise, we are a working board of patients affected by the problem we seek to solve. With patients at the center of our leadership, we bring in expert advisors we trust to advise on our priorities and to help us build our capacity as an organization seeking to tackle complex and deeply technical problems.

Our ePatient Board was elected by The Light Collective's broader membership in 2020 (listed in blue). Our seven person board is composed entirely of patient leaders, five of whom are members of rare disease communities, including the team leader.
Our Board consists of a diverse group of ePatient leaders. While our ePatient Board serves as our Board leadership for The Light Collective, we are supported by "The Council of The Wise" which is an interdisciplinary group of coders, policy experts, clinicians, and healthcare informatics experts and also, as a matter of principal, work with a paid community advisory board on all of our projects.
- Individual consumers or stakeholders (B2C)
We'll start with why we're doing this work. Consider this opening quote from The Immortal Life of Henrietta Lacks:
“We must not see any person as an abstraction. Instead, we must see in every person a universe with its own secrets, with its own treasures, with its own sources of anguish, and with some measure of triumph.”
- Elie Wiesel
When we see someone as less than human or not real as a digital community of users, it's easier to do things that take advantage of that us. As users on social media, our communities are seen as "users" ... an abstraction. But to us, our data represents the suffering of our friends, our families and generations in our community. Our collective data must be treated with dignity and respect -- not abused, owned, or exploited.
We're very interested to tap into the mentorship that the MIT Solve community can provide. While we have had traction raising funding thus far through Robert Wood Johnson Foundation we will need to diversify to find funders.
From a financial perspective, we're submitting to this challenge to help cover our plans for expansion, and we are seeking funders/partners to help us expand beyond the first year.
- Human Capital (e.g. sourcing talent, board development, etc.)
- Business model (e.g. product-market fit, strategy & development)
- Financial (e.g. improving accounting practices, pitching to investors)
- Legal or Regulatory Matters
- Monitoring & Evaluation (e.g. collecting/using data, measuring impact)
- Product / Service Distribution (e.g. expanding client base)
- Technology (e.g. software or hardware, web development/design, data analysis, etc.)
It would be great to connect with similar resources at MIT that can offer mentorship and expert guidance to our community leaders in The Light Collective to contribute to our civic trust.
For example, early collaborations that have been fruitful for us so far is through leveraging open source resources developed by Keith Porcaro at Duke's Digital Governance Design Studio. We've worked closely Eric Perakslis, Chief Scientific Officer at DCRI to think about security cyber hygiene principles. Careset Systems has generously contributed hundreds of hours of pro bono technical expertise to help patients build capacity around digital literacy.
Another collaboration has been with Melissa Cline at UCSC, who we've worked with on patient-facing strategies for genomic and data literacy BRCA Exchange and BRCA Challenge. All are examples of projects where we begin to have an impact at a global scale through co-design and partnership, and thankful to stakeholders who have helped.
Our biggest hurdle right now is connecting with funders and institutional support that can scale beyond year 1 across a network of peer support groups and hospital systems.
Health Information Networks in TEFCA: Together with the Office of the National Coordinator for Health Information Technology (ONC), the Sequoia Project is building the Trusted Exchange Framework and Common Agreement (TEFCA) established by the 21st Century Cures Act. TEFCA will facilitate exchange of health information on a nationwide scale in the US, simplify connectivity among networks, and create efficiency by establishing a standardized approach to exchange policies and technical frameworks. We're seeking to learn how networks of organized patients can participate in the governance of this effort in order to scale.
Mentors at MIT: Any groups that would be interested to help us co-develop solutions within our civic trust.
Peer Support Groups on Social Media: Our primary constituency is support groups on social media for hereditary cancer and BRCA Community, expanding to networks of other groups in the rare diseases who emerged like the BRCA Community, and need resources and support as they adopt new technology.
Solvers: There are a range of Solvers in this challenge who would be interesting to partner with in order to co-design interventions, clinical trails, and studies. As a global community of BRCA mutation carriers on social media, we want to partner with solvers who meet our standards are willing formally negotiate with us within the civic trust model.

Co-Founder, BRCA Advocate, Security Researcher

Founder & CEO

Co-Founder, The Light Collective

Board Member, The Collective; Rare Disease Patient; Global PCORI Ambassador; Cybersecurity and Patient Data Governance Researcher; Board President, the American Living Organ Donor Fund