Category: Study Design
Labels: rationale, patient identification, data protection
Related questions: NA
Creation date and Author: Maria G., Feb 7th 2025
Last Update and Author: Astrid C., June 12th 2025
The ROSALIND Study aims to uncover biological signatures associated with exceptional long-term survival in aggressive cancers.
The study relies on pseudonymized data: each center is responsible for pseudonymizing any patient Personal Data before uploading it to the Platform. Every patient is assigned a unique code, and only the center maintains the key linking this code to the patient's identifying information. CURE51 commits to never attempting to re-identify any patient.
The Personal Health Information (PHI), according to the protocol, prior to pseudonymization is essential for patient selection, case-control matching, and data integration and is in compliance with the proportionality principle.
The Rosalind Study protocol was developed in collaboration with a scientific committee, ensuring that the data collected is strictly limited to what is necessary to achieve the study’s objectives, adhering to the proportionality principle and GDPR compliance.
1. PHI is Necessary for Patient Identification and Cohort Selection
- The study seeks to identify patients diagnosed within the last 20 years, requiring access to historical medical records to determine survival status and eligibility.
- CURE 51 relies on study sites to identify long-term survivors (cases) and matched controls. This classification cannot be done with de-identified data — it requires clinical history, survival duration, and treatment outcomes, all of which are PHI at the time of collection.
- Once eligibility is confirmed, PHI is pseudonymized, meaning patient identifiers are replaced with unique codes.
2. PHI is Required for Case-Control Matching
- The study depends on a case-control design, where each long-term survivor is matched to a control patient with typical survival outcomes.
- Matching requires detailed medical data, such as age, overall survival, treatment history etc.
- This level of detail remains PHI until pseudonymization occurs, ensuring precise comparisons between cases and controls.
3. PHI is Needed to Link Clinical and Molecular Data
- The ROSALIND Study integrates multi-omic analyses with clinical data.
- To correlate biological signatures with survival outcomes, each tumor sample must be linked to treatment history, disease progression, and patient outcomes.
- Since de-identified data would not allow these connections, pseudonymization ensures that critical relationships between clinical and molecular data are preserved while protecting patient identities.