Data Archiving & Permissions
Policies, best practices, and resources for responsibly sharing cancer genomics and biomarker data with the Journal of Cancer Genetics and Biomarkers (JCGB).
Championing Open, Ethical, and Reusable Data
JCGB is committed to accelerating precision oncology through transparent, FAIR-compliant data sharing. This page outlines how to prepare, deposit, and license the datasets, code, and metadata that underpin your manuscript. Following these guidelines maximises reproducibility, fosters collaboration, and honours the trust patients and communities place in biomedical research.
- Transparency: Readers must be able to trace results back to raw or processed data and analytical code.
- Reusability: Data should be deposited with clear licenses, metadata, and formats that enable integration and re-analysis.
- Respect: Sensitive datasets require governance aligned with participant consent, community expectations, and legal frameworks.
- Longevity: Use stable repositories that offer persistent identifiers (DOIs, accession numbers) and long-term preservation guarantees.
Choose repositories according to data type, access requirements, and compliance mandates.
Genomic & Transcriptomic Data
- European Genome-phenome Archive (EGA)
- NCBI Sequence Read Archive (SRA)
- Gene Expression Omnibus (GEO)
- dbGaP for controlled-access human genetics
Proteomics & Metabolomics
- PRIDE (Proteomics Identifications Database)
- MassIVE
- Metabolomics Workbench
Imaging & Radiomics
- The Cancer Imaging Archive (TCIA)
- BioImage Archive
- PhysioNet for physiological signals
Clinical & Real-World Data
- Vivli or Project Data Sphere for clinical trials
- HealthData.gov for U.S. public datasets
- Institutional data enclaves with controlled access
Multi-Omics & Integrative Studies
- Zenodo or Figshare for curated multi-modal packages
- Synapse for collaborative consortia projects
- Open Science Framework (OSF) for protocol registries
Code & Computational Pipelines
- GitHub/GitLab with version tagging
- Zenodo integration to mint DOIs
- Dockstore for containerised workflows
- Study overview (objectives, design, cohort characteristics)
- Data dictionary describing variables, units, and codes
- File structure map indicating directories, naming conventions, and dependencies
- Detailed methods or laboratory protocols (include reagents, instruments, software)
- Quality control procedures and filtering rules
- Version history documenting updates or corrections
- Use community schemas (MIAME for microarrays, MINSEQE for RNA-seq, HUPO-PSI for proteomics).
- Include controlled vocabulary terms (MeSH, HPO, ICD-10) for disease classification.
- Provide sample-level metadata (age, sex, ethnicity, tumour type, staging) with anonymisation.
- Document data processing pipelines, software versions, and parameter settings.
JCGB encourages open licenses while recognising the need for controlled access in sensitive contexts.
License | Use Case | Notes |
---|---|---|
CC0 / Public Domain | Non-identifiable datasets, benchmark resources | Maximises reuse; cite original creators to maintain credit |
CC BY 4.0 | General-purpose sharing with attribution | Recommended for most JCGB datasets |
CC BY-NC | Restrict commercial use of sensitive datasets | Ensure “non-commercial” aligns with funder policies |
DUO (Data Use Ontology) | Controlled-access human genomic data | Specify consent-based restrictions (e.g., disease-specific research) |
- Controlled Access: Deposit in repositories offering access committees (EGA, dbGaP, controlled Synapse projects). Provide Data Use Agreements (DUAs).
- De-identification: Remove direct identifiers, convert dates to offsets relative to diagnosis, and aggregate geolocation data.
- Genomic Sovereignty: For indigenous or marginalised communities, align with community-specific governance, benefit-sharing agreements, and indigenous data frameworks (CARE principles).
- Third-Party Data: Obtain permissions from original custodians; document agreement terms in the manuscript.
Include a dedicated section in the manuscript after acknowledgments. Examples:
- “Whole-genome sequencing data are available in the European Genome-phenome Archive (EGA) under accession EGAS00001008721. Access requests can be submitted through the EGA Data Access Committee.”
- “Metabolomic peak tables associated with this study are deposited in Metabolomics Workbench (ST002345) under CC BY 4.0 license.”
- “Due to participant confidentiality agreements, de-identified clinical data are available upon request to the corresponding author and subject to institutional review.”
- Host code in public repositories with version control.
- Tag releases corresponding to the manuscript and archive them via Zenodo for DOI generation.
- Provide README files with setup instructions, dependencies, and example data.
- Use containerisation (Docker, Singularity) or workflow description languages (CWL, Nextflow, WDL) for complex pipelines.
- Indicate licensing (MIT, Apache 2.0, GPL) to clarify reuse rights.
- Compress large datasets (tar.gz) and split into manageable segments (≤5 GB) where repository limits apply.
- Provide checksum files (MD5, SHA256) to verify integrity.
- For extremely large data (petabyte-scale imaging), coordinate with JCGB to arrange cloud-based sharing or institutional hosting.
- JCGB permits repository embargoes aligned with journal publication. Specify embargo end date in the metadata.
- If presenting at conferences, ensure repository links honour embargo deadlines and note them in the cover letter.
- JCGB supports preprints (bioRxiv, medRxiv). Link datasets upon preprint release for transparency.
- During peer review, JCGB may request reviewer access to datasets or code. Provide temporary credentials where necessary.
- Post-acceptance, JCGB verifies repository accessibility, metadata completeness, and licensing.
- Failure to provide verifiable data may delay publication or result in rejection.
- Use repository versioning to issue updates or corrections.
- Notify JCGB of significant changes so errata or addenda can be published.
- Document version history in README files and repository metadata.
FAIR Principles
Understand frameworks for Findable, Accessible, Interoperable, Reusable data.
Visit GO FAIRCARE Principles
Respect collective benefit, authority, responsibility, and ethics for indigenous data.
Explore CAREData Management Plan Templates
Download DMP templates aligned with NIH, Horizon Europe, and Wellcome requirements.
Access DMPToolWhat if my dataset exceeds repository limits?
Contact the repository support team for expansion options or use institutional cloud services. Inform JCGB so we can document the approach in your data availability statement.
Can I restrict commercial reuse?
Yes. Use licenses such as CC BY-NC when consistent with funder mandates. Clarify restrictions in metadata and the manuscript.
How do I handle multi-institutional permissions?
Coordinate Data Transfer Agreements (DTAs) among partner institutions early. Provide JCGB with copies or summaries in supplementary materials.
What about legacy data without consent for sharing?
Describe the limitations and provide summary statistics. Seek IRB guidance on re-consenting or de-identifying. JCGB will consider justified exceptions with clear explanations.
Complete this checklist before final submission to ensure your archiving plan meets JCGB’s standards and legal obligations:
- Consent Alignment: Confirm that participant information sheets and consent forms explicitly cover the level of data sharing you intend (open, controlled, summary-only). Document any deviations and secure re-consent or waivers when necessary.
- Institutional Approvals: Obtain a letter or email confirmation from your institutional data governance office or privacy board approving the external release of data. Store the approval in project records and reference it in the manuscript if required.
- Data Transfer Agreements: Execute DTAs among collaborating institutions, specifying permitted uses, security obligations, breach response procedures, and data retention timelines.
- Security Architecture: Encrypt datasets prior to upload, enforce multi-factor authentication, and maintain access logs for controlled repositories. For cloud-hosted data, comply with ISO 27001 or equivalent standards.
- Retention & Destruction Policy: Define how long raw and processed data will be retained post-publication and outline procedures for secure destruction when retention periods end.
- Community Engagement: For indigenous or community-led research, document engagement sessions, co-authorship agreements, and benefit-sharing commitments in alignment with the CARE principles.
- Machine-Readable Metadata: Validate JSON, XML, or CSV metadata files against repository schemas. Ensure ontology tags, units, and identifiers are accurate and consistent across files.
- Emergency Contacts: Provide repositories with a generic institutional email (e.g., [email protected]) to ensure continuity if personnel change.
Need Help with Data Archiving?
Email [email protected] for repository recommendations, metadata templates, or compliance checklists. JCGB’s data editors are ready to support you.
Last updated: September 2025. JCGB reviews data policies annually to align with evolving legal, ethical, and technological standards.