Journal of Big Data Research

Journal of Big Data Research

Journal of Big Data Research – Data Archiving Permissions

Open Access & Peer-Reviewed

Submit Manuscript

Data Archiving and Permissions

Promoting Transparency, Reproducibility, and Open Science

JBR's Commitment to Open Science

Journal of Big Data Research (JBR) strongly supports the principles of open science, research transparency, and reproducibility. As a journal focused on data-driven research, we recognize that data sharing is fundamental to validating findings, enabling replication, and accelerating scientific progress in big data analytics, machine learning, and artificial intelligence.

JBR encourages all authors to make their research data, code, algorithms, and materials openly available whenever possible, while respecting ethical constraints, privacy regulations, intellectual property rights, and commercial sensitivities. This policy aligns with international best practices and funder mandates for data transparency.

By promoting data accessibility, JBR aims to enhance the credibility of published research, facilitate meta-analyses and systematic reviews, support education and training, and maximize the societal impact of big data science.

Data Availability Requirements

All manuscripts submitted to JBR must include a Data Availability Statement describing how readers can access the data underlying the study's findings. This statement should be included as a separate section before the References.

Acceptable Data Availability Statements

✓ Data Publicly Available

Recommended approach. Data deposited in a recognized public repository with persistent identifier (DOI, accession number).

Example: "The datasets generated and analyzed during the current study are available in the Zenodo repository: [DOI link]. Source code is available on GitHub: [repository URL]."

✓ Data Available on Request

Data available from corresponding author upon reasonable request, with clear contact information.

Example: "The datasets used in this study are available from the corresponding author upon reasonable request. Contact: [email address]."

✓ Data Included in Manuscript

All data presented in tables, figures, or supplementary files within the publication.

Example: "All data generated or analyzed during this study are included in this published article and its supplementary information files."

✓ Data Not Publicly Available

Data cannot be shared due to privacy, ethical, legal, or commercial restrictions. Explanation required.

Example: "Data cannot be shared publicly due to patient privacy restrictions under HIPAA regulations. De-identified data may be available to qualified researchers upon approval from the institutional review board."

Important: If data cannot be shared, authors must provide clear justification based on ethical, legal, privacy, or commercial grounds. Vague statements like "data available on request" without contact information are insufficient.

Recommended Data Repositories

JBR recommends depositing research data, code, and supplementary materials in established, domain-specific repositories that provide persistent identifiers and long-term preservation:

General Purpose Repositories

  • Zenodo - Multidisciplinary, DOI assignment
  • Figshare - All file types, visualization support
  • Dryad - Curated datasets with peer review
  • Mendeley Data - Integrated with reference manager
  • OSF (Open Science Framework) - Project management + data

Code Repositories

  • GitHub - Version control, collaboration
  • GitLab - Private/public repositories
  • Bitbucket - Academic plans available
  • Code Ocean - Computational reproducibility
  • Software Heritage - Long-term archiving

Domain-Specific Repositories

  • UCI ML Repository - Machine learning datasets
  • Kaggle Datasets - Competition and public data
  • GenBank - Biological sequence data
  • IEEE DataPort - Engineering datasets
  • arXiv - Preprints with supplementary data

Institutional Repositories

  • University library repositories
  • Research center data archives
  • National/international data centers
  • Subject-specific consortium repositories

Repository Selection: Choose repositories that provide DOIs or persistent URLs, offer long-term preservation (minimum 10 years), allow unrestricted access (or controlled access with clear procedures), and are recognized in your field. Include repository links and accession numbers in your Data Availability Statement.

Code and Algorithm Sharing

For research involving computational methods, algorithms, machine learning models, or data analysis pipelines, JBR strongly encourages code sharing to enable reproducibility and facilitate adoption by the research community:

Code Sharing Best Practices

  • Version control: Use Git-based platforms (GitHub, GitLab) for code management
  • Documentation: Include README files with installation instructions, dependencies, and usage examples
  • Licensing: Apply appropriate open-source licenses (MIT, Apache 2.0, GPL) or specify restrictions
  • Environment specification: Provide requirements.txt, environment.yml, or Docker containers
  • Reproducible workflows: Include scripts that reproduce key results from raw data
  • Version tagging: Tag repository version corresponding to published manuscript
  • DOI assignment: Use Zenodo integration with GitHub to assign DOIs to code releases

What Code Should Be Shared?

  • Novel algorithms and methods described in the manuscript
  • Data preprocessing and cleaning scripts
  • Statistical analysis code and model training procedures
  • Visualization scripts used to generate figures
  • Trained model weights and hyperparameters (if computationally feasible)
  • Benchmark comparison code and evaluation scripts

Proprietary Code: If commercial software or proprietary algorithms are used, clearly state this in your methods section and provide sufficient algorithmic description to enable understanding, even if source code cannot be shared.

Data Formats and Standards

To maximize data usability and long-term accessibility, use standardized, non-proprietary file formats whenever possible:

Tabular Data

Preferred: CSV, TSV, JSON
Acceptable: Excel (.xlsx), HDF5
Include: Variable definitions, units, missing data codes

Images & Figures

Preferred: PNG, TIFF, SVG
Acceptable: JPEG, PDF
Resolution: Minimum 300 DPI for publication quality

Code & Scripts

Preferred: Plain text (.py, .R, .m, .java)
Include: Comments, documentation, example usage
Specify: Language version, dependencies

Specialized Formats

Domain-specific: NetCDF, GeoTIFF, FASTA
ML models: ONNX, pickle, TensorFlow SavedModel
Include: Format documentation and reading instructions

Ethical Considerations and Privacy Protection

While JBR promotes data sharing, we recognize that certain types of data cannot be publicly released due to privacy, ethical, or legal constraints:

Human Subject Data

For research involving human participants, data must be de-identified and anonymized before sharing. Remove or aggregate all personally identifiable information (names, addresses, phone numbers, dates of birth, medical record numbers, etc.). If complete anonymization prevents meaningful analysis, consider controlled access repositories that require data use agreements. Always comply with informed consent provisions and institutional review board requirements.

Proprietary and Commercial Data

If data are obtained from commercial providers, under non-disclosure agreements, or subject to intellectual property restrictions, clearly state these limitations in your Data Availability Statement. Provide information about how qualified researchers can request access through appropriate channels or licensing agreements.

Third-Party Data

When using publicly available datasets from external sources, provide complete citation information, access URLs, and version numbers. Describe any preprocessing or transformations applied. If you created derivative datasets, clarify whether the original data source's license permits redistribution of modified versions.

National Security and Sensitive Data

Research involving classified information, critical infrastructure data, or information with national security implications may have legitimate restrictions on data sharing. Provide as much methodological detail as possible to enable assessment of research validity while respecting security requirements.

Permissions for Third-Party Materials

When including materials created by others (figures, tables, text excerpts, photographs), you must obtain and document appropriate permissions:

When Permissions Are Required

  • Copyrighted figures/tables: Reproducing or adapting content from other publications
  • Photographs: Images of identifiable individuals or copyrighted artworks
  • Extensive text quotations: Quoting more than 250-400 words from single source
  • Unpublished data: Using datasets or results from collaborators or other researchers
  • Proprietary materials: Company logos, trademarks, commercial datasets

When Permissions Are NOT Required

  • Open Access articles: Content published under CC BY, CC BY-SA, or similar permissive licenses (proper attribution still required)
  • Public domain: Works with expired copyright or explicitly dedicated to public domain
  • Your own work: Materials from your previous publications (if you retained copyright or license permits reuse)
  • Fair use: Limited quotations for criticism, commentary, or scholarly analysis (jurisdiction-dependent)
  • Government publications: Public data and documents from government agencies (varies by country)

How to Obtain Permissions

  1. Contact the copyright holder (original publisher, author, or rights owner)
  2. Specify exactly what you want to reuse (figure number, table, text excerpt)
  3. Explain how it will be used (in your JBR manuscript)
  4. Request permission for use in an open access publication under CC BY 4.0 license
  5. Obtain written confirmation (email is acceptable)
  6. Include permission statement in figure/table caption
  7. Provide copies of permission correspondence if requested during peer review

Permission Statement Example

"Figure 3: Adapted from Smith et al. (2022) with permission from IEEE. Original figure copyright © 2022 IEEE."

FAIR Data Principles

JBR encourages authors to follow FAIR (Findable, Accessible, Interoperable, Reusable) data principles:

Findable

  • Assign persistent identifiers (DOIs)
  • Provide rich metadata
  • Register in searchable repositories
  • Use descriptive file names

Accessible

  • Use standard protocols (HTTP, FTP)
  • Ensure long-term availability
  • Provide authentication when needed
  • Metadata accessible even if data restricted

Interoperable

  • Use standardized formats (CSV, JSON, XML)
  • Apply controlled vocabularies
  • Include format documentation
  • Enable integration with other data

Reusable

  • Provide clear licensing (CC BY, CC0)
  • Include provenance information
  • Meet community standards
  • Document processing steps

Questions About Data Sharing?

Our editorial team can help you determine appropriate data sharing approaches for your research, identify suitable repositories, and navigate ethical or legal constraints.