These guidelines ensure that data versioning criteria are consistently applied to changes in datasets, data files, and data documentation in the Ag Data Commons.
Version Policy Guidelines
A significant percentage of data submitted to the Ag Data Commons are altered within the first year of being published, or on a regularly occurring basis as data reports are updated. These changes may be initiated by the data producer or the Ag Data Commons administrators. Notifications of changes in data can also be submitted by data users through the contact email listed at the end of this document. The Ag Data Commons has adopted a set of criteria which distinguish between significant and minor changes to a dataset to determine whether a new dataset version is merited, as well as a protocol for creating new dataset versions.
These guidelines ensure that data versioning criteria are consistently applied to changes in datasets, data files, and data documentation (including correction for error, amendments, additional variables, changes in access conditions, format changes) for inclusion in the Ag Data Commons. This will often involve collaboration between the Ag Data Commons administrators and the data producers.
The Ag Data Commons recommends new versions of datasets for significant changes to the dataset or data resource files contained within.
Significant changes are those that will have a high impact on the use or interpretation of the data, whereby minor changes are those that will have a low impact in relation to interpretation or use for research purposes.
Significant changes include addition of new variables, incorrect data supplied, miscoding of data, formatting changes, substantial documentation changes, changes in access conditions, changes in authors or primary investigators, and withdrawal of data elements or documentation files. Changes of this nature warrant a new dataset version.
Minor changes such as small changes in variable labels, spelling corrections in metadata, or minor changes in documentation can be made to relevant content in existing datasets. The updated datasets will follow the same review process as a newly created dataset prior to publishing any changes. New dataset versions will not be created for minor changes to data or datasets.
Give the dataset and resource files unique names when creating new versions. The Ag Data Commons does not require a specific naming protocol, but asks data creators to follow the general advice in the Submission Manual for filling out the Title field. In addition, consider the following version naming best practices:
- Include a version number in the Title field, e.g "v1," "v2," or "v2.1".
- Numbering protocol should follow a logical and consistent sequence for the entire data series or program.
- The title should remain identical (aside from the version number) for all versions of a particular dataset.
Resource File Names:
- Keep file titles consistent from version to version while keeping the following in mind:
- Major version updates often receive whole numbers (1.0, 2.0, 3.0, etc.) and minor version / draft updates receive decimal numbers (1.1, 1.2, etc.).
- Helpful information to add to a file name includes project or experiment name or acronym, location/spatial coordinates, researcher name/initials, date or date range of experiment, type of data, conditions, and version number of file
- If data files are added to a dataset over time as part of a regularly scheduled update, such as quarterly or monthly, consider titling them as such (Q1, Q2; Jan, Feb; etc.). Outline the update schedule in the dataset Description and note whether the data is unique for each file or cumulative with each new file added (the newest file being the most up to date data).
- Avoid confusing labels such as “revision”, “final”, “final 2”, etc. in the file name
- Follow this advice for both the File Name field as well as the filename of the uploaded resource.
Linkage and Distinctions
When creating a new version of a dataset, record how the dataset and/or resource files have been changed. The following standards / actions will apply:
- Include information about what changes were made in the new dataset version (e.g. "normalized", new processes implemented, temporal or geographic limits extended, new authors added, etc.) in the Description field as well as in the appropriate information fields of the dataset.
- The Ag Data Commons administrators will issue a unique DOI (digital object identifier) to every approved and published dataset, including new versions of a dataset.
- The most recent dataset version in the series should be cloned when creating an updated version. This will keep primary field content identical between version. Updated description information can then be added and new data resource files can be uploaded.
- The new version of a dataset should link to the previous version(s) through the “Related Content” - “Related To” field. This link can be accomplished automatically through cloning the original dataset version, or by manually linking the old version(s) to the new. The dataset creator is responsible for ensuring all appropriate content is linked to any new dataset versions.
Entities affected by this policy
Ag Data Commons content authors, data users, data curators, and data administrators.
If you have questions about specific issues regarding the Ag Data Commons Version Policy, please contact firstname.lastname@example.org