These guidelines ensure consistent application of data versioning criteria to changes in datasets, data files, and data documentation in the Ag Data Commons.
Version Policy Guidelines
Data producers alter a significant percentage of data submitted to the Ag Data Commons within the first year of publication, or on a regularly occurring basis in conjunction with updates in data reports. Data producers or the Ag Data Commons administrators may initiate these changes. Data users may also submit notifications of changes in data through the National Agricultural Library contact form. The Ag Data Commons has adopted a set of criteria to distinguish between significant and minor changes to a dataset to determine when to create a new dataset version, as well as new dataset version protocol.
These guidelines ensure consistent application of data versioning criteria to changes in datasets, data files, and data documentation (including correction for error, amendments, additional variables, changes in access conditions, format changes) for inclusion in the Ag Data Commons. This will often involve collaboration between the Ag Data Commons administrators and the data producers.
The Ag Data Commons recommends new versions of datasets for significant changes to the dataset or data resource files contained within.
Significant changes comprise those that will result in a high impact on the use or interpretation of the data, whereas minor changes comprise those that will have a low impact in relation to interpretation or use for research purposes.
Significant changes include addition of new variables, incorrect data supplied, miscoding of data, formatting changes, substantial documentation changes, changes in access conditions, changes in authors or primary investigators, and withdrawal of data elements or documentation files. Changes of this nature warrant a new dataset version.
Users may make minor changes such as small changes in variable labels, spelling corrections in metadata, or minor changes in documentation to relevant content in existing datasets. The updated datasets follow the same review process as a newly created dataset prior to publishing any changes. Users should not create new dataset versions for minor changes to data or datasets.
Give the dataset and resource files unique names when creating new versions. The Ag Data Commons does not require a specific naming protocol, but asks data creators to follow the general advice in the Submission Manual for filling out the Title field. In addition, consider the following version naming best practices:
- Include a version number in the Title field, e.g "v1," "v2," or "v2.1".
- Numbering protocol should follow a logical and consistent sequence for the entire data series or program.
- The title should remain identical (aside from the version number) for all versions of a particular dataset.
Resource File Names:
Keep file titles consistent from version to version while keeping the following in mind:
- Major version updates often receive whole numbers (1.0, 2.0, 3.0, etc.) and minor version / draft updates receive decimal numbers (1.1, 1.2, etc.).
- Helpful information to add to a file name includes project or experiment name or acronym, location/spatial coordinates, researcher name/initials, date or date range of experiment, type of data, conditions, and version number of file
- If data files are added to a dataset over time as part of a regularly scheduled update, such as quarterly or monthly, consider titling them as such (Q1, Q2; Jan, Feb; etc.). Outline the update schedule in the dataset Description and note whether the data is unique for each file or cumulative with each new file added (the newest file being the most up to date data).
- Avoid confusing labels such as “revision”, “final”, “final 2”, etc. in the file name
- Follow this advice for both the File Name field as well as the filename of the uploaded resource.
Linkage and Distinctions
When creating a new version of a dataset, record the changes in the dataset and/or resource files. The following standards / actions apply:
- Include information about the changes to the new dataset version (e.g. "normalized", new processes implemented, temporal or geographic limits extended, new authors added, etc.) in the Description field as well as in the appropriate information fields of the dataset.
- Clone the most recent dataset version in the series when creating an updated version. This will keep primary field content identical between version. Add updated description information and upload a new data resource file.
- The new version of a dataset should link to the previous version(s) through the “Related Content” - “Related To” field. This link happens automatically through cloning the original dataset version, or by manually linking the old version(s) to the new. Responsibility for ensuring links to all appropriate content and to any new dataset versions lies with the dataset creator.
- The Ag Data Commons administrators issue a unique DOI (digital object identifier) to every approved and published dataset, including new versions of a dataset, according to the outlined DOI Policy.
Entities affected by this policy
Ag Data Commons content authors, data users, data curators, and data administrators.