This submission manual provides practical information for contributors to the Ag Data Commons data repository and registry. The manual can be accessed as a single, printer-friendly document, or in chapter-by-chapter view for convenient on-screen viewing:
User Edition v1.3
Prepared March 2016
Knowledge Services Division | National Agricultural Library
10301 Baltimore Ave | Room 207
Beltsville, Maryland 20705
Contact: NAL-ADC-Curator@ars.usda.gov
Research datasets and tools that relate to those datasets have value of their own, regardless of whether they are associated with published peer-reviewed literature. This means they can have a different set of authors or publishers, their own digital object identifier (DOI), and their own unique description. The presence or absence of peer-reviewed papers that describe them does not affect a dataset’s qualification to be hosted on Ag Data Commons (henceforth referred to as ADC).
On the ADC, a “dataset” refers to the entire metadata record for one or more resources on the ADC; the dataset refers to information entered on the “Edit dataset” page. A “resource” refers to the data files you upload or link to on the “Add data” page. This may be a spreadsheet, database, photographic collection, software tool, or other resource that you wish to make available for the purpose of advancing scientific knowledge. Again, multiple resources may be included as part of a single dataset, depending on how you wish to define and describe your data.
Before proceeding, determine whether your data is suitable for inclusion on the Ag Data Commons.
Was the research behind your data all or partially funded by the USDA?
a. If YES, continue.
b. If NO, your data is unsuitable for the ADC at this time. As the ADC grows, we may consider expanding to include data with other sources of funding.
Is your data able to be freely distributed and re-used by others?
a. If YES, continue.
b. If NO, your data is unsuitable for the ADC at this time. We host and support open data to facilitate sharing and reuse.
c. If you are unsure, please see the section on determining licenses on the Description of fields on the "Edit Dataset" page.
The Ag Data Commons requires inclusion of the full and complete data cited in papers or used to create any charts, graphs, tables, and so on. Do you have the ability to publish the full, complete data, or link to the complete data if it is published elsewhere?
a. If YES, continue.
b. If NO, your data is unsuitable for the ADC at the time. We gladly accept supplementary materials resulting from the data (i.e. tables, charts, figures), but in order to comply with the ADC mission, we also need access to the full original data that the figures were derived from.
c. If NO, but data will be available in the future: If you would like to create a public record as a placeholder before the data is made publicly available, the ADC does allow an embargo to be placed on a data resource for up to 3 years if it is not immediately available to be viewed publicly. See “Scheduling Options” on the Description of fields on “Add Data" page” for more information.
Video tutorial: Register for an account on the Ag Data Commons
Select “Register” at the upper right corner of the ADC home page.
Enter your credentials, and select “Create new account” at the bottom of the page. An ADC curator will be notified of your account request.
Once your account has been approved (which will usually be within 5 business days), you will receive an email with details on how to complete your account registration. You must complete the registration process before you can submit data.
Decide how you want to organize your data: Do you have multiple unique datasets or only one dataset containing multiple files of data? As the data submitter, you are best suited to make this decision. Try to imagine yourself as a user of your data, and ask yourself what the most useful grouping would be.
You would create distinct ADC records if:
There are enough differences among your data files that they require unique descriptions of methods and data dictionaries (i.e. lists of the measurements included in the files)
The datasets will likely be reused independently, and should have unique citations
The authorship varies from dataset to dataset
You would keep all the data under a single record if:
It would be confusing and redundant to generate several unique ADC records where one will suffice
Video tutorial: Create a dataset on the Ag Data Commons
The process to add a new dataset record to the ADC consists of two main forms: the "Create Dataset" form, where you will add details about the overall dataset, and the "Add data" form, where you will upload and describe one or more specific data resources that you choose to include with your dataset entry. To create a new dataset record, use the following steps as a guide:
Select “Log in” at the upper right corner of the ADC home page if you are not already logged into the ADC.
Select “Submit Dataset” at the upper right of the screen. Now you should be on the “Create Dataset” form.
Note that the gray rows toward the end of the page (such as “Purpose and Methods” and “Temporal Information”) are expandable. Select each header to view the full range of fields ADC has for describing your data.
Video tutorial: Add a data resource to your dataset on the Ag Data Commons
If you choose the latter, you should be on the “Add data” form, and are ready to submit your data files and resources.
Upload or link to your data files and any related content. Make sure they are named and formatted the way in which you want them to appear. See the Description of fields on "Add Data" page for detailed explanations of each field. See the Guideline for remote data resources and related content page of this manual for more information on appropriate resource files to upload or link.
If you are only adding one resource at this time, when finished, select “Save” at the bottom of the page to save your changes. If you finish adding that resource but want to immediately begin adding another resource, when finished, select “Save and add another” at the bottom of the page to save your changes and add an additional resource.
You will repeat this "add data" process and create an additional resource for each data file you upload or each API you link. You can create multiple data resources within each dataset, allowing you to describe each resource more accurately. If you would like to include multiple files in a single downloadable resource, you may upload a single zip file containing multiple data files. If you choose this method of adding your data, please describe the contents of the zip file(s) thoroughly since users will not be able to preview zipped files to the same degree as individual files.
Remember to upload your data dictionary as a resource, and check the box titled “Make this resource the data dictionary” near the center of the page. See our Data Dictionary Guidelines toward the end of this document for more information on preparing data dictionaries.
You can always return to your dataset to add data and other resources at a later time by using the "+ Add Resource" button at the top left of the dataset page.
At this point, you should also decide whether you want your data resource(s) to be published immediately, or if you would like to place an embargo period on the data. An embargo would create a placeholder for your data resource, which will automatically be published when the embargo expires. We allow for resources to be embargoed for up to 3 years. See “Scheduling Options” on the Description of fields on “Add Data” page for more information.
To embargo your data:
Create a new resource for your dataset (see instructions above) and enter a title and description of the data you expect to publish. Upload a data file to accompany this resource record.
Scroll to the bottom of the page above the "Save" button and click "Scheduling options"
Enter the date you would like to make your data publicly available and a reason for the embargo period
When the dataset is published a message regarding the embargo will appear to anyone interested in your data
When you view your saved datasets/resources on the ADC Datasets page(s), they will be highlighted in pink. This indicates they are unpublished, nothing else (see example of an unpublished dataset, Transforming Drainage, in the image below).
Only you can view unpublished datasets you have created while logged into your user account. In order for your data to be viewed by everyone, you must first submit the dataset for review.
Ensure that remotely stored and linked resources are indeed data and not just links to web pages or articles. The Ag Data Commons can accommodate many types of related and linked data, but it is important to include each type of information in the appropriate fields. Use the following guideline to determine where information should be included.
Data should be included as a Resource. Data are materials like tabular data, tools, images, etc. that the user can download and use.
Do not include previously published data as direct upload resource. Link to the published version of that data instead. DOIs will not be issued by the Ag Data Commons unless data is uploaded locally and originally published by the Ag Data Commons.
Any remote materials that are not data can be included in the appropriate category:
Articles should be included in the Citations field (Primary, Related, Methods).
URLs should be included in the Related Content field.
If a piece of supplemental material is not data but is a direct upload, verify that it belongs with and adds value to the data. As long as the item is not published elsewhere, it can be added as a resource if it is a suitable supplement to the data.
Video tutorial: Clone a dataset on the Ag Data Commons
There are many cases where a content author wants to submit an updated version of a dataset already in the Ag Data Commons, or wants to upload a similar record to one that has already been created. If many similarities exist between an existing record and one not yet created, content authors can choose to clone one of their existing datasets as opposed to creating the dataset from scratch. As a new version, the cloned dataset has slight differences in metadata and will receive a new DOI if applicable, but most of the metadata between the original and new datasets is the same.
Log in with your user account
Navigate to an existing dataset you have created whose metadata is almost exactly the same as the dataset you want to create
Click on the “Clone Dataset” button at the top of the record
You are now on the confirmation page. Click on the “Clone” button to create the new dataset
A new dataset is created pre-populated with the metadata from the original dataset, but not including any resources
You can then change any of the fields for the newly created dataset
The new dataset automatically includes a pointer to the old dataset
Video tutorial: Submit your dataset draft for review on the Ag Data Commons
Once you have finished editing your dataset record and uploaded or linked all your data and resources, you must save and submit the dataset and data resource drafts for review so that they can be approved and published. Note that each dataset AND data resource you create must be submitted for review separately. You can submit your datasets and data resources for review in one of two ways.
This method makes the most sense if you want to submit multiple datasets for review at a time, or are submitting a dataset that does not need further editing.
To submit a single dataset, find the dataset in this list that you wish to submit for review, and in the right hand column click the “Submit for Review” button
To choose more than one dataset to submit for review, click the checkbox to the left of all datasets you would like to submit, and then click the “Submit for Review” button at the top of the list
To submit all of your drafts for review at once, click the “Select all items on this page” and when a checkbox appears next to every dataset, click the “Submit for Review” button at the top of the list
Or...
This method makes the most sense if you are submitting a dataset for review immediately after you finish editing it. Note, the dataset or data resource draft must first be saved at least once in order to submit it for review.
Click on the dataset you would like to edit and submit for review (either from the “Datasets” page view or “My Workbench” view)
Edit the dataset as needed
When finished editing, click on either "Revision information" / “Moderation State” at the bottom of the page above the “Save” button, or on “Moderate” in the menu bar at the top of the page
In the dropdown menu, select the moderation state “Needs Review”
Click “Save” or “Apply”, respectively, to move the dataset into the queue for moderation
An ADC curator will now review your dataset and either approve it for publication or inform you of any changes that must be made prior to publishing. You will receive an email notification when the status of your dataset changes in any way. The ADC reserves the right to refuse publication of a dataset for any reason.
Once you submit your dataset for review, an ADC curator will be notified. She will:
Review your metadata and contact you if further information is needed to ensure the data is sufficiently described
Add ADC and National Agricultural Library Thesaurus (NALT) keywords to enhance retrievability, making your dataset easier to find in the repository
Obtain a DOI if your dataset does not already have one
Contact you once your data goes live on ADC so that you can review and approve the final record
If you need to add data or other resources to your dataset record at a later date, you can return to your record in the future and select “Add Resource” at the top of the Dataset page. To edit datasets or resources at a later date, select on the dataset or resource title, then select “Edit” at the top of the page.
Note that you can always edit your dataset, even after you submit it for review and it is in either the Needs Review or Needs Supervisor Review boxes, as well as after it is published. Just remember to re-submit the dataset or data resource for review when you are finished editing it so the changes can be approved and published.
Fields marked with an asterisk are required. All other fields are optional. If N/A, leave the field blank.
Video tutorial: Create a dataset on the Ag Data Commons
Title* : Enter a descriptive dataset title as you want it to appear; include dates, locations, and specific metrics that make your dataset unique. If the data is from a Primary Article Citation (see below), use the naming convention "Data from: title of article"
Description* : A rich free text description that provides as much explanation as possible about the dataset: how and why it was generated, and how it should (or should not) be used. This can be modified from article text (e.g. Abstract, Methods, Objectives), but should focus on characterizing the data, not the journal article.
Please provide explanations for all acronyms and abbreviations. Get more guidance on filling out the Description field.
Summary: A shorter description of the dataset, usually no more than a sentence or two. This information will appear in the main dataset list as a teaser line under the Title to briefly communicate the contents and purpose of your dataset. (Note: You can toggle between the Summary and Description boxes by clicking the "Edit Summary" and "Hide Summary" links - both boxes will not be visible in the submission form at the same time.)
Author *:
Name *: Enter the Last name, First name, Middle initial followed by a period (e.g. Doe, John A.) of all persons involved in the data collection. Authors can be different from those listed in a primary/related article (or presented in a different order); multiple allowed.
Identifier Kind: Select from the drop-down list to select the unique author identifier kind, if applicable. For example, ORCID, ResearcherID, etc.
Identifier: Enter the unique author identifier here, if applicable. For example, an ORCID identifier should be entered as 1234-5678-9123-4567.
Dataset DOI (digital object identifier): DOI for the dataset, not the journal article that may be referencing it. If a DOI does not already exist and you are uploading data directly to the Ag Data Commons (as opposed to linking to externally hosted data), the Ag Data Commons will obtain one for you. See our section on opting out of this option if you do not want a DOI created for your dataset.
ISO Topic: High-level subject categorization, also referred to as ISO Topic Categories. Select one or more from the drop-down list; multiple allowed.
Product Type: This field automatically defaults to Dataset. However, it may be changed to better describe the main or most important part of the dataset's actual content
Video tutorial: Filling out the Purpose and Methods section of the Ag Data Commons data submission form
Intended Use: Explain the intended use and benefits of the dataset. What purpose do you expect the data to serve? For example, precipitation data may be collected to study patterns of groundwater recharge, to validate watershed models, etc; Life-cycle assessment (LCA) data may be intended for a wide range of impacts in private or public use and/or for product comparisons, etc.
Use Limitations: Explain the limitations regarding the dataset's usability. For example, estimates may be biased over water, equipment may have malfunctioned during a specified time, granularity may mean it is unsuitable for certain kinds of analysis.
Equipment or software used: Name the equipment and software used to collect and process the data. Provide make and model, name and version number, and a stable URL for each tool used to collect and process the data.
Video tutorial: Filling out the Geographic section of the Ag Data Commons data submission form
State or Territory: Select as many as are applicable from the drop-down list of states; multiple allowed.
Spatial Description: This free text can be an address, city, state, region, or other spatial description. Geonames are recommended but not required.
Global Map: You may also use the interactive map (projected in WGS84) to indicate where your data were collected. See the left side of the map for buttons to manipulate data input. These features can be used exclusively or in combination to enter multiple points, polygons, or bounding boxes to a single record if applicable.
The + and - enable you to zoom in or out, depending on the level of geographic detail you wish to represent. If you have global data, you may zoom out as much as necessary to indicate data were collected across multiple countries.
Select the third button on the left to enter one or more polygons
Select the fourth button to enter one or more bounding boxes
Select the fifth button to drop one or more points on the map
Select the tabs at the top of the map to enter data in other ways. The GeoJSON tab enables you to enter raw GeoJSON data. GeoJSON is a technical standard that will appeal to GIS enthusiasts. See DCAT spatial/geographical coverage for more info.
Select the last tab on the right, Points, to manually input one or more points in decimal degrees. This is recommended if you know the exact point where data were collected. You can find the exact coordinates of an address or location at a variety of sites, including http://www.gps-coordinates.net/
Video tutorial: Filling out the Temporal section of the Ag Data Commons data submission form
Temporal Coverage: The span of time during which data were collected. Add temporal coverage in one of the following formats (start date/end date; if data collection is ongoing, leave the end date blank):
YYYY-MM-DD/YYYY-MM-DD
YYYY-MM/YYYY-MM
YYYY/YYYY
See DCAT temporal coverage for more info.
Frequency: The frequency with which dataset is published e.g. None, Daily, Weekly, Monthly, Annually, Continuously, Irregularly, Decennial - R/P10Y, Quadrennial - R/P4Y, Bimonthly - R/P2M, etc. For example, data with a publish frequency repeating once every 10 years would be designated Decennial - R/P10Y. See DCAT frequency for more info.
Video tutorial: Filling out the Citations section of the Ag Data Commons data submission form
Primary Article
Full Citation: Enter the full bibliographic citation, in APA, to a published article that directly describes this dataset (i.e. a data paper). Leave blank if there is no primary article connected to this dataset.
Article DOI: Enter the article’s DOI (just the number itself, not as a full URL, and excluding the prefix “doi:”). For example, "10.123/ABC/123"
PubAg AGID: If the article is in PubAg, enter the AGID here. When you navigate to an article in PubAg, the URL will look like this: https://pubag.nal.usda.gov/catalog/61025 . The AGID is the string of numbers following “/catalog/”. Leave blank if not applicable.
Methods Citation
Full Citation: Enter the full bibliographic citation, in APA, to a published article that describes the procedures for data assembly in greater detail; multiple allowed.
Article DOI: Enter the article’s DOI (just the number itself, not as a full URL, and excluding the prefix “doi:”). For example, "10.123/ABC/123"
PubAg AGID: If the article is in PubAg, enter the AGID here. When you navigate to an article in PubAg, the URL will look like this: https://pubag.nal.usda.gov/catalog/61025 . The AGID is the string of numbers following “/catalog/”. Leave blank if not applicable.
Related Article
Full Citation: Enter the full bibliographic citation, in APA, to a published article that is related to this dataset. Use your discretion to list only the most useful and relevant one(s); multiple allowed.
Article DOI: Enter the article’s DOI (just the number itself, not as a full URL, and excluding the prefix “doi:”). For example, "10.123/ABC/123"
PubAg AGID: If the article is in PubAg, enter the AGID here. When you navigate to an article in PubAg, the URL will look like this: https://pubag.nal.usda.gov/catalog/61025 . The AGID is the string of numbers following “id=” Leave blank if not applicable.
Preferred Dataset Citation: If you have a specifically formatted citation for this dataset that you want others to use, enter it here. This will ensure your dataset is cited to your satisfaction if/when it is re-used. If this field is left blank, a standard dataset citation will be constructed for you.
Metadata Sources: If you are using metadata that has already been prepared for this dataset, indicate your sources of metadata here.
Video tutorial: Filling out the Related Content section of the Ag Data Commons data submission form
Is Part Of: Is this dataset part of a larger dataset on the Ag Data Commons? If so, type the first letters of the container dataset’s title, then select from the autocomplete suggestions.
Cites other datasets
Full Citation: Provide the full citation(s) for datasets that were used to construct this one. This is particularly relevant for datasets that are aggregations or adaptations of other data.
DOI or other link: Provide the cited dataset's DOI as a full URL, e.g. "http://dx.doi.org/10.123/ABC/123". If there is no DOI, provide another link to the dataset.
Related To: Provide a name and URL to a resource that provides additional context to the dataset, e.g. a project website, manual, or data documentation. This field is not to denote relationship among datasets, but relationships among the dataset and non-dataset resources; multiple allowed.
Video tutorial: Filling out the Contact information section of the Ag Data Commons data submission form
Contact Name: Enter a long-term contact person for the dataset. This can be different from the dataset author/originator e.g. a data curator or administrator. See Project Open Data for more info. Format as Last name, First name.
Contact Email: Contact person’s email address.
Reviewer: Name of person other than author who has reviewed and approved the data (not the metadata); multiple allowed.
Publisher: Publisher of the dataset. This will be displayed in the dataset citation Ag Data Commons suggests to users of your data and is required to obtain a DOI. The field will default to “Ag Data Commons” unless you change it.
Video tutorial: Filling out the Keywords section of the Ag Data Commons data submission form
User-supplied Tags: Free text keywords to facilitate discovery of the dataset. This field is not part of a controlled vocabulary, and is intended to capture meaningful tags to support search and browse; multiple allowed.
Program: Choose from the hierarchy to select a program designation, if applicable. These programs are specific to the National Agricultural Library.
Video tutorial: Filling out the Administrative section of the Ag Data Commons data submission form
License: Enter the license assigned to this dataset; for federally generated data, this should be a Public Domain dedication such as CC Zero (Creative Commons CC Zero) or US Public Domain. Unsure which license to choose? Walk through the following questions to help determine the right one. The following is based on the latest guidance from the Federal CIO council.
Are all the data owners federal employees who generated the data as part of their work?
If YES, then the license recommendation is CC Zero (Creative Commons CC Zero).
If YES, but you are concerned about international public domain, then the license recommendation is US Public Domain.
If NO, see #2.
If all data owners are not federal government employees, were the data generated under a grant, collaborative agreement, or other contract agreement?
If YES, check the terms of that agreement for a “rights in data” or related clause. Use the licensing information specified there.
If YES, but no specific license is mentioned in your agreement, see #3.
If your licensing agreement allows non-federal data owners to choose a license, the license recommendation is CC Zero (preferred), CC BY (Creative Commons Attribution), or CC BY SA (Creative Commons Attribution-ShareAlike). License definitions and additional information can be found at http://opendefinition.org/licenses/ or https://creativecommons.org/licenses/
If your data was generated with no US federal funding, then they are unsuitable for the Ag Data Commons at this time. As the Ag Data Commons grows, we may consider expanding to include data from externally funded researchers. If you have questions or need assistance, email us at NAL_ADC_Curator@nal.usda.gov.
Video tutorial: Choosing a license for your data on the Ag Data Commons
Public Access Level: Enter the degree to which this dataset could be made publicly available, regardless of whether it has been. See Project Open Data for more info.
Bureau Code: Please specify one or more bureau codes. For example USDA-Agricultural Research Service (ARS) funded research should be “005:18”. See Project Open Data for more info.
Program Code: Please specify one or more program codes. ARS and National Institute of Food and Agriculture (NIFA) funded research should be “005:037 - Research and Education”. See Project Open Data for more info.
Funding Source(s): Indicate all sources of funding for the dataset. Include the institution name and project number, if available; multiple allowed. If this is an ARS or NIFA funded project, put the CRIS number in the Project or grant number field.
ARIS Log Number: This only applies to datasets created by Agricultural Research Service employees. Enter the ARIS log number, if applicable.
Resources: allows you to attach resources to your dataset that are already uploaded to other datasets
Highlight Image: This image should be an informative and attractive picture or graphic, and may be used to accompany your dataset. Files must be less than 4 GB, and in one of the following file types: .png .gif .jpg or .jpeg. Ag Data Commons curators reserve final approval for all image selections.
Revision log message: Provide an explanation of the changes you are making. This will help other authors understand your motivations.
Moderation state: Set the moderation state for this content - this is one place content authors can submit their draft for review.
Video tutorial: Submit your dataset for review on the Ag Data Commons
Include the following in the narrative Description field of your dataset, if applicable:
Even though some of the following have dedicated fields that allow for greater detail, we encourage summarizing them in the dataset Description field as well
Make sure the Description describes the data, not the project or article
Description of the experiment setting: location, influential climatic conditions, controlled conditions (e.g. temperature, light cycle)
Processing methods and equipment used
Study date(s) and duration
Study spatial scale (size of replicates and spatial scale of study area)
Level of true replication
Sampling precision (within-replicate sampling or pseudoreplication)
Level of subsampling (number and repeat or within-replicate sampling)
Study design (before–after, control–impacts, time series, before–after-control–impacts)
Description of any data manipulation, modeling, or statistical analysis undertaken
Description of any gaps in the data or other limiting factors
Outcome measurement methods and equipment used
Data description pointers are based on information from the following publication:
Haddaway, N. & Verhoeven, J. (2015). Poor methodological detail precludes experimental repeatability and hampers synthesis in ecology. Ecol Evol, 5(19), 4451-4454. http://dx.doi.org/10.1002/ece3.1722
Video tutorial: Add a data resource to your dataset on the Ag Data Commons
Fields marked with an asterisk are required. All other fields are optional. If N/A, leave the field blank.
File Name*: Enter a descriptive resource title as you want it to appear; if uploading multiple resources, include information that distinguishes this one from the others you will provide. It is particularly helpful to include the format of the resource in the title and it is not necessary to repeat details that are in the title of the dataset.
Option to Link to a file, Link to an API, or Upload a File:
Upload a file: If you are uploading a file, multiple files can be dragged into the interface window at once. Or, select “+ Add Files” in the lower left corner of the dialog box. Remember to select “Start upload.”
If your file is in csv or text format, select the delimiter used, if applicable.
Please note, the Ag Data Commons does not accept executable files of any type.
Link to a file: Enter a link to one of the following file formats: csv, html, xls, json, xlsx, doc, docx, rdf, txt, jpg, png, gif, tiff, pdf, odf, ods, odt, tsv, geojson, or xml.
Link to an API: Link to an API if you want to link to a web page.
File Format: Specify format of the resource (e.g. CSV, HTML, XML, etc.). All resources linked to from an API should be designated as HTML regardless of the type of resource found at that link.
Description: Provide a detailed description of your resource. For example, if it is an Excel file with multiple tables, describe what the different tables contain. Imagine that you are a novice user: what information would you need to make sense of this resource?
Text Format: This refers to the text format of the description block. Although it should not be necessary, you can change the format to suit your preference.
“Make this resource the data dictionary” checkbox: Check this if the resource you are uploading is the data dictionary. See the sections regarding data dictionaries in this document for more information on data dictionaries.
Recommended Software: Provide a name, version number, and stable URL for software tools recommended to view or run this resource.
Dataset: This field will auto-populate with the title of the dataset associated with this resource. Do not edit or add to this field.
Weight: This field determines the order in which the resources will be displayed on the dataset page. The lowest value will be displayed first.
Described by: This field is restricted to ADC curators. Submitters should leave this field blank.
URL Path Settings: A URL will automatically be generated for this metadata record unless you uncheck the “Generate automatic URL alias” box; once it’s unchecked, you may enter a customized URL if you wish.
Revision Information: If this is a revision to a previously uploaded file, check the box and provide an explanation as to why you are revising the submission.
Scheduling Options: If you would like to place an embargo on the publication of your resource, enter the date you would like it to be published here. Format is YYYY-MM-DD. You may enter a date up to three years in the future. If you would like your data resource to be published immediately upon curator approval, with no embargo, leave this field blank.
By default, if you upload resources for the Ag Data Commons to publish, the Ag Data Commons will mint a new DOI for your dataset upon successful review of your submission. If you already have a DOI for your data, or otherwise wish to opt-out of this service, please e-mail: agrefquestion@libraryresearch.info
In the body of your email, include the title and author(s) of the dataset that should not receive a DOI.
You may provide multiple formats for the same data.
Consider submitting the files used by your statistical program as they are already machine-readable.
Provide data as CSV (comma-separated values) wherever possible for tabular data. In addition you may also submit spreadsheets to capture formulas. For more information on creating CSV files, see the next page.
Include meaningful column headers in the first row of the file. This will allow the data to be converted into other formats and display successfully. Avoid subheadings and summary information -- these will not make sense in CSV format.
Author your spreadsheet as one table per tab, as opposed to more than one table in a tab. This ensures the data will be machine readable.
Avoid blank rows or columns between data elements.
Do not use zeroes or leave a cell blank. Select a code to identify missing data; using -999 or -9999 is a common convention. NA is also an acceptable value for missing or inapplicable data. Indicate the code for missing data in the data dictionary.
When exporting data to another format, check to ensure that no cells with missing data have zeros, or are blank in the resulting file. Check to be sure that the resulting rows and columns make sense.
Use standard terminology wherever possible (include both the Latin and common names for plants and animals). As linked data becomes the norm, this will increase the impact of your dataset by making it easier to find.
As a scientific researcher, you may have access to disciplinary thesauri or ontologies that you wish to use. If not, a couple suggestions are the National Agricultural Library Thesaurus (a broad thesaurus of Agricultural Terms), and the Integrated Taxonomic Information System (which provides standardized names for plants, animals, fungi, and microbes).
Video tutorial: Convert data files to CSV format
CSV (Comma Separated Values) is the preferred format for most data in the Ag Data Commons. Not only is a dataset more versatile as a CSV, but viewers can take advantage of built-in features like data visualizations and previews with CSV data files.
If your database can output data as CSV, opt for that choice to create your data file.
If you have an Excel spreadsheet and want to convert it to CSV, follow these steps:
Make sure there is a single column header row to label your variables. If your current data has more than one header row, consider combining these into one row in a way that makes sense.
Combine your data onto a single spreadsheet page. Delete any additional pages by right-clicking the blank pages at the bottom of your spreadsheet and choosing "Delete". Note that one multi-tab spreadsheet might become several CSV files.
In order to use the built-in chart previews, remove any special characters from your column header rows. Some characters in the column headers prevent the embed / link features from working with charts and graphs in the Ag Data Commons. This step is not necessary to create a CSV, but is necessary to use the Embed feature for sharing your data visualizations created in the Ag Data Commons. This is a list of incompatible and acceptable special characters.
Characters that should NOT be used in column headers for full compatibility with built-in data visualizations:
CharacterDescriptionCharacterDescription
`accent!exclamation point
@"at" symbol#number / hashtag symbol
$dollar symbol%percent symbol
^caret &"and" symbol
*asterisk( )parenthesis
< >greater / less than symbols?question mark
[ ]brackets{ }brackets
|pipe\back slash
'apostrophe~tilde
Acceptable characters to use in column headers for full compatibility with built-in data visualizations:
CharacterDescriptionCharacterDescription
/forward slash.period
"double quotes
Remove any commas from your document. Because the delimiter is a comma, extra commas in your text can cause errors in interpreting the data.
Save your document - choose "Save as: CSV (Comma delimited)
Upload this document as a data resource on the Ag Data Commons, and check Grid, Graph, and Embed to take advantage of the full functionality of the Ag Data Commons data previews.
(Please note, data is not required to be in CSV format, but is highly recommended if compatible with your data)
Before submitting your dataset for review, check the following items to ensure your dataset is described and formatted the best way possible:
✔ Ensure all acronyms and abbreviations are spelled out
✔ Check for typos
✔ If the Title is based directly on a paper, use the “Data from:” convention
✔ Make the Title descriptive - Include locations, dates, and informative keywords, if applicable
✔ Make sure the Description describes the data, not the project or article
✔ Make sure the Summary is filled in and consists of an appropriate sentence to represent the dataset
✔ Include Geographic / Temporal Information and Use Limitations, if applicable
✔ Add Author IDs. USDA ID - When you click on an author’s name, their ID will be everything after display/ in the URL, for example ARS-ABC01234. If no USDA ID exists, check ORCID, then Scopus and ResearcherID. Often multiple IDs will exist, choose the one with the most publications attached to it.
✔ Provide a Contact Name and Contact Email
✔ Choose an appropriate License (Note that funding sources affect the type of license assigned to the dataset)
✔ Add Bureau / Program Codes if appropriate - If ARS funded, these should be 005:18 / 005:037, respectively
✔ Add Funding Source(s) and Project or grant number
✔ Add User-supplied tags (this usually requires subject area research). Add Latin names for plants/animals as user-supplied tags. Add National Program Number (e.g. NPxxx) if applicable (ARS-specific) in this field.
✔ Add Program hierarchy tags if appropriate
✔ Make sure the following fields are populated to reserve a DOI: title, author, publisher, contact name and email, and product type (dataset, etc)
✔ Submit appropriate resources (data, not just figures), preferably in a machine readable format (csv is preferred for data files; XLS can be converted and added as an additional CSV resource)
✔ Provide a Description of the data file
✔ Review resources to ensure they contain column headers, that file titles are meaningfully descriptive, and that all links / downloads work as advertised
✔ Check the Graph and Grid boxes if your data is tabular / Check the Map box if data is geospatial with coordinates / Check the Embed box to provide a link to data visualizations created from your data
✔ Provide a Data Dictionary / read me file
✔ Change Moderation State on your dataset AND each resource file to "Needs Review" when ready to publish
Video tutorial: Data Dictionaries on the Ag Data Commons
Data dictionaries are used to provide detailed information about the contents of a dataset or database, such as the names of measured variables, their data types or formats, and text descriptions. A data dictionary provides a concise guide to understanding and using the data. Ideally, all Ag Data Commons (ADC) records for datasets and databases should include or point to a data dictionary. It is preferred that these data dictionaries be machine readable, in csv format.
If your data are managed in a standard relational database you will likely be able to generate a data dictionary through your software. This will provide a document that is consistently formatted and contains what is needed for others to understand your data. See the following section for more information.
If your data are managed in spreadsheets, text files, or comma separated values, you will need to manually prepare a data dictionary. To support machine-readability, we recommend preparing your data dictionary as a spreadsheet. If you prefer to prepare it as a .doc or .pdf, we recommend embedding a data dictionary table in your document that can be easily extracted. A data dictionary template and examples can be found toward the end of this document.
If your data is stored in a relational database, it may be able to generate a data dictionary for you.
If your data is stored in a spreadsheet, you will need to manually create a data dictionary.
The following are recommended guidelines for data dictionaries; not requirements. These guidelines are subject to change, as best practices are evolving.
Enterprise-level databases often contain built-in tools for automatically generating data dictionaries. Consult your database administrator or software documentation for instructions specific to your system.
To generate a data dictionary from MS Access, select the "Database Tools" tab, then select "Database Documenter" (under “Analyze”). Now you should see the Documenter dialog box. If you have entered a description of your fields in Design View, they will carry over to the generated data dictionary here. It should resemble the following:
Video tutorial: Data Dictionaries on the Ag Data Commons
Submit a spreadsheet (.XLS or .XLSX) with one tab for introductory information, and separate data dictionary tabs that correspond to each existing tab in your dataset. For example, if your dataset consists of three tabs, your data dictionary will have four tabs: the first for introductory, background information, and three more to correspond to the three tabs of data. Consider using our data dictionary template to get started.
Submit your data as a spreadsheet or csv
One table per tab
No extraneous comments
No empty cells, columns, or rows (enter n/a if nothing applies)
Spell out all abbreviations
Element definitions should be stated in the singular, be succinct, and be able to stand alone from other element definitions
If you would rather submit a DOCX or PDF, embed tables in your document so they will be exportable. Submit a .DOCX or .PDF with the following:
Introductory and explanatory text
Explain context: is if from a singular research article, or a larger project?
Provide a URI (Uniform Resource Identifier), which will usually be a URL or a DOI (Digital Object Identifier) for the dataset or related journal article.
Other pertinent information such as version, date released, etc.)
A listing of elements (fields), in addition to the following:
Element source table
Element definition
Element variables
Element data type
Element field length
Required y/n and/or null value note
Use this option only if you are unable to automatically generate a machine-readable data dictionary. Guidelines for automatically generating a data dictionary from a database are in the previous section of this document.
We suggest submitting your data dictionary as a spreadsheet (consider using our blank template). If you would rather submit a DOCX or PDF, embed tables in your document so they will be exportable. Submit a DOCX or text searchable PDF with the following:
Introductory and explanatory text
Explain context: is if from a singular research article, or a larger project?
Provide a URI (Uniform Resource Identifier), which will usually be a URL or a DOI (Digital Object Identifier) for the dataset or related journal article.
Other pertinent information such as version, date released, etc.)
A listing of elements (fields), in addition to the following:
Element source table
Element definition
Element variables
Element data type
Element field length
Required y/n and/or null value note
If possible, a data diagram or data model showing the relationships among tables.
For example:
Source: The Pacific Northwest Forest Inventory and Analysis Database
This blank template can be used to manually create a data dictionary. Use one row for each data element, and do not leave rows, columns, or cells blank. Add rows and columns as necessary, and enter n/a if nothing applies. See below for an explanation of column headers.
Use the following link to access a blank template to download and customize:
This template is read-only. Please copy the information into your own spreadsheet to create your data dictionary.
Explanation of column headers:
Spreadsheet tab: If your spreadsheet has multiple tabs, identify the tab you are describing.
Element or value display name: What is the name used in your data file?
Description: Write a brief definition, stated in the singular, that could stand alone from other element definitions.
Data type: For example, indicate varchar, integer, date, etc.
Character length: For example, the maximum length for Excel is 255, so indicate 255 or less.
Acceptable values: List all acceptable values, separated by pipes ( | ). This may be a field name or a range of values.
Required?: Enter y/n to indicate whether this field is required.
Accepts null value?: This required to run calculations on your data. Indicate y/n if null value is allowable.
Examples 1 & 2: Created by the ADC team using the source dataset, Data from: Enabling proteomic studies with RNA-Seq: the proteome of tomato pollen as a test case:
Example 3: Selected columns from the USDA National Finance Center Insight Training Participant Guide Version 1.0, June 2013. Source: USDA National Finance Center Insight Training Participant Guide Version 1.0 June 2014
The following are more USDA, ecological, and agricultural data dictionary examples:
The Pacific Northwest Forest Inventory and Analysis Database
ICASA Version 2.0 Data Standards for Agricultural Field Experiments and Production
If you still have questions regarding data dictionaries, e-mail us at agrefquestion@libraryresearch.info