Ag Data Commons Data Submission Manual v1.3

This submission manual provides practical information for contributors to the Ag Data Commons data repository and registry. The manual can be accessed as a single, printer-friendly document, or in chapter-by-chapter view for convenient on-screen viewing:

About this manual

User Edition v1.3

Prepared March 2016

Knowledge Services Division | National Agricultural Library

10301 Baltimore Ave | Room 207

Beltsville, Maryland 20705

Contact: NAL-ADC-Curator@ars.usda.gov

Principles

Research datasets and tools that relate to those datasets have value of their own, regardless of whether they are associated with published peer-reviewed literature. This means they can have a different set of authors or publishers, their own digital object identifier (DOI), and their own unique description. The presence or absence of peer-reviewed papers that describe them does not affect a dataset’s qualification to be hosted on Ag Data Commons (henceforth referred to as ADC).

On the ADC, a “dataset” refers to the entire metadata record for one or more resources on the ADC; the dataset refers to information entered on the “Edit dataset” page. A “resource” refers to the data files you upload or link to on the “Add data” page. This may be a spreadsheet, database, photographic collection, software tool, or other resource that you wish to make available for the purpose of advancing scientific knowledge. Again, multiple resources may be included as part of a single dataset, depending on how you wish to define and describe your data.

Is the Ag Data Commons right for your data?

Before proceeding, determine whether your data is suitable for inclusion on the Ag Data Commons.

  1. Was the research behind your data all or partially funded by the USDA?

    a. If YES, continue.

    b. If NO, your data is unsuitable for the ADC at this time. As the ADC grows, we may consider expanding to include data with other sources of funding.

  2. Is your data able to be freely distributed and re-used by others?

    a. If YES, continue.

    b. If NO, your data is unsuitable for the ADC at this time. We host and support open data to facilitate sharing and reuse.

    c. If you are unsure, please see the section on determining licenses on the Description of fields on the "Edit Dataset" page.

  3. The Ag Data Commons requires inclusion of the full and complete data cited in papers or used to create any charts, graphs, tables, and so on. Do you have the ability to publish the full, complete data, or link to the complete data if it is published elsewhere?

    a. If YES, continue.

    b. If NO, your data is unsuitable for the ADC at the time. We gladly accept supplementary materials resulting from the data (i.e. tables, charts, figures), but in order to comply with the ADC mission, we also need access to the full original data that the figures were derived from.

    c. If NO, but data will be available in the future: If you would like to create a public record as a placeholder before the data is made publicly available, the ADC does allow an embargo to be placed on a data resource for up to 3 years if it is not immediately available to be viewed publicly. See “Scheduling Options” on the Description of fields on “Add Data" page” for more information.

Register for an Ag Data Commons account

Video tutorial: Register for an account on the Ag Data Commons

  1. Select “Register” at the upper right corner of the ADC home page.

  2. Enter your credentials, and select “Create new account” at the bottom of the page. An ADC curator will be notified of your account request.

  3. Once your account has been approved (which will usually be within 5 business days), you will receive an email with details on how to complete your account registration. You must complete the registration process before you can submit data.

Submit your data

Organization

Decide how you want to organize your data: Do you have multiple unique datasets or only one dataset containing multiple files of data? As the data submitter, you are best suited to make this decision. Try to imagine yourself as a user of your data, and ask yourself what the most useful grouping would be.

You would create distinct ADC records if:

  • There are enough differences among your data files that they require unique descriptions of methods and data dictionaries (i.e. lists of the measurements included in the files)

  • The datasets will likely be reused independently, and should have unique citations

  • The authorship varies from dataset to dataset

You would keep all the data under a single record if:

  • It would be confusing and redundant to generate several unique ADC records where one will suffice

    • Remember that you can denote relationships among datasets in the ADC like “Is Part of” and “Cites other datasets.” In this way it is possible to have separate but related records for your data if you feel that is the most appropriate way to represent it.

Create a dataset

Video tutorial: Create a dataset on the Ag Data Commons

The process to add a new dataset record to the ADC consists of two main forms: the "Create Dataset" form, where you will add details about the overall dataset, and the "Add data" form, where you will upload and describe one or more specific data resources that you choose to include with your dataset entry. To create a new dataset record, use the following steps as a guide:

  • Select “Log in” at the upper right corner of the ADC home page if you are not already logged into the ADC.

  • Select “Submit Dataset” at the upper right of the screen. Now you should be on the “Create Dataset” form.

  • Proceed to populate the fields shown. See our Description of fields on “Edit Dataset” page for detailed explanations of each field.

    • The only difference between "creating" and "editing" a dataset is that you only need to create the dataset once. After you create a dataset, you can then go back at any point and edit the content in the fields. The fields on these pages are exactly the same.

Note that the gray rows toward the end of the page (such as “Purpose and Methods” and “Temporal Information”) are expandable. Select each header to view the full range of fields ADC has for describing your data.

  • After you have finished filling out the dataset creation form, select "Save" or “Next: Add data” at the bottom of the screen. If you would like to add your resources at a later date, choose "Save". If you are ready to add your resources now, choose "Next: Add data".

Add the data

Video tutorial: Add a data resource to your dataset on the Ag Data Commons

If you choose the latter, you should be on the “Add data” form, and are ready to submit your data files and resources.

  • Upload or link to your data files and any related content. Make sure they are named and formatted the way in which you want them to appear. See the Description of fields on "Add Data" page for detailed explanations of each field. See the Guideline for remote data resources and related content page of this manual for more information on appropriate resource files to upload or link.

  • If you are only adding one resource at this time, when finished, select “Save” at the bottom of the page to save your changes. If you finish adding that resource but want to immediately begin adding another resource, when finished, select “Save and add another” at the bottom of the page to save your changes and add an additional resource.

  • You will repeat this "add data" process and create an additional resource for each data file you upload or each API you link. You can create multiple data resources within each dataset, allowing you to describe each resource more accurately. If you would like to include multiple files in a single downloadable resource, you may upload a single zip file containing multiple data files. If you choose this method of adding your data, please describe the contents of the zip file(s) thoroughly since users will not be able to preview zipped files to the same degree as individual files.

  • Remember to upload your data dictionary as a resource, and check the box titled “Make this resource the data dictionary” near the center of the page. See our Data Dictionary Guidelines toward the end of this document for more information on preparing data dictionaries.

  • You can always return to your dataset to add data and other resources at a later time by using the "+ Add Resource" button at the top left of the dataset page.

Embargo

At this point, you should also decide whether you want your data resource(s) to be published immediately, or if you would like to place an embargo period on the data. An embargo would create a placeholder for your data resource, which will automatically be published when the embargo expires. We allow for resources to be embargoed for up to 3 years. See “Scheduling Options” on the Description of fields on “Add Data” page for more information.

To embargo your data:

  1. Create a new resource for your dataset (see instructions above) and enter a title and description of the data you expect to publish. Upload a data file to accompany this resource record.

  2. Scroll to the bottom of the page above the "Save" button and click "Scheduling options"

  3. Enter the date you would like to make your data publicly available and a reason for the embargo period

  4. When the dataset is published a message regarding the embargo will appear to anyone interested in your data

Viewing unpublished datasets

When you view your saved datasets/resources on the ADC Datasets page(s), they will be highlighted in pink. This indicates they are unpublished, nothing else (see example of an unpublished dataset, Transforming Drainage, in the image below).

Only you can view unpublished datasets you have created while logged into your user account. In order for your data to be viewed by everyone, you must first submit the dataset for review.

Guideline for remote data resources and related content

Ensure that remotely stored and linked resources are indeed data and not just links to web pages or articles. The Ag Data Commons can accommodate many types of related and linked data, but it is important to include each type of information in the appropriate fields. Use the following guideline to determine where information should be included.

Data

Data should be included as a Resource. Data are materials like tabular data, tools, images, etc. that the user can download and use.

Do not include previously published data as direct upload resource. Link to the published version of that data instead. DOIs will not be issued by the Ag Data Commons unless data is uploaded locally and originally published by the Ag Data Commons.

Other materials

Any remote materials that are not data can be included in the appropriate category:

Articles

Articles should be included in the Citations field (Primary, Related, Methods).

  • Unless an article is no longer available and it is not a copyright violation to reproduce it publicly, articles should only be included in the Citations section.
  • Include DOI and AgID when able so there is a persistent link.

URLs

URLs should be included in the Related Content field.

  • This field is intended to link to outside resources that provide additional context to the dataset. Examples are group web sites, blog posts, remote images, webinars, videos, etc.
  • This will cover any remote web page that is not data.
  • This field allows a title and URL.
  • If an explanation is needed for the linked web site, it should be included in the dataset description.

If a piece of supplemental material is not data but is a direct upload, verify that it belongs with and adds value to the data. As long as the item is not published elsewhere, it can be added as a resource if it is a suitable supplement to the data.

Clone an existing dataset

Video tutorial: Clone a dataset on the Ag Data Commons

There are many cases where a content author wants to submit an updated version of a dataset already in the Ag Data Commons, or wants to upload a similar record to one that has already been created. If many similarities exist between an existing record and one not yet created, content authors can choose to clone one of their existing datasets as opposed to creating the dataset from scratch. As a new version, the cloned dataset has slight differences in metadata and will receive a new DOI if applicable, but most of the metadata between the original and new datasets is the same.

  • Log in with your user account

  • Navigate to an existing dataset you have created whose metadata is almost exactly the same as the dataset you want to create

  • Click on the “Clone Dataset” button at the top of the record

  • You are now on the confirmation page. Click on the “Clone” button to create the new dataset

  • A new dataset is created pre-populated with the metadata from the original dataset, but not including any resources

  • You can then change any of the fields for the newly created dataset

  • The new dataset automatically includes a pointer to the old dataset

    • If you do not want the newly cloned dataset to link back to the original dataset, while on the edit screen, navigate to the Related Content drop down and delete the content in the "Related to" fields (Title and URL). This will break the link between the old and new datasets.

Submit your dataset draft for review

Video tutorial: Submit your dataset draft for review on the Ag Data Commons

Once you have finished editing your dataset record and uploaded or linked all your data and resources, you must save and submit the dataset and data resource drafts for review so that they can be approved and published. Note that each dataset AND data resource you create must be submitted for review separately. You can submit your datasets and data resources for review in one of two ways.

Submit one or more datasets or resources for review from the “My Drafts” page

This method makes the most sense if you want to submit multiple datasets for review at a time, or are submitting a dataset that does not need further editing.

  • Click on “My Workbench” in the menu bar to see all of the datasets you have created

  • Click on “My Drafts” in the menu bar to see all published and unpublished datasets you have created - there will be a circle with a number in it noting the number of items in this category

  • To submit a single dataset, find the dataset in this list that you wish to submit for review, and in the right hand column click the “Submit for Review” button

  • To choose more than one dataset to submit for review, click the checkbox to the left of all datasets you would like to submit, and then click the “Submit for Review” button at the top of the list

  • To submit all of your drafts for review at once, click the “Select all items on this page” and when a checkbox appears next to every dataset, click the “Submit for Review” button at the top of the list

Or...

Submit a single dataset or resource directly from the dataset that needs review during the editing process

This method makes the most sense if you are submitting a dataset for review immediately after you finish editing it. Note, the dataset or data resource draft must first be saved at least once in order to submit it for review.

  • Click on the dataset you would like to edit and submit for review (either from the “Datasets” page view or “My Workbench” view)

  • Edit the dataset as needed

  • When finished editing, click on either "Revision information" / “Moderation State” at the bottom of the page above the “Save” button, or on “Moderate” in the menu bar at the top of the page

  • In the dropdown menu, select the moderation state “Needs Review”

    • From the dataset view:

    • From the Moderate view:

  • Click “Save” or “Apply”, respectively, to move the dataset into the queue for moderation

An ADC curator will now review your dataset and either approve it for publication or inform you of any changes that must be made prior to publishing. You will receive an email notification when the status of your dataset changes in any way. The ADC reserves the right to refuse publication of a dataset for any reason.

Review by ADC curator

Once you submit your dataset for review, an ADC curator will be notified. She will:

  1. Review your metadata and contact you if further information is needed to ensure the data is sufficiently described

  2. Add ADC and National Agricultural Library Thesaurus (NALT) keywords to enhance retrievability, making your dataset easier to find in the repository

  3. Obtain a DOI if your dataset does not already have one

  4. Contact you once your data goes live on ADC so that you can review and approve the final record

    • Note that ADC reserves the right to deny publishing your dataset for any reason.

Updating and revisiting your data entry

If you need to add data or other resources to your dataset record at a later date, you can return to your record in the future and select “Add Resource” at the top of the Dataset page. To edit datasets or resources at a later date, select on the dataset or resource title, then select “Edit” at the top of the page.

Note that you can always edit your dataset, even after you submit it for review and it is in either the Needs Review or Needs Supervisor Review boxes, as well as after it is published. Just remember to re-submit the dataset or data resource for review when you are finished editing it so the changes can be approved and published.

Description of fields on “Edit Dataset” page

Fields marked with an asterisk are required. All other fields are optional. If N/A, leave the field blank.

Primary fields

Video tutorial: Create a dataset on the Ag Data Commons

  • Title​* ​: Enter a descriptive dataset title as you want it to appear; include dates, locations, and specific metrics that make your dataset unique. If the data is from a Primary Article Citation (see below), use the naming convention "Data from: title of article"

  • Description​* ​: A rich free text description that provides as much explanation as possible about the dataset: how and why it was generated, and how it should (or should not) be used. This can be modified from article text (e.g. Abstract, Methods, Objectives), but should focus on characterizing the data, not the journal article.
    Please provide explanations for all acronyms and abbreviations. Get more guidance on filling out the Description field.

  • Summary: A shorter description of the dataset, usually no more than a sentence or two. This information will appear in the main dataset list as a teaser line under the Title to briefly communicate the contents and purpose of your dataset. (Note: You can toggle between the Summary and Description boxes by clicking the "Edit Summary" and "Hide Summary" links - both boxes will not be visible in the submission form at the same time.)

  • Author *​:

    • Name *​: Enter the Last name, First name, Middle initial followed by a period (e.g. Doe, John A.) of all persons involved in the data collection. Authors can be different from those listed in a primary/related article (or presented in a different order); multiple allowed.

    • Identifier Kind: Select from the drop-down list to select the unique author identifier kind, if applicable. For example, ORCID, ResearcherID, etc.

      • Visit the ORCID registration page to search for any existing accounts under your name and to create a new account if you do not already have one. Your ORCID iD connects with your ORCID Record that can contain links to your research activities, affiliations, awards, other versions of your name, and more. You control this content and who can see it. The search bar is at the very top of the page. Make sure you do not have an existing account before creating another one.
    • Identifier: Enter the unique author identifier here, if applicable. For example, an ORCID identifier should be entered as 1234-5678-9123-4567.

  • Dataset DOI (digital object identifier)​: DOI for the dataset, not the journal article that may be referencing it. If a DOI does not already exist and you are uploading data directly to the Ag Data Commons (as opposed to linking to externally hosted data), the Ag Data Commons will obtain one for you. See our section on opting out of this option if you do not want a DOI created for your dataset.

  • ISO Topic​: High-level subject categorization, also referred to as ISO Topic Categories. Select one or more from the drop-down list; multiple allowed.

  • Product Type​: This field automatically defaults to Dataset. However, it may be changed to better describe the main or most important part of the dataset's actual content

    • Acceptable content includes: Dataset, Database, Photograph collection, Presentation, Software tool, Computer model, Animations/Simulations, Figures/Plots, Genome/Genetics Data, Interactive Data Maps, Multimedia, Numeric Data, and Still Images or Photos.

Purpose & Methods

Video tutorial: Filling out the Purpose and Methods section of the Ag Data Commons data submission form

  • Intended Use​: Explain the intended use and benefits of the dataset. What purpose do you expect the data to serve? For example, precipitation data may be collected to study patterns of groundwater recharge, to validate watershed models, etc; Life-cycle assessment (LCA) data may be intended for a wide range of impacts in private or public use and/or for product comparisons, etc.

  • Use Limitations​: Explain the limitations regarding the dataset's usability. For example, estimates may be biased over water, equipment may have malfunctioned during a specified time, granularity may mean it is unsuitable for certain kinds of analysis.

  • Equipment or software used: ​Name the equipment and software used to collect and process the data. Provide make and model, name and version number, and a stable URL for each tool used to collect and process the data.

Geographic Information

Video tutorial: Filling out the Geographic section of the Ag Data Commons data submission form

  • State or Territory​: Select as many as are applicable from the drop-down list of states; multiple allowed.

  • Spatial Description​: This free text can be an address, city, state, region, or other spatial description. Geonames are recommended but not required.

  • Global Map​: You may also use the interactive map (projected in WGS84) to indicate where your data were collected. See the left side of the map for buttons to manipulate data input. These features can be used exclusively or in combination to enter multiple points, polygons, or bounding boxes to a single record if applicable.

    • The + and - enable you to zoom in or out, depending on the level of geographic detail you wish to represent. If you have global data, you may zoom out as much as necessary to indicate data were collected across multiple countries.

    • Select the third button on the left to enter one or more polygons

    • Select the fourth button to enter one or more bounding boxes

    • Select the fifth button to drop one or more points on the map

    • Select the tabs at the top of the map to enter data in other ways. The GeoJSON tab enables you to enter raw GeoJSON data. GeoJSON is a technical standard that will appeal to GIS enthusiasts. See DCAT spatial/geographical coverage for more info.

    • Select the last tab on the right, Points, to manually input one or more points in decimal degrees. This is recommended if you know the exact point where data were collected. You can find the exact coordinates of an address or location at a variety of sites, including http://www.gps-coordinates.net/

Temporal Information

Video tutorial: Filling out the Temporal section of the Ag Data Commons data submission form

  • Temporal Coverage​: The span of time during which data were collected. Add temporal coverage in one of the following formats (start date/end date; if data collection is ongoing, leave the end date blank):

  • Frequency​: The frequency with which dataset is published e.g. None, Daily, Weekly, Monthly, Annually, Continuously, Irregularly, Decennial - R/P10Y, Quadrennial - R/P4Y, Bimonthly - R/P2M, etc. For example, data with a publish frequency repeating once every 10 years would be designated Decennial - R/P10Y. See DCAT frequency for more info.

Citations

Video tutorial: Filling out the Citations section of the Ag Data Commons data submission form

  • Primary Article

    • Full Citation: ​Enter the full bibliographic citation, in APA, to a published article that directly describes this dataset (i.e. a data paper). Leave blank if there is no primary article connected to this dataset.

    • Article DOI: ​Enter the article’s DOI (just the number itself, not as a full URL, and excluding the prefix “doi:”). For example, "10.123/ABC/123"

    • PubAg AGID: ​If the article is in PubAg, enter the AGID here. When you navigate to an article in PubAg, the URL will look like this: https://pubag.nal.usda.gov/catalog/61025 . The AGID is the string of numbers following “/catalog/”. Leave blank if not applicable.

  • Methods Citation

    • Full Citation: ​Enter the full bibliographic citation, in APA, to a published article that describes the procedures for data assembly in greater detail; multiple allowed.

    • Article DOI: ​Enter the article’s DOI (just the number itself, not as a full URL, and excluding the prefix “doi:”). For example, "10.123/ABC/123"

    • PubAg AGID: ​If the article is in PubAg, enter the AGID here. When you navigate to an article in PubAg, the URL will look like this: https://pubag.nal.usda.gov/catalog/61025 . The AGID is the string of numbers following “/catalog/”. Leave blank if not applicable.

  • Related Article

    • Full Citation: ​Enter the full bibliographic citation, in APA, to a published article that is related to this dataset. Use your discretion to list only the most useful and relevant one(s); multiple allowed.

    • Article DOI: ​Enter the article’s DOI (just the number itself, not as a full URL, and excluding the prefix “doi:”). For example, "10.123/ABC/123"

    • PubAg AGID: ​If the article is in PubAg, enter the AGID here. When you navigate to an article in PubAg, the URL will look like this: https://pubag.nal.usda.gov/catalog/61025 . The AGID is the string of numbers following “id=” Leave blank if not applicable.

  • Preferred Dataset Citation​: If you have a specifically formatted citation for this dataset that you want others to use, enter it here. This will ensure your dataset is cited to your satisfaction if/when it is re-used. If this field is left blank, a standard dataset citation will be constructed for you.

  • Metadata Sources: ​If you are using metadata that has already been prepared for this dataset, indicate your sources of metadata here.

Related Content

Video tutorial: Filling out the Related Content section of the Ag Data Commons data submission form

  • Is Part Of​: Is this dataset part of a larger dataset on the Ag Data Commons? If so, type the first letters of the container dataset’s title, then select from the autocomplete suggestions.

  • Cites other datasets

    • Full Citation​: Provide the full citation(s) for datasets that were used to construct this one. This is particularly relevant for datasets that are aggregations or adaptations of other data.

    • DOI or other link​: Provide the cited dataset's DOI as a full URL, e.g. "http://dx.doi.org/10.123/ABC/123". If there is no DOI, provide another link to the dataset.

  • Related To​: Provide a name and URL to a resource that provides additional context to the dataset, e.g. a project website, manual, or data documentation. This field is not to denote relationship among datasets, but relationships among the dataset and non-dataset resources; multiple allowed.

Contact

Video tutorial: Filling out the Contact information section of the Ag Data Commons data submission form

  • Contact Name​: Enter a long-term contact person for the dataset. This can be different from the dataset author/originator e.g. a data curator or administrator. See Project Open Data for more info. Format as Last name, First name.

  • Contact Email​: Contact person’s email address.

  • Reviewer​: Name of person other than author who has reviewed and approved the data (not the metadata); multiple allowed.

  • Publisher​: Publisher of the dataset. This will be displayed in the dataset citation Ag Data Commons suggests to users of your data and is required to obtain a DOI. The field will default to “Ag Data Commons” unless you change it.

Keywords

Video tutorial: Filling out the Keywords section of the Ag Data Commons data submission form

  • User-supplied Tags​: Free text keywords to facilitate discovery of the dataset. This field is not part of a controlled vocabulary, and is intended to capture meaningful tags to support search and browse; multiple allowed.

  • Program​: Choose from the hierarchy to select a program designation, if applicable. These programs are specific to the National Agricultural Library.

Administrative

Video tutorial: Filling out the Administrative section of the Ag Data Commons data submission form

  • License​: Enter the license assigned to this dataset; for federally generated data, this should be a Public Domain dedication such as CC Zero (Creative Commons CC Zero) or US Public Domain. Unsure which license to choose? Walk through the following questions to help determine the right one. The following is based on the latest guidance from the Federal CIO council.

    1. Are all the data owners federal employees who generated the data as part of their work?

      • If YES, then the license recommendation is CC Zero (Creative Commons CC Zero).

      • If YES, but you are concerned about international public domain, then the license recommendation is US Public Domain.

      • If NO, see #2.

    2. If all data owners are not federal government employees, were the data generated under a grant, collaborative agreement, or other contract agreement?

      • If YES, check the terms of that agreement for a “rights in data” or related clause. Use the licensing information specified there.

      • If YES, but no specific license is mentioned in your agreement, see #3.

    3. If your licensing agreement allows non-federal data owners to choose a license, the license recommendation is CC Zero (preferred), CC BY (Creative Commons Attribution), or CC BY SA (Creative Commons Attribution-ShareAlike). License definitions and additional information can be found at http://opendefinition.org/licenses/ or https://creativecommons.org/licenses/

      • If more restrictive licensing is requested by non-federal data owners, then the data are unsuitable for the Ag Data Commons. We host and support open data to facilitate sharing and reuse.
    4. If your data was generated with no US federal funding, then they are unsuitable for the Ag Data Commons at this time. As the Ag Data Commons grows, we may consider expanding to include data from externally funded researchers. If you have questions or need assistance, email us at NAL_ADC_Curator@nal.usda.gov.

Video tutorial: Choosing a license for your data on the Ag Data Commons

  • Public Access Level: ​Enter the degree to which this dataset could be made publicly available, regardless of whether it has been. See Project Open Data for more info.

  • Bureau Code​: Please specify one or more bureau codes. For example USDA-Agricultural Research Service (ARS) funded research should be “005:18”. See Project Open Data for more info.

  • Program Code​: Please specify one or more program codes. ARS and National Institute of Food and Agriculture (NIFA) funded research should be “005:037 - Research and Education”. See Project Open Data for more info.

  • Funding Source(s)​: Indicate all sources of funding for the dataset. Include the institution name and project number, if available; multiple allowed. If this is an ARS or NIFA funded project, put the CRIS number in the Project or grant number field.

  • ARIS Log Number: ​This only applies to datasets created by Agricultural Research Service employees. Enter the ARIS log number, if applicable.

  • Resources: allows you to attach resources to your dataset that are already uploaded to other datasets

    • This feature is useful if your data is part of a larger program that uses the same documentation or data dictionary
    • Begin typing the name of the resource, and choose the correct one from the list
    • The resources must be exactly the same among all your datasets. If you make a change to one, it changes on every dataset that this resource is uploaded to because it is the same file.
    • If you need to make customizations in the title or description or any other part of the resource, it is best to skip this section and upload the resource directly to this dataset with your customizations
  • Highlight Image​: This image should be an informative and attractive picture or graphic, and may be used to accompany your dataset. Files must be less than 4 GB, and in one of the following file types: .png .gif .jpg or .jpeg. Ag Data Commons curators reserve final approval for all image selections.

  • Revision log message: Provide an explanation of the changes you are making. This will help other authors understand your motivations.

  • Moderation state: Set the moderation state for this content - this is one place content authors can submit their draft for review.

Video tutorial: Submit your dataset for review on the Ag Data Commons

Data Description Field Pointers

Include the following in the narrative Description field of your dataset, if applicable:

Even though some of the following have dedicated fields that allow for greater detail, we encourage summarizing them in the dataset Description field as well

  • Make sure the Description describes the data, not the project or article

  • Description of the experiment setting: location, influential climatic conditions, controlled conditions (e.g. temperature, light cycle)

  • Processing methods and equipment used

  • Study date(s) and duration

  • Study spatial scale (size of replicates and spatial scale of study area)

  • Level of true replication

  • Sampling precision (within-replicate sampling or pseudoreplication)

  • Level of subsampling (number and repeat or within-replicate sampling)

  • Study design (before–after, control–impacts, time series, before–after-control–impacts)

  • Description of any data manipulation, modeling, or statistical analysis undertaken

  • Description of any gaps in the data or other limiting factors

  • Outcome measurement methods and equipment used

Data description pointers are based on information from the following publication:

Haddaway, N. & Verhoeven, J. (2015). Poor methodological detail precludes experimental repeatability and hampers synthesis in ecology. Ecol Evol, 5(19), 4451-4454. http://dx.doi.org/10.1002/ece3.1722

Description of fields on “Add Data” page

Video tutorial: Add a data resource to your dataset on the Ag Data Commons

Fields marked with an asterisk are required. All other fields are optional. If N/A, leave the field blank.

  • File Name​*​: Enter a descriptive resource title as you want it to appear; if uploading multiple resources, include information that distinguishes this one from the others you will provide. It is particularly helpful to include the format of the resource in the title and it is not necessary to repeat details that are in the title of the dataset.

    • Helpful information to add to a file name includes project or experiment name or acronym, location/spatial coordinates, researcher name/initials, date or date range of experiment, type of data, conditions, version number of file
    • Avoid confusing labels such as “revision”, “final”, “final 2”, etc. in the file name
  • Option to Link to a file, Link to an API, or Upload a File​:

    • Upload a file: ​If you are uploading a file, multiple files can be dragged into the interface window at once. Or, select “+ Add Files” in the lower left corner of the dialog box. Remember to select “Start upload.”

      • If your file is in csv or text format, select the delimiter used, if applicable.

      • Please note, the Ag Data Commons does not accept executable files of any type.

    • Link to a file: ​Enter a link to one of the following file formats: csv, html, xls, json, xlsx, doc, docx, rdf, txt, jpg, png, gif, tiff, pdf, odf, ods, odt, tsv, geojson, or xml.

    • Link to an API: ​Link to an API if you want to link to a web page.

  • File Format​: Specify format of the resource (e.g. CSV, HTML, XML, etc.). All resources linked to from an API should be designated as HTML regardless of the type of resource found at that link.

  • Description​: Provide a detailed description of your resource. For example, if it is an Excel file with multiple tables, describe what the different tables contain. Imagine that you are a novice user: what information would you need to make sense of this resource?

  • Text Format​: This refers to the text format of the description block. Although it should not be necessary, you can change the format to suit your preference.

  • “Make this resource the data dictionary” checkbox​: Check this if the resource you are uploading is the data dictionary. See the sections regarding data dictionaries in this document for more information on data dictionaries.

  • Recommended Software:​ Provide a name, version number, and stable URL for software tools recommended to view or run this resource.

  • Dataset:​ This field will auto-populate with the title of the dataset associated with this resource. Do not edit or add to this field.

  • Weight​: This field determines the order in which the resources will be displayed on the dataset page. The lowest value will be displayed first.

  • Described by​: This field is restricted to ADC curators. Submitters should leave this field blank.

  • URL Path Settings​: A URL will automatically be generated for this metadata record unless you uncheck the “Generate automatic URL alias” box; once it’s unchecked, you may enter a customized URL if you wish.

  • Revision Information​: If this is a revision to a previously uploaded file, check the box and provide an explanation as to why you are revising the submission.

  • Scheduling Options​: If you would like to place an embargo on the publication of your resource, enter the date you would like it to be published here. Format is YYYY-MM-DD. You may enter a date up to three years in the future. If you would like your data resource to be published immediately upon curator approval, with no embargo, leave this field blank.

DOI Opt-out

By default, if you upload resources for the Ag Data Commons to publish, the Ag Data Commons will mint a new DOI for your dataset upon successful review of your submission. If you already have a DOI for your data, or otherwise wish to opt-out of this service, please e-mail: agrefquestion@libraryresearch.info

In the body of your email, include the title and author(s) of the dataset that should not receive a DOI.

Guidelines: Formatting Your Data

  • You may provide multiple formats for the same data.

  • Consider submitting the files used by your statistical program as they are already machine-readable.

  • Provide data as CSV (comma-separated values) wherever possible for tabular data. In addition you may also submit spreadsheets to capture formulas. For more information on creating CSV files, see the next page.

  • Include meaningful column headers in the first row of the file. This will allow the data to be converted into other formats and display successfully. Avoid subheadings and summary information -- these will not make sense in CSV format.

  • Author your spreadsheet as one table per tab, as opposed to more than one table in a tab. This ensures the data will be machine readable.

  • Avoid blank rows or columns between data elements.

  • Do not use zeroes or leave a cell blank. Select a code to identify missing data; using -999 or -9999 is a common convention. NA is also an acceptable value for missing or inapplicable data. Indicate the code for missing data in the data dictionary.

  • When exporting data to another format, check to ensure that no cells with missing data have zeros, or are blank in the resulting file. Check to be sure that the resulting rows and columns make sense.

  • Use standard terminology wherever possible (include both the Latin and common names for plants and animals). As linked data becomes the norm, this will increase the impact of your dataset by making it easier to find.

  • As a scientific researcher, you may have access to disciplinary thesauri or ontologies that you wish to use. If not, a couple suggestions are the National Agricultural Library Thesaurus (a broad thesaurus of Agricultural Terms), and the Integrated Taxonomic Information System (which provides standardized names for plants, animals, fungi, and microbes).

Create a CSV data file

Video tutorial: Convert data files to CSV format

CSV (Comma Separated Values) is the preferred format for most data in the Ag Data Commons. Not only is a dataset more versatile as a CSV, but viewers can take advantage of built-in features like data visualizations and previews with CSV data files.

  • If your database can output data as CSV, opt for that choice to create your data file.

  • If you have an Excel spreadsheet and want to convert it to CSV, follow these steps:

  1. Make sure there is a single column header row to label your variables. If your current data has more than one header row, consider combining these into one row in a way that makes sense.

  2. Combine your data onto a single spreadsheet page. Delete any additional pages by right-clicking the blank pages at the bottom of your spreadsheet and choosing "Delete". Note that one multi-tab spreadsheet might become several CSV files.

  3. In order to use the built-in chart previews, remove any special characters from your column header rows. Some characters in the column headers prevent the embed / link features from working with charts and graphs in the Ag Data Commons. This step is not necessary to create a CSV, but is necessary to use the Embed feature for sharing your data visualizations created in the Ag Data Commons. This is a list of incompatible and acceptable special characters.

Characters that should NOT be used in column headers for full compatibility with built-in data visualizations:

CharacterDescriptionCharacterDescription
`accent!exclamation point
@"at" symbol#number / hashtag symbol
$dollar symbol%percent symbol
^caret &"and" symbol
*asterisk( )parenthesis
< >greater / less than symbols?question mark
[ ]brackets{ }brackets
|pipe\back slash
'apostrophe~tilde

Acceptable characters to use in column headers for full compatibility with built-in data visualizations:

CharacterDescriptionCharacterDescription
/forward slash.period
"double quotes

  1. Remove any commas from your document. Because the delimiter is a comma, extra commas in your text can cause errors in interpreting the data.

  2. Save your document - choose "Save as: CSV (Comma delimited)

  3. Upload this document as a data resource on the Ag Data Commons, and check Grid, Graph, and Embed to take advantage of the full functionality of the Ag Data Commons data previews.

(Please note, data is not required to be in CSV format, but is highly recommended if compatible with your data)

Submitter Checklist

Before submitting your dataset for review, check the following items to ensure your dataset is described and formatted the best way possible:

General Submission Checklist

✔ Ensure all acronyms and abbreviations are spelled out

✔ Check for typos

✔ If the Title is based directly on a paper, use the “Data from:” convention

✔ Make the Title descriptive - Include locations, dates, and informative keywords, if applicable

✔ Make sure the Description describes the data, not the project or article

✔ Make sure the Summary is filled in and consists of an appropriate sentence to represent the dataset

✔ Include Geographic / Temporal Information and Use Limitations, if applicable

✔ Add Author IDs. USDA ID - When you click on an author’s name, their ID will be everything after display/ in the URL, for example ARS-ABC01234. If no USDA ID exists, check ORCID, then Scopus and ResearcherID. Often multiple IDs will exist, choose the one with the most publications attached to it.

✔ Provide a Contact Name and Contact Email

✔ Choose an appropriate License (Note that funding sources affect the type of license assigned to the dataset)

✔ Add Bureau / Program Codes if appropriate - If ARS funded, these should be 005:18 / 005:037, respectively

✔ Add Funding Source(s) and Project or grant number

✔ Add User-supplied tags (this usually requires subject area research). Add Latin names for plants/animals as user-supplied tags. Add National Program Number (e.g. NPxxx) if applicable (ARS-specific) in this field.

✔ Add Program hierarchy tags if appropriate

✔ Make sure the following fields are populated to reserve a DOI: title, author, publisher, contact name and email, and product type (dataset, etc)

Resource Review Checklist

✔ Submit appropriate resources (data, not just figures), preferably in a machine readable format (csv is preferred for data files; XLS can be converted and added as an additional CSV resource)

✔ Provide a Description of the data file

✔ Review resources to ensure they contain column headers, that file titles are meaningfully descriptive, and that all links / downloads work as advertised

✔ Check the Graph and Grid boxes if your data is tabular / Check the Map box if data is geospatial with coordinates / Check the Embed box to provide a link to data visualizations created from your data

✔ Provide a Data Dictionary / read me file

When Finished...

✔ Change Moderation State on your dataset AND each resource file to "Needs Review" when ready to publish

Data Dictionary - Purpose

Video tutorial: Data Dictionaries on the Ag Data Commons

Data dictionaries are used to provide detailed information about the contents of a dataset or database, such as the names of measured variables, their data types or formats, and text descriptions. A data dictionary provides a concise guide to understanding and using the data. Ideally, all Ag Data Commons (ADC) records for datasets and databases should include or point to a data dictionary. It is preferred that these data dictionaries be machine readable, in csv format.

If your data are managed in a standard relational database you will likely be able to generate a data dictionary through your software. This will provide a document that is consistently formatted and contains what is needed for others to understand your data. See the following section for more information.

If your data are managed in spreadsheets, text files, or comma separated values, you will need to manually prepare a data dictionary. To support machine-readability, we recommend preparing your data dictionary as a spreadsheet. If you prefer to prepare it as a .doc or .pdf, we recommend embedding a data dictionary table in your document that can be easily extracted. A data dictionary template and examples can be found toward the end of this document.

The following are recommended guidelines for data dictionaries; not requirements. These guidelines are subject to change, as best practices are evolving.

Automatically generating a data dictionary

Enterprise-level databases often contain built-in tools for automatically generating data dictionaries. Consult your database administrator or software documentation for instructions specific to your system.

To generate a data dictionary from MS Access, select the "Database Tools" tab, then select "Database Documenter" (under “Analyze”). Now you should see the Documenter dialog box. If you have entered a description of your fields in Design View, they will carry over to the generated data dictionary here. It should resemble the following:

Manually creating a data dictionary

Video tutorial: Data Dictionaries on the Ag Data Commons

For a spreadsheet

Submit a spreadsheet (.XLS or .XLSX) with one tab for introductory information, and separate data dictionary tabs that correspond to each existing tab in your dataset. For example, if your dataset consists of three tabs, your data dictionary will have four tabs: the first for introductory, background information, and three more to correspond to the three tabs of data. Consider using our data dictionary template to get started.

Best Practices:

  • Submit your data as a spreadsheet or csv

  • One table per tab

  • No extraneous comments

  • No empty cells, columns, or rows (enter n/a if nothing applies)

  • Spell out all abbreviations

  • Element definitions should be stated in the singular, be succinct, and be able to stand alone from other element definitions

If you would rather submit a DOCX or PDF, embed tables in your document so they will be exportable. Submit a .DOCX or .PDF with the following:

  • Introductory and explanatory text

    • Explain context: is if from a singular research article, or a larger project?

    • Provide a URI (Uniform Resource Identifier), which will usually be a URL or a DOI (Digital Object Identifier) for the dataset or related journal article.

    • Other pertinent information such as version, date released, etc.)

  • A listing of elements (fields), in addition to the following:

    • Element source table

    • Element definition

    • Element variables

    • Element data type

    • Element field length

    • Required y/n and/or null value note

For a database

Use this option only if you are unable to automatically generate a machine-readable data dictionary. Guidelines for automatically generating a data dictionary from a database are in the previous section of this document.

We suggest submitting your data dictionary as a spreadsheet (consider using our blank template). If you would rather submit a DOCX or PDF, embed tables in your document so they will be exportable. Submit a DOCX or text searchable PDF with the following:

  • Introductory and explanatory text

    • Explain context: is if from a singular research article, or a larger project?

    • Provide a URI (Uniform Resource Identifier), which will usually be a URL or a DOI (Digital Object Identifier) for the dataset or related journal article.

    • Other pertinent information such as version, date released, etc.)

  • A listing of elements (fields), in addition to the following:

    • Element source table

    • Element definition

    • Element variables

    • Element data type

    • Element field length

    • Required y/n and/or null value note

  • If possible, a data diagram or data model showing the relationships among tables.

For example:

Source: The Pacific Northwest Forest Inventory and Analysis Database

Data Dictionary - Blank Template

This blank template can be used to manually create a data dictionary. Use one row for each data element, and do not leave rows, columns, or cells blank. Add rows and columns as necessary, and enter n/a if nothing applies. See below for an explanation of column headers.

Use the following link to access a blank template to download and customize:

Blank Template

This template is read-only. Please copy the information into your own spreadsheet to create your data dictionary.

Explanation of column headers:

  • Spreadsheet tab​: If your spreadsheet has multiple tabs, identify the tab you are describing.

  • Element or value display name​: What is the name used in your data file?

  • Description​: Write a brief definition, stated in the singular, that could stand alone from other element definitions.

  • Data type​: For example, indicate varchar, integer, date, etc.

  • Character length​: For example, the maximum length for Excel is 255, so indicate 255 or less.

  • Acceptable values​: List all acceptable values, separated by pipes ( | ). This may be a field name or a range of values.

  • Required?​: Enter y/n to indicate whether this field is required.

  • Accepts null value?​: This required to run calculations on your data. Indicate y/n if null value is allowable.

    • A note on null values: Null is the absence of a recorded value for a field. A null value differs from a value of zero in that zero may represent the measure of an attribute, while a null value indicates that no measurement has been taken.

Data Dictionary - Examples

Examples 1 & 2: Created by the ADC team using the source dataset, Data from: Enabling proteomic studies with RNA-Seq: the proteome of tomato pollen as a test case:

Example 3: Selected columns from the USDA National Finance Center Insight Training Participant Guide Version 1.0, June 2013. Source: USDA National Finance Center Insight Training Participant Guide Version 1.0 June 2014

The following are more USDA, ecological, and agricultural data dictionary examples:

If you still have questions regarding data dictionaries, e-mail us at agrefquestion@libraryresearch.info