• You may provide multiple formats for the same data.

  • Consider submitting the files used by your statistical program as they are already machine-readable.

  • Provide data as CSV (comma-separated values) wherever possible for tabular data. In addition you may also submit spreadsheets to capture formulas. For more information on creating CSV files, see the next page.

  • Include meaningful column headers in the first row of the file. This will allow the data to be converted into other formats and display successfully. Avoid subheadings and summary information -- these will not make sense in CSV format.

  • Author your spreadsheet as one table per tab, as opposed to more than one table in a tab. This ensures the data will be machine readable.

  • Avoid blank rows or columns between data elements.

  • Do not use zeroes or leave a cell blank. Select a code to identify missing data; using -999 or -9999 is a common convention. NA is also an acceptable value for missing or inapplicable data. Indicate the code for missing data in the data dictionary.

  • When exporting data to another format, check to ensure that no cells with missing data have zeros, or are blank in the resulting file. Check to be sure that the resulting rows and columns make sense.

  • Use standard terminology wherever possible (include both the Latin and common names for plants and animals). As linked data becomes the norm, this will increase the impact of your dataset by making it easier to find.

  • As a scientific researcher, you may have access to disciplinary thesauri or ontologies that you wish to use. If not, a couple suggestions are the National Agricultural Library Thesaurus (a broad thesaurus of Agricultural Terms), and the Integrated Taxonomic Information System (which provides standardized names for plants, animals, fungi, and microbes).