Why We Harvest

The Ag Data Commons serves as the central registry and repository for open, USDA-funded data resources -- primarily datasets, databases, software tools, APIs, and related services. Researchers are permitted to deposit their data with any repository that meets federal open data requirements, as long as they create a record of that data in the Ag Data Commons for cataloging purposes. Harvests allow the Ag Data Commons to ingest records for large groups of data from a given repository all at once and on a recurring basis to capture additions or changes in the records. When datasets are harvested, the resources are added as remote files, which means they are links to the original files on the remote server.

DKAN Harvest Module

DKAN Harvest Module provides a common harvesting framework for DKAN. It supports custom extensions and adds drush commands and a web UI to manage harvesting sources and jobs. To “harvest” data is to use the public feed or API of another data portal to import items from that portal’s catalog into your own. For example, Data.gov harvests all of its datasets from the data.json files of hundreds of U.S. federal, state and local data portals.

DKAN Harvest is built on top of the widely-used Migrate framework for Drupal. It follows a two-step process to import datasets:

  • Process a source URI and save resulting data locally to disk as JSON
  • Perform migrations into DKAN with the locally cached JSON files, using mappings provided by the DKAN Migrate Base module

Harvest Sources

The following are examples of the types of data sources the Ag Data Commons harvests:

See our Harvest Policy for more details on our criteria for harvesting metadata records from outside repositories.