About Our Technology

About Our Technology

The Ag Data Commons can best be described as an ecosystem of distributed repositories. Data are typically stored in the most appropriate disciplinary-specific database or repository. If no appropriate database or repository exists, the files can be stored in the Ag Data Commons. In either case, a catalog record with rich metadata should exist on the Ag Data Commons. This page provides detail on our standards and technology stack. It points to documentation and tools that developers can use to leverage both data and metadata.

You can view this content as a single page or in chapter form.

Our Platform

The Ag Data Commons core platform uses DKAN, a Drupal 7 installation profile. In our beta version, we added customizations and configurations to support researchers and research data. In partnership with CivicActions we are releasing these features for community use as DKAN Science. DKAN source code and documentation are freely available. To track its progress, visit the DKAN Science Github page.

This effort includes:

  • Productizing our open source platform for general, scalable use
  • Linkage with literature, related datasets and other online resources
  • Collections to group data from same research program
  • Guidelines and assistance for complex data dictionaries
  • Incorporates metadata elements from a variety of relevant standards

The DKAN Github page follows the current development of DKAN, as well as issues submitted by users, notes on current fixes and future development.

Our Metadata

The Ag Data Commons is a metadata repository that uses the Project Open Data v1.1 metadata schema and its metadata resources. The POD 1.1 set are our core, extended by selected fields from:

  • ISO 19115-2 and ISO 19115-3 - Defines the schema required for describing geographic information and services by means of metadata. It provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services.
  • ISO 19115 Topic Category - General subjects for which geospatial data may be relevant. Comprised of few, broad terms.
  • DataCite metadata schema (DOIs) - A list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions.
  • Data Catalog Vocabulary (DCAT) - An RDF vocabulary designed to facilitate interoperability between data catalogs published on the web. Project Open Data 1.1 is DCAT compliant.
  • Author ORCIDs - Provides a persistent digital identifier that distinguishes researchers from one another and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between authors and their professional activities ensuring that their work is recognized.
  • National Agricultural Library Thesaurus and Glossary - Online vocabulary tools of agricultural terms in English and Spanish and are cooperatively produced by the National Agricultural Library, USDA, and the Inter-American Institute for Cooperation on Agriculture as well as other Latin American agricultural institutions belonging to the Agriculture Information and Documentation Service of the Americas (SIDALC).

Our Application Programming Interfaces (APIs) and Endpoints

The following APIs are available to users of the Ag Data Commons. No API key is currently required to use these features.

DKAN Project Open Data

DKAN provides catalog features that provide endpoints for JSON and RDF in compliance with the Project Open Data requirements.

The Ag Data Commons uses the standard DKAN endpoint to provide a JSON listing of all published datasets and their metadata. We have created a customized version of that endpoint that is a filtered subset of the full data.json restricted only to those datasets with an Agricultural Research Service bureau code. This custom endpoint regenerates every Sunday and serves ARS metadata to the USDA Enterprise Data Inventory, which in turn is ingested into the United States Government data.gov catalog.

DKAN Datastore API

DKAN provides a datastore API. This API allows machine retrieval of the data contained in each tabular file in our repository. For example, to get the entire Ag Data Commons metrics data, try this example. This is a lookup by resource id of the resource belonging to the metrics dataset, which provides monthly statistics about the Ag Data Commons.

Agricultural Research Service (ARS) National Program API

The ARS National Program API counts and retrieves titles for Agricultural Research Service (ARS) datasets in the Ag Data Commons. This API also counts and retrieves datasets by keyword, particularly the National Program Numbers. The ARS National Programs page describes the National Programs and their broader categories.

API Dataset Records

The Ag Data Commons contains records with APIs for some of its data. There are currently three ways to find these API dataset records:

  1. Visit the search results page of the "application programming interface" NALT keyword tag
  2. Visit the search results page of the "API" user-supplied tag
  3. Visit the search results page of the resources designated as API format to see dataset records containing or relating to APIs

Search

There are multiple ways to search for content in the Ag Data Commons.

Search Bar

DKAN offers a faceted search similar to CKAN. This functionality is provided by the Search API and Search API DB modules. DKAN can easily be updated to use Apache Solr to power the search using the Search API Solr module. The search bar at the top of the homepage allows users to search metadata text using the Ag Data Commons interface, and supports a full free-text search of metadata record fields. Users can type their search into the box and then choose the magnifying glass icon in the search box to retrieve results:

Sidebar Facets

Users can also filter content with the help of the sidebar facets to the left of the dataset lists. Facets include keywords, author names, funding sources, file format types, and more:

Choosing an option from the facets list narrows down the dataset records accordingly:

DKAN Datastore API

The DKAN datastore API offers search capabilities as well. This is a custom endpoint for the Drupal Services module and can be searched using a variety of functions and parameters.

Harvests

Why We Harvest

The Ag Data Commons serves as the central registry and repository for open, USDA-funded data resources -- primarily datasets, databases, software tools, APIs, and related services. Researchers are permitted to deposit their data with any repository that meets federal open data requirements, as long as they create a record of that data in the Ag Data Commons for cataloging purposes. Harvests allow the Ag Data Commons to ingest records for large groups of data from a given repository all at once and on a recurring basis to capture additions or changes in the records. When datasets are harvested, the resources are added as remote files, which means they are links to the original files on the remote server.

DKAN Harvest Module

DKAN Harvest Module provides a common harvesting framework for DKAN. It supports custom extensions and adds drush commands and a web UI to manage harvesting sources and jobs. To “harvest” data is to use the public feed or API of another data portal to import items from that portal’s catalog into your own. For example, Data.gov harvests all of its datasets from the data.json files of hundreds of U.S. federal, state and local data portals.

DKAN Harvest is built on top of the widely-used Migrate framework for Drupal. It follows a two-step process to import datasets:

  • Process a source URI and save resulting data locally to disk as JSON
  • Perform migrations into DKAN with the locally cached JSON files, using mappings provided by the DKAN Migrate Base module

Harvest Sources

The following are examples of the types of data sources the Ag Data Commons harvests:

See our Harvest Policy for more details on our criteria for harvesting metadata records from outside repositories.

Our Metrics