Ag Data Commons
Browse
1/1
3 files

Current and projected research data storage needs of Agricultural Research Service researchers in 2016

dataset
posted on 2023-11-30, 07:49 authored by Cynthia Parr

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling.

The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly.

From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.

Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond.

We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival.

To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.


Resources in this dataset:

  • Resource Title: Appendix A: ARS data storage survey questions.

    File Name: Appendix A.pdf

    Resource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here.

    Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/


  • Resource Title: CSV of Responses from ARS Researcher Data Storage Survey.

    File Name: Machine-readable survey response data.csv

    Resource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).


  • Resource Title: Responses from ARS Researcher Data Storage Survey.

    File Name: Data Storage Survey Data for public release.xlsx

    Resource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.

    Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Funding

Agricultural Research Service

History

Data contact name

Parr, Cynthia

Data contact email

cynthia.parr@ars.usda.gov

Publisher

Ag Data Commons

Intended use

Digital data storage needs assessment for intramural United States federal researchers in agricultural domains.

Use limitations

Sample may be biased if researchers with large storage concerns were more likely to respond, or if some kinds of units were more effective at getting a response. Web-accessible databases and repositories and administrative data were not included, nor were back up storage needs.

Temporal Extent Start Date

2016-10-01

Temporal Extent End Date

2016-11-01

Theme

  • Not specified

ISO Topic Category

  • farming
  • structure

National Agricultural Library Thesaurus terms

scientists; issues and policy; surveys; databases; Agricultural Research Service; data collection; information management; infrastructure

OMB Bureau Code

  • 005:18 - Agricultural Research Service

OMB Program Code

  • 005:040 - National Research

Pending citation

  • No

Public Access Level

  • Public

Preferred dataset citation

Parr, Cynthia (2017). Current and projected research data storage needs of Agricultural Research Service researchers in 2016. Ag Data Commons. https://doi.org/10.15482/USDA.ADC/1346946

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC