The work undertaken by this group began with a conversation to develop a common understanding of the competencies that we were all seeking to teach. The basis for the conversation was Carlson et al, 2011. In this article, twelve competencies were proposed and skills that demonstrate the mastery of these competencies were listed.  It was agreed that these twelve competencies were reasonable starting points for our exploration of the data management processes of a range of STEM disciplines.  We hypothesized from the beginning that disciplines would interpret these concepts differently, and would have differing skills that appropriately demonstrated  these competencies within these disciplines.  This has proven to be the case.  For each of the competencies below, a loose description is included, as well as one or more examples of skill building activities incorporated in the curriculum developed by our teams. In those cases where a competency was not addressed by the work within our grant, background for how that competency can be addressed via curricular integration is included.

  • Introduction to Databases and Data Formats

Understands the concept of relational databases, how to query those databases, and becomes familiar with standard data formats and types for their discipline. Understands which formats and data types are appropriate for different research questions.

The University of Oregon team focused on relational databases in their two hour laboratory wide training session. They also created an assessment instrument to capture the impact of their presentation on the awareness of the laboratory members. The assessment instrument consisted of a flat file (paper organizing exercise) that assessed the ability of the participants to organize and describe the relationships (one to one, one to many, many to one). The assessment showed that the team had a basic understanding of relational database structure and that no further training was needed in this area. (See U Oregon internal report.)

  • Discovery and Acquisition of Data

Locates and utilizes disciplinary data repositories. Not only identifies appropriate data sources, but also imports data and converts it when necessary, so it can be used by downstream processing tools.

Purdue Team 2 used disciplinary repositories to provide practical examples of how metadata functions within a search, good and poor use of metadata, and identifying useful datasets based on limited metadata.

  • Data Management and Organization

Understands the life cycle of data, develops data management plans, and keeps track of the relation of subsets or processed data to the original data sets. Creates standard operating procedures for data management and documentation.

Purdue Team 1 embedded themselves within a highly structured engineering design class that includes several sections of software design teams.  These software design teams have lingering issues with poor documentation with consequential delays to the delivery of the project. Purdue Team 1 identified that most documentation for these teams are stored in a variety of places (design notebooks, SVN code repository, Sharepoint).  Students enter the class with  a variety of skill levels and very different understandings of what makes for good documentation. Therefore, team two created a software documentation rubric that was handed out in class for use by the students. The resulting documentation was then assessed using a coding schema developed by Purdue Team 1 researchers.

  • Data Conversion and Interoperability

Becomes proficient in migrating data from one format to another. Understands the risks and potential loss or corruption of information caused by changing data formats. Understands the benefits of making data available in standard formats to facilitate downstream use.

  • Quality Assurance

Recognizes and resolves any apparent artifacts, incompletion, or corruption of data sets. Utilizes metadata to facilitate understanding of potential problems with data sets.

  • Metadata

Understands the rationale for metadata and proficiently annotates and describes data so it can be understood and used by self and others. Develops the ability to read and interpret metadata from external disciplinary sources. Understands the structure and purpose of ontologies in facilitating better sharing of data.

Purdue Team 2 worked with their faculty partner to identify a need for further training in the use of metadata specific to hydrologic data. An in-laboratory workshop addressed this issue. Students started by discussing the concept of metadata through a“Peanut Butter Sandwich exercise to demonstrate how description can make a difference in how or how well individuals understand instructions and the need to be explicit and complete when describing something.” (Purdue Team 2 Internal Report) Once students understood the general concept of metadata, students then searched a disciplinary repository for data sets relevant to their work and developed an understanding of what poor metadata looked like and hypothesized what good metadata may look like. The session was assessed via self assessment for change in perception of metadata. In the next session, the students were then encouraged to create their own metadata file for a project using an online form. The created forms were then assessed by Purdue Team 2 as well.

  • Data Curation and Re-use

Recognizes that data may have value beyond the original purpose, to validate research or for use by others. Understands that curating data is a complex, often costly endeavor that is nonetheless vital to community-driven e-research. Recognizes that data must be prepared for its eventual curation at its creation and throughout its lifecycle. Articulates the planning and actions needed to enable data curation.

Purdue Team 1

  • Cultures of Practice

Recognizes the practices, values, and norms of his/her chosen field, discipline, or sub-discipline as they relate to managing, sharing, curating, and preserving data. Recognizes relevant data standards of his/her field (metadata, quality, etc.)

Purdue Team 2/Cornell

  • Data Preservation

Recognizes the benefits and costs of data preservation. Understands the technology, resource, and organizational components of preserving data. Utilizes best practices in preservation appropriate to the value and reproducibility of data.

The University of Minnesota team taught graduate students basic information about the preservation of data via on online course module. The objectives for the module were to explain the lifespan of potential use for their data in order to recognize the long-term value of their data and to identify the relevant preservation-friendly file format for their research data in order to ensure long-term access to their digital information. The assessment tied to this module was a portion of a data management plan which highlighted the planning of preservation for a personally relevant {to the graduate student} research project.

  • Data Analysis

Becomes familiar with the basic analysis tools of the discipline. Uses appropriate workflow management tools to automate repetitive analysis of data.

  • Data Visualization

Proficiently uses basic visualization tools of discipline. Avoids misleading or ambiguous representations when presenting data. Understands the advantages of different types of visualization, for example, maps, graphs, animations, or videos, when displaying data.

In the 6 week course taught by Cornell, students learned how to effectively visualize their data using R.


  • Ethics, including citation of data

Develops an understanding of intellectual property, privacy and confidentiality issues, and the ethos of the discipline when it comes to sharing data. Appropriately acknowledges data from external sources.

In the case of the Oregon team, it was determined that existing trainings provided by the Institutional Review Board at Oregon was sufficient to meet the needs of the graduate students.