The State of the Craft in Research Data Management - Part III
November 30, 2020 | Susan Tussy
Part III in a Five-Part Series
Data volumes are exploding, and the need to efficiently store and share data quickly, reliably and securely—along with making the data discoverable—is increasingly important. Deliberate planning and execution are necessary to properly collect, curate, manage, protect and disseminate the data that are the lifeblood of the modern research enterprise.
In order to better understand the current state of research data management we reached out to several research computing leaders at institutions around the world to gain insights into their approach and their recommendations for others. We wanted to explore how data management tools, services, and processes deployed at their institutions help the research community reduce the amount of time spent addressing technological hurdles and accelerate their time to science.
At these, and many other leading institutions, new methodologies and technologies are being developed and adopted to create a frictionless environment for scientific discovery. Self-service data portals and point-and-click data management tools such as Globus are being made available to allow researchers to spend more time on science and less on technology. Education, documentation and a data management plan can assist researchers in their desire to spend more time on science. Hack-a-thons, webinars and researcher resource books are just some of the ways that institutions are educating researchers on best practices. Science Node published a summary of our findings, and we present here a more complete discussion with Sharon Broude Geva from the University of Michigan, the third interview in our five-part series.
Sharon Broude Geva, Director of Advanced Research Computing, Office of Research, University of Michigan
How has research data management evolved over time?
As a community, we first identified the need for reproducibility and data sharing, while understanding the wariness and concerns surrounding both practices. People have been thinking about data management for a long time, and eventually (if not already) will be forced to implement it by the funding agencies, so this discussion can no longer be postponed. Initially, discussion was all about funding and storage resources at an individual level. Clearly, it is now a much bigger issue and requires institutional commitment.
Three key questions have emerged:
1. “What is the data we're managing?” It used to be small and large files/data sets, but software is now also seen as data. Anything that has to do with reproducibility is data: scripts, publications, notes, etc. And people are looking for more recognition for the things they do. Intellectual property that the institution and the people should get credit for. Even for things like responding to questions on Stackoverflow.
2. “What does management mean?” to different people it means different things that need to be differentiated. For example, retention is not the same as preservation.
Libraries have always been the experts in preservation. Research IT often is chartered to look at it through the lens of retention. And the Vice President for Research (VPR) and Legal and Compliance have other necessary interpretations and perspectives. Looking at it just from an IT perspective doesn’t cover curation, metadata, findability, etc. You are primarily thinking about where researchers can put the data so it can be used with the computational and other tools and how can it be shared with collaborators at another institution with different logins, for example. Looking at it from a preservation perspective doesn’t necessarily account for the shorter-term needs for continuing projects and reproducibility. This is a top down problem and needs to include all perspectives of data management and enable flow between the different data “stations”.
3. “Who is responsible for this at an institution?” Even if there is a clear assignment of institutional responsibility, this is not a problem to be solved by one part of an institution. It requires involvement and work from university stakeholders across disciplines, including groups that are responsible for the oversight of the research enterprise, General Counsel, Privacy and Compliance groups, Libraries, IT providers, and very importantly, the researchers themselves. At the University of Michigan (U-M), the Provost and VPR created a task force for Public Access to Research Data in an effort to continue the discussions started at AAU/APLU workshops on that topic, and we continue towards broadening the available guidance, training, and resources on campus.”
Are you requiring people to preserve software and data?
Requirements are coming from funding agencies and publishers. In general, universities are trying to decide how much they should require and how to do it. We essentially need to lower the barriers, so researchers can meet requirements. For years, U-M has had a Data Management Services team in the Library. They are doing a phenomenal job supporting Data Management Plans as well as an institutional data repository. But in reality, libraries are often not funded to provide operational support services, as opposed to knowledge and expertise support services. It goes back to having institutional skin in the game to provide all the requirements and tools, expertise and training and U-M has been working on this; The University has actually had a number of repositories for quite a long time (e.g., ICPSR - www.icpsr.umich.edu, Deep Blue, and others). But now we have to think about those researchers dealing with much larger datasets than previously, as well as other types of data to be managed. It requires a lot more time to curate, a lot more storage, and takes funding from various sources.
What are you doing to provide active data services?
Even with tools like Globus, researchers still need to address issues like “how best to move the data I need to my computational and data resources?”, “How do I allow collaborators to access my data?”, “how do I share my data publicly for reproducibility of my research?” You need training for researchers beyond providing the technical tools. And we need to make it easier for researchers to use them. At U-M, Library Data Management Services provides free consultation and assistance in writing data management plans. We also have CSCAR, a consulting and training unit in the Office of Research which includes a staff of 14 plus 6 grad students, with a budget and a very long tradition of providing support for statistics, research computation, and analytics to any researcher on campus. CSCAR is appointment based - 1hr/week free for any U-M researcher, in-person or video conference (even pre-pandemic!), as well as consulting via email and walk-in appointments for issues that require immediate assistance. Their consulting and workshops cover the "middle ground" of data management and addresses specific issues that researchers run into such as “What do I do with my data?” “How do I analyze it?”, “Are there tools I can use for applying ML?”, “How do I organize it?”, “What about structured vs. unstructured data?” The scientific expertise of these consultants is very helpful in understanding the research needs and often they are included in research teams to provide further deep and specialized expertise to the project, beyond the scope of the appointments.
What steps has the University of Michigan taken to ease data management?
Going back to some of the things I mentioned earlier, we have an institutional data repository which is self-service and free, and free consulting of various types is also available. Data Management services work hand in hand with the Libraries’ publishing division. U-M, through the Library, is a founding member of the Data Curation Network, a collaboration between twelve universities where each member makes the expertise of their individual data curators available to all the members.
Unfortunately, most external funding sources don’t cover (non-preservation) retention costs and PIs need to figure out paying for it. We also need to have more discussion about what to keep or not, which is facilitated by the curation and preservation expertise on campus. Through the Office of Research and the Libraries, U-M participates in groups like AAU/APLU’s Public Access to Research Data initiative and workshops which are really getting into many of these issues and allow us to learn from other institutions and organizations and share what we have been doing.
###
To learn more about Globus subscriptions watch our short video now