Monday, 1 June 2015

How do we make data count?

Below is a copy of a post I recently wrote for The Sydney Conference: Scholarly Communication Beyond Paywalls blog. I'm really looking forward to this event which promises to deliver in an unconference participant-driven style. I'll be co-facilitating Thread 5: Dive in and out of communications (multi dimensional).
-------------------------------------------------------------

Data generated through the course of research is as valuable an asset as research publications. Access to research data enables the validation and verification of published results, allows the data to be reused in different ways, helps to prevent duplication of research effort, enables expansion on prior research and therefore increases the returns from investment. Yet the quality and quantity of a researcher’s publications continue to provide the key measure of their research productivity. Sharing data, it seems, still does not count for nearly enough.

In recent years there have been a proliferation of policies strongly encouraging and sometimes even requiring researchers to share their data for the reasons outlined above. This includes policies from governments (e.g. USA, Australia), publishers (e.g. PLOS, Nature), and research funders (e.g. NIH, ARC). These policies are certainly opening up more data but even more research data remains locked away and therefore undiscoverable. So how do we unlock more data? One of the ways is to figure out how to make data count so that researchers have more incentives to undertake the extra (and in the main, unfunded) work required to share their data.

A 2013 study by Heather Piwowar and Todd Vision looked into the link between open data and citation counts. They found that the citation benefit intensified over time: with publications from 2004 and 2005 cited 30 per cent more often if their data was freely available; every 100 papers with open data prompted 150 "data reuse papers" within five years; original authors tended to use their data for only two years, but others re-used it for up to six years. More studies like this one are needed to demonstrate and track over time the link between opening up data and making it count, in this case in the form of citations which – like it or not – is still the primary measure of research impact.

Counting data citations – whether to gather citation metrics or alternative metrics (altmetrics) - is challenging in and of itself because data is cited very differently to publications. Data can be cited within an article text rather than in the references section, which means the article must be open access in order for the citation to be discovered. Sometimes the article that referenced the data is cited rather than the data itself even where the reference applies only to the data. Reference managers don’t tend to recognise datasets and therefore don’t record the Digital Object Identifier (DOI), which creates difficulties since DOIs make it so much easier to track citations. There are also many self-citations, where researchers are citing their own data, and so it difficult to distinguish an article that has cited another person’s data. And there are likely to be differences between how data is cited in the sciences as compared to the humanities.

Fortunately, California Digital Libraries, PLOS and DataONE have partnered in an NSF-funded project called Make Data Count. The project will “design and develop metrics that track and measure data use i.e data-level metrics”. The findings promise to be highly valuable and may also shape future recommendations for the way data should be cited in order for it to be counted.

Sharing impact stories of data reuse is perhaps another way that can help make data count. A number of organisations around the world that promote better data management have been collecting data reuse stories (e.g. DataONE, ANDS). Some researchers may see these stories as a negative because they show that “someone else might get the scoop on ‘my’ data”. But these stories can also inspire researchers to spend the extra effort to make their data available when they feel they are ready to. The rewards may not only be in the metrics but in the unexpected ‘buzz’ of seeing ‘your’ data have a longer life and be reused in ways you had not even imagined. Are there other ways that we can help make data count? It’s worth thinking about because "data sharing is good for science, good for you"

Monday, 4 May 2015

Guest blog: ORCID in Australia

I've been doing a lot of work around ORCID (Open Researcher and Contributor ID) over the past year including organising two national ORCID roundtables, participating in a national ORCID Working Group and giving ORCID presentations at conferences, workshops and events. In April, I became an ORCID Ambassador and had the pleasure of writing a guest blog post for ORCID about what's been happening in Australia.


Monday, 9 March 2015

Library Research Data Services - how does Australia compare?

Carol Tenopir and her co-authors recently published an article on Research data management services in academic research libraries and perceptions of librarians in Library & Information Science Research. The article draws on the results of two studies: librarians' RDS practices in U.S. and Canadian academic research libraries, and the RDS-related library policies in those or similar libraries.

In the article, research data services are categorised in two ways:
1. Informal or consulting services such as consultation on data management plans or providing reference support for finding and citing data and datasets.
2. Technical or hands-on services such as providing technical support for a data repository or directly participating with researchers on a project (as a team member).
The article fleshes out the categorisation with additional examples and the authors found that the first type is more commonly offered than the second, but not by much. Unsurprisingly perhaps, the authors’ reveal that "The most commonly offered or planned informational RDS, finding and citing datasets, is a service that simply extends a familiar library reference service into the realm of data".

Research Data Services in Australia
Thinking about this in the Australian context, I don’t think we could draw the same distinctions. Our library research data services – which are still few and far between - seem far more blended and I’ve not yet heard of Australian RDS that offer reference support for finding datasets. In fact, I think our RDS are probably more weighted toward the second (technical) category, but only just. This is probably because our RDS came into being sometime after significant funding was provided to institutions by ANDS to develop infrastructure and grow the research data commons. The focus was on technical infrastructure and services, not consultancy services, which were developed somewhat later (if at all).

Looking back at the presentations on ‘Developing library research data services’ at an ANDS webinar in September last year, we learned that:
·      Flinders University offered a wide range of RDS including: referral to collaboration options; eResearch tools and services; assistance with ARC funding applications; training for HDR students; metadata creation; advice about compliance with funder/publisher mandates.
·      University of Western Australia’s RDS offered: data management planning tool; institutional research data storage tool; ‘Research Data Online’ for access and discovery of UWA datasets.
·      University of the Sunshine Coast RDS offered: data management planning and a central storage space.

All three services were still developing and staff were on a steep learning curve. Amanda Nixon from Flinders called RDS “giving people what they didn’t know they wanted”. She highlighted the role of the library in offering RDS leveraging a natural link with researchers and their research outputs.

It’s interesting that in Australia, our libraries collaborate more widely than our North American counterparts with respect to Research Data Services (judging by the aforementioned article). All of the three Australian RDS examples above mentioned collaborations with a range of internal partners (e.g. IT services, ethics, research office) and external partners (e.g. statewide eResearch providers, national infrastructure providers such as RDSI).

Training opportunities
The article by Tenopir and colleagues also notes that, "There appears to be somewhat of a mismatch between what academic research library directors believe they offer to their librarians and what the librarians themselves perceive to be available to them in the way of RDS training opportunities. Nonetheless, these results portend well for the future of RDS, as there are clearly some opportunities for training of librarians in RDS skills."


I wonder whether the same mismatch occurs in Australia. I had the pleasure of facilitating an ANDS-CAUL workshop on research data in Auckland last year specifically for the heads of university libraries. The workshop reflected a real concern between UL’s about how to provide their librarians with training in managing research data. Our North American colleagues said that attending conferences was the preferred method of training, followed by courses and in-house training. In Australia, our conference opportunities are more infrequent. ANDS has been providing workshops for librarians, research managers etc that can help fill some of the gaps, however serious attention needs to be paid to this area. It will be interesting to see this develop over the next few years as libraries continue to grow their offerings of research data services and develop the skills of their librarians in this key area.

Wednesday, 4 March 2015

Flying solo: data librarians working outside of (traditional) libraries

Today’s ANDS webinar on ‘Flying solo: data librarians working outside of (traditional) libraries’ was rich with stories from three great presenters:
  • Siobhann McCafferty, Research Data Coordinator for the National Agricultural Nitrous Oxide Research Program
  •  Michelle Teis, Senior Consultant at Glentworth
  • Jane Frazier, Data Librarian with ANDS and formerly a data curator at Dryad data repository.
Each told their own unique story, however there were common elements to all. These included:
  • Being prepared to move jobs, even countries
  • Being in contract roles, not permanent positions
  • Having a real thirst to learn new skills and technologies
  • Drawing on basic library cataloguing skills and developing strong metadata skills (they were all queens of metadata!)
  • Being prepared to engage with content creators, being skilled at interviewing to get or manage content
  • Being flexible and adaptive
  • Being outspoken and proactive – about your skills, about what librarians can do

Data librarianship is a new and emerging profession which is ill-defined in terms of roles and responsibilities and is equally ill-defined in job titles. The speakers advised those wanting to move into the field to:
  • Look beyond job titles – jobs may not be called ‘Data Librarian’ but may actually be data librarian roles
  • Draw on traditional library cataloguing/metadata skills
  • Think big, talk up your skills and your keenness to learn to potential employers.


In response to a question about how those in traditional library jobs could move into non-traditional roles like data librarian, Siobhann made an interesting comment that employers are looking for something different because they don’t really know what librarians can do. She – and the other panellists – advised being proactive and vocal about what librarians (and libraries) can offer in the way of research data management.

The recording will be posted on the ANDS YouTube channel soon and appear under the 'Data Librarians' playlist.

Tuesday, 17 February 2015

DataQ project is promising

DataQ is an interesting project coming out of the USA. The project is funded by the Institute of Museum and Library Services and run by the University of Colorado Libraries, the Greater Western Library Alliance and the Great Plains Network. Although the project is based in the States, it is international in scope. The outcome of the project is a knowledge-base of research data questions and answers curated for and by the library community. This will be achieved by inviting library staff from any institution around the world to submit questions on research data topics. Answers to the questions will go through the editorial team and will also be crowd-sourced with the results posted to the DataQ website, building up a knowledge base with links to resources, tools, best practices and practical approaches to addressing specific data-related issues. It’s a community driven project that will allow libraries to share knowledge, support staff skills development and better understand researcher needs in the area of RDM.

I think this project has a lot of merit and promise. I will be watching with interest to see how the DataQ service will be used - what kind of questions will be asked, will the responses be able to be generalised, how will it cater for international questions where the response may require a country specific answer e.g. a data licensing question. Will they, for example, tag those questions with a country code to enable searchers of the database/website to pull out their content specific to their region. I hope to see some questions posted - and some answers given - from the Australian library community.

Tuesday, 3 February 2015

Research Support Community Day - impact stories

Yesterday I attended the Research Support Community Day (#rscday2015) at UNSW in Sydney. This event is organised annually by a small, dedicated group of research support librarians who generously organise the event in their "spare time". This year, my key learnings were:

1. Telling stories is critical. In this day and age, where libraries must justify every dollar spent, librarians need to be able to effectively 'sell' the importance of the library to its institution. Much time is spent gathering statistics and turning them into eye-catching visuals, supplemented by some qualitative data e.g. researcher testimonials. Sue Henczel made the point that 'people forget facts but they never forget a good story'. The ideas presented at the support day were about combining facts and qualitative data into a story that tells in a powerful way the impact the library has had. If the library's value to the institution cannot be clearly evidenced then the very jobs of librarians will be in jeopardy.

2. Some researchers love and use social media while others do not. I listened with interest to researcher and academic, Keith Parry (@sportinaus), as he talked about his day using social media. He talked about the value of tools such as Twitter and YouTube in connecting with his students, promoting his work, and connecting with colleagues. He reflected on the value of publishing in The Conversation over academic journal publishing in terms of audience reach. He also talked about the perils of engaging in social media, such as trolling and distraction. A lot of Keith's time is also spent looking at the statistics provided by social media. ImpactStory is of real value here  (sadly it was not mentioned by Keith it was mentioned by a subsequent speaker). Librarians could learn from looking at this need and the types of statistics provided by social media - can these be used in a library context e.g. added to repository statistics provided to researchers about their publications? It was acknowledged that the library can play a key role in promoting the work of its institutions researchers via social media and Twitter in particular e.g. by tweeting recent publications.


3. Research data is still not seen as a profile boost for researchers. While Keith was the only researcher who spoke at the event, his comments are familiar: 'why would someone want my data? I'm not sure if I can share my data anyway'. Data is of increasing value to publishers, funders, institutions and some researchers but many researchers have yet to come on board the data express. The benefits, the carrots, are still not so clear and juicy-looking. For more reasons not to share data, see the funny yet tragic Open Data Excuse Bingo.

4. Research data services are important to institutions. While researchers may still struggle with concepts of data management and data sharing, institutions are putting in infrastructure and services to help with that because they see value in it. The stand out example from the Research Support Community Day was UNSW's 'ResData' which is an impressive web-based tool built in-house. To use the research data storage at UNSW, you need to complete a data management plan. ResData allows you to do this - and so much more - and ensures that should the funding bodies mandate DMPs, then UNSW will be ready to go. They have also delved into new territory - data support for HDR candidates. Maude Frances, with her brand new 'Dr' title (yes, I'm jealous!), presented on ResData and nailed it when she said that the rollout of a new system is as much about the support model and it is about the system. There is much to learn here.


This was the second Research Support Day that I have attended and I highly recommend it to librarians working in this space. 

Monday, 12 May 2014

Becoming a Data Librarian

Today I attended an Australian National Data Service (ANDS) webinar on ‘Becoming a Data Librarian – everything you wanted to know….’ The topic obviously struck a nerve with some 124 connections to the webinar – and many of these would have involved more than one person listening in. As there are actually very few ‘Data Librarians’ in Australia, I assume that the webinar attracted a broad range of people such as librarians in other roles, librarians who do some data management but who are not called ‘Data Librarian’, library managers, data managers, library/info mgt students and more.

There were three speakers at the webinar: Cathy Miller from University of Adelaide, Philippa Broadley from QUT and David Groenewegen from Monash University. Cathy and Philippa shared their ‘path’ to becoming a data librarian while David took a birds-eye, senior managers perspective. Each reflected on the skills and knowledge that were needed as a data librarian. There were common themes in all three talks:

  • There is no well-worn path to becoming a data librarian. This is a new role, which many libraries are yet to create, and so there is no clear or common career path leading into this role.
  •       The role of data librarian is not well defined and is continually evolving, with the job responsibilities differing widely between institutions. Rather than being too narrow, the role is often much broader than the title indicates, though data management is of course at the core of the work.
  •       The role requires generic “soft” skills like communication. This is critical for conducting aspects of the role such as face-to-face data interviews with researchers, writing policy documents, training a group of researchers in data management and so forth. Another “soft” skill was obviously flexibility – both Cathy and Philippa changed roles frequently before their current data librarian roles and were prepared to take fixed-term contract (rather than permanent) positions. These “soft” skills are not exclusive to librarianship and may be gained outside of the library.
  •       There are some “hard” skills that help transition into the data librarian role. In particular: metadata, technical and repository skills and knowledge. Knowledge of the broader research environment and project management skills were also an asset.
  •      Most skills and knowledge are acquired on the job. David reflected on what librarians bring to the roles, such as knowledge of the scholarly environment and skills in organising and describing information resources. However, knowledge of eResearch and the research data environment along with skills in things like metadata were gained on the job, while in the role, rather than acquiring this beforehand (e.g. through a library degree or short course).

These common threads, reflected by the speakers, were heartening to me as they confirm the findings that Sam Searle and I wrote about in our VALA 2014 paper Redefining‘the Librarian’ in the context of emerging eResearch services.


Cathy and Philippa had both been involved in multiple ANDS-funded projects prior to or as part of their data librarian roles, which gave them advanced skills and knowledge in relation to data management. At a senior level, David was an ANDS Director before he moved into a new role at Monash University. This shows how important the role of ANDS has been in fostering better data management practices at Australian institutions, however it is concerning: now that many of the ANDS–funded projects at Australian institutions have concluded, where to for the data librarian roles? How many libraries are willing to create permanent roles for Data Librarians, or, re-write position descriptions of existing library roles so that they include data management? Will more libraries make data management mainstream, core library business? I like to think the answer is yes, and that it will be fuelled by the needs of researchers, changes in the requirements of research funding bodies including government, and by local policy implementation.