The Good, The Bad, and the Ugly
Friday, February 3, 2023 (9:30am – 4:15pm EST)
Library data is something that we all have to deal with – the good, the bad, and the ugly. Sometimes it can be fun and exciting when you have an interesting story to tell. Other times, it can be terrifying sifting through old forgotten admin logins and new report layouts. As with much of our work, we train ourselves on the data as we go.
Do you have a passion for library data? Are you interested in learning about useful tools and methods that help you comb through data? Are you trying to get a better handle on analyzing and sharing COUNTER 5 and SUSHI data? Would you like to learn new ways to communicate about your data to different audiences? This conference is for you! Come join your library colleagues at the SUNYLA Midwinter Virtual Conference on Friday, February 3, 2023 (9:30am – 4:15pm EST).
Technology requirements for attendance: Computer, internet connection, microphone/speakers (headset recommended) or telephone. Zoom Webinar will be used for this conference and is free for use by attendees.
SUNYLA’s Midwinter Virtual Conference Committee:
Jennifer DeVito, Stony Brook University
Jennifer Jeffery, SUNY Potsdam
Bill Jones, SUNY Geneseo
Jill Locascio, SUNY Optometry (chair)
Carrie Marten, SUNY Purchase
Jessica McGivney, SUNY Farmingdale
9:30am – 9:35am
9:35am – 10:05am
Session 1: Communicating Library Impact By Creating a Simple Dashboard, by Sharon Clapp, Carl Antonucci, PhD, and Martha Kruy (Central Connecticut State University)
As libraries face an increasingly lean budget environment, communicating library impact data to funding authorities and library users has become a critical task. This presentation will discuss the creation and launch of a simple library impact dashboard at Central Connecticut State University on LibGuides. There will be discussion of the dashboard’s construction process, data collected, design choices made, and ideas about the dashboard’s future development based on this experience.
10:05am – 10:35am
Session 2: From Scratches on Paper to Auto-Generated Pivot Tables: The Evolution of Reference Statistics at SUNY Morrisville, by Angela M. Rhodes (SUNY Morrisville)
Almost all college libraries are in the habit of keeping reference transaction statistics. For the small college library on a budget, keeping stats must be simple for all to understand with no cost associated. Back in 2010, SUNY Morrisville was old-school: a piece of paper at the desk with hash marks to indicate a transaction. With a little push from an Excel nerd, Morrisville launched its digital version of hashes on paper, and a world of stats by day of week, time of day, and type of question flourished. After twelve years of use and updates, Excel served its time. Now the library seeks answers to more questions. WHO needs us, and HOW are they accessing us? Now Morrisville moves away from Excel spreadsheet hashes and evolves into a Microsoft Form built to answer those questions.
10:35am – 11:05am
Session 3: Visualization of COUNTER 5 data, by Bridgett Bonar (Dartmouth College)
This presentation will discuss how to create visualizations of COUNTER 5 data. Instead of focusing on a particular program, we will discuss the pros and cons of different data points and visualization styles and how to create clear, useful, and communicable visualizations.
11:05am – 11:15am
11:15am – 11:45am
Session 4: Data Visualization: Using Tableau to Analyze Library Website Usage Data, by Qiong Xu and Robin Naughton (CUNY Queens College)
The Queens College Library (QCL) employs Tableau, an interactive visual analytics software, to visualize the library website usage data collected from multiple platforms. As a digital space hosting accessible resources, the QCL website combines multiple systems, including WordPress for webpages and Springshare LibGuides for research guides, A-Z Database list, and electronic course reserves, to create a unified user experience. QCL has adopted Google Analytics to capture users’ behavior data (page views) and used Springshare to capture LibGuides usage data. Analyzing data collected from multiple systems and pulling them together with data visualizations can help our leaders and stakeholders better understand the library’s role in supporting college teaching and research. The presentation provides a context for incorporating data visualization into QCL’s assessment of user behavior to fuel better decision-making regarding the library website. The presentation will also share examples of data dashboards to showcase the usage of the QCL website in the recent five years, including trends in library webpage views, ranks of topic webpage views, and LibGuides visits. The presentation will conclude with a discussion about potential projects to visualize more library data to assist in library assessment efforts in the future.
11:45am – 12:15pm
Session 5: True Grit: Prospecting Through Inherited Data, by Nate Beyerink and Kyle Constant (University of Central Missouri)
Have you ever been faced with questions like “How did they get these numbers” or comments like “That’s not how we do that here” or “That’s how we’ve always done it?” If so, this session may be for you! Interpreting data requires a thorough understanding of what information was collected and the methods that were used. As organizations across the country deal with a shifting labor force, some of this knowledge is being lost in the shuffle. A common obstacle in transition planning is existing documentation and workflows that are no longer relevant or effective. Removing these obstacles as well as creating new paths to efficiency can be critical to institutional health. This can be particularly problematic as existing staff remain in key roles. Their familiarity with their tools and workflow can present challenges to progress evolution. In this presentation, two newly hired librarians will discuss their experiences piecing together information from past documentation, reconstructing forgotten procedures, and building upon that foundation while integrating a new ILS (Ex Libris) with legacy frameworks. Providing practical tips and tools to develop and improve your own documentation, this presentation will be perfect for anyone preparing for staffing transitions. Even if you are not expecting a transition, succession planning will be integral in maintaining operations no matter what life throws your way.
12:15pm – 12:45pm
Session 6: How to Combine Journals Data from Multiple Data Sources Cleanly & Efficiently, by Nat Gustafson-Sundell (Minnesota State University, Mankato)
At Minnesota State University, Mankato, we have iteratively developed methods and tools for combining multiple journals-related data sources. The basic approach is to make four connections on ISSN and one more on the ‘StandardTitle,’ which is a processed title. This approach requires a further step to validate results before applying a standard number to facilitate report production. For this presentation, I will demonstrate how to clean and format ISSNs using Microsoft (MS) Excel, how to apply a StandardTitle using Jupyter Notebook, how to match lists using MS Access, and how to validate the results. I’ll explain why we’ve come to prefer these technologies for these specific tasks. I will provide the formulae to clean and check ISSNs, as well as the Python code to create StandardTitles. In addition to demonstrating the full method for data matching, I’ll present a simplified method which can produce ‘good enough’ results. Finally, I’ll also demonstrate how quickly a variety of reports can be produced once the underlying data processing is completed.
12:45pm – 1:05pm
1:05pm – 1:35pm
Session 7: It’s 10pm, do you know what’s happening in the library? An Exploration of Hourly Library Usage Data, by Hilary Thompson and James Spring (University of Maryland Libraries)
The COVID-19 pandemic and rising minimum wage prompted access services managers to delve deeper into data we were already collecting in order to better understand when and how the main library is used. This exploratory project involved gathering, reconciling, and identifying trends in hourly usage data from different systems, with the goal of maximizing what the library can offer with its current resources and to advocate for more funding, if needed. We’ll share our experience undertaking this work, discuss how we have and will apply this data, and offer suggestions for others interested in doing something similar at their institution.
1:35pm – 2:05pm
Session 8: Casting a fishing net for SUSHI data, by Laura Beane and Marie Day (Kennesaw State University)
Recently the Kennesaw state University Library System implemented the SUSHI protocol to automate the collection of COUNTER 5 usage statistics in our library services platform (Alma). There remained, however, a need for evaluating the results on a regular basis to verify that the linked accounts were working without the time-consuming process of checking each account individually. This presentation outlines our method for identifying possible problems with the automatic harvesting of usage data on a regular basis so our Systems and Online Services unit can easily maintain and proactively troubleshoot SUSHI accounts.
2:05pm – 2:35pm
Session 9: Automating a Data Pipeline for a Rapid Weeding Project Dashboard, by David Arredondo and Matt McDowall (University of Nebraska at Kearney)
In the fall of 2022, our library began preparing for a renovation project by rapidly weeding our main stacks. Using our LSP reporting platform, Alma Analytics, as the primary data source, we automated a data pipeline to generate, collect, process, and update a data dashboard daily. Primary tools used were Alma Analytics, Power Automate, and PowerBI. The dashboard provides daily updates to reflect the project’s progress to everyone in the Library. The techniques used for this project can be applied to many different scenarios. However, the ugly question remains – if you build it, will they care?
2:35pm – 2:45pm
2:45pm – 3:15pm
Session 10: OpenRefine for Collections Data, by Karen Kohn (Temple University)
The open-access data cleanup tool OpenRefine describes itself as “a powerful tool for working with messy data.” While this may seem like a niche role for a software product, OpenRefine can be useful to libraries in many situations. Bibliographic and holdings data are inherently often messy: for example, publisher names appear with slight variations, books can have several different ISBNs, and the same ebooks are offered on multiple platforms. OpenRefine is relatively easy to use and makes dealing with these complications much simpler. Besides handling larger files than Excel with faster processing, it includes built-in features for finding similar data and standardizing it, splitting data from one cell into several, and merging data from multiple rows into one. The presentation will show screenshots of scenarios in which OpenRefine is particularly well suited for cleaning and standardizing bibliographic data and will offer resources for learning more about the tool.
3:15pm – 3:45pm
Session 11: Intro to Python and API: Finding LC class data with ISBN for collections, by Selena Chau (University of California Santa Barbara)
Collection-centered data for ebook use assessment have varied descriptive metadata. COUNTER allows consistency across vendor platforms, but is limited to providing ISBNs. This session introduces a Python script to fill in Library of Congress classification data so that collection assessment by subject matter is possible. Presentation will be a “show-and-tell” style with code hosted on Binder so that participants can interact with the code as they learn basic concepts of how Python and the API are used in this tool.
3:45pm – 4:15pm
Session 12: Casting a Wide Net: Utilizing Python as a Tool in Data Remediation, by Jennifer Scholl (Florida International University)
Following migration to Alma/Primo VE in July 2021, the Florida International University (FIU) Libraries participated in a statewide shared bibliographic database remediation task force on post-migration data cleanup. This presentation explains how the FIU Libraries built and utilized custom Python scripts for remediation-related large-scale data analysis. This included processing over 24,000,000 rows of data in a single runtime, overcoming limitations in established utilities such as Microsoft Excel and OpenRefine. The scripts allowed for easy identification of bibliographic records containing unnecessary data. This presentation will demonstrate how Python programming was used to efficiently analyze the records requiring data remediation, helping assure a better end-user experience while saving staff time.