Seaside Librarian: Why the LEADING Fellowship?

Amanda Whitmire

After I shared my post yesterday, it dawned on me that an easy way to share more of my thoughts around libraries, data science, and my job, would be to simply share the essay that I included in my application for the LEADING Fellowship¹ program.

Here was the prompt

Please include a one-page statement sharing your interest in the LEADING program and the selected fellowship sites. Your one-page statement must address why you seek to learn more about the intersection of library science and data science.

My essay

I am interested in participating in the LEADING Program because gaining skills in data science will increase both the efficacy and impact of my work as a marine science liaison librarian. I will be able to leverage our collections in new ways without being constrained by reliance upon my colleagues who have data science expertise.

In my previous life as an oceanographer, I was frequently limited in my work by the availability of data. There are only so many oceanographic research ships, and time at sea is rare and expensive. We always felt a need to have more observations and more samples over wider areas and longer time spans. This chronic feeling of needing access to more observational data became foundational to my academic worldview. It instilled in me a passion for rescuing datasets and making them available and useful again. I distinguish between “available” and “useful” because simply digitizing material and putting it online doesn’t necessarily make it useful to researchers. This transcendental zone of opportunity between materials being “available” and being “useful” is the space where the intersection of library science and data science can have meaningful impact. My interest in the LEADING Program is directly related to projects that I have underway, and the reality that I am limited by my inability to do the computational work on my own. The opportunity to develop my data science skills will have a direct impact on how I can engage in these projects and help make them successful.

I’m currently the Head Librarian of a marine science branch library located at a coastal marine research station. Our station has been at this location for a hundred years (literally), and I have several historical collections that contain data on environmental conditions and species observations over decades of time. One of the collections that I’m working with is a set of 746 undergraduate research papers that were submitted in annual batches as part of a spring course, from 1963 – 2011. These papers are an extremely valuable corpus for conducting historical ecology research, but the papers cannot be made openly available due to copyright restrictions. Even if we could make the papers available, however, species occurrence observations, for example, are essentially hidden within the text of hundreds of papers. Even though all of the papers have been digitized, this isn’t a dataset that is currently useful. Computational methods for text analysis are rapidly evolving, and initial testing on a subset of the student papers corpus demonstrates significant potential. For example, we’ve tested spaCy, a Python library for natural language processing, to automate identification of genus and species names in student reports using named entity recognition (NER). In partnership with developers in the Digital Library Systems and Services group, we are now testing a predicate-argument structure semantic model to link NER-tagged species names with locations. The goal of the project is to create a dataset of species occurrences to upload into ²(Global Biodiversity Information Facility), a resource that would be much more useful to biodiversity researchers than a corpus of unprocessed student papers.

There is significant intersection between the work I am currently doing and the 2021 LEADING Project “From Natural History Literature to Linked Open Data Biodiversity Knowledge Graph.” Adding data science skills to my librarian toolkit through this fellowship will support my overarching career goal to extract as much legacy data as possible from historical collections, transform them from being available to being useful, and ultimately contribute toward building out the biodiversity linked open data ecosystem. Thank you for considering my application.

Chaser

Because a post without an image is no fun at all, here is an example of the kind of work you can do with long-term biodiversity observations: detect habitat range shifts in animals. The lighter red bars show the historical latitudinal ranges of southern barnacle species (from the 1970s), and how far north they were observed during marine heat waves (dark red bars).

Figure 6. Geographic ranges of common intertidal/shallow subtidal barnacles of California (after Newman). This assemblage includes species that are cosmopolitan (black bars), primarily northern (blue bars), and primarily southern (red bars). The highest species richness of intertidal barnacles occurs in central/north central California between Point Conception and Point Arena. For southern species, red bars indicate northern range limits during the 1970s, and dark red bars indicate geographic range expansions to current poleward boundaries (see Supplementary Information). Species are coded as follows: (1) Pollicipes polymerus, (2) Balanus glandula, (3) Chthamalus dalli, (4) Balanus nubilus, (5) Balanus crenatus, (6) Semibalanus cariosus, (7) Paraconcavus pacificus, (8) Chthamalus fissus, (9) Tetraclita rubescens, (10) Megabalanus californicus.

*Thank you to the authors of this paper for publishing their paper with a Creative Commons Attribution license, which allows me to share their work in this post.

LEADING website https://cci.drexel.edu/mrc/leading/↩︎
GBIF↩︎

Comment on this article Share:

Why the LEADING Fellowship?

Here was the prompt

My essay

Chaser

Corrections

Citation