Exploring Global Names Finder

In which we replace an API-based taxonomic tool with a command-line application we can run in RStudio

Amanda Whitmire https://amandawhitmire.github.io/ (Stanford Libraries & Hopkins Marine Station)
08-05-2021

Background

In my last post, I explored using Global Names Recognition and Discovery (GNRD), a web-app and API that will find taxonomic names in text that you provide. I used an R-package, Taxize, to implement the API call, but I wasn’t able to switch on a verification step that I wanted to use (my own shortcoming, to be sure), and that’s where I stopped for the day. In doing some research into the other tools available through Global Names Architecture, I came upon a recommendation to use Global Names Finder (GNfinder) instead of GNRD. As this recommendation came from the developer of said packages, I decided to give GNfinder a try.

Workspace setup

Load the Libraries we need.

Install and configure GNfinder

To run GNfinder1, you need to install it as a command line application. see the instructions here. I am a Mac user, so I used the Homebrew option.

The setup instructions also encourage you to edit the gnfinder.yml file if you want to operate under non-default settings (saving you from having to specify every time). I took advantage of this to specify that I would like verified results from the taxonomic name search, which resolves found taxonomic names against all available data sources (e.g., Catalog of Life, Wikispecies, WoRMS, etc.) and returns the best match. YES PLEASE. Also, just for kicks, I turned on the option to return the five words before and after the taxonomic name.

Running GNfinder

GNfinder will run in three ways: as a command line app, as a library, or as a docker container. I have no idea what running it “as a library” means (mea culpa!), and I haven’t yet explored docker containers, so … to the command line! RStudio very conveniently offers bash chunks in R-markdown files, so we can do this without even leaving this script.

For this exercise, I’ve moved some text files into my blog’s data directory.

You can’t see the open/close parts of the bash chunk, but if you could they would look like:

```{bash, echo-TRUE}
the code from the chunk below

```

## A test of GNfinder running in a bash chunk in Rmd
# Run gnfinder on a section of the Proceedings of the Academy of Natural Sciences of Philadelphia
gnfinder /Users/thalassa/Rcode/blog/data/v37sxx01.txt -U -v -w 5 > /Users/thalassa/Rcode/blog/data/v37sxx01.csv

Import the output from GNFinder as a tibble & look at it:

dat <- as_tibble(read_csv("/Users/thalassa/Rcode/blog/data/v37sxx01.csv"))
paged_table(dat)

Thoughts

So, this is *great*! We get better results than with [my implementation of] GNRD, and we get more features (words before and after, verification, etc.). Happy with this. I have limited experience working in the terminal, so it might be a leap for me to deal with file handling. It feels wonky to have to create a new text file in bash to then read back in via read_csv, but perhaps having CSV files to look at isn’t a terrible thing. I will need to spend some time learning how to automate file-naming in the terminal so I can run this in batches. What I would really prefer is to be able to run GNFinder against text in a tibble, or even a list. Maybe this is possible and I just haven’t figured it out yet. Let me know if you have ideas!

/a


  1. Mozzherin, Dmitry, Alexander Myltsev, and Harsh Zalavadiya. Gnames/Gnfinder: V0.14.2. Zenodo, 2021. https://doi.org/10.5281/zenodo.5111562.↩︎

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Whitmire (2021, Aug. 5). Seaside Librarian: Exploring Global Names Finder. Retrieved from https://amandawhitmire.github.io/blog/posts/2021-08-05-exploring-gnfinder/

BibTeX citation

@misc{whitmire2021exploring,
  author = {Whitmire, Amanda},
  title = {Seaside Librarian: Exploring Global Names Finder},
  url = {https://amandawhitmire.github.io/blog/posts/2021-08-05-exploring-gnfinder/},
  year = {2021}
}