In which we replace an API-based taxonomic tool with a command-line application we can run in RStudio
In my last post, I explored using Global Names Recognition and Discovery (GNRD), a web-app and API that will find taxonomic names in text that you provide. I used an R-package, Taxize
, to implement the API call, but I wasn’t able to switch on a verification step that I wanted to use (my own shortcoming, to be sure), and that’s where I stopped for the day. In doing some research into the other tools available through Global Names Architecture, I came upon a recommendation to use Global Names Finder (GNfinder) instead of GNRD. As this recommendation came from the developer of said packages, I decided to give GNfinder a try.
Load the Libraries we need.
To run GNfinder1, you need to install it as a command line application. see the instructions here. I am a Mac user, so I used the Homebrew option.
The setup instructions also encourage you to edit the gnfinder.yml
file if you want to operate under non-default settings (saving you from having to specify every time). I took advantage of this to specify that I would like verified results from the taxonomic name search, which resolves found taxonomic names against all available data sources (e.g., Catalog of Life, Wikispecies, WoRMS, etc.) and returns the best match. YES PLEASE. Also, just for kicks, I turned on the option to return the five words before and after the taxonomic name.
GNfinder will run in three ways: as a command line app, as a library, or as a docker container. I have no idea what running it “as a library” means (mea culpa!), and I haven’t yet explored docker containers, so … to the command line! RStudio very conveniently offers bash chunks in R-markdown files, so we can do this without even leaving this script.
For this exercise, I’ve moved some text files into my blog’s data directory.
You can’t see the open/close parts of the bash chunk, but if you could they would look like:
```{bash, echo-TRUE}
the code from the chunk below
```
## A test of GNfinder running in a bash chunk in Rmd
# Run gnfinder on a section of the Proceedings of the Academy of Natural Sciences of Philadelphia
gnfinder /Users/thalassa/Rcode/blog/data/v37sxx01.txt -U -v -w 5 > /Users/thalassa/Rcode/blog/data/v37sxx01.csv
Import the output from GNFinder as a tibble & look at it:
dat <- as_tibble(read_csv("/Users/thalassa/Rcode/blog/data/v37sxx01.csv"))
paged_table(dat)
So, this is *great*! We get better results than with [my implementation of] GNRD, and we get more features (words before and after, verification, etc.). Happy with this. I have limited experience working in the terminal, so it might be a leap for me to deal with file handling. It feels wonky to have to create a new text file in bash to then read back in via read_csv
, but perhaps having CSV files to look at isn’t a terrible thing. I will need to spend some time learning how to automate file-naming in the terminal so I can run this in batches. What I would really prefer is to be able to run GNFinder against text in a tibble, or even a list. Maybe this is possible and I just haven’t figured it out yet. Let me know if you have ideas!
/a
Mozzherin, Dmitry, Alexander Myltsev, and Harsh Zalavadiya. Gnames/Gnfinder: V0.14.2. Zenodo, 2021. https://doi.org/10.5281/zenodo.5111562.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Whitmire (2021, Aug. 5). Seaside Librarian: Exploring Global Names Finder. Retrieved from https://amandawhitmire.github.io/blog/posts/2021-08-05-exploring-gnfinder/
BibTeX citation
@misc{whitmire2021exploring, author = {Whitmire, Amanda}, title = {Seaside Librarian: Exploring Global Names Finder}, url = {https://amandawhitmire.github.io/blog/posts/2021-08-05-exploring-gnfinder/}, year = {2021} }