Back in 1770, there was quite an adventure when Captain Cook’s Endeavor made contact with the Great Barrier Reef and had to pause for repairs. That was the moment when botanists Joseph Banks and Daniel Solander gathered a bunch of plant specimens!
Fast forward to now, and one of those pressed plants is part of a stunning collection of 170,000 specimens sitting pretty at the University of Melbourne’s herbarium.
Across the globe, herbaria hold over 395 million specimens. Together, they paint an unparalleled picture of life as it once was, documenting Earth’s plant and fungal diversity.
Eager to make this trove of information easier to access, we looked for quick and effective methods. Our *new research* introduces Hespi (short for “herbarium specimen sheet pipeline”), an AI-powered tool that could dramatically change how we interact with biodiversity data, opening up loads of new research opportunities.
The Digitization Hurdle
If we want to get the most out of herbaria, it’s key to digitize them. This colossal task means snapping high-res photos of every specimen and turning each label’s contents into searchable digital info.
Once these records are digitized, they can be shared online through platforms like the University of Melbourne Herbarium Collection Online. They also flow into big biodiversity resources like the Australasian Virtual Herbarium, the Atlas of Living Australia, and the Global Biodiversity Information Facility. This effort exposes centuries of botanical wisdom to researchers everywhere.
However, digitizing isn’t a walk in the park. Large herbaria, such as the National Herbarium of New South Wales and the Australian National Herbarium, employ speedy conveyor belt systems that enable them to image millions of specimens rapidly. Even with all this automation, it took over three years to digitize the 1.15 million specimens at the National Herbarium of NSW!
For smaller institutions, the process is much slower and often relies on volunteers and citizen scientists to snap pictures and painstakingly transcribe data by hand.
If we keep going at the current pace, many collections won’t see the light of day for decades. This makes it ridiculously tough for researchers in ecology, evolution, climate science, and conservation to get their hands on substantial and accurate biodiversity data. We need to find a quicker way to sort this out!
Thanks to AI, Things are Getting Faster
To tackle this issue, we rolled out Hespi—a handy open-source software designed to grab information from herbarium specimens automatically.
This smart tool merges advanced computer vision skills with AI capabilities like object detection, image classification, and large language models.
Here’s how it works: first, it snaps a photo of the specimen sheet, which includes the pressed plant and its identification text. Subsequently, it locates and extracts that text using a blend of optical character recognition and handwriting recognition.
Since understanding hand-written notes can be quite complex, Hespi sends the text through OpenAI’s GPT-4o Large Language Model to repair any mistakes, vastly amplifying accuracy.
In mere seconds, Hespi can find and read the main label data on herbarium sheets, covering taxonomic names, collector info, location details (latitude and longitude), and collection dates. It swiftly transforms this info into a digital format that’s ready for research!
As a shining example, Hespi flawlessly captured all necessary elements from the large brown algae specimen collected in 1883 at St Kilda.
We rigorously tested Hespi with thousands of specimen images from the University of Melbourne Herbarium and beyond. Creating test datasets for various pipeline stages, we assessed performance.
The results? A remarkably high accuracy level that presents a huge time-saving advantage when compared to extracting data manually.
We’re now working on a user-friendly interface so herbarium curators can check and fine-tune the results manually.
Just the Beginning
Herbaria are already making huge contributions to society: from id’ing species and aiding taxonomies to supporting ecological monitoring, conservation efforts, education, and even forensic investigations.
With innovative AI systems like Hespi, we’reopening the floodgates for exciting new applications at an entirely new scale.
AI has previously been utilized to extract extensive leaf measurements and various traits from digitized specimens, liberating centuries-old collections for rapid research geared toward plant evolution and ecology.
And remember, this is just the lighting strike of creativity—AI and computer vision technologies promise to elevate and broaden botanical research opportunities in the near future.
Expanding Beyond Herbaria
Thanks to pipelines like Hespi, we can snag text from any museum or archival label as long as high-quality digital images exist.
What’s next? We’re teaming up with Museums Victoria to tailor Hespi into an AI digitization pipeline suited for their museum collections. This will allow us to unlock biodiversity data for around 12,500 specimens in their globally important fossil graptolite collection.
We’re also embarking on a fresh project with the Australian Research Data Commons (ARDC) to craft a more adaptable version of the software, enabling curators in museums and other institutions to tweak Hespi for various collections, far beyond just plant specimens.
A New Era of Innovation
Just as AI is stirring up changes in everyday life, it has the power to completely reshape how we access biodiversity data. Collaborations between humans and AI can effectively address one of the most significant roadblocks in data digitization: that slow, manual transcription of collection labels.
Bringing to light the hidden information within herbaria, museums, and archives worldwide is vital for the extensive cross-disciplinary research needed to confront the biodiversity crisis.
This article is quoted from The Conversation under a Creative Commons license. You can read the original article.
Originally published on Phys.org.
