Google-built AI software boosts NZ birdsong study

Google-built artificial intelligence software has been trained to recognise the different calls of threatened native birds.

In one of the quirkiest applications of machine-learning yet, researchers at Victoria University tasked tech company NEC New Zealand to help them trawl through tens of thousands of hours of birdsong.  The recordings, capturing hihi, saddleback and kakariki, were picked up by acoustic sensors at 50 locations in and around Wellington sanctuary Zealandia, as part of a three-year study by Victoria researchers.  But when PhD researcher Victor Anton sought to pick out the location and number of different calls - something which would help him reveal factors affecting threatened birds outside Zealandia - he was overwhelmed by the vast databank facing him.  So he turned to machines.  Using deep neural network software originally developed by engineers at Google Brain, NEC's system learned to recognise different bird calls, effectively measuring the activity of each bird species at specific times and locations.  As more audio was processed, the system's accuracy improved.  "That's the beauty of machine learning - the system learns and improves on the job," Anton said.  The AI system used audio that had been recorded and stored, chopping it into minute-long segments, and then converting the file into a spectrogram.  After the spectrograms were chopped into chunks, each spanning less than a second, they were processed individually by a deep convolutional neural network.  A recurrent neural network then tied together the chunks and produced a continual prediction of which of the three birds was present across the minute-long segment.  The system was developed using Tensorflow, an open-source library for numerical computation for machine learning.  "Tensorflow allowed us to take advantage of heterogeneous computing resources, including massively parallel environments, such as GPUs, to train networks efficiently on large quantities of data," NEC artificial intelligence expert Paul Mathews said.  "There's still a significant implementation overhead - Tensorflow provides the basic operations but we still had to design, implement, train and evaluate specific neural network configurations so that we could package up the system and send it to Victor to analyse more of his data."  Essentially, the system was trained using data Victor has already labelled.  "The goal is for the networks to learn patterns indicated by the labelling so that they are reliably applied to new data that has not been listened to by a human," Matthews said.  Because most neural network systems are inspired by those used for large-scale speech recognition, there were few examples designed for detecting birds, thus making birdsong tougher to analyse.   But the biggest challenge was training the system to function reliably when exposed to new data - something AI engineers call generalisation.  "Achieving a good level of generalisation from small quantities of labelled data is one of the biggest problems in machine learning," Mathews said.  "Our system has to work for data Victor hasn't yet collected.  "Additionally, the system must also learn to ignore a lot of additional noise in the bush to pick out just the birds we care about."  That included the racket of forested areas, parks, and people's backyards.  At times, the algorithms classified noise from construction and traffic - and even door bells - as bird songs.  "Another challenge we face is variation of calls among birds of the same species," Anton explained.  "Birds have various calls and use them for different purposes.  "For example, the call they sing to mark their territory is quite different to calls when they're looking for a partner.  "Training the system to identify specific call types is challenging because sometimes two or more birds are calling at the same time or the bird doesn't sing the entire call.  "We aim for the algorithms to learn these variations."  In future, he believed acoustic monitoring could be particularly useful to better understand New Zealand's unique fauna.  Thanks to conservation efforts, birds like tui and kaka were becoming more abundant outside protected areas, yet other species, such as hihi and saddleback, were still struggling to shift outside refuges.  "Due to limited information about their distribution outside wildlife sanctuaries it's hard to know how we can maximise conservation efforts," Anton said.  "By combining acoustic sensors and AI we can gather enough information to identify the location and visiting frequency of these threatened birds outside protected areas."

In one of the quirkiest applications of machine-learning yet, researchers at Victoria University tasked tech company NEC New Zealand to help them trawl through tens of thousands of hours of birdsong.

The recordings, capturing hihi, saddleback and kakariki, were picked up by acoustic sensors at 50 locations in and around Wellington sanctuary Zealandia, as part of a three-year study by Victoria researchers.

But when PhD researcher Victor Anton sought to pick out the location and number of different calls - something which would help him reveal factors affecting threatened birds outside Zealandia - he was overwhelmed by the vast databank facing him.

So he turned to machines.

Using deep neural network software originally developed by engineers at Google Brain, NEC's system learned to recognise different bird calls, effectively measuring the activity of each bird species at specific times and locations.

As more audio was processed, the system's accuracy improved.

"That's the beauty of machine learning - the system learns and improves on the job," Anton said.

The AI system used audio that had been recorded and stored, chopping it into minute-long segments, and then converting the file into a spectrogram.

After the spectrograms were chopped into chunks, each spanning less than a second, they were processed individually by a deep convolutional neural network.

A recurrent neural network then tied together the chunks and produced a continual prediction of which of the three birds was present across the minute-long segment.

The system was developed using Tensorflow, an open-source library for numerical computation for machine learning.

"Tensorflow allowed us to take advantage of heterogeneous computing resources, including massively parallel environments, such as GPUs, to train networks efficiently on large quantities of data," NEC artificial intelligence expert Paul Mathews said.

"There's still a significant implementation overhead - Tensorflow provides the basic operations but we still had to design, implement, train and evaluate specific neural network configurations so that we could package up the system and send it to Victor to analyse more of his data."

Essentially, the system was trained using data Victor has already labelled.

"The goal is for the networks to learn patterns indicated by the labelling so that they are reliably applied to new data that has not been listened to by a human," Matthews said.

Because most neural network systems are inspired by those used for large-scale speech recognition, there were few examples designed for detecting birds, thus making birdsong tougher to analyse.

But the biggest challenge was training the system to function reliably when exposed to new data - something AI engineers call generalisation.

"Achieving a good level of generalisation from small quantities of labelled data is one of the biggest problems in machine learning," Mathews said.

"Our system has to work for data Victor hasn't yet collected.

"Additionally, the system must also learn to ignore a lot of additional noise in the bush to pick out just the birds we care about."

That included the racket of forested areas, parks, and people's backyards.

At times, the algorithms classified noise from construction and traffic - and even door bells - as bird songs.

"Another challenge we face is variation of calls among birds of the same species," Anton explained.

"Birds have various calls and use them for different purposes.

"For example, the call they sing to mark their territory is quite different to calls when they're looking for a partner.

"Training the system to identify specific call types is challenging because sometimes two or more birds are calling at the same time or the bird doesn't sing the entire call.

"We aim for the algorithms to learn these variations."

In future, he believed acoustic monitoring could be particularly useful to better understand New Zealand's unique fauna.

Thanks to conservation efforts, birds like tui and kaka were becoming more abundant outside protected areas, yet other species, such as hihi and saddleback, were still struggling to shift outside refuges.

"Due to limited information about their distribution outside wildlife sanctuaries it's hard to know how we can maximise conservation efforts," Anton said.

"By combining acoustic sensors and AI we can gather enough information to identify the location and visiting frequency of these threatened birds outside protected areas."