User:Tscofield034/sandbox/Contemporary Machine Learning Algorithms for Animal Bioacoustics

Contemporary Machine Learning Algorithms in Animal Bioacoustics Applications

Machine learning is a field of study in which users program computers to automatically learn specified phenomena through experience. In particular, machine learning can assist in improving tasks such as pattern recognition, statistical classification, and outlier detection. Machine learning is utilized in many fields of research today and bioacoustics is no exception.

Bioacoustics is the study of sound production, dispersion, and reception in the living world: animals and humans. Given that the study of sound in human-based applications is so broad, this article will include only animal-based bioacoustics examples. The study of bioacoustics can aid in the understanding of the animal world greatly by helping us understand instinctual behavior, communications, relationships, and organ anatomy involved in receiving acoustic signals, among other things.

The study of bioacoustics is important because it can help researchers answer specific biological, ecological and management questions. Some examples of topics that research in this area can accommodate toward are determining, which animals live in an area, an approximate population size, the effect of communication within a species, animal density change over time, among other conservation and mitigation topics. Applying machine learning to such tasks will help with the volume of research being conducted and expedite the process of attempting to answer some of these important questions.

Background
To give a brief overview of the subjects at hand, it is important to touch on both the early studies of bioacoustics before machine learning and the development of machine learning algorithms.

Bioacoustics History
Since the beginning of humankind, humans have been unintentionally utilizing bioacoustics in human survival. For example, humans have been listening to animal sounds in order to hunt efficiently, communicate with others, and to protect themselves from predators. However, bioacoustics was not established as an official scientific discipline until 1925 when Yugoslavian biologist Ivan Regen began studying insect sounds. Specifically, Regen worked with and studies species of katydids and crickets. Through his experiments, he was able to determine that a male katydid of the species Thamnotrizon apterus responds to other chirps from males. They correspond in a conversation consisting of chirping patterns back and forth that depend solely on the cricket’s sense of hearing.

There are several other animal species that have been studied and recorded. Many animal sound collections have been recorded and include a wide range of frequencies and media, some of which aren’t really sounds. For example, some acoustical sounds are simply compression waves that a human eardrum can detect. Bioacoustics data can be found from several different repositories including The Macaulay Library at the Cornell Lab of Ornithology, Xeno-Canto, the Moby Sound database, and government sources such as the British Library and the United States’ National Center for Environmental Information. Also, some of the acoustic collections may include frequencies that are outside of the human hearing range. Yet, these various signals can still be collected and studied by researchers using the technology of various computers.

Machine Learning History
The term machine learning was first introduced and discussed in the early 1960s, initially based on a model correlating to human brain cell interaction created by Donald Hebb in 1949. The Perceptron algorithm, which is an early development of neural networks, and nearest neighbors algorithms were the first to be developed. Although there was research support and algorithm framework in place in the mid-to-late 1900s, the field did not start flourishing until the 1990s when computer performance was improved.

There are two main categories of machine learning: supervised and unsupervised learning. Supervised learning algorithms require the presence of class labels in the data. Usually, this step is going to require manual human interaction to label the data. On the other hand, unsupervised learning algorithms does not require class labels in the data. Hence, the algorithms can predict the class labels on its own through training and testing with only the information available from the data.

Under the umbrella of supervised and unsupervised learning there are many different categories of machine learning algorithms. Some examples of algorithm categories include regression algorithms, instance-based algorithms, decision tree algorithms, Bayesian algorithms, clustering algorithms, and deep learning algorithms. See a full list of machine learning algorithms here.

Data Collection
Typically, researchers have two viable options of obtaining their data: collect it themselves or employ a previously collected dataset, perhaps from one of the mentioned data repositories. Often times, using datasets that are publicly available from one of the repositories is attractive to researchers because it requires less money, time, and effort on their part. Also, many researchers choose to train and test more than one dataset to evaluate their model’s robustness, thus data obtained from a repository being a better option. What's more, most machine learning algorithms require an abundance of training data to be capable of accurate predictions without overfitting.

Preprocessing
In most cases, the raw data recorded directly from a microphone in the environment is not going to be a practical input to a machine learning algorithm. Moreover, there are going to be signals in the raw data that are unusable due to noise, unimportance, or other factors and usable signals that need to be parsed into a different form. For example, researchers P. Somervuo, A. Harma, and S. Fagerlund parsed their bird signals into individual syllables before applying machine learning algorithms. Furthermore, they achieved this by applying an envelope function and fixing a threshold level to half of the initial noise level. Then, workable syllables were defined as signals located above the calculated threshold level. Lastly, the candidate signals were grouped if they were very close together in time (less than 15ms). This approach is only one example of a possible preprocessing method and a potential method of parsing signals into individual syllables.

On the other hand, researchers studying bird audio such as D. Stowell and M. Plumbley wanted to experiment with splitting signals up into syllables and into segments of sound (i.e., bird songs) to see which method is more effective for their application. Additional preprocessing methods may include Discrete Fourier Transform (DFT), log magnitude spectrum, filter bank of various filters (e.g., band-pass, low-pass filter (LPF), high-pass filter (HPF), triangle filter), and Discrete Cosine Transform (DCT). All of which examples were applied in M. Roch, M. Soldevilla, and J. Burtenshaw et al. work of Gaussian mixture model classification of odontocetes in the Southern California Bight and the Gulf of California. Another preprocessing method that is correlated to acoustics and bioacoustics classification is dynamic time warping (DTW). The benefits of DTW are that it allows signals with a different time series or speed, to be compared directly. An applicable example to this description is breaking down the syllables and speech of two speakers with drastically different speaking speeds. It is very difficult to match up these two speakers and evaluate specific syllables when they occur at different rates at varying moments in time. Therefore, DTW may prove to be a helpful preprocessing algorithm when studying bioacoustics data.

Feature Extraction
Analogous to most layers of a machine learning model, the researchers have flexibility in which feature extraction algorithms, if any, they apply to the data. Also, a more thorough analysis of the model(s) is completed if the researchers try more than one feature extraction algorithm or perhaps a cascaded approach as well. Several scientists choose to extract values that are specifically engineered to audio applications such as signal bandwidth, spectral flux, spectral centroid, frequency range, and zero-crossing rate. While others may take a statistician’s approach like W. Steiner in his article and record each signal’s mean, standard deviation, and variance along with some of the frequency characteristics.

In addition, psychoacoustic scales such as sinusoidal models, Mel-Cepstrum models, and a caveat of Mel-Cepstrum called Mel-frequency cepstral coefficients (MFCC) have each been widely applied to acoustic data. The Mel-Cepstrum scale weights the signal in the frequencies that humans can most accurately perceive frequency changes.

Lastly, spherical k-means can be a supportive feature extraction algorithm as seen in Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. Contrary to a standard k-means  algorithm, spherical k-means iteratively searches for unit vectors that minimize the angular distance from data points. Note that both classical k-means and spherical k-means algorithms are machine learning algorithms; but D. Stowell and M. Plumbley implemented them as a feature extraction method instead of a classification algorithm.

Supervised
Southern California researchers, J. Barlow, J. Oswald, and T. Norris implemented both multivariate discriminant analysis and decision trees in their study of classifying nine different species of odontocetes based on their acoustic signals. Discriminant analysis is a dimensionality reduction algorithm, and a practical baseline algorithm to start with due to its simplicity, reproducibility, and performance. The discriminant analysis model is trained by estimating the mean and variance from the training data and then makes a prediction from the testing set using Bayes’ theorem. With multivariate discriminant analysis as their classification algorithm, an accuracy of 41.1% was achieved. In addition, the effectiveness of classification trees, a division of decision tree learning, to classify delphinid species was also tested. Decision trees are called accordingly because in these models, the branches correlate to the features or observations and the leaves refer to the class labels. The models are built through binary recursive partitioning and display all possible options in a binary scenario. Hence, the simplicity and visualization capabilities of this algorithm make it a popular choice for many applications. In the work shown, classification trees did provide a more effective model with 51.4% accuracy.

The next featured research produced by S. Fagerlund applied machine learning algorithms to classify bird species based on their produced sounds. The algorithms used are decision trees and binary support vector machine (SVM) classifiers used in a cascaded approach. In this work, Fagerlund uses two different data sets exclusively, one with 6 species of birds and the other with 8 species of birds. During the classification step, each node at every layer correlates to a comparison of two species. In every comparison, an instance-based SVM model is constructed to choose between two species. One species gets eliminated at every layer until only one species is left, revealing the predicted species for the given signal. This research had encouraging results of an average accuracy of 91% for the data set with 6 species and 98% for the data set with 8 species.

S. Fagerlund, A. Harma, and P. Somervuo also conducted research on bird acoustics and utilized Gaussian mixture models (GMM), a supervised clustering algorithm, in their work. The scope of this work is to classify the species of fourteen different birds based on their auditory sounds. As well, the researchers segmented the audio signals into syllables, utilized DTW, and applied MFCC scale. Lastly, the researchers obtained less than desirable classification results of slightly over 50% accuracy with GMM.

The next featured research is predicting the species of bumblebee through analyzing flight buzzing sounds. Moreover, the obtained dataset contained twelve species, nine of which had examples of both worker and queen bees, thus, there being twenty-one classes. The researchers chose to test a collection of machine learning classifiers, J48 decision tree, Naïve Bayes, SVM, and random forests to compare their accuracies. A full list of results can be reviewed here, however, the best results were gathered with a random forest classifier with an 86% accuracy.

Beyond that, the research from Frog classification using machine learning techniques shows another approach to applying supervised machine learning to bioacoustics. Here, the scientists are trying to autonomously distinguish between five different species of frogs using an instance-based k-nearest neighbors (kNN) algorithm and SVM. The results from the two classifiers were both satisfactory with accuracy rates of 89.05% and 90.30%, respectively.

Unsupervised
An example of utilizing unsupervised machine learning in bioacoustics occurs in the research of J. Burtenshaw, M. Roch, and M. Soldevilla in Gaussian mixture model classification of odontocetes in the Southern California Bight and the Gulf of California. Here, the scientists are distinguishing short-beaked common, long-beaked common, Pacific white-sided, and bottlenose dolphins off the coast of California. The scientist’s goals in this work were twofold: acoustic call detection and species classification. Dolphins exhibit three different calls, echolocation clicks, burst-pulsed calls, and whistles, which makes the machine learning a bit more complex. GMMs were utilized in this work, but contrary to typical supervised GMM models, they used Hidden Markov Models (HMM) which optimizes clustering through an unsupervised expectation-maximization (EM) algorithm. With this approach, accuracy rates of 67-75% were obtained.

Next, we show an example of unsupervised learning in identifying bird species. The researchers built a robust model and utilized four different data sets with 87, 88, 77, and 501 species, respectively. The chosen machine learning classifier was random forests, which constructs a multitude of decision trees. This algorithm doesn't need labeled training data due to the large number of decision trees created using ensemble learning. The metrics used to quantify the effectiveness of the model were an area under the ROC curve and mean average precision. Although the results are not directly comparable to other research in this article given the chosen metrics, the researchers found promising results with their set of parameters and algorithms.

Using deep learning, researchers autonomously detect and monitor populations of bats using only their audio signals. Furthermore, convolutional neural networks (CNN) were used to detect and analyze ultrasonic, full-spectrum, search phase calls from echolocating bats. CNNs were trained and tested along with a random forest classifier and three existing closed-source commercial detecting systems. The chosen metrics were recall and precision, which were recorded for each classification method. The researcher’s findings were promising with suitable precision and recall rates for the CNN, much higher than the rates for the three commercial detecting systems and random forest classifiers. This work concludes that deep-learning is a great option for analyzing and classifying real-world, potentially low-quality sound clips.

Conclusion and Summary
As outlined above, there are many different feature extraction algorithms and machine learning algorithms applied to different applications within bioacoustics. So there is no state-of-the-art or "go-to" algorithm when it comes to research like this. This issue means that many methods must be attempted and recorded using different datasets. Also, much of the research outlined above is difficult to compare due to different constraints, parameters, algorithms, metrics, and datasets used. Even so, we can try to quantify the effectiveness of each project's results and use them to push forward in this area of research.

Two areas that are going to push machine learning applications further within bioacoustics are going to be an increase in data collection and utilization of unsupervised machine learning. Furthermore, the more data that scientists have access to, the more research that will be conducted with different methods on various subjects. An increase in data will be sure to come with a growing interest in bioacoustics and the drop in price of recording equipment. In addition, with unsupervised machine learning algorithms, there will be less need of labeled data. This fact alone will dramatically decrease the quantity of human hours needed for the research to be conducted, especially with an increase of data being collected. Specifically, more research using unsupervised deep learning may prove to be fruitful given the promising results of Bat Detective - Deep Learning Tools for Bat Acoustic Signal Detection and deep learning algorithms being the current state-of-the-art in several other machine learning applications.

All in all, exertion of machine learning within bioacoustics is an interesting area of research and a machine learning application that is gaining some momentum. Continuous work in this field may help scientists gain a deeper understanding of specific species, populations, and ecosystems through the acoustic signals they produce, and a helping hand from the powerful computers we have available today.