Submit or Track your Manuscript LOG-IN

Application of Deep Learning in Remote Sensing Monitoring of Large Herbivores- A Case Study in Qinghai Tibet Plateau


Application of Deep Learning in Remote Sensing Monitoring of Large Herbivores- A Case Study in Qinghai Tibet Plateau

Wei Luo 1,2,3, Yongtao Jin 1,2,3, Xuqing Li 1,2,3 and Ke Liu 1,2,3*

1School of Remote Sensing Information, North China Institute of Aerospace Eingering, Langfang 065000, Hebei Province, China

2Collaborative Innovation Center of Aerospace Remote Sensing Information Processing and Application of Hebei Province, Langfang 065000, Hebei Province, China

3National Joint Engineering Research Center of Space Remote Sensing Information Application Technology, Langfang 065000, Hebei Province, China


In this study, we monitored large herbivores in Maduo County of Qinghai Province by means of Unmanned aerial vehicles (UAV) remote sensing. The monitoring objects include three kinds of domestic herbivores: Tibetan sheep, yaks, and horses, and three kinds of wild herbivores: Tibetan antelopes, Tibetan wild assess, and blue sheep. All the kinds of large herbivores in the aerial images are detected and located using deep learning model of the MASK R-convolutional neural network (CNN), and the average recall, correct, and leakage are 89%, 98.4%, and 10.8%, respectively. Furthermore, the contour vector of the herbivores is obtained by extracting the mask generated in the detection of the MASK R-CNN, following which the information of both the population number and the distribution of all kinds of large herbivores can be estimated. According to the data of domestic herbivores on hand provided by the General grassland station of Qinghai province, the difference percentage of Tibetan sheep, yak, and horse is 7.5%, 8.1%, and 18.7%, respectively.

Article Information

Received 05 December 2019

Revised 11 February 2020

Accepted 22 February 2020

Available online 12 March 2021

(early access)

Published 17 December 2021

Authors’ Contribution

LW designed the model and wrote the paper. YJ finalized the research model. XL collect and preprocess the data. KL designed the framework and accuracy evaluation method of this paper

Key words

large herbivores, UAV remote sensing, deep learning, object detection, population number


* Corresponding author:

0030-9923/2022/0001-0413 $ 9.00/0

Copyright 2022 Zoological Society of Pakistan


Owing to the improvement in remote sensing image resolution and the continuous development of computational vision technology, researchers have developed many automatic or semi-automatic animal recognition and counting methods (Chabot et al., 2018; Descamps et al., 2011; Groom et al., 2013; Rey et al., 2017). However, thus far, most researches only performed small-scale experiments. The research area was usually only a few km2 or several images; furthermore, the monitoring object was required to have an obvious difference between the background, and the environment was also relatively monotonous (Fretwell et al., 2017; Hollings et al., 2018).

Generally, image detection and classification algorithms are mainly divided into pixel-based and object-based methods. The pixel-based method is the most common and simple automatic or semi-automatic animal-detection method, mainly including supervised classification, unsupervised classification, threshold segmentation, etc. (Christiansen et al., 2014; Liu et al., 2015; Seymour et al., 2017). The images used are mainly low-resolution satellite images (Fretwell et al., 2017; Laliberte and Ripple, 2003) and thermal infrared aerial images (Gonzalez et al., 2015; Seymour et al., 2017). In addition, because animals usually have only a few pixels on these images, the object-oriented method offers limited help (Descamps et al., 2011; Fretwell et al., 2017; Seymour et al., 2017). The monitored animals include chicken (Christiansen et al., 2014), spoonbill (Liu et al., 2015), seal (Seymour et al., 2017), koala and deer (Gonzalez et al., 2015), south right whale (Fretwell et al., 2014), great red Stork (Gonzalez et al., 2015), among others.Compared with pixel-based methods, which only use spectra, textures, and other features for object detection, the object-oriented methods can also employ features such as shape, spectrum, texture, and context background of segmented objects, so that the accuracy is considerably high (Chabot et al., 2018). Some researches tried to use either the object-based classification or the fusion of object-based and pixel-based classification in order to improve the detection accuracy. Furthermore, Yang et al. (2014) combined both pixel-based and object-based classification methods to detect African wildebeest and zebra on the basis of GeoEye-1 images, with an average number error of only 8.2%, lost object of 6.6%, and misclassified object of 13.7%. Chabot et al. (2018) developed an object-oriented snow goose detection and counting method, which can adapt to multiple complex environments, variable lighting and exposure conditions, and obtain better accuracy. The correlation with the manual counting method was R2= 0.998, regression coefficient= 0.974, and n= 41. The pixel-based or object-oriented classification method is simple and easy to use, and sometimes it can achieve high accuracy. However, due to the need for artificial selection of the detection features, the accuracy is considerably affected by the user’s experience and skills (Fretwell et al., 2017; Hollings et al., 2018; Terletzky and Ramsey, 2014).

The emergence of centimeter-level aerial images (including unmanned aerial vehicles (UAV) images) provides more abundant details of animal features, and the related machine-learning based detection algorithms have also developed rapidly (Longmore et al., 2017; Olivares-Mendez et al., 2015; Torney et al., 2016). Accordingly, Christiansen et al. (2014) developed an algorithm based on discrete-cosine-transform feature extraction and K-nearest neighbor classifier to automatically detect hare and chicken from thermal infrared images. When the flight altitude was 3–10 m, the detection accuracy was 93.3%. Furthermore, Rey et al. (2017) developed an active learning system based on support vector machine, and they performed the detection of large mammals in the Savannah grassland using 6500 UAV images; however, the recall rate was 75% and the accuracy only 10%. Furthermore, Xue et al. (2017) developed a machine-learning based algorithm based to detect African wildebeest and zebra using GeoEye-1 images. The algorithm uses the adaptive neural network, which cannot only use the existing expert knowledge but also learn the animal features from the data, thereby obtaining higher accuracy (0.79 versus 0.58) than that of the traditional threshold-based segmentation method.

In recent years, deep-learning theory and practice have made a breakthrough. Deep learning can automatically learn some features from big data, which are difficult for humans to extract manually; therefore, deep-learning based methods have higher precision and can achieve better effects as compared to the traditional shallow machine-learning model (such as ANN). Relevant achievements have been made in the field of animal detection and competed for reports in journals such as the Nature (LeCun et al., 2015; Reichstein et al., 2019) and the PNAs (Waldrop, 2019). Accordingly, deep learning has gradually become an indispensable tool in big-data processing fields such as remote sensing (Zhu et al., 2018) and earth system science (Reichstein et al., 2019). Kellenberger et al. (2018) employed the convolutional neural network (CNN) to detect more than 20 large mammals from thousands of RGB UAV images of 4-cm resolution (all the animals in the experiment were classified into one group); they achieved higher accuracy (30% @ 80% recall versus 10% @ 75% recall) than that achieved upon using the traditional shallow machine learning (Rey et al., 2017). Furthermore, Norouzzadeh et al., (2018) combined nine kinds of deep neural network models, such as AlexNet, Google Net, and ResNet, to perform detect and classify the animals in the images of the ground thermal infrared trigger camera; they achieved similar accuracy regarding the artificial recognition (the accuracy of judging whether there were animals in the image was 96.6%). In this paper, we use UAV remote sensing to carry out large-scale monitoring of herbivores in the Qinghai Tibet Plateau, six kinds of large herbivores were extracted from the images by deep learning model of mask RCNN meanwhile.


Overview of the study area

The Qinghai–Tibet Plateau is an area with dense alpine biodiversity; according to the statistics, it has 69 species of national key protected animals. Among them, 16 are national level I key protected animals and 53 national level II key protected animals (Quanqin Shao et al., 2012). Since the establishment of the Source of Three Rivers National Nature Reserve, the ecological environment of the area has improved significantly, and, consequently, the number of large herbivores has increased year by year. How to obtain the population number and distribution of large herbivores in this area is the key to protecting them scientifically and reasonably.

Acquisition of UAV remote sensing images

From July 8 to 18, 2019, the author and other experts went to the Source of Three Rivers area for aerial photographing, which was mainly completed in Maduo County, Qinghai Province, China. The track and identifier of the UAV investigation is depicted in Figure 1. The flight coverage area is approximately 200 km2, with a resolution of 8 cm. After image stitching, the area used for the monitoring of large herbivores is approximately 150 km2.

The aerial photography mainly employed two independently developed UAVs, both of which had been used by the Institute of Mountain Hazards and Environment, Chinese Academy of Sciences (IMHE CAS) in plateau areas; both the UAVs are depicted in Figure 2.

At 700-m altitude, the shooting width of a single camera is 0.8 km. To increase the coverage of each investigation belt, a dual-camera shooting system was adopted, following which the width of each investigation belt increased to 1.2 km. The detailed parameters of the dual-camera system are listed in Table I.



Table I. Load parameters of the UAV of IMHE CAS.

Number of integrated cameras


Angle between two cameras


Side overlap


Fore-and-aft overlap


Camera type

Sony ILCE-5100

Focal length

30 mm

Camera resolution

6000 × 4000 pixel


Explanation features and data preprocessing

In this experiment, 10000 remote sensing images of 14 UAV tracks in Maduo County were selected as the sample database; among these images, 7000 images were used as the training set and the remaining 3000 as the test set. The objects of detection included three kinds of domestic herbivores: yak, Tibetan sheep, and horse, and three kinds of wild herbivores: Tibetan wild ass, Tibetan antelope, and blue sheep. According to the elements of remote sensing explanation, this paper summarizes and establishes Tables II and III as the features basis for the automatic detection to be performed. In addition, a tool named LabelMe was used to mark large herbivores in each image of the training set with the help of Tables II and III (Luo et al., 2019).

Training the model of deep learning

The Mask R-CNN is a new CNN based on the previous Faster R-CNN architecture, which can effectively detect the target and achieve high-quality semantic segmentation. The main idea of the Mask R-CNN is to expand the original Fast R-CNN and add a branch to use the existing detection in order to predict the target in parallel. At the same time, the network structure is relatively easy to realize and train, and also the speed is relatively fast. In addition, it can be easily applied to other fields, such as target recognition, scene segmentation, and key point detection, and its effect is better than that of other existing algorithms.

First, to strengthen the basic network, the VGG network used in the Faster R-CNN is replaced by the residual network with stronger feature-expression ability. In addition, the FPN network is used for mining the multi-scale information. Furthermore, Resnext–101 + FPN is used as the feature-extraction network to achieve the effect of the state of the art. The structure of the Mask R-CNN is depicted in Figure 3.

Second, the pooling process of the original Faster R-CNN ROI is replaced by the ROI Align layer, which can detect more small objects in the picture, more effectively and accurately. Therefore, adopting the Mask R-CNN is


Table II. Explanation features of domestic herbivores.

Domestic yak

Tibetan sheep



Mainly in dark tones such as black and gray black

Mainly in light tones such as white and gray

Mainly in dark tones such as black, brown black, and brown red


Black, gray black, and white

White, gray white, dirty white, and black

Black, tan black, tan red, and occasionally white individuals


Pure color or large block pure color mosaic texture

Same left

Usually pure individuals

with occasional color block splicing texture


The body length of an adult yak is mostly 1.6–2.2 m. Taking 4-cm resolution images as an example, the individual length is mostly approximately 40–50 pixels. Young yaks can be as small as 0.8 m, but they will not be isolated.

The body length of an adult Tibetan sheep is mostly 1.2–1.5 m. Taking 4-cm resolution images as an example, the individual length is mostly approximately 25–35 pixels. Young Tibetan sheep can be as small as 0.4 m, but they will not be isolated.

The body length of an adult horse is mostly 1.6–2.2 m. Taking 4-cm resolution images as an example, the individual length is mostly approximately 40–55 pixels. Young horses can be as small as 0.9 m, but they will not be isolated.


Nearly elliptical or rectangular aspect ratio mostly between 1.5:1 and 3:1

Nearly elliptical or water-drop shaped. aspect ratio mostly between 1.5:1 and 3:1.

Nearly long rectangle or long rectangle aspect ratio mostly between 3:1 and 5:1.

Group image

Supplementary Figure 1

Supplementary Figure 2

Supplementary Figure 3

Individual sample

Supplementary Figure 4

Supplementary Figure 5

Supplementary Figure 6

Appearance characteristic

Supplementary Figure 7

Supplementary Figure 8

Supplementary Figure


Table III. Explanation features of wild herbivores.

Tibetan wild ass

Tibetan antelope

Blue sheep


Mainly in protective tones such as smoke brown and ochre brown

Mainly in protective tones such as yellow, tan, and grayish yellow

Mainly in dark tones such as cyan gray and gray


Main body is smoke brown, earthy yellow, and ochre brown, with white and brown black color blocks on the edge

Main body is yellowish, yellowish brown, and grayish yellow, with white color blocks at one end of the edge

Main body is cyan gray, gray, and dirty white.


Mosaic texture generated by brown in the back and white in the limbs and abdomen

Pure earthy yellow or similar tone, sometimes with white block at one end (buttocks).

Cyan gray to grayish white gradient


The body length of an adult Tibetan wild ass is mostly 1.6–2.3 m. Taking 4-cm resolution images as an example, the individual length is mostly approximately 40–60 pixels. A young Tibetan wild ass can be as small as 0.9 m, but it will not be isolated.

The body length of an adult Tibetan antelope is mostly 0.8–1.0 m. Taking 4-cm resolution images as an example, the individual length is mostly approximately 20–30 pixels.

The body length of adult blue sheep is mostly 1.2–1.4 m. Taking 4-cm resolution images as an example, the individual length is mostly approximately 25–305 pixels.


Nearly long rectangle or long rectangle aspect ratio mostly between 4:1 and 5:1.

Nearly long rectangle or long rectangle aspect ratio mostly between 3:1 and 5:1.

Nearly long rectangle aspect ratio mostly between 3:1 and 4:1.

Group image

Supplementary Figure 10

Supplementary Figure 11

Supplementary Figure 12

Individual sample

Supplementary Figure 13

Supplementary Figure 14

Supplementary Figure 15



Supplementary Figure 16

Supplementary Figure 17

Supplementary Figure 18


very helpful in improving the animal-detection probability in this experiment. The detection results of all kinds of large herbivores are depicted in Figure 5.



Generation and vectorization of herbivores mask

The Mask R-CNN is a semantic segmentation algorithm based on pixels; therefore, a mask image can be output at the same time of detection. In this experiment, the python language was used as the development tool to automatically extract the herbivores mask and, subsequently, convert it into contour vector output (Luo et al., 2019). The obtained herbivores mask contour vector can be imported into the geoscience analysis software, ArcGIS, to obtain the herbivores information regarding the population number, area, and distribution (Luo et al., 2019).


Generation of deep learning model loss curve

The corresponding loss curves generated during the detection process upon using the deep-learning model of the MASK R-CNN (Yak is taken as an example) are depicted in Figure 4. They are (clockwise from the upper left corner) the classification loss curve, regression loss curve, mask segmentation loss curve, and RPN regression loss curve.

Detection results of all kinds of large herbivores

The test program detected and located all kinds of large herbivores in the test set, with the training model generated via massive training. The typical results of the detection and location are depicted in Figure 5.


Accuracy evaluation of all kinds of large herbivores

In this experiment, accuracy, recall, and leakage were used as the indexes for accuracy evaluation, and the effect of herbivore recognition is presented in Table IV.

Distribution results of all kinds of large herbivores

Summarizing the statistical results of all kinds of large herbivores (have been transformed into sheep units

Table IV. Accuracy evaluation of large herbivores.

Herbivorous category

Test set











Domestic yak








Tibetan sheep
















Tibetan wild ass








Tibetan gazelle








Blue sheep













meanwhile, as shown in Table V) in each investigation belt by using the method mentioned in the previous section, the density of large herbivores distributed in different investigation belts can be estimated, as depicted in Figures 6, 7 and 8.




Population number estimation of all kinds of large herbivores

Based on the UAV investigation results and the area of Maduo County, the population number of large herbivores in Maduo county can be estimated. According to an estimation, there are 102200 Tibetan sheep, 70800 domestic yaks, 29095 Tibetan wild assess, 15433 Tibetan gazelles, 15686 blue sheep, and 1200 horses in Maduo County. It can be clearly seen that the number of large wild herbivores is considerably fewer than that of domestic herbivores, only 20.66% of the latter, as listed in Table VI.

Comparative analysis with the official number of domestic herbivores

Based on the estimation results of the domestic herbivores in Maduo County for the year 2019, assuming the birth rate is 30%, the number of Tibetan sheep on hand should be 78615 and the number of domestic yaks on hand should be 54461, by the end of the year 2018. In addition, there is no need to consider the birth rate of horses; therefore, the number of horses on hand should be 1476 by the end of the year 2018.

According to the data provided by the General grassland station of Qinghai province, at the end of the year 2018, there were 73133 Tibetan sheep, 59235 domestic yaks, and 1476 horses on hand in Maduo County. Using comparative analysis, the difference in the percentage of Tibetan sheep, domestic yaks, and horses is 7.5%, 8.1%, and 18.7%, respectively, as depicted in Figure 9.


Table V. Conversion table of standard sheep units for all kinds of herbivores.

Herbivores category

Sheep unit

Tibetan wild ass


Tibetan gazelle


Blue sheep


Domestic Yak


Tibetan sheep





Table VI. Estimation results of large herbivores by land area in Maduo County.

Large herbivore species

Density in investigation belt (number/km2)

Area of Maduo County (km2)

Estimation number of the whole county

Tibetan wild ass




Tibetan gazelle




Blue sheep




Domestic yak




Tibetan sheep










UAV remote sensing is an effective way for monitoring large herbivores in the areas of high altitude and extreme cold. In view of the wide study area and the large number of high-resolution images obtained via aviation investigation, it is difficult to perform artificial visual explanation for herbivores. However, using the deep-learning model to detect and locate large herbivores in the images not only considerably improves the efficiency of the explanation but also helps achieve high accuracy. In addition, the Mask R-CNN is a semantic segmentation algorithm based on pixels; therefore, the mask image can be output at the same time of detection. By automatically extracting herbivores mask and converting it into the contour vector output, we can obtain the information regarding the population number, area, and distribution of large herbivores in the study area. Compared with the official data, the error rate of the method proposed in this study is very low. Deep learning has obvious advantages in dealing with remote sensing big data; however, the model obtained by its training is unexplainable (Zhu et al., 2018). Therefore, in practical applications, in addition to marking a large number of samples, the model should also be adjusted according to the actual situation in order to obtain higher accuracy and efficiency (Kellenberger et al., 2018; LeCun et al., 2015).


This work was supported by the Open Research Fund of National Earth Observation Data Center (No.NODAOP2020014). We are grateful to Prof. Quanqin Shao from Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences for providing some of research contents.

Supplementary material

There is supplementary material associated with this article. Access the material online at:

Statement of conflict of interest:

The authors have declared no conflict of interest.


Chabot, D., Dillon, C. and Francis, C.M., 2018. An approach for using off-the-shelf object based image analysis software to detect and count birds in large volumes of aerial imagery. Avian Conserv. Ecol., 13: 15.

Christiansen, P., Steen, K., Jørgensen, R. and Karstoft, H., 2014. Automated detection and recognition of wildlife using thermal cameras. Sensors, 14: 13778.

Descamps, S., Béchet, A., Descombes, X., Arnaud, A. and Zerubia, J., 2011. An automatic counter for aerial images of aggregations of large birds. Bird Study, 58: 302-308.

Fretwell, P.T., Scofield, P. and Phillips, R.A., 2017. Using super-high resolution satellite imagery to census threatened albatrosses. Ibis, 159: 481-490.

Fretwell, P.T., Staniland, I.J. and Forcada, J., 2014. Whales from space: Counting southern right whales by satellite. PLoS One, 9: e88655.

Gonzalez, L.F., Montes, G.A., Puig, E., Johnson, S., Mengersen, K. and Gaston, K.J., 2015. Unmanned aerial vehicles (UAVs) and artificial intelligence revolutionizing wildlife monitoring and conservation. Sensors, 16: 1-18.

Groom, G., Stjernholm, M., Nielsen, R.D., Fleetwood, A. and Petersen, I.K., 2013. Remote sensing image data and automated analysis to describe marine bird distributions and abundances. Ecol. Inf., 14: 2-8.

Hollings, T., Burgman, M., van Andel, M., Gilbert, M., Robinson, T. and Robinson, A., 2018. How do you find the green sheep? A critical review of the use of remotely sensed imagery to detect and count animals. Methods Ecol. Evol., 9: 881-892.

Kellenberger, B., Marcos, D. and Tuia, D., 2018. Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ., 216: 139-153.

Laliberte, A. and Ripple, W., 2003. Automated wildlife counts from remotely sensed imagery. Wildl. Soc. Bull., 31: 362-371.

LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. Nature, 521: 436-444.

Liu, C.-C., Chen, Y.-H. and Wen, H.-L., 2015. Supporting the annual international black-faced spoonbill census with a low-cost unmanned aerial vehicle. Ecol. Inf., 30: 170-178.

Longmore, S.N., Collins, R.P., Pfeifer, S., Fox, S.E., Mulero-Pazmany, M., Bezombes, F., Goodwin, A., Ovelar, M.D., Knapen, J.H. and Wich, S.A., 2017. Adapting astronomical source detection software to help detect animals in thermal images obtained by unmanned aerial systems. Int. J. Remote Sens., 38: 2623-2638.

Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C. and Clune, J., 2018. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. natl. Acad. Sci., 115: E5716-E5725.

Olivares-Mendez, M.A., Fu, C.H., Ludivig, P., Bissyande, T.F., Kannan, S., Zurad, M., Annaiyan, A., Voos, H. and Campoy, P., 2015. Towards an autonomous vision-based unmanned aerial system against wildlife poachers. Sensors, 15: 31362-31391.

Quanqin, S. and Fan, J.W., 2012. Comprehensive monitoring and evaluation of ecosystem in the area of Sanjiangyuan. Science Press, Beijing.

Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N. and Prabhat, 2019. Deep learning and process understanding for data-driven Earth system science. Nature, 566: 195-204.

Rey, N., Volpi, M., Joost, S. and Tuia, D., 2017. Detecting animals in African Savanna with UAVs and the crowds. Remote Sens. Environ., 200: 341-351.

Seymour, A.C., Dale, J., Hammill, M., Halpin, P.N. and Johnston, D.W., 2017. Automated detection and enumeration of marine wildlife using unmanned aircraft systems (UAS) and thermal imagery. Sci. Rep., 7: 10.

Terletzky, P. and Ramsey, R.D., 2014. A semi-automated single day image differencing technique to identify animals in aerial imagery. PLoS One, 9: e85239.

Torney, C.J., Dobson, A.P., Borner, F., Lloydjones, D.J., Moyer, D., Maliti, H.T., Mwita, M., Fredrick, H., Borner, M. and Hopcraft, J.G.C., 2016. Assessing rotation-invariant feature classification for automated wildebeest population counts. PLoS One, 11: e0156342.

Luo, W., Wang, D., Xia, L. and Chen, S., 2019. A method of forestry resources survey based on deep learning. Forest Science and Technology.

Waldrop, M.M., 2019. News feature: What are the limits of deep learning? Proc. natl. Acad. Sci., 116: 1074-1077.

Xue, Y., Wang, T. and Skidmore, A.K., 2017. Automatic counting of large mammals from very high resolution panchromatic satellite imagery. Remote Sens., 9: 878.

Yang, Z., Wang, T., Skidmore, A.K., De, L.J., Said, M.Y. and Freer, J., 2014. Spotting East African mammals in open savannah from space. PLoS One, 9: e115989.

Zhu, X.X., Tuia, D., Mou, L., Xia, G.S. and Fraundorfer, F., 2018. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag., 5: 8-36.

To share on other social networks, click on any share button. What are these?

Pakistan Journal of Zoology


Vol. 54, Iss. 1, Pages 1-501


Click here for more

Subscribe Today

Receive free updates on new articles, opportunities and benefits

Subscribe Unsubscribe