Happy Humpbacks: A machine learning-ready drone dataset for whale detection model development
GB/NERC/BAS/PDC/02202
Page Links
Jump To:
- Citation
- Access Data
- Constraints
- Basic Information
- Additional Information
- Locality
- Instrumentation
- Storage
Related Links
Summary
Abstract:
We present the Happy Humpbacks dataset, a machine learning-ready collection of drone imagery and annotations for object detection. The dataset comprises 5,281 images containing 10,401 instances of humpback whales. Imagery was collected in the waters surrounding Palmer Station, Western Antarctic Peninsula, between January and March 2020. Data acquisition was conducted over 55 flights using a DJI Phantom 4 Pro multirotor drone, as part of the Palmer Long Term Ecological Research Program. Images were manually annotated by the British Antarctic Survey using LabelMe software, with bounding boxes delineating individual whales. Annotations are provided in both LabelMe and COCO formats, along with predefined training, validation, and test splits. This dataset captures substantial variability in whale behaviour, morphology, and environmental conditions, reflecting the challenges of real-world remote sensing imagery. It is intended to support the development and benchmarking of object detection models for automated whale monitoring.
This work was supported by the Natural Environment Research Council grant (Grant no. NE/S007164/1). Data was collected as part of the Palmer Long Term Ecological Research Program (Grant no. 1440435 ).
Keywords:
Cetacean, computer vision, deep learning, drone, humpback whale, object detection, remote sensing, unoccupied aerial systems, unoccupied aerial vehicles
Citation
Houliston, H.R., Cheng, Y., Johnston, D.W., Larsen, G.D., Friedlaender, A.S., Fretwell, P.T., Jackson, J.A., Cubaynes, H.C., Schonlieb, C., & Aviles-Rivero, A. (2026). Happy Humpbacks: A machine learning-ready drone dataset for whale detection model development (Version 1.0) [Data set]. NERC EDS UK Polar Data Centre. https://doi.org/10.5285/7a952870-9880-415d-a8ab-194fedf01a26
Access Data
GET DATA
REFERENCE MATERIALS
- https://pallter.marine.rutgers.edu/docs/publications/sitreps/2020/01.pdf
- https://pallter.marine.rutgers.edu/docs/publications/sitreps/2020/02.pdf
- https://pallter.marine.rutgers.edu/docs/publications/sitreps/2020/03.pdf
VIEW PROJECT HOME PAGE
VIEW RELATED INFORMATION
Constraints
| Access Constraints: | Data are under embargo until publication of the associated manuscript. |
|---|---|
| Use Constraints: | These data are governed by the NERC data policy http://www.nerc.ac.uk/research/sites/data/policy/ and supplied under Open Government Licence v.3 http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/. |
Basic Information
| Creation Date: | 2026-05-11 |
|---|---|
| Dataset Progress: | Complete |
| Dataset Language: | English |
| ISO Topic Categories: |
|
| Parameters: |
|
| Personnel: | |
| Name | UK Polar Data Centre |
| Role(s) | Metadata Author |
| Organisation | British Antarctic Survey |
| Name | Holly R Houliston |
| Role(s) | Investigator, Technical Contact |
| Organisation | British Antarctic Survey |
| Name | Yanqi Cheng |
| Role(s) | Investigator |
| Organisation | University of Cambridge |
| Name | David W Johnston |
| Role(s) | Investigator |
| Organisation | Duke University |
| Name | Gregory D Larsen |
| Role(s) | Investigator |
| Organisation | Alaska Department of Natural Resources |
| Name | Ari S Friedlaender |
| Role(s) | Investigator |
| Organisation | Santa Cruz University |
| Name | Peter T Fretwell |
| Role(s) | Investigator |
| Organisation | British Antarctic Survey |
| Name | Jennifer A Jackson |
| Role(s) | Investigator |
| Organisation | British Antarctic Survey |
| Name | Hannah C Cubaynes |
| Role(s) | Investigator |
| Organisation | British Antarctic Survey |
| Name | Carola-Bibiane Schonlieb |
| Role(s) | Investigator |
| Organisation | University of Cambridge |
| Name | Angelica I Aviles-Rivero |
| Role(s) | Investigator |
| Organisation | University of Cambridge |
| Parent Dataset: | N/A |
Additional Information
| Quality: | All bounding box annotation were manually reviewed after conversion to COCO format to ensure consistency and accuracy. Detectability of whales varied due to environmental conditions and individual appearance, including adverse weather, low contrast of dark-bodied individuals and partial occlusion. In addition, known limitations of manual annotation in remote sensing, such as annotator fatigue and subjectivity, may have introduced variability in the labels. | |
|---|---|---|
| Lineage/Methodology: | Imagery was collected using a DJI Phantom 4 Pro multirotor drone with a default gimbal-stabilised RGB camera (1" CMOS sensor; 5472 x 3648 pixels), capturing data across the red, green, and blue bands of the visible spectrum. The drone was launched and recovered by hand from a small boat. Data acquisition was conducted over 55 drone flights covering approximately 40 km2 as part of the Palmer Long Term Ecological Research Program. Images were captured in bursts when whales surfaced, with the drone pilot manually triggering the camera shutter. Although the primary purpose of data collection was photogrammetry, images were acquired at a range of off-nadir angles, and whales were not always fully positioned within the frame. Manual image annotation was conducted using LabelMe, with bounding boxes used to delineate the extent of each humpback whale and a single class label ("humpback whale") assigned. Non-target objects, including other wildlife and boats, were not annotated. The annotations were subsequently converted from LabelMe format to COCO format to support object detection model development. The dataset was partitioned into training (70%), validation (10%), and test (20%) subsets. As images were collected in bursts, where possible, a flight-wise split was applied to avoid overlap of similar images being found in different subsets. This dataset was developed for training and evaluation of an object detection model for automated humpback whale detection (in preparation). |
|
Locality
| Temporal Coverage: | |
|---|---|
| Start Date | 2020-01-15 |
| End Date | 2020-03-18 |
| Spatial Coverage: | |
| Latitude | |
| Southernmost | -64.81877 |
| Northernmost | -64.72691 |
| Longitude | |
| Westernmost | -64.20687 |
| Easternmost | -63.91881 |
| Altitude | |
| Min Altitude | N/A |
| Max Altitude | N/A |
| Depth | |
| Min Depth | N/A |
| Max Depth | N/A |
| Location: | |
| Location | Antarctica |
| Detailed Location | Western Antarctic Peninsula, Palmer Station |
Instrumentation
| Data Collection: | Data was collected using a DJI Phantom 4 Pro multirotor drone. Manual image annotation was conducted on LabelMe software (Version 5.11.1). Resulting LabelMe annotations were converted to COCO format using the labelme2coco package in Python (Version 3.11.14). |
|---|
Storage
| Data Storage: | - Images: 5281 files (.JPG), 42.46GB - LabelMe Annotations: 5281 files (.json), 13.34GB - COCO Annotations: 4 files (.json), 4.3 MB - Total volume of data: 55.8 GB |
|---|