Abstract:
This dataset presents the suitable area(s) for very high-resolution optical satellite imagery to monitor live and stranded cetaceans around the UK and UK Overseas Territories, based on five-year monthly median 'Total Cloud Cover' and '10m Wind Speed' ERA5 global reanalysis data.
Monitoring live and stranded cetaceans can be expensive and logistically challenging resulting in knowledge gaps. Very high-resolution (VHR) optical satellites are considered a potential solution to addressing some of these gaps. Despite success at detecting live and stranded cetaceans, satellites have only been trialled on restricted spatial and temporal scales. We established a framework for assessing the feasibility of using VHR optical satellite-based monitoring of cetaceans at high temporal frequency and local to global scales, focusing on the UK and UK Overseas Territories as a case study. We assessed the primary environmental conditions necessary for successful application of this technology: cloud cover and wind speed. Here we present the spatial feasibility of satellite monitoring around the UK, and the Caribbean and the Falkland Islands (Islas Malvinas), based on five-year (2018-2022) monthly median 'Total Cloud Cover' and '10m Wind Speed' ERA5 global reanalysis data. The data are .tif format depicting the five-year (2018-2022) monthly median of the respective environmental variable, which is subject to a user defined threshold to generate vector (polygon shapefile) format feasibility maps, depicting the 'suitable area(s)' mapped to the study area. For live cetacean monitoring, 'suitable area(s)' delineate where both five year monthly average environmental variables met the predefined threshold over open water, and for stranded cetaceans 'Total Cloud Cover' only along the coastline (1km either side of the coastline). The suitable areas are merged (and dissolved) for projects interested in monitoring both live and stranded cetaceans, which can be extended to include monitoring of floating dead cetaceans.
This research has been supported by the Natural Environment Research Council (NERC) through a SENSE CDT studentship (grant no. NE/T00939X/1) and the Joint Nature Conservation Committee.
Keywords:
ECMWF-ERA5, cloud cover, marine mammals, remote sensing, wind
Clarke, P., Skachkova, A., Jackson, J., Cubaynes, H., & Jones, G. (2025). Suitable areas for very high-resolution optical satellite imagery to monitor live and stranded cetaceans around the UK and UK Overseas Territories, based on ERA5 reanalysis data (2018-2022) (Version 1.0) [Data set]. NERC EDS UK Polar Data Centre. https://doi.org/10.5285/451a6a5d-a17b-4d71-aad0-188739403d8c
Access Constraints: | None |
---|---|
Use Constraints: | Data supplied under Open Government Licence v3.0 http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/. |
Creation Date: | 2025-07-02 |
---|---|
Dataset Progress: | Complete |
Dataset Language: | English |
ISO Topic Categories: |
|
Parameters: |
|
Personnel: | |
Name | UK Polar Data Centre |
Role(s) | Metadata Author |
Organisation | British Antarctic Survey |
Name | Penny J Clarke |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Name | Aliaksandra C G Skachkova |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Name | Dr Jennifer Jackson |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Name | Dr Hannah C Cubaynes |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Name | Dr Gwawr Jones |
Role(s) | Investigator |
Organisation | Joint Nature Conservation Committee |
Parent Dataset: | N/A |
Reference: | Associated paper:Clarke et al. (2025 ) Talking about the weather: the feasibility of using very high-resolution optical satellite imagery to monitor live and stranded cetaceans around the UK and UK Overseas Territories. Marine Mammal Science. The input data needed to recreate this dataset can be accessed from: Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2023): ERA5 monthly averaged data on single levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), DOI: 10.24381/cds.f17050d7[Accessed February 8, 2024]. Guidance on how to access and download the data and assigning the correct naming convention for use with the reproducible code (https://github.com/PennyJClarke/feasibility-mapping), can be located in the GitHub repository (downloading_era5_data.pdf) alongside the code or in the associated manuscript (supplemental_material_2). Additional datasets necessary to reproduce this data include : (1) Global land shapefile (1:10m) Natural Earth. 2025. Downloads [Online]. Available: https://www.naturalearthdata.com/downloads/ [Accessed September 7, 2023]. (2) Global coastline shapefile: OpenStreetMap. 2019. Coastlines [Online]. Available: https://osmdata.openstreetmap.de/data/coastlines.html [Accessed February 10, 2025]. (3) Exclusive Economic Zone(s) world: Flanders Marine Institure. 2023. Maritime Boundaries Geodatabase, version 12 [Online]. Available: https://www.marineregions.org/. https://doi.org/10.14284/628 [Accessed July 06, 2024]. (4) Exclusive Economic Zone UK: UK Hydrographic Office. 2025. UK Hydrographic Office Maritime Limits and Boundaries - Overview [Online]. Available: https://data.admiralty.co.uk/portal/home/item.html?id=bf77b2ac1b654efc95dc3665c0501e23Licence https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ [Accessed July 30, 2024]. |
|
---|---|---|
Quality: | The datasets were extracted from the European Centre for Medium-range Weather Forecasts (ECMWF) reanalysis ERA5 data. The ERA5 data used in the feasibility mapping exercise is a synthetic version of Earth's atmospheric conditions, and its performance is dependent upon the quality of observed and modelled data. As observation records are spatially and temporally biased and inconsistent in their standards, modelled data will be less accurate in regions with fewer observations. We used monthly averages to map feasibility; more refined daily products available through ERA5 could provide higher resolution information about satellite imaging feasibility than our exploration. Regardless of the timescale; daily; hourly; monthly; or yearly, analysing averages can conceal anomalous periods. ERA5 data is a global dataset, which is advantageous for the replicability and comparability of these feasibility assessments on a global scale. However, in the future, regional reanalysis datasets or locally derived observation datasets, may provide a higher resolution alternative to ERA5, which, when combined with local knowledge, could achieve more accurate local-level assessments. The spatial resolution of ERA5 data is much coarser than the VHR optical satellite imagery we are interpreting feasibility for. The data are also much coarser than the conditions which they are representing, and which conservation managers require to make informed decisions. ERA5 is a gridded dataset with a native horizontal resolution of approximately 31km (0.28125 degrees), which, when downloaded, is reprojected to a 0.25 by 0.25 degrees latitude/longitude (the resolution of the data in kilometres is consistent across latitude, however, longitude decreases poleward due to convergence). |
|
Lineage/Methodology: | This dataset uses as input the following datasets: 'ERA5 monthly averaged data on single levels from 1940 to present' sub-variables, 'Total cloud cover' and '10m wind speed' (see https://doi.org/10.24381/cds.f17050d7) 'Total cloud cover' (0 and 1 or 0% and 100%) is the portion of a grid cell covered by cloud, and integrates all cloud information from the Earth's surface to the top of atmosphere. '10m wind speed' (ms-1) represents horizontal wind speed in meters per second, at 10m above the Earth's surface. For details on precisely how to download ERA5 data, see supplemental_ material_ 2 in the associated manuscript or view downloading_era5_data.pdf in the associated Github repository: https://github.com/PennyJClarke/feasibility-mapping. The bounding box coordinates of the three study areas were selected to encompass the entirety of the Exclusive Economic Zone (EEZ) of the study area, and used to delineate and extract wind and cloud data from the ECMWF reanalysis ERA5 data. The assessment of all data was conducted using open source QGIS 3.28 pyQGIS console. An open-source pipeline to identify areas that meet a threshold of environmental variables in extracted ERA5 data, is available at: https://github.com/PennyJClarke/feasibility-mapping and a description follows. All years of data, (1940-2023), were downloaded and later subset to extract the most recent five years (2018-2022) of data for environmental variables 'Total cloud cover' and '10m wind speed'. To perform a threshold analysis, for each month, first we calculated the five-year monthly median for 'Total cloud cover' and '10m wind speed' from 2018-2022 (.tif files). Median was selected for its greater representation of central tendency. The monthly median outputs were then masked to retain only those areas that met or were below the predefined threshold. The masked areas of both variables were converted into a vector (polygon shapefile). For live cetaceans, both vectors were used to perform an intersection (extraction of overlapping features from the two input layers) of the matching monthly 'Total cloud cover' and '10m wind speed'. The resulting final 'suitable area(s)', where both 5 year monthly average environmental variables met the predefined threshold (vector polygon shapefiles), was mapped to the study area to develop feasibility maps. For stranded cetaceans, the outputs for 'Total cloud cover' alone, masked by the coastline (to include 1km buffer either side of the coastline) form the resulting final 'suitable area(s)' to develop feasibility maps. Finally, for projects interested in exploring stranded and floating dead/live cetaceans ('float_and_strand'), the 'suitable area(s)' for both live and stranded cetaceans were merged and boundaries dissolved, to develop feasibility maps. In addition, to understand the (interannual) variability in the dataset, the standard deviation was extracted and mapped (.tif format). To evaluate monthly average 'Total cloud cover', we devised thresholds to define the likelihood of achieving a suitable image collection (see table 1 below and detailed further details in the associated paper). The term suitable image is used to describe an image that is cloud free or contains thin or spatially restricted cloud, which does not render an image unusable and for live cetaceans only, where the sea state is below Beaufort 4 (equal 8 ms-1, for studies monitoring small cetacean species, it may be sensible to be conservative and reduce the wind speed threshold to Beaufort 3, equal to 5 ms-1). Table 1 - ERA5 'Total Cloud Cover (TCC)' monthly average thresholds (%) and associated likelihood of achieving a suitable image collection. 'TCC' is given as a value between 0 and 1, as probability is a value between 0 and 1 or 0% and 100% inclusive, therefore, to assess the likelihood (percentage likelihood) of achieving a suitable image collection based on 'TCC', in this project the ERA5 grid cell values were evaluated as a percentage (between 0 and 100%). ERA5 'Total Cloud Cover (TCC)' monthly average threshold (%) Likelihood of achieving a suitable image collection based on 'TCC' <20 Certain 20-40 Almost certain 40-60 Likely 60-80 Possible 80-99 Unlikely 100 Impossible Three study areas were selected to assess the feasibility of collecting useful VHR optical satellite imagery at large temporal and spatial scales in UK and UKOT waters. (a) The entire British Isles was selected to address the EFRA's recommendation that VHR optical satellites represent a principal solution for addressing the UK''s cetacean monitoring gaps. Overseas study areas (b) the Falkland Islands (Islas Malvinas) and (c) the Caribbean, encompassing five UKOTs, were also selected, given their highly diverse environmental conditions when compared with the UK. |
Temporal Coverage: | |
---|---|
Start Date | 2018-01-01 |
End Date | 2022-12-31 |
Spatial Coverage: | |
Latitude | |
Southernmost | 5 |
Northernmost | 35 |
Longitude | |
Westernmost | -90 |
Easternmost | -50 |
Altitude | |
Min Altitude | N/A |
Max Altitude | N/A |
Depth | |
Min Depth | N/A |
Max Depth | N/A |
Latitude | |
Southernmost | 45 |
Northernmost | 65 |
Longitude | |
Westernmost | -30 |
Easternmost | 10 |
Altitude | |
Min Altitude | N/A |
Max Altitude | N/A |
Depth | |
Min Depth | N/A |
Max Depth | N/A |
Latitude | |
Southernmost | -58 |
Northernmost | -45 |
Longitude | |
Westernmost | -67 |
Easternmost | -50 |
Altitude | |
Min Altitude | N/A |
Max Altitude | N/A |
Depth | |
Min Depth | N/A |
Max Depth | N/A |
Location: | |
Location | United Kingdom |
Detailed Location | N/A |
Location | Falkland Islands |
Detailed Location | N/A |
Location | Caribbean |
Detailed Location | N/A |
Data Collection: | The assessment of all data was conducted using open source QGIS 3.28 pyQGIS console. An open-source pipeline to identify areas that meet a threshold of environmental variables in extracted ERA5 data, is available at: (https://github.com/PennyJClarke/feasibility-mapping) |
---|
Distribution: | |
---|---|
Distribution Media | Online Internet (HTTP) |
Distribution Size | N/A |
Distribution Format | N/A |
Fees | N/A |
Data Storage: | This dataset consists of 6340 Files, 178 Folders, 1.11GB. The data are provided in the following formats: .tif format (with accompanying .tfw world file, which contains information on the location, scale and rotation of the raster stored in the associated .tif file), and .shp (final suitable area(s)). The folder structure is as follows: 1. 'final_suitable_area' - the suitable area(s) outputs for 'live' and 'float_and_strand' cetacean types separately (live refers to free swimming cetaceans and suitable areas are based on both 'Total Cloud Cover' and '10m Wind Speed' variables at sea, and float_and_strand merges the suitable area(s) for live and stranded cetaceans (stranded cetaceans only considers 'Total Cloud Cover' 1km either side of the coastline)). Within each cetacean type folder, the results are stored per location/region of interest, e.g., 'uk', 'caribbean', or 'falklands'. The file naming convention for live is 'uk_april_final_aoi_cloud20_wind5_live.shp', equal to uk = location, april = month, final_aoi, cloud20 = cloud threshold 20%, wind5 = wind threshold 5 ms-1, and live = cetacean type live. The file naming convention for 'float_and_strand' includes two files per region, per month, and per threshold (1) ending with '_live_merge.shp' and (2) ending with '_float_and_strand.shp'. '_live_merge.shp' is the merged 'live' and 'stranding' suitable area(s), and '_float_and_strand.shp' is the final product where the boundaries between the merged layers have been dissolved. For example, 'uk_april_final_aoi_cloud20_wind5_float_and_strand.shp' and 'uk_april_final_aoi_cloud20_wind5_live_merge.shp' 2. 'total_cloud' - for each cetacean type 'live' or 'stranding', contains a sub-folder per region of interest, e.g., 'uk', 'caribbean', or 'falklands'', which contains a sub-folder folder per statistic 'median' or 'std_dv' (standard deviation). For 'median' the folder contains the final .tif (and accompanying .tfw) per five year monthly median 'Total Cloud Cover', and .shp polygon shapefiles depicting the suitable ares(s) that meet each user defined threshold for 'Total Cloud Cover'. The file naming convention for the .tif and .tfw file is 'uk_april_total_cloud_subset_median.tif', where uk = location, april = month, total_cloud = environmental variable, subset = 5 year subset, and median = statistic (for standard deviation the statistic would be noted as std_dv). The file naming convention for the .shp file is 'uk_april_total_cloud_subset_median_20_sa_live.shp', where uk = location, april = month, total_cloud = environmental variable, subset = 5 year subset, median = statistic, 20 = cloud cover threshold, sa = suitable area, and live = cetacean type (for stranding the cetacean type would be stranding). 3. 'wind_speed' - for cetacean type 'live' cetacean only, contains a sub-folder per region of interest, e.g., 'uk', 'caribbean', or 'falklands', which contains a sub-folder folder per statistic 'median' or 'std_dv' (standard deviation). For the 'median' the folder contains the final .tif (and accompanying .tfw) per five year monthly median '10m Wind Speed', and .shp polygon shapefiles depicting the suitable ares(s) that meet each user defined threshold for '10m Wind Speed'. The file naming convention for the .tif and .tfw file is 'uk_april_wind_speed_subset_median.tif', where uk = location, april = month, wind_speed = environmental variable, subset = 5 year subset, and median = statistic (for standard deviation the statistic would be noted as std_dv). The file naming convention for the .shp file is 'uk_april_wind_speed_subset_median_8_sa_live.shp', where uk = location, april = month, wind_speed = environmental variable, subset = 5 year subset, median = statistic, 8 = wind speed threshold, sa = suitable area, and live = cetacean type. Please note: within the dataset are a number of .shp files with zero extent . These datasets represent the months for a given region where no 'suitable area(s)' exist (where no environmental parameters met the user defined threshold). These files are included in this dataset, because null areas are equally as valuable to document as those that are suitable, for indicating regions or months not suitable for investment of VHR satellite imagery to monitor live and stranded cetaceans. The list for all the .shp files with zero extent, documented by location they are stored can be found in the readme.txt file at the top of the folder tree of this dataset. |