Abstract:
This dataset contains model input and output data on emperor penguin population dynamics for a Bayesian analysis carried out on multivariate classification results. Model input data comprises multivariate classification analysis results derived from very-high resolution (VHR) satellite imagery pertaining to 16 emperor penguin colonies, spanning the Bellingshausen Sea to the Weddell Sea between 2009 to 2023. Model output data comprises population estimates for each year for each colony, global trends per year, global change for the dataset overall, global abundance pertaining to individual colonies, as well as statistical parameter estimates provided by the model.
Data collection was carried out by personnel at BAS.
Funding from WWF UK (GB095701), project NE/Y00115X/1 "Understanding emperor penguin populations in the Weddell Sea and Antarctic Peninsula" and previous WWF funding over the 15 year period.
Keywords:
Antarctica, Bayesian analysis, Climate change, Emperor penguins, Population trajectory, Sea ice
Fretwell, P., Bamford, C., Skachkova, A., Trathan, P., & Forcada, J. (2025). Model input and output statistics on fifteen-year assessment of emperor penguin breeding populations from the Bellingshausen sea to the Weddell sea from VHR satellite imagery, 2009-2023 (Version 1.0) [Data set]. NERC EDS UK Polar Data Centre. https://doi.org/10.5285/c8d8ffe6-0aff-493c-b40f-e63f0c35f081
Access Constraints: | Dataset is under embargo until the publication of the associated manuscript. |
---|---|
Use Constraints: | Data supplied under Open Government Licence v3.0 http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/. |
Creation Date: | 2025-04-16 |
---|---|
Dataset Progress: | Planned |
Dataset Language: | English |
ISO Topic Categories: |
|
Parameters: |
|
Personnel: | |
Name | UK Polar Data Centre |
Role(s) | Metadata Author |
Organisation | British Antarctic Survey |
Name | Dr Peter T Fretwell |
Role(s) | Investigator, Technical Contact |
Organisation | British Antarctic Survey |
Name | Dr Connor C G Bamford |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Name | Aliaksandra C G Skachkova |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Name | Dr Philip N Trathan |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Name | Dr Jaume Forcada |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Parent Dataset: | N/A |
Reference: | Associated publication: Fretwell, P.T., Bamford, C., Skachkova, A., Trathan, P.N., Forcada, J. (2025) Regional emperor penguin population declines exceed modelled projections. Nature Comms Earth and Environment in press. Methodology references: Gelman, A., & Shirley, K. Inference from Simulations and Monitoring Convergence. In S. Brooks, A. Gelman, G. Jones, & X.-L. Meng (Eds.), Handbook of Markov Chain Monte Carlo (1st ed., pp. 163-174). Chapman and Hall/CRC (2011). LaRue et al. Advances in remote sensing of emperor penguins: first multi-year time series documenting trends in the global population (2024). Proc. R. Soc. B.29120232067 |
|
---|---|---|
Quality: | To convert the area of penguins in each image to an index of abundance, we followed an approach similar to that used by LaRue et al. 2024, using a state-space analysis of emperor penguin population dynamics and observation process, and modified for satellite observations but without additional aerial or ground counts. The population-process accounted for daily changes in satellite counts over the survey period (August to November) in each of 16 analysed colonies. Colony-level trends and annual fluctuations were assessed considering the persistence of colonies at the same locations given physical changes (e.g., fast ice conditions), and occupancy (i.e., presence or absence for unknown reasons). The observation process accommodated bias and precision in satellite image counts due to data collection, and changes over survey period due to chick mortality and subsequent emigration by attendant and non-attendant adults. Expected abundance at colony and year was modelled according to LaRue et al. 2024. Due to a relatively low sample size and to the sparse data of some colonies, colony-level effects were modelled as random effects. Initial population states were assumed to have a log-normal distribution with hyperparameters estimated empirically from the data. The observation process assumed that the size of the areas occupied by penguins in satellite images were normally distributed and corrected for estimate bias due to interpretation of satellite images. Further, a log-linear effect was used to consider the number of adults that decline over the spring survey period, where the variable described the proportional change in expected count for each day elapsed in the survey. To accommodate observer bias, image quality scores were related to these variables in a discriminant analysis of principal components, using R package adegenet . Here, we selected the optimum number of principal components using cross-validation and used observer-assigned image quality as grouping variable in a discriminant analysis. The results provided a covariate with predicted image quality, and the subsequently fitted beta coefficients assumed that satellite observations had a constant proportional bias. |
|
Lineage/Methodology: | This dataset contains csv tables showing supplemental data from the results of the Bayesian analysis of the multivariate classification results. Table 5 summarises the Bayesian results, while tables 1-3 give specific outputs and statistical fits. Table 4 includes the underlying results from the multivariate classification analysis derived from satellite imagery of each of the 16 emperor penguin colonies in the segment of Antarctica studied over the 15 year period between 2009 and 2023. This was done using ArcGIS using VHR satellite imagery, details of each and the associated locations are recorded in the table. It also contains the information used on image quality, and satellite information used to parameterise the Bayesian model. Tables are described in more detail in the data storage section. Satellite imagery We used optical satellite data from the MAXAR VHR satellite constellation (https://resources.maxar.com/brochures/the-maxar-constellation) that was speculatively tasked over each colony location each year, accessed from the MAXAR search and Discovery platform (https://discover.maxar.com/ and similar previous versions). A single section of each image with a minimum window of 25 km2 was obtained for each colony each year. In several cases, images were unsuitable as there was low cloud, low sun angle or poor environmental conditions obscuring or making penguins difficult to identify. Where possible, additional images were acquired. Additional information of image ID and scene quality is contained in Table 4. The analysis was restricted to the austral spring, between the end of August, when the sun comes up at the more northerly colonies, and the end of November before the adults depart the colonies as chicks fledge. The general method of pre-processing followed that of LaRue et al. 2024, first loading each image into ArcGIS (ESRI ArcGIS version 10.6 2024 and earlier versions), and projecting the images into the local UTM projection for more accurate area assessment. Imagery was pan-sharpened using ArcGIS, and enhanced using a "Standard Deviation" histogram stretch. The location of the colony was then isolated, and the colony was cropped from the surrounding image manually to avoid excessive processing time. A supervised classification analysis was then conducted on each image, using a multivariate classification analysis in ArcGIS, which involves training a model on manually chosen pixels of penguin, guano, snow and shadow. These training data were chosen manually by experienced observers and usually equated to between 50-200 examples from each class, depending upon the homogeneity of the background image. Once the classes have been identified the multivariate classification algorithm divides the image into the available classes, one of which will be penguins. This process is iterative and usually training data needs to be refined multiple times before an acceptable confidence (>95% agreement between manual and automated observations) is reached. From these results the penguin pixels were isolated and converted into a shapefile to calculate. The shapefile gives the area occupied by penguins in each image. The area information from each shapefile are included in Table 5 as well as image IDs and image quality. This process was undertaken for a total of 241 images over the 15-year period. Model fitting and assessment: We used Markov Chain Montel Carlo methods in program JAGS, run from R (v.4.4.0; R Core Team, 2024) using package jagsUI to fit the state-space models. We used 550,000 iterations of three Markov chains using dispersed parameter values as starting values and discarded the first 50,000 samples of each chain as burn-in, thinning the remainder to every 50th sample, which produced 10,000 posterior samples from each Markov chain. We assessed chain convergence visually using trace plots, through the mixing of the chains and sample autocorrelation plots and using the R^ potential scale reduction factor statistic. We selected similar prior distributions as LaRue et al., and we assessed the model's goodness-of-fit using posterior predictive checks. We obtained the root mean square error of simulated data and the observed data and obtained and estimated Bayesian p-value. P-values close to 0.5 indicated a reasonable fit and occurred when the fitted statistical model was equivalent to the 'true' model, which generated the data. |
Temporal Coverage: | |
---|---|
Start Date | 2008-08-01 |
End Date | 2023-12-31 |
Spatial Coverage: | |
Latitude | |
Southernmost | 0 |
Northernmost | 90 |
Longitude | |
Westernmost | -77 |
Easternmost | -64 |
Altitude | |
Min Altitude | N/A |
Max Altitude | N/A |
Depth | |
Min Depth | N/A |
Max Depth | N/A |
Data Resolution: | |
Latitude Resolution | N/A |
Longitude Resolution | N/A |
Horizontal Resolution Range | N/A |
Vertical Resolution | 0.6m - 0.3m |
Vertical Resolution Range | N/A |
Temporal Resolution | N/A |
Temporal Resolution Range | N/A |
Location: | |
Location | Antarctica |
Detailed Location | Weddell Sea |
Location | Antarctica |
Detailed Location | Bellingshausen Sea |
Location | Antarctica |
Detailed Location | Dronning Maud Land |
Location | Antarctica |
Detailed Location | Antarctic peninsula |
Distribution: | |
---|---|
Distribution Media | Online Internet (HTTP) |
Distribution Size | 175 KB |
Distribution Format | ASCII |
Fees | N/A |
Data Storage: | There are 5 .csv files and 1 .pdf file, totalling 175KB. Table_1_global_abudance.csv: individual results of the areas associated with each penguin colony for each year analysed showing the area calculated from the Bayesian model and the relevant statistics . Columns show the statistical confidence in the model and include the mean, standard error, median and upper and lower confidence levels of 0.25%,05% 95% and 97.5%. Table_2_global_change.csv: the overall statistics from the model of change for the whole dataset. Columns show the probability of overall population of population decline as well as the probability (between 0 and 1) of a decline of 30%, 50% and the % confidence (at 2.5%, 5%,50%, 95% and 97.5%) as well as the mean confidence and the standard error of those figures. Table_3_global_trend.csv: The estimated trend per year as a ratio, with relevant statistics. These include the confidence levels for a 2.5%, 5%, 50%, 95%, and 97.5% decline, as well as the mean, standard error and probability of a decline (negative). Table_4_images_and_image_quality.csv the spreadsheet recording the images used in the classification analysis, the resulting areas from each of these analyses and various quality assessments used in the Bayesian model. Table_5_colony_summary.csv: parameters areas and estimates for each year for each colony from the Bayesian model based upon the statistical analysis from the classified area estimates. Model_specifications_metadata.pdf: Additional specifications about the bayesian model construction. |