Abstract:
We present a concurrent series of 144 monthly reanalyses of Super Dual Auroral Radar Network (SuperDARN) plasma velocity measurements, using the method of datainterpolating Empirical Orthogonal Functions (EOFs). For each monthly reanalysis, the 5minute median values of the northern polar region's radarmeasured lineofsight Doppler plasma velocities are binned in an equalarea grid defined in quasidipole latitude and quasidipole magnetic local time (MLT). The grid cells each have an area of approximately 110,000km2, and the grid extends to 30 degrees colatitude. Within this spatial grid, the sparse binned data are infilled to provide a measurement at every spatial and temporal location via two different EOF analysis models: one tailored to instances of low data coverage, the other tailored to higher data coverage. These two models each comprise 144 monthly sets of orthogonal modes of variability (spatial and temporal patterns), along with the timestamps of each epoch, and the spatial coordinate information of all bin locations. A companion dataset determines which of the two models should be chosen in each location for each month, in order to ensure the best accuracy of the infill solution. We also provide the temporal mean of the data in each spatial bin, which is removed prior to the EOF analysis. Collectively, the reanalysis delivers the SuperDARN data in terms of cardinal north and east vector components (in the quasidipole coordinate frame), without its usual extreme sparseness, for studies of ionospheric electrodynamics for the period 1997.0 to 2009.0.
Funding was provided by NERC Standard grant NE/N01099X/1, titled 'Thermospheric Heating Modes and Effects on Satellites' (THeMES) and the NERC grant NE/V002732/1, titled 'Space Weather Instrumentation, Measurement, Modelling, and Risk: Thermosphere' (SWIMMRT).
Keywords:
Data Interpolating Empirical Orthogonal Functions, Ionospheric electrodynamics, Plasma velocity, SuperDARN reanalysis, Upper atmosphere dynamics
Shore, R., Freeman, M., Chisham, G., Lam, M., & Breen, P. (2022). Dominant spatial and temporal patterns of horizontal ionospheric plasma velocity variation covering the northern polar region, from 1997.0 to 2009.0  VERSION 2.0 (Version 2.0) [Data set]. NERC EDS UK Polar Data Centre. https://doi.org/10.5285/2b9f0e9f34ec44679e02abc771070cd9
Access Constraints:  None 

Use Constraints:  This data is governed by the NERC data policy http://www.nerc.ac.uk/research/sites/data/policy/ and supplied under Open Government Licence v.3 http://www.nationalarchives.gov.uk/doc/opengovernmentlicence/version/3/ 
Creation Date:  20220330 

Dataset Progress:  Complete 
Dataset Language:  English 
ISO Topic Categories: 

Parameters: 

Personnel:  
Name  UK PDC 
Role(s)  Metadata Author 
Organisation  British Antarctic Survey 
Name  Robert M Shore 
Role(s)  Investigator 
Organisation  British Antarctic Survey 
Name  Mervyn Freeman 
Role(s)  Investigator, Technical Contact 
Organisation  British Antarctic Survey 
Name  Gareth Chisham 
Role(s)  Investigator 
Organisation  British Antarctic Survey 
Name  Mai Mai Lam 
Role(s)  Investigator 
Organisation  British Antarctic Survey 
Name  Paul Breen 
Role(s)  Investigator 
Organisation  British Antarctic Survey 
Parent Dataset:  N/A 
Reference:  This paper describes the methodology in detail: Shore, R. M., Freeman, M. P., & Chisham, G. (2021). Datadriven basis functions for SuperDARN ionospheric plasma flow characterization and prediction. Journal of Geophysical Research: Space Physics, 126, e2021JA029272. https://doi.org/10.1029/2021JA029272. This paper describes the reanalysis dataset in detail: 'A reanalysis of SuperDARN plasma velocity variability for solar cycle 23', by Shore et al., intended for publication in JGR Space Physics. 


Quality:  The SuperDARN data were processed to remove ground scatter, and to eliminate measurements with too low power (lower than 3dB), or which had a poorquality flag (identified in RSTv4.0). When binning the data, range gates below 11 and above 150 (where those values correspond to multiple of 45 km range distance from the radar array location) were not used, since these gave inaccurate locational estimates.  
Lineage:  The data were gathered using the northern hemisphere radars of the SuperDARN global array, and the fitted Doppler velocities were processed from the original autocorrelation functions using version 4.5 of the radar software toolkit (RSTv4.5) and within that toolkit, fitting routine 'FitACF v2.5'. Following the data binning into an equalarea grid and 5min medians as described in section 2 above, the data gaps are infilled as follows. We initially infill the data gaps in the sparse binned data with zeros, and then we apply the method of datainterpolating Empirical Orthogonal Functions (EOFs). This allows global (i.e., the full extent of the binned data set) spatial and temporal basis vector patterns to be obtained. These basis vectors collectively describe the full variability of the dataset. The form (i.e., morphology/shape) of the basis vectors is controlled by the crosscorrelations within the dataset. Since the ionospheric plasma velocity is strongly correlated in space and time, the spatial and temporal behaviour of the basis vectors with the largest eigenvalues (i.e., those which describe most of the variability in the dataset) are defined by the underlying physics of the ionospheric plasma. In contrast, since the missing data are relatively uncorrelated in space and time, the missing data contribute to lowereigenvalue basis vectors. This provides us with a method to infill the missing values with the largesteigenvalue basis vector, which is a better guess for the underlying plasma velocity field than the initial infill of zeros. Moreover, we have done this without any a priori specification of source geometry. The two models described in section 2 differ in how they compute the infilled values, as follows. In one of the two models, we apply a set of weights to the data before we compute the covariance matrix (from which the eigenvectors of the reanalysis are determined). These weights act to decrease the relative contribution of poorly sampled epochs to the solution for the variance of the data. In the second of the two models, we likewise apply weights, but this time the weights act to increase the relative contribution of poorly sampled spatial regions to the solution for the variance of the data. By combining both models together (dependent on which one reconstructs the existing data with best accuracy for a given location), we can optimise the full reanalysis for both high and low data coverage conditions. The EOFsolutionandinfill process is repeated iteratively, until the amplitude of the infill converges with that of the data measurements, where both overlap. This infill only converges when it reinforces patterns present in the original data, thus providing a selfconsistent description of the plasma velocity at the original temporal resolution of the SuperDARN data set. This gives complete spatial and temporal coverage without resorting to climatological averages, spatially smoothed models, or a priori relationships determined from solar wind drivers. Following this retrieval of the unmeasured variability of the data, we fit a sinusoid model to translate the basis vectors from their lineofsight (i.e., radar look direction) basis to a basis of cardinal north and east plasma velocity vector components. This method is described in full in this paper: Shore, R. M., Freeman, M. P., & Chisham, G. (2021). Datadriven basis functions for SuperDARN ionospheric plasma flow characterization and prediction. Journal of Geophysical Research: Space Physics, 126, e2021JA029272. https://doi.org/10.1029/2021JA029272. 
Temporal Coverage:  

Start Date  19970101 
End Date  20081231 
Spatial Coverage:  
Latitude  
Southernmost  60 
Northernmost  90 
Longitude  
Westernmost  180 
Easternmost  180 
Altitude  
Min Altitude  N/A 
Max Altitude  N/A 
Depth  
Min Depth  N/A 
Max Depth  N/A 
Location:  
Location  Ionosphere 
Detailed Location  Fregion 
Distribution:  

Distribution Media  Online Internet (HTTP) 
Distribution Size  294 MB 
Distribution Format  netCDF 
Fees  N/A 
Data Storage:  We present 8 NetCDFformatted datasets, named HybridPlasmaVelocityModesds##.nc, as follows. File ds01. Contains the location information for the twodimensional spatial bins used in this analysis. Coordinates are in the QuasiDipole reference frame. Contains 4 variables: bin_centroids_colatitude, bin_centroids_longitude, bin_limits_colatitude and bin_limits_longitude. The centroids are the locations of the centre of each bin, and the limits of the bins give the region over which the EOF prediction at the centroid is assumed to apply. There are 559 bins in the northern polar region, which is the area of focus for this analysis. They are ordered approximately by latitude, then longitude, but this does not always apply near the 0/360 degree longitude boundary. The temporal dependence of the Quasi Dipole coordinates  required to translate them to other coordinate systems  is supplied by the values in file ds02, described below. File ds02. Contains the temporal dependence for each analysis used in this study. The analyses are each one calendar month in length. This NetCDF file contains 144 variables, each named 'bin_times_YYYYMM', where YYYY is the year of the analysis, and MM is the month of the analysis. Each of the 144 variables is a set of 5minute average epochs (the rows are sequentially ordered in time), whose 6 columns are in the format [year, month, day, hour, minute, second]. Each timestamp is the centroid epoch of a 5minute span, within which the existent SuperDARN data were used to compute the median velocity for that epoch (these velocity values were used as input to the EOF modelling process). File ds03. Contains the temporal means of the binned, 5minmedian SuperDARN velocity data. These temporal means were removed from each monthly binned data set prior to the EOF analysis, and should be added back on for an accurate prediction of the plasma velocity. The file contains two sets of 144 variables, each named 'bin_means_YYYYMM_[component]', where YYYY is the year of the analysis, MM is the month of the analysis, and [component] indicates the cardinal direction component that the values pertain to. These two components are in QuasiDipole coordinates, and are either north or east. The units of the values in this file are m/s. Each variable is a vector of 559 rows and 1 column, and the order of the rows is the same as that of the variables in file ds01. File ds04. Contains the spatial eigenvectors of the 144 EOF analyses for 'model 1', which is tailored to work best in higher coverage data regions. The file contains 1440 sets of two variables, each named 'model_1_eig_s_YYYYMM_modeXX_[component]', where YYYY is the year of the analysis, MM is the month of the analysis, XX is the number of the mode that the eigenvector relates to (where the modes are ranked according to decreasing eigenvalue, with mode 01 corresponding to the largest eigenvalue), and [component] is the same information as given for data set ds03, described above. Each variable has 559 rows and 1 column. The row order is the same as that of the variables in data set ds01. The units of these data are nonphysical, but the product of a pair of spatial and temporal eigenvectors has units of m/s. File ds05. Contains the spatial eigenvectors of the 144 EOF analyses for 'model 2', which is tailored to work best in lower coverage data regions. The file contains 1440 sets of two variables, each named 'model_1_eig_s_YYYYMM_modeXX_[component]', where YYYY is the year of the analysis, MM is the month of the analysis, XX is the number of the mode that the eigenvector relates to (where the modes are ranked according to decreasing eigenvalue, with mode 01 corresponding to the largest eigenvalue), and [component] is the same information as given for data set ds03, described above. Each variable has 559 rows and 1 column. The row order is the same as that of the variables in data set ds01. The units of these data are nonphysical, but the product of a pair of spatial and temporal eigenvectors has units of m/s. File ds06. Contains the temporal eigenvectors of the 144 EOF analyses for 'model 1', which is tailored to work best in higher coverage data regions. The file contains 1440 variables, each named 'model_1_eig_t_YYYYMM_modeXX', where YYYY is the year of the analysis, MM is the month of the analysis, and XX is the number of the mode that the eigenvector relates to (where the modes are ranked according to decreasing eigenvalue, with mode 01 corresponding to the largest eigenvalue). Each variable has 1 column, and the same number (and order) of rows as the corresponding temporal variable in data set ds02. The units of these data are nonphysical, but the product of a pair of spatial and temporal eigenvectors has units of m/s. File ds07. Contains the temporal eigenvectors of the 144 EOF analyses for 'model 2', which is tailored to work best in lower coverage data regions. The file contains 1440 variables, each named 'model_2_eig_t_YYYYMM_modeXX', where YYYY is the year of the analysis, MM is the month of the analysis, and XX is the number of the mode that the eigenvector relates to (where the modes are ranked according to decreasing eigenvalue, with mode 01 corresponding to the largest eigenvalue). Each variable has 1 column, and the same number (and order) of rows as the corresponding temporal variable in data set ds02. The units of these data are nonphysical, but the product of a pair of spatial and temporal eigenvectors has units of m/s. File ds08. Contains the ''model choice' values to pick which of the two models should be chosen in each location for each month, to ensure the best accuracy of the infill solution. The accuracy was determined from which of models 1 or 2 had the highest prediction efficiency in regions where the data and the model infill coexisted for a given month (the prediction efficiencies for a given location were computed for each look direction for a given month, and then summed over all look directions for that location for that month  the model with the highest sum was picked as the best for that location, and this choice is assumed to apply for the whole month). The file contains 144 variables, each named 'hybrid_model_choice_YYYYMM', where YYYY is the year of the analysis, and MM is the month of the analysis. Each variable has 559 rows and 1 column. The row order is the same as that of the variables in data set ds01. The values of the variables are either 1 or 2, indicating that model 1 or 2 (respectively) is the best choice for that location for that month. The volumes of each of the above files are as follows: HybridPlasmaVelocityModesds01.nc: 80 kB HybridPlasmaVelocityModesds02.nc: 58 MB HybridPlasmaVelocityModesds03.nc: 1.3 MB HybridPlasmaVelocityModesds04.nc: 13 MB HybridPlasmaVelocityModesds05.nc: 13 MB HybridPlasmaVelocityModesds06.nc: 97 MB HybridPlasmaVelocityModesds07.nc: 97 MB HybridPlasmaVelocityModesds08.nc: 680 kB In all files, data null values are NaN. 