Surface UV irradiances and doses derived from remotely sensed data acquired with instruments on satellites are important as ground level measurements of these variables are rather sparse, with a much lower spatial resolution than temperature and rainfall. It is frequent to rely on remote sensing when studying effects of solar UV radiation on human, animal and plant health at a large scale. This package supports reading of data from files downloaded from open-access internet servers. These data have very few restrictions to their use, in most cases only the requirement to cite the source.
The main Surface UV data product supported is “Surface UV” from the AC SAF (atmospheric composition) project of EUMETSAT. In addition, “Surface UV” data from the OMI/Aura project hosted at NASA are supported giving access to additional variables. In each case, two file formats are supported. Currently, data are read much faster from HDF5 files than from the other formats.
O3M SAF Offline surface ultraviolet radiation product
(OUV). Data files from the Surface UV offline data product of
AC SAF EUMETSAT can be downloaded from the server of the Finnish
Meteorological Institute (FMI) EUMETSAT AC
SAF website. Two different data “ingestion” (sUV_read_
)
functions cater for two different types of files: HDF5 files with data
on a geographic grid and one file per day, and text (.txt) files with
time series data for a single geographic location. For more information
on these and other related data products, please, see the EUMETSAT AC SAF website.
OMI/Aura Surface UVB Irradiance and Erythemal Dose Daily L3
Global Gridded 1.0 degree x 1.0 degree V3 (OMUVBd). The
OMI/Aura Surface UV offline data are available through NASA and can be
downloaded from the NASA
EARTHDATA website as NetCDF4 files with the possibility of
sub-setting before download. HDF5 files are also available for download
without possibility of sub-setting. In both cases the data are provided
as one file per day on a geographic grid basis. Two different data
“ingestion” (sUV_read_
) functions cater for the two
different types of files. The OMI/Aura Surface UV offline data are
available through NASA and can be downloaded from the NASA
EARTHDATA website.
Note: HDF5 files are binary and contain data and metadata organized hierarchically. NetCDF4 files make use of a subset of the features available in HDF5, and also comply with the HDF5 format. The functions in package ‘surfaceuv’ call functions from package ‘rhdf5’ (part of Bioconductor) to read these files. The wrapper functions provide a simplified user interface based on the known structure of the files used to distribute specific Surface UV radiation data sets.
The AC SAF project of EUMETSAT on atmospheric composition provides several different data products, including “Surface UV”, for ultraviolet radiation doses and irradiances. The AC SAF “Surface UV” data are on a 0.5∘ × 0.5∘ longitude E and latitude N grid. That is to say that latitudes south of the equator and longitudes west of Greenwich are expressed as negative numbers. The data consist in several different variables, both daily doses and daily maximum irradiances, biologically weighted and not weighted, estimate errors for them as well as quality flags.
Data can be downloaded in two different formats, suitable for
different uses. The most efficient way of downloading data for a single
location is to download then as a time series in a text file.
When we intend to download data for a region or the whole Earth gridded
data are the right approach. These are available as binary files in HDF5
format, that are smaller and allow selective reading of variables and
grid regions, allowing fast “ingestion” of the data. This package
provides functions that make it easy to selectively import data in
either format into R data frames. Please, see the online-only
article for step-by-step instructions on how to obtain the data files
from the server. This must be done at a web page as there is no API
available and in addition the files can be downloaded only after a
delay, as preparation is involved processing in the server. However,
multiple gridded files are bundled into a single .zip
file
making downloading them very easy.
The files are text files with a header protected with #
as comment marker and the data are in aligned columns separated by one
space character. The column names are not stored as column headings, but
instead in the header of the file, one variable per row. Thus, decoding
the file header is key to the interpretation of the data, while reading
the data is simple, although setting the correct R classes to the
different variables is also important. The top of the file we will use
in the examples is shown below.
#AC SAF offline surface UV, time-series
#OUV EXTRACTOR VERSION: 1.20
#LONGITUDE: 25.000 (0-based index 410)
#LATITUDE: 60.000 (0-based index 300)
#COLUMN DEFINITIONS
#0: Date [YYYYMMDD]
#1: DailyDoseUva [kJ/m2]
#2: DailyDoseUvb [kJ/m2]
#3: DailyMaxDoseRateUva [mW/m2]
#4: DailyMaxDoseRateUvb [mW/m2]
#5: QC_MISSING
#6: QC_LOW_QUALITY
#7: QC_MEDIUM_QUALITY
#8: QC_INHOMOG_SURFACE
#9: QC_POLAR_NIGHT
#10: QC_LOW_SUN
#11: QC_OUTOFRANGE_INPUT
#12: QC_NO_CLOUD_DATA
#13: QC_POOR_DIURNAL_CLOUDS
#14: QC_THICK_CLOUDS
#15: QC_ALB_CLIM_IN_DYN_REG
#16: QC_LUT_OVERFLOW
#17: QC_OZONE_SOURCE
#18: QC_NUM_AM_COT
#19: QC_NUM_PM_COT
#20: QC_NOON_TO_COT
#21: Algorithm version
#DATA
20240501 1.224e+03 1.558e+01 3.932e+04 6.628e+02 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 2.2
20240502 1.235e+03 1.648e+01 3.951e+04 6.974e+02 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 2.2
20240503 9.368e+02 1.345e+01 3.224e+04 5.871e+02 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 2.2
... etc.
All functions in the package are vectorised for parameter
files
and can read one or more files in a single call to
any of them. However, in our first example we read a single.
We fetch the path to an example file included in the package, originally downloaded from the FMI server. The grid point closest to Viikki, Helsinki, Finland. In normal use this step is unnecessary as the user will already know the folder where the files to be read are located.
one.file.name <-
system.file("extdata", "AC_SAF-Viikki-FI-6masl.txt",
package = "surfaceuv", mustWork = TRUE)
Query functions make it possible to find out the names of the variables contained in a file and the coordinates of location corresponding to the time series data. They are very useful as these depend on what is requested at the time the file was downloaded.
sUV_vars_OUV_txt(one.file.name)
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
#> [4] "DailyMaxDoseRateUva" "DailyMaxDoseRateUvb" "QC_MISSING"
#> [7] "QC_LOW_QUALITY" "QC_MEDIUM_QUALITY" "QC_INHOMOG_SURFACE"
#> [10] "QC_POLAR_NIGHT" "QC_LOW_SUN" "QC_OUTOFRANGE_INPUT"
#> [13] "QC_NO_CLOUD_DATA" "QC_POOR_DIURNAL_CLOUDS" "QC_THICK_CLOUDS"
#> [16] "QC_ALB_CLIM_IN_DYN_REG" "QC_LUT_OVERFLOW" "QC_OZONE_SOURCE"
#> [19] "QC_NUM_AM_COT" "QC_NUM_PM_COT" "QC_NOON_TO_COT"
#> [22] "Algorithm version"
To skip the quality control (QC
) flags from the listed
variables we can add keep.QC = FALSE
to the call above.
sUV_vars_OUV_txt(one.file.name, keep.QC = FALSE)
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
#> [4] "DailyMaxDoseRateUva" "DailyMaxDoseRateUvb" "Algorithm version"
The geographic coordinates of the location are returned based on the file header.
The variables included in downloaded files can be chosen when the
request is submitted on-line. The default in function
sUV_read_OUV_txt()
is to read all the variables present in
the file.
summer_viikki.tb <- sUV_read_OUV_txt(one.file.name)
dim(summer_viikki.tb)
#> [1] 153 22
colnames(summer_viikki.tb)
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
#> [4] "DailyMaxDoseRateUva" "DailyMaxDoseRateUvb" "QC_MISSING"
#> [7] "QC_LOW_QUALITY" "QC_MEDIUM_QUALITY" "QC_INHOMOG_SURFACE"
#> [10] "QC_POLAR_NIGHT" "QC_LOW_SUN" "QC_OUTOFRANGE_INPUT"
#> [13] "QC_NO_CLOUD_DATA" "QC_POOR_DIURNAL_CLOUDS" "QC_THICK_CLOUDS"
#> [16] "QC_ALB_CLIM_IN_DYN_REG" "QC_LUT_OVERFLOW" "QC_OZONE_SOURCE"
#> [19] "QC_NUM_AM_COT" "QC_NUM_PM_COT" "QC_NOON_TO_COT"
#> [22] "Algorithm version"
The returned data frame has 153 rows (= days) and 22 columns
(variables). We can see above that several of the variables have names
starting with “QC” for quality control. It is good to check them, as
even though bad data are set as NA
, some of these flags
also report weaknesses of the estimates that even if not fatal can be
important.
The class of the different columns varies, while dose and irradiance
data are stored as numeric
values, the “QC” variables are
stored in the data frame as integer
values. The dates are
stored in a variable of class Date
and the
Algorithm version
as character
.
str(lapply(summer_viikki.tb, class))
#> List of 22
#> $ Date : chr "Date"
#> $ DailyDoseUva : chr "numeric"
#> $ DailyDoseUvb : chr "numeric"
#> $ DailyMaxDoseRateUva : chr "numeric"
#> $ DailyMaxDoseRateUvb : chr "numeric"
#> $ QC_MISSING : chr "integer"
#> $ QC_LOW_QUALITY : chr "integer"
#> $ QC_MEDIUM_QUALITY : chr "integer"
#> $ QC_INHOMOG_SURFACE : chr "integer"
#> $ QC_POLAR_NIGHT : chr "integer"
#> $ QC_LOW_SUN : chr "integer"
#> $ QC_OUTOFRANGE_INPUT : chr "integer"
#> $ QC_NO_CLOUD_DATA : chr "integer"
#> $ QC_POOR_DIURNAL_CLOUDS: chr "integer"
#> $ QC_THICK_CLOUDS : chr "integer"
#> $ QC_ALB_CLIM_IN_DYN_REG: chr "integer"
#> $ QC_LUT_OVERFLOW : chr "integer"
#> $ QC_OZONE_SOURCE : chr "integer"
#> $ QC_NUM_AM_COT : chr "integer"
#> $ QC_NUM_PM_COT : chr "integer"
#> $ QC_NOON_TO_COT : chr "integer"
#> $ Algorithm version : chr "character"
As bad data values are filled with NA
in the
measured/derived variables, a smaller data frame can be obtained by not
reading the QC
(quality control) variables.
summer_viikki_QCf.tb <-
sUV_read_OUV_txt(one.file.name, keep.QC = FALSE)
dim(summer_viikki_QCf.tb)
#> [1] 153 6
colnames(summer_viikki_QCf.tb)
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
#> [4] "DailyMaxDoseRateUva" "DailyMaxDoseRateUvb" "Algorithm version"
In some cases we may want to read only specific variables out of the
file. This is possible by passing the names of the variables as an
argument through parameter vars.to.read
. In the example
below we read only two data variables plus Date
that is
always included.
# read two variables
summer_viikki_2.tb <-
sUV_read_OUV_txt(one.file.name,
vars.to.read = c("DailyDoseUva", "DailyDoseUvb"))
dim(summer_viikki_2.tb)
#> [1] 153 3
colnames(summer_viikki_2.tb)
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
To read variables based on name matching we can first retrieve the
names of all variables, and select some of them with grep()
before passing them to parameter vars.to.read
as
argument.
# read UV-A and UV-B variables and QC flags
uvb.vars <- grep("uvb$|uva$|^QC_",
sUV_vars_OUV_txt(one.file.name), value = TRUE, ignore.case = TRUE)
summer_viikki_3.tb <-
sUV_read_OUV_txt(one.file.name,
vars.to.read = uvb.vars)
dim(summer_viikki_3.tb)
#> [1] 153 21
colnames(summer_viikki_3.tb)
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
#> [4] "DailyMaxDoseRateUva" "DailyMaxDoseRateUvb" "QC_MISSING"
#> [7] "QC_LOW_QUALITY" "QC_MEDIUM_QUALITY" "QC_INHOMOG_SURFACE"
#> [10] "QC_POLAR_NIGHT" "QC_LOW_SUN" "QC_OUTOFRANGE_INPUT"
#> [13] "QC_NO_CLOUD_DATA" "QC_POOR_DIURNAL_CLOUDS" "QC_THICK_CLOUDS"
#> [16] "QC_ALB_CLIM_IN_DYN_REG" "QC_LUT_OVERFLOW" "QC_OZONE_SOURCE"
#> [19] "QC_NUM_AM_COT" "QC_NUM_PM_COT" "QC_NOON_TO_COT"
In the case of time-series data one may want to read several files
for the same location, for example for different time periods at the
same location or for a few different locations for the same time period.
This can be achieved by passing to parameter files
a vector
of file names.
One consideration is that when reading multiple time-series files, we
must ensure that the variables we intend to read are present in all the
files. The presence of additional variables in only some files is not a
problem as the files are read selectively, and the name and position of
columns found from the header of each file. We can find out which
variables are present in all the files we intend to read as shown below,
relying on the default behavior of function
sUV_vars_OUV_txt()
. We first locate the folder where the
files are stored and then search for files with names matching a
pattern.
path.to.files <-
system.file("extdata",
package = "surfaceuv", mustWork = TRUE)
two.file.names <-
list.files(path.to.files, pattern = "*masl\\.txt$", full.names = TRUE)
We can find out if the two files contain data for the same location.
sUV_grid_OUV_txt(two.file.names)
#> Longitude Latitude
#> AC_SAF-Salar-Olaroz-AR-3900masl.txt -66.8 -23.5
#> AC_SAF-Viikki-FI-6masl.txt 25.0 60.0
We can ignore the quality control variables to keep the output simpler and easier to understand. A warning is issued because the files contain different sets of variables. By default the returned vector contains the names of the variables that are present in all files.
shared.variables <- sUV_vars_OUV_txt(two.file.names, keep.QC = FALSE)
#> Files contain different variables, applying 'intersect'.
In the case of reading multiple time series for different locations
it is important to include the geographic coordinates in the returned
data frame. The default is to include these coordinates when more than
one file is read in a single call to
sUV_read_OUV_txt()
.
two_locations.tb <-
sUV_read_OUV_txt(two.file.names, vars.to.read = shared.variables, keep.QC = FALSE)
dim(two_locations.tb)
#> [1] 519 8
colnames(two_locations.tb)
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
#> [4] "DailyMaxDoseRateUva" "DailyMaxDoseRateUvb" "Algorithm version"
#> [7] "Longitude" "Latitude"
The returned value is a single data frame with the concatenated contents of the files.
We can check the variables present in each file.
vars.ls <- lapply(two.file.names, FUN = sUV_vars_OUV_txt, keep.QC = FALSE)
names(vars.ls) <- basename(two.file.names)
vars.ls
#> $`AC_SAF-Salar-Olaroz-AR-3900masl.txt`
#> [1] "Date" "DailyDosePlant" "DailyDoseUva"
#> [4] "DailyDoseUvb" "DailyMaxDoseRatePlant" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb" "SolarNoonUvIndex" "Algorithm version"
#>
#> $`AC_SAF-Viikki-FI-6masl.txt`
#> [1] "Date" "DailyDoseUva" "DailyDoseUvb"
#> [4] "DailyMaxDoseRateUva" "DailyMaxDoseRateUvb" "Algorithm version"
Being the returned object an R data frame plotting and other
computations do not differ from the usual ones. One example follows
showing subsetting based on dates. In the time series there are
occasionally days missing data (NA
), and this may need to
be addressed.
We may be interested in computing the total UV-B dose accumulated over the duration of an experiment. There are different ways of doing this computation, here I use base R functions.
subset(summer_viikki_2.tb,
Date >= as.Date("2024-07-15") & Date < as.Date("2024-08-15")) |>
with(sum(DailyDoseUvb))
#> [1] 542.083
The file headers are copied to attribute “file.header”, in the form of a list with one character vector per file read.
attr(two_locations.tb, "file.header")
#> $`AC_SAF-Salar-Olaroz-AR-3900masl.txt`
#> [1] "#AC SAF offline surface UV, time-series"
#> [2] "#OUV EXTRACTOR VERSION: 1.20"
#> [3] "#LONGITUDE: -66.800 (0-based index 226)"
#> [4] "#LATITUDE: -23.500 (0-based index 133)"
#> [5] "#COLUMN DEFINITIONS"
#> [6] "#0: Date [YYYYMMDD]"
#> [7] "#1: DailyDosePlant [kJ/m2]"
#> [8] "#2: DailyDoseUva [kJ/m2]"
#> [9] "#3: DailyDoseUvb [kJ/m2]"
#> [10] "#4: DailyMaxDoseRatePlant [mW/m2]"
#> [11] "#5: DailyMaxDoseRateUva [mW/m2]"
#> [12] "#6: DailyMaxDoseRateUvb [mW/m2]"
#> [13] "#7: SolarNoonUvIndex [-]"
#> [14] "#8: QC_MISSING"
#> [15] "#9: QC_LOW_QUALITY"
#> [16] "#10: QC_MEDIUM_QUALITY"
#> [17] "#11: QC_INHOMOG_SURFACE"
#> [18] "#12: QC_POLAR_NIGHT"
#> [19] "#13: QC_LOW_SUN"
#> [20] "#14: QC_OUTOFRANGE_INPUT"
#> [21] "#15: QC_NO_CLOUD_DATA"
#> [22] "#16: QC_POOR_DIURNAL_CLOUDS"
#> [23] "#17: QC_THICK_CLOUDS"
#> [24] "#18: QC_ALB_CLIM_IN_DYN_REG"
#> [25] "#19: QC_LUT_OVERFLOW"
#> [26] "#20: QC_OZONE_SOURCE"
#> [27] "#21: QC_NUM_AM_COT"
#> [28] "#22: QC_NUM_PM_COT"
#> [29] "#23: QC_NOON_TO_COT"
#> [30] "#24: Algorithm version"
#> [31] "#DATA"
#>
#> $`AC_SAF-Viikki-FI-6masl.txt`
#> [1] "#AC SAF offline surface UV, time-series"
#> [2] "#OUV EXTRACTOR VERSION: 1.20"
#> [3] "#LONGITUDE: 25.000 (0-based index 410)"
#> [4] "#LATITUDE: 60.000 (0-based index 300)"
#> [5] "#COLUMN DEFINITIONS"
#> [6] "#0: Date [YYYYMMDD]"
#> [7] "#1: DailyDoseUva [kJ/m2]"
#> [8] "#2: DailyDoseUvb [kJ/m2]"
#> [9] "#3: DailyMaxDoseRateUva [mW/m2]"
#> [10] "#4: DailyMaxDoseRateUvb [mW/m2]"
#> [11] "#5: QC_MISSING"
#> [12] "#6: QC_LOW_QUALITY"
#> [13] "#7: QC_MEDIUM_QUALITY"
#> [14] "#8: QC_INHOMOG_SURFACE"
#> [15] "#9: QC_POLAR_NIGHT"
#> [16] "#10: QC_LOW_SUN"
#> [17] "#11: QC_OUTOFRANGE_INPUT"
#> [18] "#12: QC_NO_CLOUD_DATA"
#> [19] "#13: QC_POOR_DIURNAL_CLOUDS"
#> [20] "#14: QC_THICK_CLOUDS"
#> [21] "#15: QC_ALB_CLIM_IN_DYN_REG"
#> [22] "#16: QC_LUT_OVERFLOW"
#> [23] "#17: QC_OZONE_SOURCE"
#> [24] "#18: QC_NUM_AM_COT"
#> [25] "#19: QC_NUM_PM_COT"
#> [26] "#20: QC_NOON_TO_COT"
#> [27] "#21: Algorithm version"
#> [28] "#DATA"
Worldwide coverage consists in 720 × 360 = 2.592 × 105 grid points. As for time series, the number of data columns varies. However, one difference is that QC information is collected into a single variable. The format of the files is HDF5, which are binary files that allow selective reading. There are additional optimizations used to reduce the size, the main one is that the geographic coordinates of the grid points are not saved explicitly but instead the information needed to compute them is included as metadata. The data are provided as one file per day, with the size of the files depending on the number of grid points included as well as the number of variables. As these are off-line data available with a delay, in most cases we are interested in data for a certain period of time.
The HDF5 files have a specific format and content organization that
makes it possible to efficiently read subsets of the data they contain.
The functions in package ‘surfaceuv’ call functions from package ‘rhdf5’
to access these files. The column names are stored as metadata and can
be queried without reading the whole file. Thus, decoding is simpler
than for the time series files in text format. Reading the data is also
simple as it is stored as numeric values no requiring interpretation.
The dates, in contrast, need to be decoded from the file names, making
it crucial that if users rename the files they preserve the date string
and the _
at its boundary. The HDF5 files are contained in
a .zip
archive. The .zip
archives can be
freely renamed if desired.
As HDF5 gridded data files contain data for a single day, frequently we need to read several of them concatenating the data they contain. Anyway, in the first example we read a single file for simplicity.
As above for the time series file, we fetch the path to an example file included in the package, originally downloaded from the FMI server. It covers the whole of the Iberian peninsula and the Balearic islands. Only variables for UV-B and UV-A daily dose and daily maximum irradiances, not biologically weighted, were requested from the server. In normal use this step is unnecessary as the user will already know the folder where the file to be read is located.
one.file.name <-
system.file("extdata", "O3MOUV_L3_20240621_v02p02.HDF5",
package = "surfaceuv", mustWork = TRUE)
Two query functions make it possible to find out the names of the variables contained in a file and the coordinates of the grid.
sUV_vars_OUV_hdf5(one.file.name)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb" "QualityFlags"
By default only the boundaries of the grid are returned.
With defaults all variables are read, and because the data can
include multiple geographic grid points, Longitude
and
Latitude
are always returned in the data frame.
midsummer_spain.tb <- sUV_read_OUV_hdf5(one.file.name)
dim(midsummer_spain.tb)
#> [1] 221 8
colnames(midsummer_spain.tb)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb" "QualityFlags"
Variable names are consistent between the data frames returned by
sUV_read_OUV_hdf5()
and sUV_read_OUV_txt()
,
but the position of the columns, can vary. Use names rather than
numeric positional indices to extract columns!
str(lapply(midsummer_spain.tb, class))
#> List of 8
#> $ Date : chr "Date"
#> $ Longitude : chr "numeric"
#> $ Latitude : chr "numeric"
#> $ DailyDoseUva : chr "numeric"
#> $ DailyDoseUvb : chr "numeric"
#> $ DailyMaxDoseRateUva: chr "numeric"
#> $ DailyMaxDoseRateUvb: chr "numeric"
#> $ QualityFlags : chr [1:2] "AsIs" "integer64"
Quality control information is encoded differently in the two types
of downloaded files. As seen above, in .txt
individual QC
variables, taking as values single-digit integer
values are
present. In the .HDF5
files the flags are collapsed into a
single 64 bit variable, that needs decoding. As R does not support 64
bit integers, the package returns the QualityFlags
variable
as an columns of class integre64
, a class defined in
package ‘bit64’.
We can as before read only specific variables if needed by passing
their names as argument to vars.to.read
.
We can read multiple files, with a limit to their maximum number imposed by the available computer RAM as data frames as used reside in RAM during computations. The amount of RAM required varies with the geographic area covered and number of variables read. In practice, this limit is unlikely to be a problem only with data with world-wide or continental coverage. Running time increases linearly with the number of files and roughly proportionally with the number of variables read, at least for large numbers of files with global coverage.
We fetch the paths to the example files included in the package. In
normal use, this step is not needed as the user will know the paths to
the files to read, or will use function list.files()
with a
search pattern if he/she knows the folder where the files to be read
reside. In normal use this step is unnecessary as the user will
already know the folder where the file to be read is located.
five.file.names <-
system.file("extdata",
c("O3MOUV_L3_20240620_v02p02.HDF5",
"O3MOUV_L3_20240621_v02p02.HDF5",
"O3MOUV_L3_20240622_v02p02.HDF5",
"O3MOUV_L3_20240623_v02p02.HDF5",
"O3MOUV_L3_20240624_v02p02.HDF5"),
package = "surfaceuv", mustWork = TRUE)
The only difference to the case of reading a single file is in the
length of the character vector containing file names. The different
files read in the same call to sUV_read_OUV_hdf5()
should
share identical grids and contain all the variables to be read (by
default all those in the first file read). If this is not the case,
currently sUV_read_OUV_hdf5()
should be used to read them
individually and later combined, which is a slower approach.
summer_5days_spain.tb <- sUV_read_OUV_hdf5(five.file.names)
dim(summer_5days_spain.tb)
#> [1] 1105 8
colnames(summer_5days_spain.tb)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb" "QualityFlags"
We can silence progress reporting.
As shown above for text files, query functions can be used to extract information about the files, in this case without reading them in whole.
sUV_vars_OUV_hdf5(five.file.names)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb" "QualityFlags"
sUV_vars_OUV_hdf5(five.file.names, keep.QC = FALSE)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb"
sUV_grid_OUV_hdf5(five.file.names, expand = TRUE) |>
head(10)
#> Longitude Latitude
#> 1 -10.75 35.25
#> 2 -10.25 35.25
#> 3 -9.75 35.25
#> 4 -9.25 35.25
#> 5 -8.75 35.25
#> 6 -8.25 35.25
#> 7 -7.75 35.25
#> 8 -7.25 35.25
#> 9 -6.75 35.25
#> 10 -6.25 35.25
sUV_date_OUV_hdf5(five.file.names)
#> O3MOUV_L3_20240620_v02p02.HDF5 O3MOUV_L3_20240621_v02p02.HDF5
#> "2024-06-20" "2024-06-21"
#> O3MOUV_L3_20240622_v02p02.HDF5 O3MOUV_L3_20240623_v02p02.HDF5
#> "2024-06-22" "2024-06-23"
#> O3MOUV_L3_20240624_v02p02.HDF5
#> "2024-06-24"
sUV_date_OUV_hdf5(five.file.names, use.names = FALSE)
#> [1] "2024-06-20" "2024-06-21" "2024-06-22" "2024-06-23" "2024-06-24"
If the files differ in the variables they contain, or we want to only
extract some, as shown above, we can select them with
grep()
or some other function once we have a list of their
names. (Of course, we can also simply pass the names of the variables as
an inline character vector in the call, as shown above for time-series
data files.)
all.vars <- sUV_vars_OUV_hdf5(five.file.names)
daily.dose.vars <- grep("^DailyDose", all.vars, value = TRUE)
doses_5days_spain.tb <-
sUV_read_OUV_hdf5(five.file.names, vars.to.read = daily.dose.vars)
dim(doses_5days_spain.tb)
#> [1] 1105 5
colnames(doses_5days_spain.tb)
#> [1] "Date" "Longitude" "Latitude" "DailyDoseUva" "DailyDoseUvb"
As memory is pre-allocated the set of variables to be read is decided
when or before the first file is read. If no argument is passed to
vars.to.read
, the default becomes the set of variables
found in the first file read. In contrast, if an argument is passed, it
determines the set of variables to read. If any variable in this set is
not found in a file, currently, an error is triggered.
We can use
r
sUV_vars_OUV_hdf5()to obtain a vector with the names of all the variables that are present consistently across all the files. This is the case with the default argument
set.oper
= “intersect”. Passing
“union”` lists all the variables
found in at least one of the files.
shared.variables <- sUV_vars_OUV_hdf5(gridded.files, keep.QC = FALSE)
#> Files contain different variables, applying 'intersect'.
shared.variables
#> [1] "Date" "Longitude" "Latitude" "DailyDoseUva" "DailyDoseUvb"
We can check the variables present in each file.
vars.ls <- lapply(gridded.files, FUN = sUV_vars_OUV_hdf5, keep.QC = FALSE)
names(vars.ls) <- basename(gridded.files)
vars.ls
#> $O3MOUV_L3_20240620_v02p02.HDF5
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb"
#>
#> $O3MOUV_L3_20240621_v02p02.HDF5
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb"
#>
#> $O3MOUV_L3_20240622_v02p02.HDF5
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb"
#>
#> $O3MOUV_L3_20240623_v02p02.HDF5
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb"
#>
#> $O3MOUV_L3_20240624_v02p02.HDF5
#> [1] "Date" "Longitude" "Latitude"
#> [4] "DailyDoseUva" "DailyDoseUvb" "DailyMaxDoseRateUva"
#> [7] "DailyMaxDoseRateUvb"
#>
#> $O3MOUV_L3_20241021_v02p02.HDF5
#> [1] "Date" "Longitude" "Latitude" "DailyDoseDna"
#> [5] "DailyDoseEry" "DailyDosePlant" "DailyDoseUva" "DailyDoseUvb"
#> [9] "DailyDoseVitd"
By default progress is reported when the function is called interactively. Thus, within the vignette they are not shown by default. We can force its display.
six.days.tb <-
sUV_read_OUV_hdf5(gridded.files,
vars.to.read = shared.variables,
keep.QC = FALSE,
verbose = TRUE)
#> Reading: O3MOUV_L3_20240620_v02p02.HDF5
#> Reading: O3MOUV_L3_20240621_v02p02.HDF5
#> Reading: O3MOUV_L3_20240622_v02p02.HDF5
#> Reading: O3MOUV_L3_20240623_v02p02.HDF5
#> Reading: O3MOUV_L3_20240624_v02p02.HDF5
#> Reading: O3MOUV_L3_20241021_v02p02.HDF5
#> Read 6 OUV grid-based HDF5 file(s) into a 56.6 kB data frame [1326 rows x 5 cols] in 0.079 secs
As you surely expect by now, the returned object is a data frame.
dim(six.days.tb)
#> [1] 1326 5
colnames(six.days.tb)
#> [1] "Date" "Longitude" "Latitude" "DailyDoseUva" "DailyDoseUvb"
head(six.days.tb)
#> Date Longitude Latitude DailyDoseUva DailyDoseUvb
#> 1 2024-06-20 -10.75 35.25 1161.042 27.79345
#> 2 2024-06-20 -10.25 35.25 1377.761 31.70621
#> 3 2024-06-20 -9.75 35.25 1451.853 33.52253
#> 4 2024-06-20 -9.25 35.25 1571.595 35.90476
#> 5 2024-06-20 -8.75 35.25 1543.349 34.08216
#> 6 2024-06-20 -8.25 35.25 1558.068 34.33302
As shown above, being the returned object an R data frame plotting and other computations do not differ from the usual ones.
Currently only a trace of the origin of the data is preserved in the
data frame object’s comment
attribute.
comment(six.days.tb)
#> [1] "Data imported from 6 HDF5 files: /tmp/RtmpngnTMQ/Rinst97d1ff57957/surfaceuv/extdata/O3MOUV_L3_20240620_v02p02.HDF5, /tmp/RtmpngnTMQ/Rinst97d1ff57957/surfaceuv/extdata/O3MOUV_L3_20240621_v02p02.HDF5, /tmp/RtmpngnTMQ/Rinst97d1ff57957/surfaceuv/extdata/O3MOUV_L3_20240622_v02p02.HDF5, /tmp/RtmpngnTMQ/Rinst97d1ff57957/surfaceuv/extdata/O3MOUV_L3_20240623_v02p02.HDF5, /tmp/RtmpngnTMQ/Rinst97d1ff57957/surfaceuv/extdata/O3MOUV_L3_20240624_v02p02.HDF5, /tmp/RtmpngnTMQ/Rinst97d1ff57957/surfaceuv/extdata/O3MOUV_L3_20241021_v02p02.HDF5."
The individual variables also have metadata, including units of expression, as attributes. These are copied from the attributes in the first file read, which is safe, as the downloaded data are always consistent in this respect.
Kujanpää, J. (2019) PRODUCT USER MANUAL Offline UV Products v2 (IDs: O3M-450 - O3M-464) and Data Record R1 (IDs: O3M-138 - O3M-152). Ref. SAF/AC/FMI/PUM/001. 18 pp. EUMETSAT AC SAF.
The OMI/Aura Surface UVB Irradiance and Erythemal Dose Daily L3 Global Gridded 1.0 degree x 1.0 degree V3 (OMUVBd) data can be accessed as well as the documentation through NASA’s EARTHDATA GES DISC webpage. Compared to the AC SAF Surface UV data, these data are based on an older algorithm and provide lower spatial resolution. The OMI/Aura “Surface UV” data are on a 1∘ × 1∘ longitude E and latitude N grid. However, although this data set lacks some very useful variables, it includes solar-noon-time irradiances for specific UV wavelengths and cloud depth estimates not included in the AC SAF data.
Data can be downloaded in different formats, but always on a grid and
one file per day: they are not available as time series. The original
HDF5 files are available for download only with global coverage. Subsets
of data can be downloaded in NetCDF4 formatted files or as text files.
The files to be downloaded are not bundled into .zip
files
but instead have to be downloaded individually. Individual files can be
downloaded through the web page but this is tedious when data for many
days are needed. It possible to download a list of links, and then use
wget
or curl
to download by batch. However,
this requires setting up the computer following the instructions
provided at the NASA EARTHDATA website.
This package provides functions that make it easy to selectively import data in both NetCDF4 and HDF5 formats into R data frames.
The functions to be used work similarly to those described above. The examples below are based on those above.
Worldwide coverage consists in 360 × 180 = 6.48 × 104 grid points. The actual number of grid points and data columns varies depending on what is selected for download. The format of the files is NetCDF4, which are binary files that allow selective reading. The data are provided as one file per day, with the size of the files depending on the number of grid points included as well as the number of variables. The data are available for download without delay. In most cases we are interested in data for a certain period of time rather than for a single day.
As NetCDF4 gridded data files contain data for a single day, frequently we need to read several of them concatenating the data they contain. Anyway, in the first example we read a single file for simplicity.
As above for HDF5 files, we fetch the path to an example file included in the package, originally downloaded from the NASA EARTHDATA server. These data cover a small region that includes Helsinki, Finland. A selection of variables were requested from the server, including several not available in the AC SAF Surface UV data product. In normal use this step is unnecessary as the user will already know the folder where the file to be read is located. (The file name is shortened compared to the original name.)
one.file.name <-
system.file("extdata", "OMI-Aura_L3-OMUVBd_2023m1001_v003.nc4",
package = "surfaceuv", mustWork = TRUE)
Two query functions make it possible to find out the names of the variables contained in a file and the coordinates of the grid.
sUV_vars_OMUVBd_nc4(one.file.name)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "CloudOpticalThickness" "ErythemalDailyDose" "ErythemalDoseRate"
#> [7] "Irradiance305" "Irradiance310" "Irradiance324"
#> [10] "Irradiance380" "UVindex"
By default only the boundaries of the grid are returned.
With defaults all variables are read, and because the data can
include multiple geographic grid points, Longitude
and
Latitude
are always returned in the data frame.
autumn_helsinki.tb <- sUV_read_OMUVBd_nc4(one.file.name)
dim(autumn_helsinki.tb)
#> [1] 9 11
colnames(autumn_helsinki.tb)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "CloudOpticalThickness" "ErythemalDailyDose" "ErythemalDoseRate"
#> [7] "Irradiance305" "Irradiance310" "Irradiance324"
#> [10] "Irradiance380" "UVindex"
Grid and date variables names are consistent between the data frames
returned by sUV_read_OUV_hdf5()
,
sUV_read_OUV_txt()
and sUV_read_OMUVBd_nc4()
.
For other variables the original names are preserved and may differ. The
position of columns may vary between functions and also between versions
of the package. Use names rather than numeric positional indices to
extract columns!
str(lapply(autumn_helsinki.tb, class))
#> List of 11
#> $ Date : chr "Date"
#> $ Longitude : chr "numeric"
#> $ Latitude : chr "numeric"
#> $ CloudOpticalThickness: chr "numeric"
#> $ ErythemalDailyDose : chr "numeric"
#> $ ErythemalDoseRate : chr "numeric"
#> $ Irradiance305 : chr "numeric"
#> $ Irradiance310 : chr "numeric"
#> $ Irradiance324 : chr "numeric"
#> $ Irradiance380 : chr "numeric"
#> $ UVindex : chr "numeric"
We can as before read only specific variables if needed by passing
their names as argument to vars.to.read
.
We can read multiple files, with a limit to their maximum number imposed by the available computer RAM as data frames as used reside in RAM during computations. The amount of RAM required varies with the geographic area covered and number of variables read. In practice, this limit is unlikely to be a problem only with data with world-wide or continental coverage.
We fetch the paths to the example files included in the package. In
normal use, this step is not needed as the user will know the paths to
the files to read, or will use function list.files()
with a
search pattern if he/she knows the folder where the files to be read
reside. In normal use this step is unnecessary as the user will
already know the folder where the file to be read is located.
path.to.files <-
system.file("extdata",
package = "surfaceuv", mustWork = TRUE)
three.file.names <- list.files(path.to.files, pattern = "\\.nc4$", full.names = TRUE)
The only difference to the case of reading a single file is in the
length of the character vector containing file names. The different
files read in the same call to sUV_read_OMUVBd_nc4()
should
share identical grids and contain all the variables to be read (by
default all those in the first file read). If this is not the case,
currently sUV_read_OMUVBd_nc4()
should be used to read them
individually and later combined, which is a slower approach.
three.days.helsinki.tb <- sUV_read_OMUVBd_nc4(three.file.names)
dim(three.days.helsinki.tb)
#> [1] 27 11
colnames(three.days.helsinki.tb)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "CloudOpticalThickness" "ErythemalDailyDose" "ErythemalDoseRate"
#> [7] "Irradiance305" "Irradiance310" "Irradiance324"
#> [10] "Irradiance380" "UVindex"
We can silence progress reporting.
As shown above for text files, query functions can be used to extract information about the files, in this case without reading them in whole.
sUV_vars_OMUVBd_nc4(three.file.names)
#> [1] "Date" "Longitude" "Latitude"
#> [4] "CloudOpticalThickness" "ErythemalDailyDose" "ErythemalDoseRate"
#> [7] "Irradiance305" "Irradiance310" "Irradiance324"
#> [10] "Irradiance380" "UVindex"
sUV_grid_OMUVBd_nc4(three.file.names, expand = TRUE)
#> Longitude Latitude
#> 1 24.5 58.5
#> 2 25.5 58.5
#> 3 26.5 58.5
#> 4 24.5 59.5
#> 5 25.5 59.5
#> 6 26.5 59.5
#> 7 24.5 60.5
#> 8 25.5 60.5
#> 9 26.5 60.5
sUV_date_OMUVBd_nc4(three.file.names)
#> OMI-Aura_L3-OMUVBd_2023m1001_v003.nc4 OMI-Aura_L3-OMUVBd_2023m1002_v003.nc4
#> "2023-10-01" "2023-10-02"
#> OMI-Aura_L3-OMUVBd_2023m1003_v003.nc4
#> "2023-10-03"
sUV_date_OMUVBd_nc4(three.file.names, use.names = FALSE)
#> [1] "2023-10-01" "2023-10-02" "2023-10-03"
If the files differ in the variables they contain, or we want to only
extract some, as shown above, we can select them with
grep()
or some other function once we have a list of their
names. (Of course, we can also simply pass the names of the variables as
an in-line character vector in the call.)
all.vars <- sUV_vars_OMUVBd_nc4(three.file.names)
daily.dose.vars <- grep("^Irradiance", all.vars, value = TRUE)
irardiances_helsinki.tb <-
sUV_read_OMUVBd_nc4(three.file.names, vars.to.read = daily.dose.vars)
dim(irardiances_helsinki.tb)
#> [1] 27 7
colnames(irardiances_helsinki.tb)
#> [1] "Date" "Longitude" "Latitude" "Irradiance305"
#> [5] "Irradiance310" "Irradiance324" "Irradiance380"
Worldwide coverage consists in 360 × 180 = 6.48 × 104 grid points. The actual number of grid points is always the same as there is no possibility to subset the files before download. The format of the files is HDF5, which are binary files that allow selective reading. The data are provided as one file per day, with the size of the files depending on the number of variables. The data are available for download without delay. In most cases we are interested in data for a certain period of time rather than for a single day.
As HDF5 gridded data files contain data for a single day, frequently we need to read several of them concatenating the data they contain. Anyway, in the first example we read a single file for simplicity.
As above for other files, we fetch the path to an example file included in the package, originally downloaded from the NASA EARTHDATA server. These data have global coverage and all variables. In normal use this step is unnecessary as the user will already know the folder where the file to be read is located.
path.to.files <-
system.file("extdata",
package = "surfaceuv", mustWork = TRUE)
one.file.name <-
list.files(path.to.files, pattern = "\\.he5$", full.names = TRUE)
Two query functions make it possible to find out the names of the variables contained in a file and the coordinates of the grid.
sUV_vars_OMUVBd_he5(one.file.name)
#> [1] "Date" "Longitude"
#> [3] "Latitude" "CSErythemalDailyDose"
#> [5] "CSErythemalDoseRate" "CSIrradiance305"
#> [7] "CSIrradiance310" "CSIrradiance324"
#> [9] "CSIrradiance380" "CSUVindex"
#> [11] "CloudOpticalThickness" "ErythemalDailyDose"
#> [13] "ErythemalDoseRate" "Irradiance305"
#> [15] "Irradiance310" "Irradiance324"
#> [17] "Irradiance380" "LambertianEquivalentReflectivity"
#> [19] "SolarZenithAngle" "UVindex"
#> [21] "ViewingZenithAngle"
By default only the boundaries of the grid are returned.
With defaults all variables are read, and because the data can
include multiple geographic grid points, Longitude
and
Latitude
are always returned in the data frame.
global_one_day.tb <- sUV_read_OMUVBd_he5(one.file.name)
dim(global_one_day.tb)
#> [1] 64800 21
colnames(global_one_day.tb)
#> [1] "Date" "Longitude"
#> [3] "Latitude" "CSErythemalDailyDose"
#> [5] "CSErythemalDoseRate" "CSIrradiance305"
#> [7] "CSIrradiance310" "CSIrradiance324"
#> [9] "CSIrradiance380" "CSUVindex"
#> [11] "CloudOpticalThickness" "ErythemalDailyDose"
#> [13] "ErythemalDoseRate" "Irradiance305"
#> [15] "Irradiance310" "Irradiance324"
#> [17] "Irradiance380" "LambertianEquivalentReflectivity"
#> [19] "SolarZenithAngle" "UVindex"
#> [21] "ViewingZenithAngle"
Grid and date variables names are consistent between the data frames
returned by sUV_read_OUV_hdf5()
,
sUV_read_OUV_txt()
and sUV_read_OMUVBd_nc4()
.
For other variables the original names are preserved and may differ. The
position of columns may vary between functions and also between versions
of the package. Use names rather than numeric positional indices to
extract columns!
str(lapply(global_one_day.tb, class))
#> List of 21
#> $ Date : chr "Date"
#> $ Longitude : chr "numeric"
#> $ Latitude : chr "numeric"
#> $ CSErythemalDailyDose : chr "numeric"
#> $ CSErythemalDoseRate : chr "numeric"
#> $ CSIrradiance305 : chr "numeric"
#> $ CSIrradiance310 : chr "numeric"
#> $ CSIrradiance324 : chr "numeric"
#> $ CSIrradiance380 : chr "numeric"
#> $ CSUVindex : chr "numeric"
#> $ CloudOpticalThickness : chr "numeric"
#> $ ErythemalDailyDose : chr "numeric"
#> $ ErythemalDoseRate : chr "numeric"
#> $ Irradiance305 : chr "numeric"
#> $ Irradiance310 : chr "numeric"
#> $ Irradiance324 : chr "numeric"
#> $ Irradiance380 : chr "numeric"
#> $ LambertianEquivalentReflectivity: chr "numeric"
#> $ SolarZenithAngle : chr "numeric"
#> $ UVindex : chr "numeric"
#> $ ViewingZenithAngle : chr "numeric"
We can as before read only specific variables if needed by passing
their names as argument to vars.to.read
.
As for other functions in the package,
sUV_read_OMUVBd_he5()
accepts a character
vector of file names to read and returns a single concatenated data
frame.
Jari Hovila, Antti Arola, and Johanna Tamminen (2013), OMI/Aura Surface UVB Irradiance and Erythemal Dose Daily L3 Global Gridded 1.0 degree x 1.0 degree V3, NASA Goddard Space Flight Center, Goddard Earth Sciences Data and Information Services Center (GES DISC).