Model Climate Data

Author:Graziano Giuliani <ggiulian@ictp.it>

Model climate dataset starting from the Coupled Model Intercomparison Project Phase 5 are expected to be in netCDF format with a well defined convention on metadata, attributes and naming to allow easy intercomparison of the results. This because in time the number of tools and interface to netCDF data have outnumbered what is available for any other data format.

In particular, for netCDF formatted data, simple command line tools allow the user to operate on the data on the file level, creating even complex processing chains without the need to write the I/O part of the task.

The tools mostly work using the filosofy:

toolname [options] operator input-file.nc [output-file.nc]

Combining a series of operation with temporary intermediate files, and even mixing different tools, the user can produce new information from the data without actually being a skilled programmer.

Albeit the netCDF library contains very basic programs to handle transition from ASCII/XML to netcdf and format change between all the possible internal netCDF format, the most valuable set of tools come from independent third party projects.

Example of those are the netCDF operators (NCO)

http://nco.sourceforge.net

and the Climate Data Operators (CDO)

https://code.zmaw.de/projects/cdo

The first projects has produced a number of small tools with a common philosophy to perform simple operations on netCDF files, both at the level of attributes and data, plus a swiss army knife scriptable processors. The second project is a stand-alone monolithic program that can execute a list of operations in an internal pipeline from the input to the output file.

Exercices on netcdf/cdo and nco

We will now examine the capabilities offered by the netCDF library programs, the NCO and the CDO.

netcdf programs

The netCDF library provide the API user with three programs:

  1. ncdump which reads the binary format and writes the file content in ASCCI format (with XML or netCDF internal representation called CDL)
  2. ncgen which from the CDL ASCII representation creates the binary file or the C/FORTRAN/JAVA language code to create the same file
  3. nccopy which can translate a netCDF data file into any other supported internal library format, eventually adding filters or restructuring the data to allow different access schemes to be faster (chunking).

Simple example:

  1. Create a simple netCDF file starting with a text file.
netcdf simple{
 dimensions:
   i = 2 ;
   j = 2;
 variables:
   float var(j,i) ;
     var:comment = "this is a text comment " ;

     :file_description = "This is a test file" ;
 data:
  var = 1, 2, 3, 4 ;
}

The above text we will save in a text file called simple.cdl. We can check the syntax with:

ncgen simple.cdl
echo $?
0

Creating a binary file is easy:

ncgen -b simple.cdl

This creates the simple.nc file in the current directory. The code to create the file in c program can be created as easily as:

ncgen -lc simple.cdl

The netCDF binary file structure can then be dumped to XML format with:

ncdump -x simple.nc

while we can extract data values with:

ncdump -fc simple.nc

with also C notations for the position of the values in the data matrix. Informations about the file internal format can be extracted with:

ncdump -k simple.nc

We next modify the internal file type to the new netCDF4 format using

nccopy -k 4 simple.nc simple_nc4.nc

and test the new format by

ncdump -k simple_nc4.nc

The new format adds new “hidden” attributes that can be seen with the

ncdump -s simple_nc4.nc

One simple usage for the nccopy is to compress data using the GNU ZIP with:

nccopy -k 4 -s -d 9 simple.nc simple_nc4.nc

This usually reduces the data storage requirements by one/third.

nco basic tutorial

This below taken from http://ccr.aos.wisc.edu/climate_modeling/modeling/data_processing/nco/index.php and slightly adapted for us. Still to be reviewed

What is NCO and why should I use it?

NCO is a group of programs that allows you to manipulate netCDF files. You can edit file attributes, rename variables, extract variables for certain dimensions (such as a given time slice or geographical area), and create ensembles and averages. You can do much of the same things with NCL, but NCO is much easier to use. The commands are given at the unix command prompt (in our examples, plum-%), or they can be strung together in a shell script, depending on the amount of manipulation that needs to be done. The official documentation for NCO can be found here. We’ve put together some examples here using some CCR files. In all cases, the original data file is available for download.

How to use NCO and these examples: first, make a back-up copy of any files you wish to process. NCO will usually give you an error message if something goes wrong, but it’s always best to have a copy of your unaltered data. On this page, all commands are shown in red type.

NCEA

Ncea creates ensemble averages. It is generally used when you have several years of data, and you want to create an ensemble, or average, of those years (i.e., going from individual year files to mone files).

Example: you have five files, each of which have twelve months of data in them (Jan-Dec). You want to create an ensemble (average) of these five years.

If the files are: (click on them to download):

ha.F1.PBC_TPBOX.00510.nc ha.F1.PBC_TPBOX.00511.nc ha.F1.PBC_TPBOX.00512.nc ha.F1.PBC_TPBOX.00513.nc ha.F1.PBC_TPBOX.00514.nc

Then the command would be:

plum-% ncea -n 5,5,1 ha.F1.PBC_TPBOX.00510.nc ha.F1.PBC_TPBOX.510_514.nc

where

5 is the number of files in the ensemble 5 is the digit_number, the fixed number of numeric digits comprising the numeric_suffix (00510, 00511, 00512, etc.) 1 is the numeric_increment, the constant, integer-valued difference between the numeric_suffix of any two consecutive files ha.F1.PBC_TPBOX.00510.nc is the first input file in the series ha.F1.PBC_TPBOX.510_514.nc is the output file name NCEA can also be used to create seasonal files from monthly files. For example, if you have three files representing March (moni03.nc), April (moni04.nc), and May (moni05.nc), you can create a file that is the average of those three files - a spring average (March-April-May).

The command would be:

plum-% ncea -n 3,2,1 moni03.nc mami.nc

As before, 3 is the number of files, 2 is the number of digits in the file name, and 1 is the increment.

NCRCAT

Ncrcat concatenates (glues together) files across the record dimension, which is usually time. It can be used to create files that contain multiple years of the same month or season, such as “anni” or “juli” files. It may also be used to glue individual year files into a single file, thus creating a “moni” file.

Example: you have five files, each of which have twelve months of data in them (Jan-Dec). You want to glue them together to create a single file that will contain all the months for five years - a “moni” file. Therefore instead of having five files with twelve months each, you will end up with a file with 60 time steps (5 years of 12 months = 60 months).

Using the files from the previous example, the command would be:

plum-% ncrcat -n 5,5,1 ha.F1.PBC_TPBOX.00510.nc ha.F1.PBC_TPBOX.moni.nc

The ‘-n 5,5,1’ convention is the same as for ncea listed above. If you’re curious, you can download and look at the new file, ha.F1.PBC_TPBOX.moni.nc, or view the header information for the file.

Example: you have five files, each of which have twelve months of data in them (Jan-Dec). You want to create files that contain 5 years of January, 5 years of February ... 5 years of December (i.e., jani,febi,...deci files). You also want to create an “anni” file (5 years of annual average). First we create the individual months files, then the anni file. Using the same files as the above example, the command to create the january individual year file (jani-we call it “moni01”) is:

plum-% ncrcat -F -d time,1,,12 -n 5,5,1 ha.F1.PBC_TPBOX.00510.nc moni01.nc

To create the febi (“moni02”) file, you would change the increment of time to 2a:

plum-% ncrcat -F -d time,2,,12 -n 5,5,1 ha.F1.PBC_TPBOX.00510.nc moni02.nc

And so on. Once you have made the twelve individual files, (moni01.nc through moni12.nc), you can create an anni file using the ncea command:

plum-% ncea -p -n 12,2,1 moni01.nc anni.nc

Notice that the file input command “-n 12,2,1” has changed, since we now have 12 input files (moni01-mon12), and each file name has two digits (moni01.nc-moni12.nc).

NCKS

Ncks - or nc kitchen sink - is probably the most useful of the NCO operators. You can use it to extract a variable from a file, or to extract multiple variables, or to get variables at certain times or geographical areas. It’s especially helpful for when you are moving files from NCAR to CCR. You can extract just the variables you want, making the file sizes much smaller and more manageable.

Example: extract a single variable from a netCDF file:

plum-% ncks -v PRECTMM GEN2AGCM-CONB-MONE0509.nc \
         GEN2AGCM-CONB-MONE0509.prectmm.nc

Example: extract multiple variables from a netCDF file:

plum% ncks -v PRECTMM,PSLEVMB,CLOUD GEN2AGCM-CONB-MONE0509.nc \
          GEN2AGCM-CONB-SEAE0509.3vars.nc

Example: extract a single variable from a netCDF file for the month of June from a mone file.

Note: NCO counts from zero. To count from one, use the -F flag in any operation.

plum-% ncks -d time,5 -v PRECTMM GEN2AGCM-CONB-MONE0509.nc \
          GEN2AGCM-CONB-MONE0509.prectmm.may.nc

You would get the same result using the following:

plum-% ncks -F -d time,6 -v PRECTMM GEN2AGCM-CONB-MONE0509.nc \
          GEN2AGCM-CONB-MONE0509.prectmm.may.nc

NCDIFF

Ncdiff allows you to two subtract one netCDF file from another and create a new file with the resulting deltas.

Example: Create a difference file from two yearly FOAM files.:

plum-% ncdiff ha.F1.PBC_TPBOX.00511.nc ha.F1.PBC_TPBOX.00510.nc \
              ha.F1.PBC_TPBOX.00511-00510.nc

In the example, we are subtracting the file ha.F1.PBC_TPBOX.00510.nc from the file ha.F1.PBC_TPBOX.00511.nc and creating a new file that will contain the differences, ha.F1.PBC_TPBOX.00511-00510.nc.

You can download the three files and examine the output for yourself:

ha.F1.PBC_TPBOX.00510.nc
ha.F1.PBC_TPBOX.00511.nc
ha.F1.PBC_TPBOX.00511-00510.nc

NCRENAME

Ncrename is used to rename variables in a netCDF file.

Example: In many CCR files, precipitation-evaporation is named P-E. However, GrADS and many other software packages will not recognize variables with a minus sign in them. So in order to look at P-E, it must be renamed.

The following command will change the name P-E to PME. The file GEN2AGCM-CONB-MONE0509.rename.nc will be identical to GEN2AGCM-CONB-MONE0509.nc except that P-E will be named PME. You can download the sample file here: GEN2AGCM-CONB-SEAE0509.nc. Here is the file metadata/header information (retrieved via ncdump -h): GEN2AGCM-CONB-SEAE0509.ncdump.txt.:

plum-% ncrename -v P-E,PME GEN2AGCM-CONB-MONE0509.nc \
        GEN2AGCM-CONB-MONE0509.rename.nc

If you wanted to, you could write over the original file and not create a new file.:

plum-% ncrename -v P-E,PME GEN2AGCM-CONB-MONE0509.nc

Ncrename can also be used to rename coordinate variables such as latitude and longitude. This is a little trickier because these latitude and longitude are both dimensions and variables within a file. Therefore, you have to rename both the dimension and and the variable.

Example: rename latitude and longitude to lat and lon, respectively, using the same file as above, GEN2AGCM-CONB-MONE0509.nc.:

plum-% ncrename -d longitude,lon -d latitude,lat \
            -v longitude,lon -v latitude,lat \
            GEN2AGCM-CONB-MONE0509.nc GEN2AGCM-CONB-SEAE|0509.rename2.nc

cdo command

A collection of command-line operators to manipulate and analyze climate and numerical weather prediction data; includes support for netCDF-3, netCDF-4 and GRIB1, GRIB2, and other formats.

The cdo command usage is in the form:

cdo [options] operator[,arg,arg...] [-operator] in.nc [in2.nc] [out.nc]

The options used only working with netCDF file are :

  • -a to convert from relative to absolute time axis
  • -r to convert from absolute to relative time axis
  • -f to select output format, and can be used to obtain netCDF4 compressed together with the -z zip[_1-9] flag
  • -m to set the default missing value (default: -9e+33)
  • -O to overwrite output

The cdo command provide more than 400 operators for the following topics:

  • File information and file operations
  • File operations
  • Selection and Comparision
  • Modification of meta data
  • Arithmetic operations
  • Statistical analysis
  • Regression and Interpolation
  • Vector and spectral Transformations
  • Formatted I/O
  • Climate indices

The alphabetical list of operators is really long:

We will give a look on how to use cdo to interpolate data to a different grid to compare model and observations and then extract statistical values from data.

We will start with the CRU (Climatic Research Unit) dataset, which is regridded monthly observation data for the period 1901-2011 for the Temperature, Precipitation and Evaporation variables.

We can have a look at the data with the cdo command:

cdo sinfoc cru_ts3.20.1901.2011.tmp.dat.nc
File format: netCDF
-1 : Institut Source  Table Code   Ttype   Dtype  Gridsize Num  Levels Num
 1 : unknown  unknown     0   -1   instant  F64    259200   1       1   1
Horizontal grids :
 1 : lonlat       > size      : dim = 259200  nlon = 720  nlat = 360
                    lon       : first = -179.75  last = 179.75  inc = 0.5  degrees_east  circular
                    lat       : first = -89.75  last = 89.75  inc = 0.5  degrees_north
Vertical grids :
 1 : surface                  : 0
Time axis :  1332 steps
RefTime =  1900-01-01 00:00:00  Units = days  Calendar = STANDARD
YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
1901-01-16 00:00:00  1901-02-15 00:00:00  1901-03-16 00:00:00  1901-04-16 00:00:00
[...]
2011-09-16 00:00:00  2011-10-16 00:00:00  2011-11-16 00:00:00  2011-12-16 00:00:00
cdo sinfoc: Processed 1 variable over 1332 timesteps ( 0.15s )

The cdo command gives use a sketchy view of the content, but is not able to identify the variable in the file. Let us get a description of the grid:

cdo griddes cru_ts3.20.1901.2011.tmp.dat.nc
#
# gridID 0
#
gridtype  = lonlat
gridsize  = 259200
xname     = lon
xlongname = longitude
xunits    = degrees_east
yname     = lat
ylongname = latitude
yunits    = degrees_north
xsize     = 720
ysize     = 360
xfirst    = -179.75
xinc      = 0.5
yfirst    = -89.75
yinc      = 0.5
cdo griddes: Processed 1 variable ( 0.00s )

Then we get the same information from a CMIP5 GCM model output:

cdo sinfoc tas_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc
File format: netCDF
-1 : Institut Source   Param       Ttype   Dtype  Gridsize Num  Levels Num
 1 : unknown  unknown  -1          instant  F32     27840   1       1   1
Horizontal grids :
 1 : lonlat       > size      : dim = 27840  nlon = 192  nlat = 145
                    lon       : first = 0  last = 358.125  inc = 1.875  degrees_east  circular
                    lat       : first = -90  last = 90  inc = 1.25  degrees_north
                    available : xbounds ybounds
Vertical grids :
 1 : surface                  : 0
Time axis :  252 steps
 RefTime =  1859-12-01 00:00:00  Units = days  Calendar = 360DAYS  Bounds = true
 YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
 1984-12-16 00:00:00  1985-01-16 00:00:00  1985-02-16 00:00:00  1985-03-16 00:00:00
 [...]
 2005-08-16 00:00:00  2005-09-16 00:00:00  2005-10-16 00:00:00  2005-11-16 00:00:00
 cdo sinfo: Processed 1 variable over 252 timesteps ( 0.02s )

and the same for the griddes. Next we will regrid one to the other, chosing a conservative approach and selecting the ten year time period 1990-2000

cdo griddes cru_ts3.20.1901.2011.tmp.dat.nc > crugrid.txt
cdo remapcon,crugrid.txt -selyear,1990,2000 \
  tas_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc hadgem.nc
cdo selyear,1990,2000 cru_ts3.20.1901.2011.tmp.dat.nc cru.nc

The we will extract the mean seasonal average from the two datasets and we will take their difference.

cdo yseasavg hadgem.nc hadgem_seasonal_1990-2000.nc
cdo yseasavg cru.nc cru_seasonal_1990-2000.nc
cdo chname,tmp,tas -addc,273.15 cru_seasonal_1990-2000.nc \
        cru_rename_seasonal_1990-2000.nc
cdo sub hadgem_seasonal_1990-2000.nc \
        cru_rename_seasonal_1990-2000.nc comparison.nc

We can now use a graphical program to plot the difference in between the HadGEM2 model output and the CRU dataset.