Climate Data¶

Author:	Graziano Giuliani <ggiulian@ictp.it>

What is Climate Science¶

The study of Climate was born as a science in the last century from the growing evidence of the human origin of the rapid increase in the global average temperatures registered in last 60 years. Given also the fact that the global climate change is an ongoing phenomenon and that human activities can enhance or mitigate its effects in the near future, a consistent effort is put by government and institutions to asses the possible impacts and required policies.

This decision taking process must be given the best available information on the expected change: scientists have a role in the process as brokers of information.

Datasets¶

The information needed can be divided in:

Past data to reconstruct the history of the climate of the whole Planet in the last 12000 years. This data are in the fom of
- Measurements from proxies such as tree rings, pollen, polar ice sheets drillings, etc.
- Historical documents from the different human cultures
- Reconstruction of landscapes from paintings, images
- Hand recorded measurements taken from instruments in the last 400 years
Records of the near past weather informations collected for short term forecasts in the last 50 years. This data are in the form of
- Digitized measurements from instruments both at the surface and on vertical profiles in the atmosphere or in the depth for oceans
- Satellite measurements for the last 30 years
- Economical, population and emission dataset
- Geological and Ecological data to best describe the Ecosystem
Future scenarioes data from models to evaluate the expected change for mitigation purposes or the outcome of possible policies on the change.

The big amount of data expecially for the last two points above require a big effort to establish standards for data exchange between different organizations and different comunities.

http://www.realclimate.org/index.php/data-sources

The Magic Words here are COMMUNITY CONTROLLED STANDARDS.

Let’s go with an example:

In country A a government organization has a daily temperature timeseries of measurements from an instrument to share. But its national convention is to register the values in degrees Fahrenheit and the data are digitally registerd on magnetic support in a binary readable format where the date is expressed as the three values DDMMYY for day,month,year in the local calendar

The organiztion in country B receives the binary data files, along with a text document. The usage of the data then needs the intervention of a skilled computer programmer to be read them, change the unit of measure, fix the correct date to merge the measures in a larger dataset.

Repeat this for each measure/instrument/satellite/model data and multiply by a thousand the number of involved organizations, possibly throwing in commercial lock ins of profit organization.

IF a STANDARD for data storage and transmission can be respected, then the global knowledge coming from the data can be much easily used, empowering researchers worldwide and freeing their time to be given to the final scientific goal.

The Format and the Conventions¶

In the Climate and Forecast community the netCDF is the data format, and the CF conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data.

This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.

The document describing the standard is maintained and enhanced by free contributions from users through a Mailing list and a Trac interface at

http://cf-pcmdi.llnl.gov

The Program for Climate Model Diagnosis and Intercomparison is hosted in the Lawrence Livermore National Laboratory in San Francisco (US) from 1989, and was given the task to find an easy way to compare the results of different climate models.

netCDF data format¶

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The NetCDF was developed and is maintained at Unidata, part of the University Corporation for Atmospheric Research (UCAR) Community Programs (UCP).

http://www.unidata.ucar.edu/software/netcdf

Unidata is funded primarily by the National Science Foundation.

The file is structured on a large view as a file system with a root Group, which can contain in a hierarchical mode other groups. Each group can contain three different entitities:

Dimension which define the dimensions of any variable in all sub-groups
Variables which contain data and can be of pre-defined types or any user defined type.
Attributes which define metadata to be relevant at any group or variable level.

The library provides a C, C++ and Java language interfaces to access the data, on top of which all other possible programming languages interfaces are implemented.i

An easily extendable layer of filtering can be applied to the actual data before writing them on a a file system, and the library gives the programmer a clear set of functions in a well engineered API to store/retrieve data and metadata.

CF Conventions¶

On top of the capabilities offered by the netCDF format, a Convention among different data production entities has been established to ease the sharing of data.

The convention mostly defines the type of informations to be provided in the metadata, the name and the recognized unit of measures of climate relevant bariables, the way to code relations and ancillary informations to allow the user write applications to operate on the data.

In particular basic informations are coded to ease:

Identificate the Origin of the data, to answer question what,where,when and how on the data in the file.
Standard Name to identify which geophysical variable is in the file, regardless of the application specific name given to it.
Unit of Measure of reference for each variable, along with a set of allowed conversion interfaces
Geolocation attributes which allow standard views or projections to be easily identified and allow regridding capabilities to be built on top.
Time calendars and units to allow the user to fix the point in time where the information is valid
Spatial and temporal bounds to define the spatio-temporal grid on which the information is valid
Packing and reduction of the original data
Climatological statistics which were applied to the original data

The advantages coming from the enhanced capabilities given by the conformity to the standard at the application level must obviously be grater then the effort required to implement conformity to the standard at the data generation level.