Wednesday, August 1, 2012

Do you want to help with data discovery?

As was alluded to in an earlier posting here, NOAA's National Climatic Data Center has recently endeavored on an effort to discover and rescue a plethora of international holdings in hard copy in its basement and make them usable by the international science community. The resulting images of the records from the first chunk of these efforts have just been made available online. Sadly, it is not realistic at the present time to key these data so they remain stuck in a half-way house, available, tantalizingly so, but not yet truly usable.

So, if you want to undertake some climate sleuthing now is your moment to shine ...! The data have all been placed at . These consist of images at both daily and monthly resolution - don't be fooled by the daily in the ftp site address. If you find a monthly resolution data source you could digitize years worth of records in an evening.

Whether you wish to start with Angola ...

or Zanzibar ...

There is data for you to discover. As the Readme file says ...

The following are documents recently imaged through an effort at the 
National Climatic Data Center (NCDC). NCDC holds over 2000 cubic 
feet of foreign records in paper format of in-situ weather observations 
in the on-site physical archives. The data files included are digital 
photographs of records in this collection. NCDC federal and contract 
employees captured the images using NCDC-owned digital cameras. All 
photos are in JPEG format.

The images contain in-situ observations from all areas of the globe 
other than the United States. African observations were the initial 
geographic area of concentration in the imaging effort but have grown 
to include other continents and regions. The collection consists of 
weather observations that were taken almost exclusively between 1885 
and 1975. The text within the images are in various languages.

A worldwide community of scientists will benefit from this effort, 
which is part of a global effort to discover, scan and key missing 
in-situ data. The data from the images will eventually be added to 
integrated global datasets, including baseline datasets at NCDC.

This directory contains the following:
- 45 tar files containing images of the data
- An inventory file describing the files that were recently examined 
  at NCDC (the ones highlighted in yellow have been imaged)
- An example .csv file describing the preferable format of the data 
  if these images were to be digitized

So, if you want to do some discovery and recovery of data the opportunity is now there to do so. Any data submitted to the databank using the submission guidelines detailed here will be shared without restriction for use by anyone for any purpose. We would strongly encourage keying of all recorded meteorological parameters although clearly temperatures are essential.

Regardless of one's viewpoint there cannot be a downside to improving data availability if one wants to be able to make informed analyses and decisions, particularly so when that data has an unbroken chain of provenance back to the raw paper record. So, this really is an opportunity to provide something uniquely useful to scientists and the public around the world and to 'own' a chunk of the global climate record.

Any help gratefully received.


  2. I reposted this post at variable variability and Marion Delgado asked here: "How will the recipients check the data?"

  3. Well, once we know someone is interested in some subset of the data and is going to digitize it they can let us know through any of the advertised email addresses or a comment here. We can append some readme to the ftp site and highlight it there and here on this blog that somebody is processing that data. Once submitted it would go into a folder in stage 1 with an appropriate name (including the person or their institution) and from there be converted to the common stage 2 format and merged to form part of the global databank. Because the whole lot has provenance flags we could know what part of the final product was 'theirs'. The databank info can be found from and links therefrom.