Friday, July 30, 2010

White paper 10 - Dataset algorithm performance assessment based upon all efforts

The dataset algorithm performance assessment based upon all efforts white paper is now available for discussion. When making posts please remember to follow the house rules. Please also take time to read the full pdf before commenting and where possible refer to one or more of section titles, pages and line numbers to make it easy to cross-reference your comment with the document.

The recommendations are reproduced below:
• Assessment criteria should be developed entirely independently of the dataset developers and should be pre-determined and documented in advance of any tests.

• It is crucial that the purpose to which a dataset could be put be identified and that a corresponding set of assessment criteria are derived that are suitable for that purpose.

• The output of an assessment should be to determine whether a dataset is fit for a particular purpose and to enable users to determine which are most suitable datasets for their needs. Outputs should be clearly documented in such a form as to enable a clear decision tree for users.

• Validation of an algorithm should always be carried out on a different dataset from that used to develop and tune the algorithm.

• A key issue is to determine how well uncertainty estimates in datasets represent a measure of the difference between the derived value and the “true” real world value.

• It would be worthwhile to consider the future needs for the development of climate services by indentifying an appropriate set of regions or stations that any assessment should include.

• New efforts resulting from this initiative should be coordinated with on-going regional and national activities to rescue and homogenize data.


  1. Among the assessment criterias "the ability of the algorithm to identify and adjust for individual breaks" is listed. Although it is traditional way of calculating efficiency of homogenisation methods, these characteristics can easily provide misleading results. Problems: a) Even in a correct detection the timing and size of the shift are usually not absolutely precise. And what is the optimal degree of tolerance in impreciseness? It needs arbitrary decisions. - b) The search of change-points is not the main objective of the homogenisation: the main objective is to have the time series with their climate-characteristics closest to the real world. In this concept the search of change-points is a tool only, and anything is the skill in individual change-point detection, the focus has to be on the good reproduction of climatic characteristics. To make my argument clearer, I write an extreme but possible example: Let we suppose that a homogenisation method approaches to the final solution with fitting step-functions (it is frequent). If there is a trend-like inhomogeneity in the observed time series, the optimal solution of step-fitting will be: detecting a small change-point in each time step. In this way the homogenisation would be perfect, but the score of false alarm rate outstandingly high, after all there is only 1 trend in the observed series and no breaks! - With RMSE and trend-slope calculations there are no similar problems (therefore they are applied e.g. in COST ES0601).

    Peter Domonkos

  2. Just writing to offer to help with this project. Given my background it would be something software related such as writing or reviewing code, writing tests or documentation.

    Contact details are on my site.