عنوان مقاله [English]
In-situ observations underlie a wide range of planning, applied studies and modeling in various fields and sciences. Using this data in studies and planning without ensuring the accuracy and homogeneity of them can lead to uncertainty in the results. The major problems that researchers face are the poor data quality, missing data, outliers and in-homogeneity in time series. Therefore, in this paper, the minimum and maximum daily temperature series and daily rainfall series were analyzed at 37 weather stations in Iran for outliers and homogeneity over the period 1959-2018. In this regard, the World Meteorological Organization in cooperation with the Climatology Commission has provided instructions for data homogenization (e.g. WMO/TD Document No. 1186, Guidelines on climate metadata and homogenization; WMO Document No. 1203, WMO Guidelines on the Calculation of Climate Normal).
The main steps in data homogenization are:
Metadata analysis and data quality control;
Creating a reference series;
Detection of break points;
To do this, in the initial clustering, according to the previous activities and studies in this field which have mostly used empirical and quantitative methods, including principal components and cluster analysis, Iran was divided into 5 clusters based on the climatic characteristics. After initial clustering, the daily maximum and minimum temperatures and daily rainfall series were statistically analyzed using SPSS software and the percentage of missing data was determined for each station. Then, Climatol package in R software was used to study outliers, in-homogeneity and homogenization. In each cluster, the series are re-clustered based on the variability of desired parameter, and for each station, the other stations with similar variability belonging to that cluster are considered as reference stations.
Based on this algorithm, first the desired series is estimated and standardized by reference series using type (II) regression method. After estimating the series, the standardized anomaly series are calculated, in which the difference between the observed and estimated values is calculated. For detecting outliers, two steps were followed. Original data corresponding to the standardized anomalies greater than the prescribed thresholds were detected as outliers. In the second step, in order to confirm the outliers, the detected outliers in the first step were compared with the values of the days before and after for temperature series. If they differed significantly, they would be accepted as outliers and deleted. For the precipitation series, the atmospheric condition of the desired dates would be checked. For detection of in-homogeneity, the standard normal homogeneity test (SNHT) was performed on the monthly series. If the SNHT test statistic was greater than the prescribed threshold, the series was split at the point of the maximum SNHT and all the data before the break point were transferred to a new series with the same geographic coordinates. This process was repeated until all series were homogeneous. If break points were confirmed by metadata, they would then be accepted as non-climatic breaks. Finally, all the missing data in every homogenous series are estimated using same estimation procedure. The only difference is that the fragments of series are used as references.
Given the large number of missing and suspicious data in 1959, we considered the beginning of the statistical period from 1960. Investigations have shown that on some dates, all stations in a cluster lack data, possibly due to glitches in the MESSIR-CLIM system of Meteorological Organization through which data is received. In such cases, the average data of the days before and after the mentioned stations was used to estimate the data on that dates. MESSIR-CLIM is the database of IRIMO including climatic database management system that is based on PostgreSQL. The main functions of the system receive and store all kinds of weather and climatic data. The system is able to collect and process massive amounts of information and provide meteorological products (such as charts, maps, tables, and reports).
For the maximum temperature in the stations of the Caspian region (Cluster 4), 18 dates and for the minimum temperature, 13 dates in the mountainous areas (Cluster 5), in all the cluster stations were missing data.
The maximum and minimum temperature and daily precipitation series for 37 weather stations of Iran have an average of 5%, 7% and 2% missing values, respectively.
In the 60-year time series (1959-2018) after deleting the 1959 data, the percentage of missing data at maximum and minimum daily temperature and daily precipitation decreased by an average of about 0.3%.
In this time series, excluding 1959 data, 7 outliers were detected for the maximum temperature parameter. For the minimum temperature, this number reached 7 and for the precipitation parameter, 8 outliers were identified. In 8 cases, due to the lack of atmospheric data on the desired dates, it was not possible to make a definitive judgment about the accuracy of precipitation data outliers.
In terms of daily temperature series, with the exception of Tabas station, out of 36 stations, 16 stations were homogeneous and 20 stations had one or two or three breakpoints. For the precipitation parameter, 5 in-homogeneous stations were identified.
Unfortunately, due to the lack of a comprehensive metadata bank, there was no definitive reason for many of these fractures at some stations.