عنوان مقاله [English]
Being an inseparable part of environmental data, errors are generated due to several reasons, either natural or artificial. The first is produced from natural phenomena such as animal activities, storms, floods, etc. The later can be generated via human activities during data collecting, entering and processing that can be intentional or unintentional. Since errors can affect results of any analysis, distinguishing them via quality control is a prerequisite of any data usage. Because of unknown truth, this seemingly simple task becomes challenging. Although many efforts have been devoted to develop tests and tools for distinguishing errors in data, none of them can guarantee that all errors can be found. It is important as much as orthogonal testing to find more errors. Here we used a tool named AutoQA4Env, which has been developed for an automated quality control of environmental data. This tool consists of a series of statistical tests which have been used in various communities and organizations such as World Meteorological Organization and Environmental Protection Agency. The tests have been classified in several groups, based on their strictness. The tool has a setting menu by which users can add tests and modify the thresholds. Two versions of the tool, namely basic and advanced flagging system are open source and accessible via b2share. The tool was tested for the quality control of a set of data series of surface ozone measured at the pollution monitoring stations in the city of Tehran. These data are an important source to get information about the pollution levels and trends in Tehran; thus knowing their quality can improve and reduce the uncertainties in the results. The results indicate that gross errors exist in the most of the stations’ data, even though these data are published and are publicly available. Applying the tool in the basic state finds most of the errors. About 0.02% of the data were erroneous for three years of data at 15 stations. Binary flagging system of the tool labels these failure data as an unacceptable data, although they were in fact acceptable. The advanced state of the tool was more moderate than the basic one and corrected these labels. In this state, 57.7% of the unacceptable data in the basic state were distinguished as a suspected value and only 5.6% of them were unacceptable. Therefore, we can conclude that the AutoQA4Env even at this stage could find and flag most of the data errors, at least gross errors. Besides, the advanced flagging system of the tool reduces errors in labeling.