کنترل کیفی داده های اوزن سطحی اندازه گیری شده در ایستگاه های شهر تهران با کمک نرم افزار آماری جدید

نوع مقاله : مقاله مروری‌

نویسندگان

1 پژوهشگر پسادکتری، گروه فیزیک فضا، مؤسسه ژئوفیزیک دانشگاه تهران، تهران، ایران

2 استاد، گروه فیزیک فضا، مؤسسه ژئوفیزیک دانشگاه تهران، تهران، ایران

چکیده

بی‌توجهی به وجود خطاهای متعدد شامل خطای فاحش، اعداد ثابت و غیره در داده می تواند به نتایج نادرست در تحلیل داده ها منجر شود؛ ازاین رو کنترل کیفی داده گامی ضرورری جهت حصول اطمینان از صحت داده است. در دسترس نبودن واقعیت، سبب پیچیدگی در تشخیص خطا و انجام دادن کنترل کیفی داده می شود. روش ها و آزمایش های آماری گوناگونی برای کنترل کیفی داده وجود دارد، ولی هیچ یک یافتن تمامی خطاها را در داده ضمانت نمی کنند. اجرای هرچه بیشتر آزمایش ها سبب افزایش اطمینان نسبی از کیفیت داده می شود. در این مطالعه به دلیل اهمیت و ضرورت مطالعه آلاینده ازن سطحی، کیفیت این داده ها در سطح شهر تهران بررسی شد. کنترل کیفی داده ها با استفاده از ابزار AutoQA4Env انجام شد. این ابزار متشکل از مجموعه آزمایش های آماری گروه‌بندی شده در دو حالت پایه و پیشرفته است. از ویژگی های خاص این ابزار، تنظیمات کاربری، تکرارپذیری و گسترش پذیری آن است. نتایج اجرای این ابزار در حالت پایه، حاکی از وجود خطای فاحش در برخی از داده ها بود که این موضوع به‌منزله لزوم بررسی کنترل کیفی داده پیش از به‌کارگیری آن است. از طرف دیگر، در برخی موارد نشان داده شد اجرای ابزار در حالت پایه کافی نیست و کاربست ابزار در حالت پیشرفته مناسب تر است.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Quality control of measured surface Ozone at the Tehran city stations using a new sStatistical software

نویسندگان [English]

  • Najmeh Kaffashzadeh 1
  • Abbasali Aliakbari Bidokhti 2
1 Postdoctoral Fellowship, Space Physics Department, Institute of Geophysics, University of Tehran, Tehran, Iran
2 Professor, Space Physics Department, Institute of Geophysics University of Tehran, Tehran, Iran
چکیده [English]

Being an inseparable part of environmental data, errors are generated due to several reasons, either natural or artificial. The first is produced from natural phenomena such as animal activities, storms, floods, etc. The later can be generated via human activities during data collecting, entering and processing that can be intentional or unintentional. Since errors can affect results of any analysis, distinguishing them via quality control is a prerequisite of any data usage. Because of unknown truth, this seemingly simple task becomes challenging. Although many efforts have been devoted to develop tests and tools for distinguishing errors in data, none of them can guarantee that all errors can be found. It is important as much as orthogonal testing to find more errors. Here we used a tool named AutoQA4Env, which has been developed for an automated quality control of environmental data. This tool consists of a series of statistical tests which have been used in various communities and organizations such as World Meteorological Organization and Environmental Protection Agency. The tests have been classified in several groups, based on their strictness. The tool has a setting menu by which users can add tests and modify the thresholds. Two versions of the tool, namely basic and advanced flagging system are open source and accessible via b2share. The tool was tested for the quality control of a set of data series of surface ozone measured at the pollution monitoring stations in the city of Tehran. These data are an important source to get information about the pollution levels and trends in Tehran; thus knowing their quality can improve and reduce the uncertainties in the results. The results indicate that gross errors exist in the most of the stations’ data, even though these data are published and are publicly available. Applying the tool in the basic state finds most of the errors. About 0.02% of the data were erroneous for three years of data at 15 stations. Binary flagging system of the tool labels these failure data as an unacceptable data, although they were in fact acceptable. The advanced state of the tool was more moderate than the basic one and corrected these labels. In this state, 57.7% of the unacceptable data in the basic state were distinguished as a suspected value and only 5.6% of them were unacceptable. Therefore, we can conclude that the AutoQA4Env even at this stage could find and flag most of the data errors, at least gross errors. Besides, the advanced flagging system of the tool reduces errors in labeling.
 

کلیدواژه‌ها [English]

  • errors
  • Quality Control
  • AutoQA4Env tool
  • surface ozone data
عابدینی، ع.، آزادی، م.، پرهیزگار، د.، ۱۳۸۲، کنترل کیفی داده­های همدیدی سطح زمین و جو بالا: نشریه تحقیقات جغرافیایی، ۱۸(۲)، ۸۵-۷۴.
 Campbell, J., 2013, Quantity is nothing without quality: BioScience, 63(7), 574–585.
Gandin, L. S., 1988, Complex quality control of meteorological observations: Monthly Weather Review, 116(5), 1137–1156.

Kaffashzadeh, N., and Schultz, M. G., 2020a, AutoQA4Env (basic flagging source code for EGU 2020 presentation), accessed 11 September 2021, https://b2share.fz-juelich.de/records/f79417f0a7eb4db7818e 6e4e3c0163e7.

Kaffashzadeh, N., and Schultz, M. G., 2020b, AutoQA4Env (advanced flagging source code for EGU 2020 presentation), accessed 11 September 2021, https://b2share.fz-juelich.de/records/9afba748f2f943f5a73e 6b6b919ce3c2.
Lorenc, A. C., 1981, A global three-dimentional multivaraite statistical interpolation scheme: Monthly Weather Review, 109(4), 701–721.
Monks, P. S., Archibald, A. T., Colette, A., Cooper, O., Coyle, M., Derwent, R., Fowler, D., ... , and Williams, M. L., 2015, Tropospheric ozone and its precursors from the urban to the global: Atmospheric Chemistry and Physics, 15(15), 8889–8973.
Osborne, J. W., and Overbay, A., 2004, The power of outliers (and why researchers should always check for them): Practical Assessment, Research, and Evaluation, 9(6), accessed 11 September 2021, https://scholarworks.umass.edu/pare/vol9/iss1/6.
Schultz, M. G. et al., 2017, Tropospheric ozone assessment report: database and metrics data of global surface ozone observations: Elementa: Science of the Anthropocene, 5:58.
Scully-Allison, C., Le, V., Fritzinger, E., Strachan, S., Harris, F. C., and Dascalu, S. M., 2018, Near real-time autonomous quality control for streaming environmental sensor data: Procedia Computer Science, 126, 1656–1665.
Sofen, E. D., Bowdalo, D., Evans, M. J., Apadula, F., Bonasoni, P., Cuperio, M., and Ellul, R., 2016, Gridded global surface ozone metrics for atmosphere: Earth System Science Data, 8, 41–59.
 Steinacker, R., Mayer, D., and Steiner, A., 2011, Data quality control based on self-consistency: Monthly Weather Review, 139(12), 3974–3991.
Tanhua, T., van Heuven, S., Key, R. M., Velo, A., Olsen, A., and Schirnick, C., 2010, Quality control procedures and methods of the CARINA database: Earth System Science Data, 2, 35–49.
U.S. Integrated Ocean Observing System, 2014, Manual for real time quality control of in-situ temperature and salinity data: a guide to quality control and quality assurance for in- situ temperature and salinity observations: Silver Spring, MD, U.S. Department of Commerce, National Oceanic and Atmospheric Administration, accessed 11 September 2021, https://repository.oceanbestpractices.org/handle/11329/269.
Zahumensky, I., 2016, Guidelines on quality control procedures for data from automatic weather stations, accessed 11 September 2021, https://www.researchgate.net/publication/228826920.
Zurbenko, I., Porter, P., Rao, S. T., Ku, J. Y., Gui, R., and Eskridge, R. E., 1996, Detecting discontinuities in time series of upper-air data: Development and demonstration of an adaptive filter technique: Journal of Climate, 9, 3548–3560.