نوع مقاله : مقاله پژوهشی
موضوعات
عنوان مقاله English
نویسندگان English
Induced seismic events triggered by human activities such as subsurface fluid extraction and injection can jeopardize the integrity of critical infrastructure. The multistage framework proposed here obviates the need for exhaustive geological models and dense seismic arrays, yet accurately and reliably estimates the regional epicenter location. To derive region-based labels for the supervised classifiers, K-means clustering was first applied to the latitude–longitude coordinates of all recorded events; the resulting cluster assignments were adopted as class labels, providing an objective, data-driven regional segmentation for subsequent training.
In the initial processing stage, three-component seismic recordings were pre-processed by applying the short-term average to long-term average ratio (STA/LTA) to identify and correct abrupt baseline offsets. The cleaned records were then paired to form cross-correlation matrices at four lags (0.5, 0.1, 0.05 and 0.01 s) capturing relative information across multiple temporal scales. Recursive feature elimination with cross-validation (RFECV) extracted the most informative subset of correlation coefficients, substantially reducing dimensionality while preserving discriminative power. These feature vectors drove a probabilistic-averaging (soft-voting) ensemble that couples a support-vector machine (SVM) with an extreme-gradient-boosting (XGBoost) classifier, combining the margin-maximizing strength of SVM with the nonlinear learning capacity of boosted decision trees.
Model development was conducted twice (first on the raw, imbalanced data and then on data balanced with the Synthetic Minority Over-sampling Technique (SMOTE)) to quantify the influence of class imbalance. Without SMOTE, decreasing the correlation-window step from 0.5 s to 0.1 s improved classification accuracy for epicentral region assignment from 0.73 to 0.90 while markedly shrinking the standard deviation of epicentral errors, indicating greater solution stability. Moving to still finer steps (0.05 s and 0.01 s) made the model increasingly sensitive to high-frequency noise, saturating accuracy gains and slightly inflating variance; the 0.1 s lag therefore emerged as an optimal trade-off between resolution and robustness.
With SMOTE, overall stability improved further and error dispersion contracted, yet a modest drop in accuracy appeared at steps coarser than 0.01 s, attributable to the limited representativeness of some synthetic samples. The best performance arose from pairing SMOTE with the 0.01 s step, achieving a classification accuracy of 0.93 in epicentral region assignment, an absolute gain of 5.7% over the non-SMOTE result.
These findings demonstrate that the proposed workflow can deliver accurate, repeatable epicentral estimates in data-limited environments, supporting real-time decision-making without the need for comprehensive subsurface models. Furthermore, where computational resources are constrained, the 0.1 s configuration without SMOTE remains a well-balanced option that combines high classification accuracy with modest processing cost.
کلیدواژهها English