Estimating the location of induced seismic sources using seismic station records and artificial intelligence

Parsa, Hooman; Radan, Mohammad Yaser

doi:10.30499/ijg.2025.524794.1699

Estimating the location of induced seismic sources using seismic station records and artificial intelligence

Document Type : Research Article

Authors

Hooman Parsa ¹

Mohammad Yaser Radan ²

¹ M.Sc. in Civil Engineering, Faculty of Engineering, K. N. Toosi University of Technology, Tehran, Iran

² Assistant Professor, Faculty of Passive Defense, Malek Ashtar University of Technology, Tehran, Iran

10.30499/ijg.2025.524794.1699

Abstract

Induced seismic events triggered by human activities such as subsurface fluid extraction and injection can jeopardize the integrity of critical infrastructure. The multistage framework proposed here obviates the need for exhaustive geological models and dense seismic arrays, yet accurately and reliably estimates the regional epicenter location. To derive region-based labels for the supervised classifiers, K-means clustering was first applied to the latitude–longitude coordinates of all recorded events; the resulting cluster assignments were adopted as class labels, providing an objective, data-driven regional segmentation for subsequent training.
   In the initial processing stage, three-component seismic recordings were pre-processed by applying the short-term average to long-term average ratio (STA/LTA) to identify and correct abrupt baseline offsets. The cleaned records were then paired to form cross-correlation matrices at four lags (0.5, 0.1, 0.05 and 0.01 s) capturing relative information across multiple temporal scales. Recursive feature elimination with cross-validation (RFECV) extracted the most informative subset of correlation coefficients, substantially reducing dimensionality while preserving discriminative power. These feature vectors drove a probabilistic-averaging (soft-voting) ensemble that couples a support-vector machine (SVM) with an extreme-gradient-boosting (XGBoost) classifier, combining the margin-maximizing strength of SVM with the nonlinear learning capacity of boosted decision trees.
   Model development was conducted twice (first on the raw, imbalanced data and then on data balanced with the Synthetic Minority Over-sampling Technique (SMOTE)) to quantify the influence of class imbalance. Without SMOTE, decreasing the correlation-window step from 0.5 s to 0.1 s improved classification accuracy for epicentral region assignment from 0.73 to 0.90 while markedly shrinking the standard deviation of epicentral errors, indicating greater solution stability. Moving to still finer steps (0.05 s and 0.01 s) made the model increasingly sensitive to high-frequency noise, saturating accuracy gains and slightly inflating variance; the 0.1 s lag therefore emerged as an optimal trade-off between resolution and robustness.
   With SMOTE, overall stability improved further and error dispersion contracted, yet a modest drop in accuracy appeared at steps coarser than 0.01 s, attributable to the limited representativeness of some synthetic samples. The best performance arose from pairing SMOTE with the 0.01 s step, achieving a classification accuracy of 0.93 in epicentral region assignment, an absolute gain of 5.7% over the non-SMOTE result.
   These findings demonstrate that the proposed workflow can deliver accurate, repeatable epicentral estimates in data-limited environments, supporting real-time decision-making without the need for comprehensive subsurface models. Furthermore, where computational resources are constrained, the 0.1 s configuration without SMOTE remains a well-balanced option that combines high classification accuracy with modest processing cost.

Highlights

Abdulrahman, L. M., Abdulazeez, A. M., & Hasan, D. A. (2021). COVID-19 world vaccine adverse reactions based on machine learning clustering algorithm. Qubahan Academic Journal, 1(2), 134-140.

Altalhan, M., Algarni, A., & Alouane, M. T. H. (2025). Imbalanced Data problem in Machine Learning: A review. IEEE Access.

Awad, M., & Fraihat, S. (2023). Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. Journal of Sensor and Actuator Networks, 12(5), 67.

Bilal, M. A., Ji, Y., Wang, Y., Akhter, M. P., & Yaqub, M. (2022). Early earthquake detection using batch normalization graph convolutional neural network (bngcnn). Applied Sciences, 12(15), 7548.

Bokde, N., Feijóo, A., Villanueva, D., & Kulat, K. (2019). A review on hybrid empirical mode decomposition models for wind speed and wind power prediction. Energies, 12(2), 254.

Chen, Y., Saad, O. M., Savvaidis, A., Chen, Y., & Fomel, S. (2022). 3D microseismic monitoring using machine learning. Journal of Geophysical Research: Solid Earth, 127(3), e2021JB023842.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273-297.

Ellsworth, W. L. (2013). Injection-induced earthquakes. science, 341(6142), 1225942.

Elsayed, H. S., Saad, O. M., Soliman, M. S., Chen, Y., & Youness, H. A. (2023). EQConvMixer: A deep learning approach for earthquake location from single-station waveforms. IEEE Geoscience and Remote Sensing Letters, 20, 1-5.

Foulger, G. R., Wilson, M. P., Gluyas, J. G., Julian, B. R., & Davies, R. J. (2018). Global review of human-induced earthquakes. Earth-Science Reviews, 178, 438-514.

Hidayaturrohman, Q. A., & Hanada, E. (2024). Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure. BioMedInformatics, 4(4), 2201-2212.

Islam, M. D., Li, B., Islam, K. S., Ahasan, R., Mia, M. R., & Haque, M. E. (2022). Airbnb rental price modeling based on Latent Dirichlet Allocation and MESF-XGBoost composite model. Machine Learning with Applications, 7, 100208.

Jakkampudi, S., Shen, J., Li, W., Dev, A., Zhu, T., & Martin, E. R. (2020). Footstep detection in urban seismic data with a convolutional neural network. The Leading Edge, 39(9), 654-660.

Jin, X., & Han, J. (2011). K-means clustering. Encyclopedia of machine learning, 563-564.

Kong, Q., Trugman, D. T., Ross, Z. E., Bianco, M. J., Meade, B. J., & Gerstoft, P. (2019). Machine learning in seismology: Turning data into insights. Seismological Research Letters, 90(1), 3-14.

Krischer, L., Megies, T., Barsch, R., Beyreuther, M., Lecocq, T., Caudron, C., & Wassermann, J. (2015). ObsPy: A bridge for seismology into the scientific Python ecosystem. Computational Science & Discovery, 8(1), 014003.

Leong, Z. X., & Zhu, T. (2024). Machine learning‐assisted microearthquake location workflow for monitoring the Newberry enhanced geothermal system. Journal of Geophysical Research: Machine Learning and Computation, 1(3), e2024JH000159.

Matzel, E., Zeng, X., Thurber, C., Luo, Y., & Morency, C. (2017, February). Seismic interferometry using the dense array at the Brady geothermal field. In Proceedings of the 42nd Workshop on Geothermal Reservoir Engineering, Stanford, CA, USA (pp. 13-15).

Mousavi, S. M., & Beroza, G. C. (2022). Deep-learning seismology. Science, 377(6607), eabm4470.

ObsPy Development Team. (2024). ObsPy Documentation—Supported File Formats. Retrieved from https://docs.obspy.org

Perol, T., Gharbi, M., & Denolle, M. (2018). Convolutional neural network for earthquake detection and location. Science Advances, 4(2), e1700578.

Ramraj, S., Uzir, N., Sunil, R., & Banerjee, S. (2016). Experimenting XGBoost algorithm for prediction and classification of different datasets. International Journal of Control Theory and Applications, 9(40), 651-662.

Reinisch, E. C., Cardiff, M., & Feigl, K. L. (2018). Characterizing volumetric strain at Brady Hot Springs, Nevada, USA using geodetic data, numerical models and prior information. Geophysical Journal International, 215(2), 1501-1513.

Ross, Z. E., Meier, M. A., Hauksson, E., & Heaton, T. H. (2018). Generalized seismic phase detection with deep learning. Bulletin of the Seismological Society of America, 108(5A), 2894-2901.

SAMADI, HAMIDREZA, Kimiaefar, Roohollah, & Hajian, Alireza. (2022). Fast earthquake relocation using ANFIS Neuro-Fuzzy network trained based on the double difference method. GEOSCIENCES, 32(3 (125) ), 93-102. SID. https://sid.ir/paper/1040247/en (inPersian)

Zhang, X., Zhang, J., Yuan, C., Liu, S., Chen, Z., & Li, W. (2020). Locating induced earthquakes with a network of seismic stations in Oklahoma via a deep learning method. Scientific reports, 10(1), 1941.

Zhu, W., & Beroza, G. C. (2019). PhaseNet: a deep-neural-network-based seismic arrival-time picking method. Geophysical Journal International, 216(1), 261-273.

Zuo, K., Zhao, C., & Kuang, W. (2025). SourceNet: A Deep‐Learning‐Based Method for Determining Earthquake Source Parameters. Bulletin of the Seismological Society of America, 115(2), 379-392.

Keywords

Seismic event source localization, cross-correlation, XGBoost, SVM

Subjects