Iranian Journal of Geophysics

Iranian Journal of Geophysics

Reduction of the data required for training deep learning models based on clustering of the data and its application in one-dimensional magnetotelluric inversion

Document Type : Research Article

Authors
1 Ph.D. Student, Institute of Geophysics, University of Tehrn, Tehrn, Iran
2 Assistant Professor, Institute of Geophysics, University of Tehran, Tehran, Iran
Abstract
Data-driven deep learning approaches have to deal with the challenge of generating large amounts of high-quality data, as well as the heavy computational cost and long training time imposed by it. Due to their ability to approximate complex nonlinear mapping functions, deep networks can be used effectively in geophysical inverse problems and better generalization can be achieved through deeper networks in many applications. In this research, an approach based on primary clustering of training data and assigning a certain percentage of each cluster to training, validation and test data has been used for data splitting. Kolmogorov Smirnov (KS) test has been applied to compare the distribution of three sets that are divided in this manner, and indicates that the training, validation and test data have the same distribution. A deep learning model based on modified U-Net architecture has been trained for one-dimensional inversion of magnetotelluric (MT) data, which is a highly non-linear regression problem. Supervised learning and back propagation error are used, and therefore, the inputs along with the corresponding outputs are given to the network in the form of training samples. For this purpose, a five-layer geoelectric model has been considered to simulate the conditions of a geothermal field. Using magnetotelluric forward modeling algorithm, the responses of this one-dimensional geoelectric model are analytically calculated in the frequency range of 0.01-100 Hz and in 13 frequencies that are uniformly distributed on a logarithmic scale, and total of 500000 sample data were generated. The thickness of the layers is variable and considered as part of the output. Pre-processing is done to scale the input and output variables before training and the network outputs are post-processed to be returned to the original interval. The mean square error (MSE) loss function and the Adam optimizer were used to train the network. Training is accomplished with a different amount of data separated by the mentioned method, and network performance is evaluated with some quantitative and qualitative criteria, including boxplots of Euclidean distance between true and predicted outputs and Nash Sutcliffe Efficiency coefficients. The trained network predicts the electrical resistivity and thickness of the layers from the new set of phase and apparent resistivity values. The results show that data splitting in this manner reduces the number of training data required to train the deep learning model by at least 50% without reducing the accuracy of the trained network. For noisy data and in more real scenarios, random separation is definitely not a suitable approach to form training, validation and test sets. In these conditions, the use of clustering is a suitable solution for equalizing the statistical distribution of the three sets and reducing the number of required data.  
Keywords

Subjects


  1. خسروی، ن. 1385، آمار توصیفی و استنباطی. چاپ اول، انتشارات پوران پژوهش.

    Alali, A., Morgan, F.D., Coles, D., 2020. Novel approach for 1D resistivity inversion using the systematically determined optimum number of layers. J Geol Geophys, 9(6), 481.

    Caldwell, T.G., Bibby, H.M., Brown, C., 2004. The magnetotelluric phase tensor. Geophys. J. Int.158, 457-469.

    Chen, J., Hoversten, G.M., Key, K., Nordquist, G., Cumming W., 2012. Stochastic inversion of magnetotelluric data using a sharp boundary parameterization and application to a geothermal site. Geophysics, 77(4), E265-E279.

    Comeau, M.J., Becken, M., Grayver, A.V., Kaüfl, J.S., Kuvshinov, A.V., 2022. The geophysical signature of a continental intraplate volcanic system: from surface to mantle source.Earth and Planetary Science Letters 578, 117307.

    Constable, S.C, Parker, R.L., Constable, C.G.,

     

    1. Occam’s inversion: a practical algorithm for generating smooth models from electromagnetic data. Geophysics, 52, 289-300.

    Fischer, G., Schnegg, P.A., Peguiron, M., LeQuang, B.V., 1981. An analytic one-dimensional magnetotelluric inversion scheme. Geophysical Journal of the Royal Astronomical Society, 67, 257-278.

    Goodfellow. I., Bengio. Y., Courville. A., 2016. Deep Learning. London. The MIT Press.

    Junge, A., 2011. A concept for 1D inversion of MT data using phase tensor invariants. 24. Schmucker-Weidelt-KolloquimNeustadtan der Weinstrasse, 19-23 September.

    Kim, Y., Nakata, N., 2018. Geophysical inversion versus machine learning inversion in inverse problems. Leading Edge, 894-901.

    Kingma, D.P., Ba, J.L., 2014. Adam: a method for stochastic optimization. In International Conference on Learning Representations.

    Kodinariya, T.M., Makwana, P.R., 2013. Review on determining number of Cluster in K-Means Clustering. International Journal of Advance Research in Computer Science and Management Studies, 1 (6), 90-95.

    Liao, X., Zhang, Z., Yan, Q., Shi, Z., Xu, K., Jia, D., 2022. Inversion of 1-D magnetotelluric data using CNN-LSTM hybrid network. Arabian Journal of Geosciences15, 1430.

    Liao, X., Shi, Z., Zhang, Z., Yan, Q., Liu, P., 2022. 2D inversion of magnetotelluric data using deep learning technology. ActaGeophysica 70, 1047-1060,

    Liu, Z., Chen, H., Ren, Z., Tang, J., Xu, Z., Chen, Y., Liu, X., 2021. Deep learning audio magnetotellurics inversion using residual-based deep convolutional neural network. Journal of Applied Geophysics,188, 104309.

    Liu, W., Wang, H., Xi, Z., Zhang, R., Huang, X., 2022. Physics-driven deep learning inversion with application to magnetotelluric. Remote Sens14, 3218.

    MacQueen, J. B., 1967. Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1. University of California Press. pp. 281–297.

    Miri, H., Habibian Dehkordi, B., Payrovian, G., 2021. Oil field imaging on the Sarab Anticline, southwest of Iran, using magnetotelluric data. Journal of Petroleum Science and Engineering, 202, 108497.

    Nair, V., and E. G. Hinton, 2010, Rectified linear units improve restricted Boltzmann machines: Proceedings of the 27th International Conference on International Conference on Machine Learning, 807–814.

    Nash J, Sutcliffe J (1970) River flow forecasting through conceptual models part I: A discussion of principles. Journal of Hydrology, 10, 282-290.

    Oh, S., Noh, K., Seol, S.J., Byun. J., 2020. Cooperative deep learning inversion of controlled-source electromagnetic data for salt delineation. Geophysics 85(4), E121-E137.

    Olaniyi Muraina., I., 2022. Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts. 7th International mardin artuklu scientific researches conference, 496-504, www.artuklukongresi.org

    Parker, R.L., Booker, J.R., 1996. Optional one-dimensional inversion and bounding of magnetotelluric apparent resistivity and phase measurements. Physics of the Earth and Planetary Interiors 98, 269-282.

    Puzyrev, V., 2019. Deep learning electromagnetic inversion with convolutional neural networks. Geophys. J. Int 218, 817-832.

    Rahmani Jevinani, M., Habibian Dehkordi, B., Ferguson, I.J., Rohban, M.H., 2024. Deep learning-based 1-D magnetotelluric inversion: performance comparison of architectures. Earth Science Informatics17, 1663-1677.

    Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, 234-241.

    Segovia, M.J., Diaz, D., Selzak, K., Zuiga, F., 2021. Magnetotelluric study in the Los Lagos region (Chile) to investigate volcano-tectonic processes in the Southern Andes.Earth, Planets and Space 73(5).

    Shahriari, M., Pardo, D., Picon, A., Galdran, A., Del Ser, J., Torres-Verdin, C., 2020. A deep learning approach to the inversion of borehole resistivity measurements. Computational Geosciences24, 971-994.

    Smith, J.T, Booker, J.R., 1988. Magnetotelluric inversion for minimum structure. Geophysics53, 1565-1576.

    Vanderkelen, I.; Van Lipzig, N.P.M.; Thiery, W. Modelling the Water Balance of Lake Victoria (East Africa)-Part 2: Future Projections. Hydrol. Earth Syst. Sci. 2018, 22, 5527–5549.