The objective of this work is to reduce the cost of performing model-based sensitivity analysis for ultrasonic nondestructive testing systems by replacing the accurate physics-based model with machine learning (ML) algorithms and quickly compute Sobol’ indices. The ML algorithms considered in this work are neural networks (NNs), convolutional NN (CNNs), and deep Gaussian processes (DGPs). The performance of these algorithms is measured by the root mean-squared error on a fixed number of testing points and by the number of high-fidelity samples required to reach a target accuracy. The algorithms are compared on three ultrasonic testing benchmark cases with three uncertainty parameters, namely, spherically void defect under a focused and a planar transducer and spherical-inclusion defect under a focused transducer. The results show that NNs required 35, 100, and 35 samples for the three cases, respectively. CNNs required 35, 100, and 56, respectively, while DGPs required 84, 84, and 56, respectively.
Nondestructive testing (NDT) [1,2] is the process of evaluating, testing, or inspecting assemblies or components for discontinuities or damages without affecting the serviceability of the part. NDT can be performed either during manufacturing or while the component is in service. This is essential to ensure product integrity and reliability as well as lower production cost and maintaining a uniform level of product quality. NDT has been successfully used in various applications such as aircraft damage estimation [3,4], weld defect inspection [5,6], and flaw characterization [7,8].
Nondestructive testing measurements have traditionally relied on experimental methods. These methods, however, can be both time-consuming as well as costly to perform. In order to reduce this time and cost, various physics-based NDT models [9–11] have been developed and used to reduce the need for empirical data. These include numerical methods such as finite element methods [12,13] and boundary element methods [14,15], as well as approximation methods such as Gaussian beam superposition methods [16,17] and ray tracing methods [18,19]. The aforementioned physics-based models can be cheaper, both financially and computationally, when compared to experimental methods, and can also have high accuracy. This can prove essential to reducing the need for empirical data, by replacing experimental methods with physics-based models.
Sensitivity analysis [20,21] is an approach to quantify the effect of each individual input parameter or combinations of input parameters on the system response. It can be classified as either a local [22,23] or a global sensitivity analysis [24,25]. In local sensitivity analysis, small perturbations in the input parameter space are used to quantify their effects on the system response. For global sensitivity analysis, the variance of the system response due to the input parameter variability is quantified.
In this study, global variance-based sensitivity analysis with Sobol’ indices [26,27] is used to quantify the effects of input parameter variability of the output of physics-based ultrasonic testing (UT) simulations. The main elements of the model-based sensitivity analysis are: (1) identifying the important input variability parameters and their corresponding probability distributions, (2) propagating these input parameters through the physics-based model, called uncertainty propagation, and (3) performing sensitivity analysis using Sobol’ indices.
The key challenges of model-based sensitivity analysis include (1) each physics-based model can be computationally expensive to solve, (2) a large number of variability parameters can exist for the NDT systems, and (3) multiple and repetitive physics-based model evaluations are required for sensitivity analysis. This results in problems that are challenging to solve in a reasonable amount of time.
To alleviate this computational cost, various metamodeling methods [28,29] have been developed. Metamodeling methods replace the time-consuming and accurate physics-based models with a computationally efficient metamodel (also called surrogate model). Metamodeling methods can be broadly classified as either data-fit methods [28,30] or multifidelity methods [31,32]. Data-fit methods are constructed by fitting a response surface through evaluated model responses at sampled high-fidelity data points. Examples of data-fit methods include polynomial chaos expansions (PCE) , Kriging , and support vector machines . In multifidelity methods, on the other hand, low-fidelity data are used to enhance the prediction capabilities of a data-fit model constructed from a limited number of high-fidelity data points. Examples of such methods include Cokriging  and manifold mapping .
Metamodeling methods have been utilized for various NDT applications. Bilicz et al. [38,39] used Kriging  for both forward and inverse problems using eddy current NDT. Support vector regression  was used by Miorelli et al.  to perform both sensitivity analysis and probability of detection for eddy current NDT systems. Du et al.  performed sensitivity analysis and probability of detection for UT NDT systems using PCE  and Kriging . Du and Leifsson  developed the polynomial chaos-based Cokriging multifidelity method  and used it for model-based probability of detection in UT NDT systems.
This paper introduces and applies three different machine learning algorithms for sensitivity analysis of UT NDT systems. In particular, neural networks (NNs) [44,45], convolutional neural networks (CNNs) [46,47], and deep Gaussian processes (DGPs) [48,49] are introduced to the global variance-based sensitivity analysis of UT NDT systems. Note that for these applications, the machine learning algorithms used can be classified as data-fit metamodeling methods. The UT NDT cases considered in this study are three benchmark cases developed by the World Federation of Nondestructive Evaluation Centers (WFNDEC).2 The benchmark cases are (1) spherically void defect under a focused transducer, (2) spherically void defect under a planar transducer, and (3) spherical-inclusion defect under a focused transducer. The sensitivity analysis results from the machine learning algorithms are compared to those obtained from directly evaluating the high-fidelity physics-based model. In this work, the analytical UT model by Thompson and Gray [50,51] is used as the high-fidelity physics-based model.
The remainder of this paper is organized as follows. The following section describes the methods used in this paper to train the machine learning algorithms and perform sensitivity analysis. The next section presents the result of the application of these algorithms to the benchmark cases. This paper ends with the conclusion of this study and provides suggestions for future work.
The methods used in this work are described in this section. The workflow of the model-based sensitivity analysis is described first, followed by the sampling plan, a detailed description of the machine learning algorithms, their validation, and finally, the sensitivity analysis with Sobol’ indices.
A flowchart of the model-based sensitivity analysis is shown in Fig. 1. The process begins by sampling the input parameter space. Two separate sets of sampling plans are created: one for training and another for testing. The high-fidelity physics-based model simulations are then evaluated using those sampling plans to gather the output responses. The training data is then used to train the machine learning algorithms, and the accuracy of these algorithms are tested using the testing data. If this accuracy, measured in terms of the root-mean-squared error (RMSE), does not meet the threshold of standard deviation of the testing points (), a new training set with a higher number of sample points is created and the previous steps are repeated. Once the machine learning-based models are accurate enough, sensitivity analysis with Sobol’ indices  is performed.
2.2 Sampling Plan.
The process of selecting discrete samples in the input variability space is known as sampling. It is an iteration-based process in which the input variability parameters are randomly drawn from probability distributions assigned to the parameters. In this work, Latin Hypercube sampling (LHS)  is used to generate the training plan, while Monte Carlo sampling (MCS)  is used to generate the testing plan. MCS is also used to generate the sampling plan for the Sobol’ indices  calculation.
Monte Carlo sampling  starts by generating random numbers within the [0,1] interval with replacement. These random numbers are used as probabilities of the associated cumulative density functions of the variability parameters. The corresponding values can then be obtained using quantile functions. LHS [52,54] on the other hand aims at sampling the variability parameters more effectively than MCS. This is done by stratifying the probability distributions into equal intervals in the [0,1] range. Random samples are then selected from each interval. This prevents the clustering of the generated numbers. The remaining steps are the same as MCS.
2.3 Neural Networks.
NN  methods can be classified as a subclass of data-fit metamodeling methods. Any complex function can be approximated through an hierarchy of features. NNs  have multiple steps in the hierarchy, which starts with an input and ends with an output. Each step is known as a layer. The layers in-between the inputs and the outputs are called hidden layers . Figure 2 shows an example of a schematic of a NN architecture with two hidden layers. The input and output layer size depends on the dimensionality of the input and output for a given problem. For the three UT benchmark cases, the input has three features, while the output has a size of one. Each hidden layer in a NN consists of neurons, which are a fundamental unit of computation in a NN . Neurons calculate a weighted sum of the outputs from a previous hidden or input layer and outputs a nonlinear transformation of it. This nonlinear transformation is termed activation. In Fig. 2, each hidden layer has eight neurons. By changing the number of hidden layers as well as the number of neurons in each hidden layer, the NNs can approximate functions of arbitrary complexity .
Training the NN involves solving an optimization problem where the objective function is the loss function and is minimized to improve the prediction capabilities of the NN. The gradient of this objective function is calculated with respect to the parameters of the NN and is done efficiently using the backpropagation algorithm . The optimizer used is Adaptive Moments (ADAM) algorithm  and has the following steps:
- Update the biased first moment estimate:where Gk is the gradient of the loss function with respect to the parameters at a given iteration k, Vk is the exponential moving averages of the gradients (also called biased first moment estimate), and is the exponential decay rate for Vk. The recommended value for is 0.9  and is used in this study.(3)
- Update the biased second moment estimate:where Sk is the exponential moving averages of the squared gradients (also called biased second moment estimate) and is the exponential decay rate for Sk. 0.999 is the recommended value for  and is used in this study.(4)
- Correct the bias in the first moment:where the bias introduced by setting V0 to zero is corrected by .(5)
- Correct the bias in the second moment:where the bias introduced by setting S0 to zero is corrected by .(6)
- Update the parameters:where αk is the learning rate, are the NN parameter values, and is a small value that is specified in order to prevent the denominator from being zero.(7)
The hyperparameters of a NN include the number of hidden layers, number of neurons in each hidden layer, the mini-batch size, the maximum number of epochs, the learning rate, and the activation function. Rigorous rules for selecting the hyperparameter settings for an NN do not exist. In this work, various hyperparameter settings were selected by iterating over them. The hyperparameters chosen were the ones that resulted in the lowest RMSE as described in Sec. 2.6. The NN architecture used in this work includes one hidden layer with 50 neurons. The mini-batch size selected is 10. Maximum number of epochs is set to 10,000. One epoch refers to one iteration over an entire training data set . The learning rate selected is 0.01. The activation function chosen for this study is the hyperbolic tangent function . To construct the NN, the Keras3 wrapper with Tensorflow  is used in this study.
2.4 Convolutional Neural Networks.
CNNs  are a special type of NN that are used to process data with a grid-like topology , such as images. In images, each grid location contains pixel values, and when combined together results in the final image. For this study, the variability parameters are used in the place of the pixels. Since CNN employs the mathematical operation called convolution, it is named as such . Any NN with at least one convolutional layer is termed CNN. For image recognition tasks, the number of parameters in a NN can grow really fast, as it depends on the number of input features. For images, the size of the image defines the number of input features. In CNNs, the number of parameters is independent of the number of input features and is dependant on the size and number of filters of a convolutional kernel . This reduces the number of parameters to be tuned and prevents overfitting in the presence of limited data.
Figure 3 shows a schematic of a CNN architecture that contains one convolutional layer, followed by one max-pooling layer, two fully connected layers, and an output layer. The output of a convolutional layer or max-pooling layer is termed feature maps. Feature maps are similar to images and also have a grid-like topology with pixel values. The number of channels, also known as depth of a feature map, depends on the hyperparameter called number of filters. In Fig. 3, this value is four. To convert a feature map to a fully connect layer, the flatten operation is performed. Flatten converts the three dimensional feature map into a one-dimensional fully connected layer. Similar to a NN, the number of these layers can vary.
The input image in Fig. 3 has one channel (gray image) and has 32 pixels (grids) in the horizontal and vertical directions each. Note that colored images have three channels, namely, red, blue, and green, respectively. The kernel or filter, which contains parameters to be tuned, has five grids in the horizontal and vertical directions each, that is a 5 × 5 kernel. In a convolutional operation, an element-wise product between the kernel parameter value and image pixel value is performed. This process continues by moving the kernel over the image, from left to right, top to bottom. The hyperparameter stride is used to define how many pixels in both the horizontal and vertical directions to move after performing one convolutional operation.
Training the CNN is similar to training the NN. The loss function described in Eq. (2) is used as the objective function for training the CNN. The gradients are calculated using the backpropagation algorithm  and the ADAM  optimizer is used to minimize the loss function.
The hyperparameters used in the CNN are the number of convolutional and max-pooling layers and the convolutional and max-pooling kernel size. The number of filters of each convolutional kernel and the stride of each kernel is considered as well. Other hyperparameters include the number of fully connected layers, the number of neurons in each layer, the mini-batch size, the maximum number of epochs, the learning rate, and the activation function. The CNN used in this work includes one convolutional layer, one fully connected layer, and no max-pooling layer. Only one convolutional kernel of size 1 × 1 is selected and has a stride of value one. The mini-batch selected has a size of 10, while the maximum number of epochs is set to 10,000. The number of neurons in the fully connected layer is 100 and the learning rate of the ADAM  optimizer is 0.01. Hyperbolic tangent function  is selected as the activation function for this study. The process of selecting these hyperparameters is similar to those used for the NN. To construct the CNN, the Keras4 wrapper with Tensorflow  is used in this study.
One final thing to note is that, while images are not used, each variability parameter is assigned to each grid and the convolutional operation is performed on it. The input grid therefore has a size of 3 × 1. While this grid size is lower compared to most image sizes, CNNs can be easily expanded to work on NDT cases with higher number of variability parameters.
2.5 Deep Gaussian Processes.
DGPs  are multi-layer generalizations of Gaussian processes (GP), where the inputs to one GP is from the outputs of another. The architecture is similar to a NN (Fig. 2); however, the activation functions in the neurons are replaced by GP mappings. DGPs are shown to overcome the limitations of single-layer GP, while retaining its advantages . DGPs are shown to work well on limited amount of data , which is useful for NDT sensitivity analysis.
Damianou and Lawrence  developed a Bayesian training framework5 to train all the parameters of the DGP. This framework is used to construct the DGP in the present work. The details of this framework are beyond the scope of this paper and can be found in Ref. .
The hyperparameters used in the DGP include the number of hidden layers, the number of GP mappings in each hidden layer, the correlation used as the GP mapping, and the total number of training iterations. Similar to NNs and CNNs, these hyperparameters were chosen that resulted in the lowest RMSE. For this study, the DGP used had one hidden layer, with a Matern-5/2 correlation  function for the hidden and output layer. The total number of iterations used was set to 1500.
2.7 Model-Based Sensitivity Analysis.
3 Numerical Examples
In this section, the three machine learning algorithms are applied to three UT benchmark cases developed by the WFNDEC. The accuracy of the sensitivity analysis results are compared to those obtained from directly sampling the high-fidelity physics-based models.
3.1 Description of the Benchmark Cases.
The three UT benchmark cases considered are the spherically void defect under a focused transducer (Case 1), spherically void defect under a planar transducer (Case 2), and the spherical-inclusion defect under a focused transducer (Case 3). The setup for these cases are shown in Fig. 4. Each of these cases has three variability parameters as inputs. Cases 1 and 3 have the probe angle , the x-coordinates (x), and the frequency related F-number (F) as variability parameters. For Case 2, the F-number is replaced with the y-coordinates (y). Table 1 lists the variability parameters along with their input distributions. The output response is the voltage waveform at the receiver.
|Parameters||Case 1||Case 2||Case 3|
|(deg)||N(0, 0.52)||N(0, 0.52)||N(0, 0.52)|
|x (mm)||U(0, 1)||U(0, 1)||U(0, 1)|
|y (mm)||N/A||U(0, 1)||N/A|
|F||U(13, 15)||N/A||U(8, 10)|
|Parameters||Case 1||Case 2||Case 3|
|(deg)||N(0, 0.52)||N(0, 0.52)||N(0, 0.52)|
|x (mm)||U(0, 1)||U(0, 1)||U(0, 1)|
|y (mm)||N/A||U(0, 1)||N/A|
|F||U(13, 15)||N/A||U(8, 10)|
The Thompson–Grey analytical model  is used as the high-fidelity physics-based model in this study. The center frequency of the transducer is 5 MHz. The density of the fused quartz block is 2,000 kg/m3. The longitudinal wave speed is 5969.4 m/s, while the shear wave speed is 3774.1 m/s. A detailed description of the UT testing model can be found in Schmerr and Song , while the validation of this model with experimental data can be found in Du et al. . Figure 5 shows the validation of the time-domain waveform obtained from the high-fidelity physics-based model and is compared with experimental data for Case 2. The physics-based model matches the experimental results well.
In Sec. 2, as discussed, the machine learning algorithms are required to have a global accuracy, measured in terms of RMSE, of less than , before performing sensitivity analysis. For the three UT cases, the convergence of the RMSE with increasing number of high-fidelity training points is performed at a single defect radius size (a) of 0.5 mm and is shown in Figs. 6–8. For each machine learning algorithm and for each high-fidelity sample size, ten different LHS are generated to account for the variation in the input variability space. The RMSE plots in Figs. 6–8 show both the mean as well as the standard deviation in the RMSE arising due to the different input samples generated. In all these cases, the number of testing points is fixed and contains 1,000 high-fidelity MCS. Figures 6–8 also show the and values.
In Figs. 6–8, the RMSE decreases with increasing number of high-fidelity sample points. For Case 1, both NNs and CNNs perform similarly and require around 35 high-fidelity sample points to reach the target global accuracy. DGP for this case requires around 84 high-fidelity samples to reach this same threshold. For Case 2, DGP requires 84 high-fidelity samples and outperforms both NNs and CNNs, which require 100 high-fidelity samples each. Finally, in Case 3, NN outperforms both CNN and DGP. NN requires 35 high-fidelity samples, while the other two machine learning algorithms require 56 samples each. Using the same number of high-fidelity samples as those required to reach the global accuracy, the machine learning algorithms were trained for different defect sizes ranging from 0.1 mm to 0.5 mm and this error is now measured using NRMSE.
Figures 9–11 show the NRMSE as a variation in the defect size for the three UT cases. For all these cases, NRMSE is nearly constant and within the global accuracy, indicating that these machine learning algorithms are robust under varying defect sizes. Note that similar to the RMSE convergence plots, the NRMSE plots are generated using ten different samples at each defect size and for each machine learning algorithm. Figures 9–11 show both the mean and the standard deviation in the NRMSE.
To perform the sensitivity analysis with Sobol’ indices, ten different sets containing 75,000 MCS each were generated for each of the machine learning algorithms and for each UT case. This analysis was only performed at a defect size of a = 0.5 mm. The trained machine learning algorithms were used to provide the model response for all the generated MCS. These model responses were then used to calculate the Sobol’ indices. The same number of MCS were also generated to directly evaluate the physics-based models in order to perform sensitivity analysis. Figures 12–14 show the first-order Sobol’ indices for the three UT cases, while Figs. 15–17 show the total-order Sobol’ indices for these UT cases. The black lines in these plots show the standard deviations due to the ten different sample sets chosen. For Case 1, the F-number has negligible effect on the model response as seen in Figs. 12 and 15, respectively. The same is true for Case 3, shown in Figs. 14 and 17, respectively. In Case 2, the y-coordinates have a small effect on the model response as seen in Figs. 13 and 16, respectively. For all the different cases, the Sobol’ indices values from the machine learning algorithms match well with those from the physics-based models.
Three different machine learning algorithms, namely, neural networks, convolutional neural networks, and deep Gaussian processes, were used to perform model-based sensitivity analysis for ultrasonic testing systems using three benchmark cases developed by the WFNDEC. First, the global accuracy of these algorithms was measured and the number of high-fidelity samples required to reach the desired global accuracy was noted. These globally accurate algorithms were then used to generate model responses in order to perform sensitivity analysis using Sobol’ indices. The sensitivity analysis results also matched well with those obtained by directly using the physics-based model.
This study shows that NN, CNN, and DGP machine learning algorithms can be used to provide fast and accurate sensitivity results values. Performing sensitivity analysis can assist in deciding which variability parameters need to be considered while performing physical experiments, which can reduce both cost and time of the experiments. Future work will include cases with non-spherical defects as well as cases with a larger number of variability parameters.
See Note 3.
The authors are supported in part by NSF Award No. 1846862 and the Iowa State University Center for Nondestructive Evaluation Industry-University Research Program.
Conflict of Interest
There are no conflicts of interest.