0
Research Papers

# Improving Machined Surface Shape Prediction by Integrating Multi-Task Learning With Cutting Force Variation ModelingOPEN ACCESS

[+] Author and Article Information
Chenhui Shao

Department of Mechanical
Science and Engineering,
University of Illinois at Urbana-Champaign,
Urbana, IL 61801
e-mail: chshao@illinois.edu

Jie Ren

Department of Industrial and
Manufacturing Engineering,
Florida State University,
Tallahassee, FL 32310
e-mail: jr14r@my.fsu.edu

Hui Wang

Department of Industrial and
Manufacturing Engineering,
Florida State University,
Tallahassee, FL 32310
e-mail: hwang10@fsu.edu

Jionghua (Judy) Jin

Department of Industrial and
Operations Engineering,
University of Michigan,
Ann Arbor, MI 48109
e-mail: jhjin@umich.edu

S. Jack Hu

Department of Mechanical Engineering,
University of Michigan,
Ann Arbor, MI 48109;
Department of Industrial and
Operations Engineering,
University of Michigan,
Ann Arbor, MI 48109
e-mail: jackhu@umich.edu

1Corresponding author.

Manuscript received April 23, 2016; final manuscript received August 5, 2016; published online September 29, 2016. Assoc. Editor: Radu Pavel.

J. Manuf. Sci. Eng 139(1), 011014 (Sep 29, 2016) (11 pages) Paper No: MANU-16-1244; doi: 10.1115/1.4034592 History: Received April 23, 2016; Revised August 05, 2016

## Abstract

The shapes of machined surfaces play a critical role affecting powertrain performance, and therefore, it is necessary to characterize the shapes with high resolution. State-of-the-art approaches for surface shape characterization are mostly data-driven by interpolating and extrapolating the spatial data but its precision is limited by the density of measurements. This paper explores the new opportunity of improving surface shape prediction through considering the similarity of multiple similar manufacturing processes. It is a common scenario when the process of interest lacks sufficient data whereas rich data could be available from other similar-but-not-identical processes. It is reasonable to transfer the insights gained from other relevant processes into the surface shape prediction. This paper develops an engineering-guided multitask learning (EG-MTL) surface model by fusing surface cutting physics in engineering processes and the spatial data from a number of similar-but-not-identical processes. An iterative multitask Gaussian process learning algorithm is developed to learn the model parameters. Compared with the conventional multitask learning, the proposed method has the advantages in incorporating the insights on cutting force variation during machining and is potentially able to improve the prediction performance given limited measurement data. The methodology is demonstrated based on the data from real-world machining processes in an engine plant.

<>

## Introduction

The control of surface shape variation on a machined surface is of great importance in automotive powertrain manufacturing because such variation has a significant influence on the sealing performance and causes distortion during surface assembly, such as those between engine heads and blocks or upper and lower transmission valve bodies. Characterization of the surface shape with high precision has become critical in controlling surface variations to ensure the functional performance of surface assembly.

Various high-resolution surface measurement systems have been developed, including profilometers and laser holographic interferometers. However, the acquisition of high-resolution measurements using those systems is time-consuming and expensive. Moreover, such measurements may not be robust in response to environmental disturbance. For example, the measurement accuracy based on a laser holographic interferometer system can be affected by the heat dissipated from a machined part, measurement table vibration, and surface contamination, thus resulting in data loss and measurement errors, as shown in Fig. 1. More recent advancement in surface sensing technology is reviewed by Willomitzer et al. [1]. However, fast and high-resolution surface metrology systems have not been widely adopted in quality control for powertrain production.

Various approaches have been developed to make fine-resolution evaluation of the entire surface shape based on low or multiresolution surface measurements. These approaches include interpolating and extrapolating surface shapes from measured points through least squares [2], B-spline methods [3,4], and grid fit through triangulation [5]. In spatial statistics, the method is to estimate the surface shape by considering spatial correlations among sampled points. Such correlations reflect the spatial similarities between data in the vicinity on the part surface, and have been extensively utilized to interpolate and extrapolate surface data for form error estimation in machining processes [68], modeling of spatial wafer variations [9,10], and geographic understanding in remote sensing applications [11,12]. One limitation with these data-driven approaches is that the precision of the surface predictions is constrained by the density of measurements. To deal with the limitation, process knowledge can be integrated with surface measurements to cost-effectively model surface shapes [13,14]. The prediction improvement was achieved by fusing multiresolution measurements with cutting force dynamics during surface manufacturing. In summary, state-of-the-art surface modeling techniques mostly focus on the interpolation/extrapolation of the data and process information from one single manufacturing process. Additionally, most of the data-driven methods lack engineering insights.

This paper will explore the new opportunities of improving surface shape modeling by considering the commonality between multiple manufacturing processes. The idea is motivated by a common manufacturing scenario when the process of interest lacks sufficient data, whereas rich data could be available from other similar-but-not-identical processes. For instance, an automaker plans to build a new engine plant where there is a lack of historical data to establish a baseline for surface quality control. In the meantime, other engine plants have been running for decades, and therefore, the relevant historical surface data are extensively available. Though the face milling processes in the existing plants were developed for different types of engines, the cutting mechanism for the same materials could be very similar, and thus, those process data can partially contribute to the prediction of the surface shapes in the new plant (Fig. 2(a)). A similar scenario applies to a parallel machining station where multiple machines produce the same type of parts in parallel (Fig. 2(b)). Surface data from the parallel machines can be utilized to jointly learn the surface shape models for all machines.

It should be noted that the current multitask learning for Gaussian processes is data-driven and does not incorporate the engineering insight into the model prediction. For example, our prior research has uncovered a positive correlation between surface height and effective material removal rate [20,21] through cutting force dynamics and spatial data modeling. Incorporation of the cutting force variation induced surface modeling can help to improve surface shape prediction. The paper aims to improve surface shape prediction by fusing cutting force variation modeling with surface measurement data from a number of similar machining processes via iterative multitask Gaussian process learning. We first establish a surface model consisting of a global trend that is induced by cutting force dynamics and a local variation term that can be characterized by a zero-mean Gaussian process. The global trend captures the overall shape variation across the surface, and can be well explained by the process physics [13]. An iterative multitask Gaussian process learning algorithm is then developed to learn the parameters from the global trend and the local variation. The algorithm will be demonstrated based on engine head deck face machining processes.

The remainder of this paper is organized as follows. First, the surface model is presented in Sec. 2. Section 3 presents a case study to compare the proposed modeling approach with existing methods. Then, Sec. 4 discusses the selection of hyperparameters, effects of the sample size and the number of tasks, and the effect of the correlation strength between the machining process variable and surface height. Finally, Sec. 5 concludes the paper.

## Surface Shape Modeling by Integrating Cutting Force Variation With Multitask Learning

In this section, the engineering-guided multitask learning (EG-MTL) surface model which integrates engineering physics with multitask learning is first presented, and then, an iterative algorithm is developed to estimate the parameters of the surface model.

###### Engineering-Guided Multitask Learning Surface Model.

Traditional multitask learning approaches for Gaussian processes are data-driven, and do not incorporate engineering physics into the model. This paper develops a new EG-MTL surface model that integrates cutting force variation modeling with multitask learning, aiming to improve the surface prediction accuracy based on limited measurement data. Assume there are m similar-but-not-identical surfaces or surface manufacturing processes, and for surface l, the surface model takes the following form in the equation given below: Display Formula

(1)$Zl(s)=f(Ul(s);βl)+ηl(s)+ε$

where l is the surface index, and $l=1,…,m; s=(x,y)$ is the coordinate vector, $Zl(s)$ is the surface height at location s in surface l, $Ul(s)$ is a covariate vector of the location-dependent highly correlated process variables identified by expert knowledge, $f(·)$ is a generic function capturing the correlation between the global trend of the surface and process specific engineering factors, e.g., material removal rate, $ηl(s)$ is a zero-mean Gaussian process, and $ε∼N(0,ψ2)$.

In model (1), the surface variation is decomposed into a global trend that is induced by process settings and a local variation part that is modeled by a zero-mean Gaussian process. The global trend is modeled by a linear combination of multiple process variables. When there are both categorical and continuous process variables, a generalized linear model may be used instead [22,23]. The local variation is modeled as a zero-mean Gaussian process, and will be jointly learned across multiple similar-but-not-identical surfaces.

The rationale behind this decomposition is as follows. First, model $f(Ul(s);βl)$ is employed to capture the relationship between the process inputs and the surface height in face milling processes. It is reported that the surface height variation in a face milling process is strongly influenced by a number of engineering factors [13,20,21], including (1) the product/surface design that characterizes the design patterns of a surface, e.g., size, shape, and spatial distribution of holes and slots, (2) physical attributes of part materials, such as the defects and heterogeneous physical attributes caused by manufacturing flaws from suppliers, (3) manufacturing process conditions, such as feed rate, spindle tilt, spindle speed, depth of cut, cutter path, and clamping force, and (4) multistage interdependence that characterizes the effects of downstream stages on the surface shapes created in the upstream stages. For details, please refer to Ref. [13]. The functional form of $f(·)$ should be determined based on a thorough understanding of the process physics. For instance, in a face milling process, the global trend can be approximated by a linear term [20,13], i.e., $f(U(s);β)=[I,MRR(s)]β$, where I is a column vector of ones, MRR(s) is a column vector and represents the material removal rate (MRR) at location s, and $β=[β0,β1]T$ is a 2 × 1 vector. This linear representation reveals the fact that higher axial cutting force variations due to MMR variation may result in large displacement between the surface and the cutter, causing significant surface height variations. Second, Gaussian process has been proven to be an effective method to model the local random surface variation that exhibits the spatial distribution similarity [13,24,25].

Note: The task similarity/relatedness should be carefully evaluated prior to applying model (1). However, the definition of task relatedness is unclear [26]. In the applications where expert knowledge is available, task relatedness may be defined based on the feedback of experts [26,27]. In machining processes, model (1) may be applied to surfaces machined using the same machine or parallel machines in a manufacturing plant. When expert knowledge is limited or unavailable, a preliminary study can be conducted to evaluate the benefits brought by using MTL.

###### Iterative Algorithm for Parameter Estimation of the EG-MTL Model.

The parameters in model (1) that need to be estimated include: (1) the parameter vector in the global trend, $βl$, and (2) the parameters in the zero-mean Gaussian process, $ηl(s)$, where $l=1,…,m$. The global trend describes the overall spatial variation pattern on a surface, and is mostly determined by the process settings. Thus, $βl$ reflects the overall surface variation and is not location dependent, but it varies with different surfaces. The local variation part shares similarity across multiple surfaces, and multitask learning can potentially improve the estimation accuracy. Figure 4 illustrates the scheme for learning the parameters. As indicated by Fig. 4, three information sources are fused in the spatial model, namely, (1) surface measurement from the process of interest, (2) cutting force variation, and (3) measurement data from other similar-but-not-identical processes. A summary of the difference between the proposed model and existing approaches is given below.

• Traditional multitask learning approaches only employ information from Eqs. (1) and (3), and do not incorporate surface cutting physics;

• Simple kriging interpolation adopts data source (1) only; and

• More advanced kriging approaches, e.g., co-kriging and kriging with external drift (KED), jointly utilize information sources (1) and (2), but cannot transfer knowledge from other relevant processes.

In the remainder of this section, we first review the multitask learning algorithm for the joint learning of multiple zero-mean Gaussian processes, and then, present an iterative algorithm for estimating the parameters in model (1).

A number of multitask learning algorithms are available for Gaussian processes in the literature, such as Refs. [18,17]. The performance of different approaches depends on the data structure. Without losing generality, this study reviews the approach developed by Yu et al. [18] to illustrate the concept.

The goal of multitask learning for Gaussian processes is to estimate m related functions ηl, l = 1,…, m, based on training data represented by the following equation: Display Formula

(2)$Dl=(Sl,ηl)$

where $Sl∈ℝnl×d$ is a set of the inputs for function l, $ηl∈ℝnl$ is a set of the function values at these inputs, and nl is the size of training data for task l. In surface modeling, the location is generally determined by two coordinates (x and y), and thus d = 2. It is further assumed that there are in total n distinct data points in ${Dl}$ with $min({nl})≤n≤∑l=1mnl$.

The commonality among different tasks is defined via the following inductive model [18].

EG-MTL Model: Let $ηl$ be the values of ηl on a set S, satisfying $∪Sl⊂S$. Given the hyperprior distribution of $μα$ and $Cα$ described by a normal-inverse-Wishart distribution as shown by the following equation, the Gaussian process for task l is generated by the following three steps: Display Formula

(3)$p(μα,Cα)=N(μα|0,1πCα)IW(Cα|τ,κ−1)$

1. (1)$μα, Cα$ are generated once using Eq. (3)
2. (2)For each function ηlDisplay Formula
(4)$αl∼N(μα,Cα)$
3. (3)Given $s∈Sl$Display Formula
(5)$ηl(s)=∑i=1nαilκ(si,s)+ε$
where $κ(·,·)$ is the base kernel function; $s∈S$ ; and $ε∼N(0,σ2)$.

It should be noted that the model parameters follow different distributions (Eq. (4)) in different tasks, and the parameters of the distributions follow a common hyper prior distribution (Eq. (3)). The hyper prior distribution reflects the commonality of tasks.

The estimates of $Θ={μα,Cα,σ2}$ can be obtained using an expectation–maximization (EM) algorithm [18], and the details of the EM procedure are presented in the Appendix. After obtaining $Θ$ and ${α̂l}$ with the EM algorithm, the function value corresponding to an unknown input su from task l is estimated using the following equation: Display Formula

(6)$η̂l(su)=∑i=1nα̂ilκ(si,su)$

where the prediction is calculated as a linear combination of $κ(si,su)$, and $si∈S$ includes both the training data from task l and these from other tasks. The kernel function $κ(·,·)$ is a measure of the “distance” or similarity between two inputs.

One critical limitation of this approach is that the surface means must be accurately estimated and removed prior to conducting multitask learning, because it is based on the assumption that the Gaussian processes have zero means. However, estimation of the global means with limited training data is often not sufficiently accurate. The prediction results can be misleading when the zero-mean assumption does not hold. This paper utilizes a linear model to characterize the global trend with expert process knowledge. The estimation is expected to be more accurate, and the zero-mean assumption for residual local variation will be valid.

There are two types of model parameters in model (1), i.e., coefficients in the linear model, $βl$, and Gaussian process parameters. Simultaneously estimating both types of parameters is challenging as the changes in one part directly impacts the estimation of the other. To address this issue, an iterative algorithm is developed for the model estimation. The algorithm is illustrated by Fig. 5. For m similar-but-not-identical surfaces, the global trend and the Gaussian processes are estimated in an iterative manner, and this process is terminated when the coefficients in the global trend converge. In this procedure, the superscript j (j = 0, 1,…) specifies the iteration index, and the subscript l (l = 1,…, m) indicates the task/part/surface number. The procedure is further explained as follows.

As an initialization action, the Gaussian process part and the coefficient vector in the global trend are both set to be zero for all surfaces, as shown by Eq. (7). In general, selection of the initial values do not affect the prediction performance of the algorithm but will affect the convergence speed. Display Formula

(7)$η̂l0(s)=0β̂l0=0$

where l = 1,…, m.

In the jth iteration, the Gaussian process part estimated from (j – 1)th iteration, $η̂lj−1(s)$, is first removed from the surface height for all surfaces, as shown by Eq. (8). Then, robust linear regression is applied to estimate the coefficient vector $β̂l$ for l = 1,…, m. The model of the global trend for surface l at the jth iteration is given by Eq. (9). Display Formula

(8)$μ̂lj(s)=Zl(s)−η̂lj−1(s)$

where Zl(s) is the height of the lth surface; $η̂lj−1(s)$ is the estimated Gaussian process for lth surface at the (j – 1)th iteration; and l = 1,…, mDisplay Formula

(9)$μ̂lj(s)=Ul(s)β̂lj$

where l = 1,…, m.

There are mainly three categories of robust regression methods for linear models: (1) least squares alternatives [28], (2) parametric alternatives [29], and (3) unit weights [30]. This study adopts the method presented in Ref. [28], which applies iteratively reweighted least squares for robust regression. This method has been implemented in some popular computing softwares, such as r and matlab. [31] provided a comprehensive overview of the popular robust methods.

After the robust linear regression, the convergence test is conducted to check whether the coefficients of the linear models converge. The change of the coefficient vector is defined by Eq. (10). The convergence criterion is given by Eq. (11). The convergence is considered to be achieved if the changes of the coefficient vectors for all surfaces are within a predetermined threshold Display Formula

(10)$Δβ̂lj=β̂lj−β̂lj−1$

where $β̂lj$ and $β̂lj−1$ are the coefficient vectors from iterations j and j – 1, respectively; and l = 1,…, m. Display Formula

(11)$‖Δβ̂lj‖<ε0, ∀l∈{l,…,m}$

where ε0 is a predetermined threshold and can be tuned based on applications and the accuracy requirement.

If convergence is achieved at the jth iteration, the parameter estimation is finished, and the prediction results at the (j – 1)th iteration are the final results of the algorithm. If convergence is not achieved, multitask Gaussian process learning (MTGPL) is then conducted to jointly estimate $η(s)′s$.

The detailed implementation of multitask learning for Gaussian processes is illustrated by Fig. 6. Specifically, the first step is to assign the hyper-prior parameters, including τ and π in the Normal-inverse-Wishart distribution as described by Eq. (3) and to determine the base kernel function form κ(⋅,⋅). A common practice is to apply cross-validation to select the best settings from a set of candidate parameters and function forms [18,32]. The second step is to estimate the model and perform predictions, and this step follows the procedure reviewed in Sec. 2.2.1.

The computational complexity of the EG-MTL model is $O(k1k2mn3)$, where k1 is the number of iterations for EG-MTL parameter estimation, k2 is the number of EM iterations for MTL, m is the number of tasks, and n is the training sample size. Here, the computational complexity of robust linear regression is neglected, because the MTL estimation is more dominant. The proposed method is less efficient than Gaussian process multitask learning (GPMTL), the complexity of which is $O(k2mn3)$ [18], and simple kriging. But it needs to be pointed out that GPMTL and simple kriging generally yield larger prediction errors than the EG-MTL model, which will be shown by the case study in Sec. 3 and the discussion in Sec. 4. To overcome the computational challenge, we will discuss the economic sample size in Sec. 4.2. In practice, a tradeoff between the computational efficiency and prediction performance can be obtained by conducting a prior study that is similar with that in Sec. 4.2.

It should be noted that inappropriate initial settings may lead to undesirable results, e.g., slow convergence or nonconvergence and unsatisfactory prediction accuracy. Several practical suggestions are provided as follows to improve the EM efficiency and guarantee the acquisition of the global maximum:

• random restart (starting with different randomly selected initial parameter values)

• simulated annealing [33,34]

• monitoring the log-likelihood trace, and changing the initial values when (1) fluctuation occurs or (2) convergence is not achieved within a certain number of iterations

## Case Study

This section compares the proposed engineering-guided multitask learning surface model (denoted as “EG-MTL model” afterwards) with some representative interpolation methods which have been applied popularly, including the multitask learning for Gaussian processes (referred to as GPMTL), simple kriging, and kriging with external drift (KED). A summary of these methods is given in Table 1.

GPMTL applies the algorithm described in Sec. 2.2.1. However, this algorithm is only applicable to zero-mean Gaussian processes [18]. The global mean is estimated using the sample mean from the training data. For surface l, l = 1,…, m, the global mean is estimated using the following equation: Display Formula

(12)$μ̂l,0=Z¯(sl,o)=∑s∈SlZ(s)nl$

where sl,o represents the observed locations in surface l. After removing the global mean, the residuals η(s)'s are jointly estimated using the MTGPL algorithm in Sec. 2.2.1.

KED is a generalized case of kriging where the overall trend is modeled as a function of some auxiliary predictors [35,36]. Linear models may be applied to capture the linear correlations between the target variable and the auxiliary predictors [36]. In the interpolation problem discussed in this study, a linear model is proposed since there is a strong correlation between the surface height Z and the process variables U. This interpolation technique incorporates cutting force variation but do not employ the similarity between the process of interest and other similar-but-nonidentical processes.

A laser holographic interferometer was adopted at a Ford engine plant to measure engine block deck surfaces, and the obtained surface data from this plant will be used throughout this section. The root mean squared error (RMSE) will be used as a metric of the prediction performance, and its resolution is given by Eq. (13). Another metric of interest is the improvement percentage of RMSE from the simple kriging method to the EG-MTL model or MTGPL, which is defined by Eq. (14)Display Formula

(13)$RMSE(l)=∑s∈S(Ẑl(s)−Zl(s))2|Gl|$

where l = 1,…, m is the task/surface number, $Gl$ is a set of the locations where prediction is made, and $|Gl|$ denotes the size of $Gl$, i.e., test sample size for surface l. Display Formula

(14)$Δ+RMSEkriging=RMSEkriging−RMSEEG-MTLRMSEkriging×100%$

where $Δ+RMSEkriging$ is the improvement percentage of RMSE from simple kriging to the EG-MTL model; and $RMSEkriging$ and $RMSEEG-MTL$ are the RMSEs of simple kriging and the proposed EG-MTL model, respectively.

Similarly, a metric is defined for the RMSE improvement from the KED interpolation to the EG-MTL model, which can be viewed as a quantification of the benefits brought by multitask learning when expert knowledge is adopted. The metric is given by the following equation: Display Formula

(15)$Δ+RMSEKED=RMSEKED−RMSEEG-MTLRMSEKED×100%$

where $Δ+RMSEKED$ is the improvement percentage of RMSE from KED to the EG-MTL model; and $RMSEKED$ and $RMSEEG-MTL$ are the prediction RMSEs of KED and the EG-MTL model, respectively.

The measurement data of three engine head surfaces are utilized to validate the proposed modeling approach. The part information is summarized by Table 2. The test data of each part consists of approximately 32,600 points, and 50 points are randomly sampled as training data. Here, the random sampling aims to generate data mimicking measurements from a low-resolution metrology system. It is reasonable to sample the data that characterizes major variation features based on process knowledge (e.g., the surface height variation between cylinder bores). Our method is then applied to the downsampled data, i.e., training data, for prediction and validation. Figure 7 shows an engine head surface example, where the height is represented by color, namely, dark-scale shading indicates small value and bright-scale shading corresponds to large value; and dots represent sampling locations. The machined surfaces all exhibit a high-low-high-low trend along the horizontal direction; however, these patterns are not identical. For each part, a covariate, U, is simulated to have a correlation of around 0.80 with the surface height, Z. Additionally, in this case study and Sec. 4, the unit of the measurement data and the prediction error is μm. We will omit the unit from now on for the sake of simplicity. The value of ε0 in the convergence criterion (Eq. (11)) is chosen to be 0.1 and remain the same throughout the case study.

The comparative results are shown by Fig. 8. The average RMSEs of the EG-MTL model, GPMTL, simple kriging, and KED over three parts are 3.30, 4.31, 4.71, and 4.14, respectively. It is seen that for all three parts, the EG-MTL model performs the best, and the simple kriging obtains the worst prediction accuracy. The KED model yields smaller RMSEs than MTGPL and simple kriging, and this result implies that the incorporation of cutting force variation modeling is beneficial. Hence, by incorporating a highly correlated covariate, which can be identified using engineering understanding of the process physics, the EG-MTL model is able to significantly improve the prediction accuracy over existing interpolation techniques. When such knowledge is not available, the data-driven model reviewed in Sec. 2.2.1 can be used, because GPMTL outperforms simple kriging, as indicated by Fig. 8.

## Discussion

This section discusses the following three issues: (1) hyperparameter selection, (2) effects of the sample size and the number of tasks, and (3) the effect of the correlation strength between the covariate and the surface height.

###### Hyperparameter Selection.

It was found by a preliminary study that a quadratic exponential kernel function performed well in modeling surface height, and we will use this kernel form in this study. Quadratic exponential kernel functions are popularly used in spatial statistics, and the general form is given by the following equation: Display Formula

(16)$κ(s,s′)=exp(−‖s−s′‖2δ2)$

where δ2 is an unknown parameter and will be determined later.

Three hyperprior parameters need to be determined prior to the modeling, i.e., τ and π from the normal-inverse-Wishart distribution, as well as δ2 from the base kernel function. An experiment is conducted to investigate the effects of the hyper parameters using a part of the engine surface. Figure 9 displays an example of the surface, where circles indicate sampling location. A summary of the experimental setting is given as Table 3. In this study, the number of tasks and the training sample size are fixed as 5 and 20, respectively; the test sample size is about 11,000; the candidate parameter values vary from 0.0001 to 151 (in total 36 values); and all possible combinations (in total 363 = 46,656) were tested.

The average RMSE for simple kriging is 6.91, and the best parameter combination found for multitask learning is $[τ,π,δ2]=[0.0001,1,111]$ with a prediction error of 2.87, which is much smaller than that of simple kriging. Figure 10 shows the 1D RMSE trends versus τ, π, and δ2. Figure 11 displays the RMSE trends when changing the pairs of [τ,π], [τ,δ2], and [π,δ2].

Figure 10(a) shows that as τ increases, the RMSE first decreases sharply and then increases slowly, meaning that the prediction performance first improves significantly and then degrades. A similar pattern is observed for π, as displayed by Fig. 10(b). From Fig. 10(c), it is noted that when increasing δ2, the RMSE first decreases, and then increases slowly, meaning that the prediction performance does not change significantly when δ2 is at a relatively high level, e.g., δ2 > 50. These patterns can be reconfirmed by the 3D RMSE plots which are shown by Figs. 11(a)11(c). It is concluded that

1. (1)In general, smaller τ and π along with larger δ2 yield better prediction performance.
2. (2)In practice, it is suggested to choose τ and π to be around 1, and δ2 to be between 50 and 100.

The selection of optimal hyperparameter is mostly determined by the spatial variation pattern. When implementing the proposed modeling approach in practice, a cross-validation study may be conducted using the training data to find appropriate values.

###### Effects of the Sample Size and the Number of Tasks.

The learning performance of multitask learning can be sensitive to the sample size and the number of tasks. Therefore, it is important to analyze the effects of these factors in practice. In this section, three experiments are conducted to investigate the effects of the sample size and the number of tasks. A summary of the experimental settings is given by Table 4.

In the first experiment, the number of tasks is fixed at 5, and the sample size varies from 10 to 300. Ten runs are repeated for every sample size, and the sampling locations are randomly selected for each run. The experiment results are illustrated by Fig. 12. It is seen that as the sample size increases, the prediction performance of multitask learning and kriging both improve, and Δ+RMSE first increases, then keeps stable, and shows a sign of decreasing in the end. Further, increasing the sample size may not be economically necessary and potentially cause overfitting issue. It is also noticed that the sampling of data will influence the prediction performance. When the experiments were repeated, the training data were randomly generated each time, and the performance variation is reflected by the boxplots in Fig. 12.

The second experiment fixes the sample size as 50 for all parts, and varies the number of tasks from 3 to 10. The results are shown by Fig. 13. As the increase of the number of tasks, Δ+RMSE first increases quickly, and then keeps stable after the number of tasks reaches 17.

In the third experiment, both the number of tasks and the sample size are changed. The number of tasks takes the values of {3, 4,…, 10}, and the sample size varies as {10, 20,…, 150}. Figure 14(a) displays the 3D trend of RMSE when varying the sample size and the number of tasks. Clearly, the RMSE decreases with greater sample size and number of tasks. Figure 14(b) shows the 3D plot of Δ+RMSE, and it is indicated that the multitask learning outperforms simple kriging in all cases, and Δ+RMSE increases as the increase of sample size and number of tasks.

The above experimental results lead to the following conclusions:

1. (1)Multitask learning is able to improve the prediction accuracy over simple kriging;
2. (2)The prediction accuracy improves as the increase of the training sample size for both multitask learning and kriging;
3. (3)The prediction performance improves when the number of tasks increases for multitask learning;
4. (4)Δ+RMSE becomes bounded when the sample size and the number of tasks are relatively large.

Intuitively, when the training sample size is sufficient, single-task learning, which is simple kriging in this application, will be able to learn the spatial model very well, and the benefits brought by transferring knowledge thus become limited. Moreover, when the number of tasks increases, duplicate information may exist across tasks, and the improvement becomes less significant.

###### Effect of the Correlation Strength.

To investigate the influence of $ρ(U,Z)$ on the estimation accuracy, we vary the strength of $ρ(U,Z)$, and compare the performance of the EG-MTL model, MTGPL, simple kriging, and the KED method. The comparative results are illustrated using Figs. 15(a) and 15(b), which show the trends of RMSE and Δ+RMSE when changing the correlation strength, respectively. In Figs. 15(a) and 15(b), the results corresponding to each correlation strength are averaged over three parts. The following observations are made from these results.

1. (1)The performance of KED is strongly influenced by the correlation strength. When the correlation between U and Z is not sufficiently strong, e.g., $ρ(U,Z)<0.75$, its prediction accuracy is worse than simple kriging. When $ρ(U,Z)$ becomes stronger, the prediction accuracy of KED improves, and outperforms simple kriging. Also, the RMSE improvement from KED to the EG-MTL model decreases as $ρ(U,Z)$ increases, indicating that the benefits introduced by multitask learning become more limited.
2. (2)The prediction accuracy using the EG-MTL model is significantly affected by the correlation strength, i.e., as $ρ(U,Z)$ increases, RMSE for the EG-MTL model decreases and Δ+RMSE increases. When the correlation is relatively weak, e.g., less than 0.5, the introduction of such a covariate will degrade the modeling performance.
3. (3)The proposed engineering guided GPMTL outperforms simple kriging in all cases, which means that when expert knowledge on the process is not available and highly correlated variables cannot be identified, the adoption of multitask learning is still beneficial compared with single task learning.

## Conclusion

The accomplishments of this paper can be summarized as follows:

1. (1)An improved surface shape modeling approach by incorporating engineering insights with multitask learning for Gaussian processes. The uniqueness of the developed model is that information on both the cutting force variation modeling and spatial data from other similar-but-not-identical processes is fused in one framework. The model is advantageous in predicting surface height at unobserved locations over the traditional approaches, such as simple kriging, kriging with external drift, and multitask learning for zero-mean Gaussian processes. The approach is validated using the data from real-world machining processes in an engine plant.
2. (2)An iterative algorithm for estimating model parameters. A novel algorithm is developed to iteratively estimate the parameters in the global trend and the local variation represented by zero-mean Gaussian processes. Parameters are updated in each iteration and the procedure is terminated once the convergence criterion is met. Moreover, practical guidelines are proposed to expedite the EM optimization process.
3. (3)Systematic studies on the hyperparameter selection and the effects of training data size. Response surfaces of the prediction performance are created for the model hyperparameters, the number of tasks, and the training sample size.

The proposed engineering integration model is expected to improve the surface prediction accuracy when limited measurement data are available, thus enabling cost-effective fine-resolution surface characterization.

Future research will be focused on (1) variable selection for the iterative engineering guided GPMTL model and (2) task selection for multitask learning. First, the variables that reflect multidisciplinary process information may take various types and formats. These variables may not be equally important for improving the surface shape prediction or they may contain duplicate information. Thus, a variable selection algorithm is needed to identify the most relevant variables to be incorporated in the surface model. Second, a systematic approach is desirable to select the tasks that are used for multitask learning. Some algorithms have been presented in the literature to select tasks based on the quantification of the similarity among tasks. However, the selection may be more complicated in the context of the proposed integration model, and nontrivial efforts are needed to develop a systematic way for the task selection.

## Acknowledgements

This research has been supported by an NSF GOALI Grant No. CMMI-1434411.

## Appendices

###### Appendix

The estimates of $Θ={μα,Cα,σ2}$ can be obtained using the following EM algorithm [18]. Detailed derivation of the algorithm is available in Ref. [18].

• E-step: Estimate the expectation and covariance of $α, l=1,…,m$, given the current $Θ$. Display Formula

(A1)$α̂l=(1σ2κlTκl+Cα−1)−1(1σ2κlTηl+Cα−1μα)$
Display Formula
(A2)$Cαl=(1σ2κlTκl+Cα−1)−1$
where $κl∈ Rnl×n$ is the base kernel κ(⋅,⋅) evaluated between Sl and S.

• M-step: Optimize $Θ={μα,Cα,σ2}$. Display Formula

(A3)$μα=1π+m∑l=1mα̂l$
Display Formula
(A4)$Cα=1τ+m×{πμαμαT+τκ−1+∑l=1mCαl+∑l=1m[α̂l−μα][α̂l−μα]T}$
Display Formula
(A5)$σ2=1∑l=1mnl∑l=1m‖ηl−κα̂l‖2+tr[κlCαlκlT]$
where tr(⋅) is the trace operator.

## References

Willomitzer, F. , Ettl, S. , Arold, O. , and Häusler, G. , 2013, “ Flying Triangulation—A Motion—Robust Optical 3d Sensor for the Real-Time Shape Acquisition of Complex Objects,” AIP Conference Proceeding, Vol. 1537, pp. 19–26.
Zhu, X. , Ding, H. , and Wang, M. Y. , 2004, “ Form Error Evaluation: An Iterative Reweighted Least Squares Algorithm,” ASME J. Manuf. Sci. Eng., 126(3), pp. 535–541.
Yang, B.-D. , and Menq, C.-H. , 1993, “ Compensation for Form Error of End-Milled Sculptured Surfaces Using Discrete Measurement Data,” Int. J. Mach. Tools Manuf., 33(5), pp. 725–740.
Grove, D. M. , Woods, D. C. , and Lewis, S. M. , 2004, “ Multifactor b-Spline Mixed Models in Designed Experiments for the Engine Mapping Problem,” J. Qual. Technol., 36(4), pp. 380–391.
D'Errico, J. R. , 2005, “ Surface Fitting Using Gridfit,” The MathWorks, Inc., Natick, MA,
Yang, T.-H. , and Jackman, J. , 2000, “ Form Error Estimation Using Spatial Statistics,” ASME J. Manuf. Sci. Eng., 122(1), pp. 262–272.
Xia, H. , Ding, Y. , and Wang, J. , 2008, “ Gaussian Process Method for Form Error Assessment Using Coordinate Measurements,” IIE Trans., 40(10), pp. 931–946.
Du, S. , and Fei, L. , 2016, “ Co-Kriging Method for Form Error Estimation Incorporating Condition Variable Measurements,” ASME J. Manuf. Sci. Eng., 138(4), p. 041003.
Bao, L. , Wang, K. , and Jin, R. , 2014, “ A Hierarchical Model for Characterising Spatial Wafer Variations,” Int. J. Prod. Res., 52(6), pp. 1827–1842.
Plumlee, M. , Jin, R. , Roshan Joseph, V. , and Shi, J. , 2013, “ Gaussian Process Modeling for Engineered Surfaces With Applications to SI Wafer Production,” Stat, 2(1), pp. 159–170.
Wang, G. , Gertner, G. , and Anderson, A. , 2005, “ Sampling Design and Uncertainty Based on Spatial Variability of Spectral Variables for Mapping Vegetation Cover,” Int. J. Remote Sens., 26(15), pp. 3255–3274.
Xiao, X. , Gertner, G. , Wang, G. , and Anderson, A. B. , 2005, “ Optimal Sampling Scheme for Estimation Landscape Mapping of Vegetation Cover,” Landscape Ecol., 20(4), pp. 375–387.
Suriano, S. , Wang, H. , Shao, C. , Hu, S. J. , and Sekhar, P. , 2015, “ Progressive Measurement and Monitoring for Multi-Resolution Data in Surface Manufacturing Considering Spatial and Cross Correlations,” IIE Trans., 47(10), pp. 1033–1052.
Zhao, H. , Jin, R. , Wu, S. , and Shi, J. , 2011, “ PDE-Constrained Gaussian Process Model on Material Removal Rate of Wire Saw Slicing Process,” ASME J. Manuf. Sci. Eng., 133(2), p. 021012.
Caruana, R. , 1997, “ Multitask Learning,” Mach. Learn., 28(1), pp. 41–75.
Lawrence, N. D. , and Platt, J. C. , 2004, “ Learning to Learn With the Informative Vector Machine,” 21st International Conference on Machine Learning, ACM, p. 65.
Bonilla, E. V. , Chai, K. M. , and Williams, C. , 2007, “ Multi-Task Gaussian Process Prediction,” Advances in Neural Information Processing Systems, pp. 153–160.
Yu, K. , Tresp, V. , and Schwaighofer, A. , 2005, “ Learning Gaussian Processes From Multiple Tasks,” 22nd International Conference on Machine Learning, ACM, pp. 1012–1019.
Schwaighofer, A. , Tresp, V. , and Yu, K. , 2004, “ Learning Gaussian Process Kernels via Hierarchical Bayes,” Advances in Neural Information Processing Systems, pp. 1209–1216.
Nguyen, H. T. , Wang, H. , and Hu, S. J. , 2013, “ Characterization of Cutting Force Induced Surface Shape Variation in Face Milling Using High-Definition Metrology,” ASME J. Manuf. Sci. Eng., 135(4), p. 041014.
Nguyen, H. T. , Wang, H. , Tai, B. L. , Ren, J. , Hu, S. J. , and Shih, A. , 2016, “ High-Definition Metrology Enabled Surface Variation Control by Cutting Load Balancing,” ASME J. Manuf. Sci. Eng, 138(2), p. 021010.
Nelder, J. A. , and Baker, R. , 1972, “ Generalized Linear Models,” Encyclopedia of Statistical Sciences, Wiley, New York, NY.
Fahrmeir, L. , Kneib, T. , Lang, S. , and Marx, B. , 2013, “ Generalized Linear Models,” Regression, Springer, Berlin, pp. 269–324.
Stein, A. , and Corsten, L. , 1991, “ Universal Kriging and Cokriging as a Regression Procedure,” Biometrics, pp. 575–587.
Atkinson, P. , Webster, R. , and Curran, P. , 1992, “ Cokriging With Ground-Based Radiometry,” Remote Sensing of Environment, 41(1), pp. 45–60.
Chai, K. M. , 2010, “ Multi-Task Learning With Gaussian Processes,” Ph.D. dissertation, University of Edinburgh, Edinburgh, Scotland.
Ghosn, J. , and Bengio, Y. , 2003. “ Bias Learning, Knowledge Sharing,” IEEE Transactions on Neural Networks, 14(4), pp. 748–765. [PubMed]
Holland, P. W. , and Welsch, R. E. , 1977, “ Robust Regression Using Iteratively Reweighted Least-Squares,” Commun. Stat. Theory Methods, 6(9), pp. 813–827.
Lange, K. L. , Little, R. J. , and Taylor, J. M. , 1989, “ Robust Statistical Modeling Using the t Distribution,” J. Am. Stat. Assoc., 84(408), pp. 881–896.
Wainer, H. , and Thissen, D. , 1976, “ Three Steps Towards Robust Regression,” Psychometrika, 41(1), pp. 9–34.
Maronna, R. , Martin, D. , and Yohai, V. , 2006, Robust Statistics, Wiley, Chichester.
Huang, S. , Li, J. , Chen, K. , Wu, T. , Ye, J. , Wu, X. , and Yao, L. , 2012, “ A Transfer Learning Approach for Network Modeling,” IIE Trans., 44(11), pp. 915–931. [PubMed]
Kirkpatrick, S. , Gelatt, Jr., C. D. , and Vecchi, M. P. , 1983, “ Optimization by Simulated Annealing,” Science, 220(4598), pp. 671–680. [PubMed]
Brooks, S. P. , and Morgan, B. J. , 1995, “ Optimization Using Simulated Annealing,” J. R. Stat. Soc. (The Statistician), 44(2), pp. 241–257.
Hudson, G. , and Wackernagel, H. , 1994, “ Mapping Temperature Using Kriging With External Drift: Theory and an Example From Scotland,” Int. J. Climatol., 14(1), pp. 77–91.
Bourennane, H. , King, D. , and Couturier, A. , 2000, “ Comparison of Kriging With External Drift and Simple Linear Regression for Predicting Soil Horizon Thickness With Different Sample Densities,” Geoderma, 97(3), pp. 255–271.
View article in PDF format.

## References

Willomitzer, F. , Ettl, S. , Arold, O. , and Häusler, G. , 2013, “ Flying Triangulation—A Motion—Robust Optical 3d Sensor for the Real-Time Shape Acquisition of Complex Objects,” AIP Conference Proceeding, Vol. 1537, pp. 19–26.
Zhu, X. , Ding, H. , and Wang, M. Y. , 2004, “ Form Error Evaluation: An Iterative Reweighted Least Squares Algorithm,” ASME J. Manuf. Sci. Eng., 126(3), pp. 535–541.
Yang, B.-D. , and Menq, C.-H. , 1993, “ Compensation for Form Error of End-Milled Sculptured Surfaces Using Discrete Measurement Data,” Int. J. Mach. Tools Manuf., 33(5), pp. 725–740.
Grove, D. M. , Woods, D. C. , and Lewis, S. M. , 2004, “ Multifactor b-Spline Mixed Models in Designed Experiments for the Engine Mapping Problem,” J. Qual. Technol., 36(4), pp. 380–391.
D'Errico, J. R. , 2005, “ Surface Fitting Using Gridfit,” The MathWorks, Inc., Natick, MA,
Yang, T.-H. , and Jackman, J. , 2000, “ Form Error Estimation Using Spatial Statistics,” ASME J. Manuf. Sci. Eng., 122(1), pp. 262–272.
Xia, H. , Ding, Y. , and Wang, J. , 2008, “ Gaussian Process Method for Form Error Assessment Using Coordinate Measurements,” IIE Trans., 40(10), pp. 931–946.
Du, S. , and Fei, L. , 2016, “ Co-Kriging Method for Form Error Estimation Incorporating Condition Variable Measurements,” ASME J. Manuf. Sci. Eng., 138(4), p. 041003.
Bao, L. , Wang, K. , and Jin, R. , 2014, “ A Hierarchical Model for Characterising Spatial Wafer Variations,” Int. J. Prod. Res., 52(6), pp. 1827–1842.
Plumlee, M. , Jin, R. , Roshan Joseph, V. , and Shi, J. , 2013, “ Gaussian Process Modeling for Engineered Surfaces With Applications to SI Wafer Production,” Stat, 2(1), pp. 159–170.
Wang, G. , Gertner, G. , and Anderson, A. , 2005, “ Sampling Design and Uncertainty Based on Spatial Variability of Spectral Variables for Mapping Vegetation Cover,” Int. J. Remote Sens., 26(15), pp. 3255–3274.
Xiao, X. , Gertner, G. , Wang, G. , and Anderson, A. B. , 2005, “ Optimal Sampling Scheme for Estimation Landscape Mapping of Vegetation Cover,” Landscape Ecol., 20(4), pp. 375–387.
Suriano, S. , Wang, H. , Shao, C. , Hu, S. J. , and Sekhar, P. , 2015, “ Progressive Measurement and Monitoring for Multi-Resolution Data in Surface Manufacturing Considering Spatial and Cross Correlations,” IIE Trans., 47(10), pp. 1033–1052.
Zhao, H. , Jin, R. , Wu, S. , and Shi, J. , 2011, “ PDE-Constrained Gaussian Process Model on Material Removal Rate of Wire Saw Slicing Process,” ASME J. Manuf. Sci. Eng., 133(2), p. 021012.
Caruana, R. , 1997, “ Multitask Learning,” Mach. Learn., 28(1), pp. 41–75.
Lawrence, N. D. , and Platt, J. C. , 2004, “ Learning to Learn With the Informative Vector Machine,” 21st International Conference on Machine Learning, ACM, p. 65.
Bonilla, E. V. , Chai, K. M. , and Williams, C. , 2007, “ Multi-Task Gaussian Process Prediction,” Advances in Neural Information Processing Systems, pp. 153–160.
Yu, K. , Tresp, V. , and Schwaighofer, A. , 2005, “ Learning Gaussian Processes From Multiple Tasks,” 22nd International Conference on Machine Learning, ACM, pp. 1012–1019.
Schwaighofer, A. , Tresp, V. , and Yu, K. , 2004, “ Learning Gaussian Process Kernels via Hierarchical Bayes,” Advances in Neural Information Processing Systems, pp. 1209–1216.
Nguyen, H. T. , Wang, H. , and Hu, S. J. , 2013, “ Characterization of Cutting Force Induced Surface Shape Variation in Face Milling Using High-Definition Metrology,” ASME J. Manuf. Sci. Eng., 135(4), p. 041014.
Nguyen, H. T. , Wang, H. , Tai, B. L. , Ren, J. , Hu, S. J. , and Shih, A. , 2016, “ High-Definition Metrology Enabled Surface Variation Control by Cutting Load Balancing,” ASME J. Manuf. Sci. Eng, 138(2), p. 021010.
Nelder, J. A. , and Baker, R. , 1972, “ Generalized Linear Models,” Encyclopedia of Statistical Sciences, Wiley, New York, NY.
Fahrmeir, L. , Kneib, T. , Lang, S. , and Marx, B. , 2013, “ Generalized Linear Models,” Regression, Springer, Berlin, pp. 269–324.
Stein, A. , and Corsten, L. , 1991, “ Universal Kriging and Cokriging as a Regression Procedure,” Biometrics, pp. 575–587.
Atkinson, P. , Webster, R. , and Curran, P. , 1992, “ Cokriging With Ground-Based Radiometry,” Remote Sensing of Environment, 41(1), pp. 45–60.
Chai, K. M. , 2010, “ Multi-Task Learning With Gaussian Processes,” Ph.D. dissertation, University of Edinburgh, Edinburgh, Scotland.
Ghosn, J. , and Bengio, Y. , 2003. “ Bias Learning, Knowledge Sharing,” IEEE Transactions on Neural Networks, 14(4), pp. 748–765. [PubMed]
Holland, P. W. , and Welsch, R. E. , 1977, “ Robust Regression Using Iteratively Reweighted Least-Squares,” Commun. Stat. Theory Methods, 6(9), pp. 813–827.
Lange, K. L. , Little, R. J. , and Taylor, J. M. , 1989, “ Robust Statistical Modeling Using the t Distribution,” J. Am. Stat. Assoc., 84(408), pp. 881–896.
Wainer, H. , and Thissen, D. , 1976, “ Three Steps Towards Robust Regression,” Psychometrika, 41(1), pp. 9–34.
Maronna, R. , Martin, D. , and Yohai, V. , 2006, Robust Statistics, Wiley, Chichester.
Huang, S. , Li, J. , Chen, K. , Wu, T. , Ye, J. , Wu, X. , and Yao, L. , 2012, “ A Transfer Learning Approach for Network Modeling,” IIE Trans., 44(11), pp. 915–931. [PubMed]
Kirkpatrick, S. , Gelatt, Jr., C. D. , and Vecchi, M. P. , 1983, “ Optimization by Simulated Annealing,” Science, 220(4598), pp. 671–680. [PubMed]
Brooks, S. P. , and Morgan, B. J. , 1995, “ Optimization Using Simulated Annealing,” J. R. Stat. Soc. (The Statistician), 44(2), pp. 241–257.
Hudson, G. , and Wackernagel, H. , 1994, “ Mapping Temperature Using Kriging With External Drift: Theory and an Example From Scotland,” Int. J. Climatol., 14(1), pp. 77–91.
Bourennane, H. , King, D. , and Couturier, A. , 2000, “ Comparison of Kriging With External Drift and Simple Linear Regression for Predicting Soil Horizon Thickness With Different Sample Densities,” Geoderma, 97(3), pp. 255–271.

## Figures

Fig. 4

Illustration of the EG-MTL scheme

Fig. 5

Flowchart for the iterative algorithm

Fig. 3

Fig. 2

Potential knowledge transfer in manufacturing applications: (a) new plant and (b) low production station

Fig. 1

Data loss in high-resolution measurement

Fig. 6

Fig. 9

Engine surface example for the hyperparameter study

Fig. 7

Fig. 8

RMSE comparison for the EG-MTL model, GPMTL, and kriging

Fig. 10

Effects of hyperparameters on RMSE: (a) RMSE versus τ, (b) RMSE versus π, and (c) RMSE versus δ2

Fig. 11

Effects of hyperparameter pairs on RMSE: (a) RMSE versus (τ, π), (b) RMSE versus (τ, δ2), and (c) RMSE versus (π, δ2)

Fig. 12

Effects of sample size on the prediction performance: (a) RMSE and (b) Δ+RMSE

Fig. 13

Effects of number of tasks on the prediction performance: (a) RMSE and (b) Δ+RMSE

Fig. 14

Effects of sample size and number of tasks on the prediction performance: (a) RMSE and (b) Δ+RMSE

Fig. 15

Effects of ρ(U, Z) on the prediction performance: (a) RM SE versus ρ(U, Z) and (b) Δ+RMSE versus ρ(U, Z)

## Tables

Table 3 Experimental setting for the hyperprior parameter study
Table 1 Method summary for the case study
Table 2 Summary of part information
Table 4 Summary of experimental settings for the effects of the sample size and the number of tasks

## Discussions

Some tools below are only available to our subscribers or users with an online account.

### Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related Proceedings Articles
Related eBook Content
Topic Collections