## Abstract

Data-driven modeling is considered as a prospective approach for many conventional physical problems including ocean applications. Among various machine learning techniques, support vector machine stands out as one of the most widely used algorithms to establish models connecting pertinent features to physical quantities of interest. This paper takes the experimental data for a fixed cylinder in shallow water as the baseline data set and explores the modeling of nonlinear wave loads by the support vector machine (SVM) regression method. Different feature and target selections are studied in this paper to establish the nonlinear mapping relations from ambient wave elevations and kinematics to nonlinear wave loads. The performance of the SVM regression model is discussed and compared with nonlinear potential flow theory focusing on the overall statistics (standard deviation and kurtosis), which is critical for fatigue and extreme statistics analysis.

## 1 Introduction

Many conventional problems in ocean engineering remain challenging due to the stochastic nature of ocean waves, viscous separated flow effects, nonlinear resonance, etc., and the combination of these factors. Traditionally, to solve these problems, we mostly would explore theoretical and numerical methods and conduct experiments to understand the physics and provide engineering solutions. While the traditional development of fluid-structure interaction models has focused on incorporating more physics to improve model accuracy and predictive capabilities, an alternative approach is gaining increased attention recently, namely, to utilize data directly [1]. Data-driven modeling is considered as a prospective approach since data from experiments, field tests, or high-fidelity simulations are mostly informative about the actual physical systems.

One of the types of physics-based problems we are interested in can be categorized as mapping highly nonlinear physical relations. Machine learning algorithms, either kernel based methods or deep neural networks, have very good generalization capability as well as statistical inference that makes them good at solving this type of problems. The exploitation of machine learning methods in physics-based problems may be found in the works of many researchers in various domains [2–4]. Artificial and deep convolutional neural networks have been applied to study flows involving marine structures with very promising results [5,6]. The literature on the application of machine learning in the fields of ocean engineering and naval architecture is growing rapidly and cannot be exhaustively reviewed in the present article.

Kernel-based methods enjoy the advantage of transparency, they come with theoretically proven performance guarantees, they lead to the solution of convex optimization problems with a unique solution which reduces to the solution of positive definite linear systems and require the tuning of just two hyperparameters via cross-validation, as discussed below. One of such challenges in ocean engineering is the modeling of nonlinear wave-excited loads for offshore structures. The modeling of hydrodynamic loads has developed over decades to model the relevant physics and predict wave-structure interaction problems. Linear frequency-domain analysis led to the development of panel-method based flow solution methods, which is one of the most popular tools because of its efficiency and reliability. However, in many scenarios, the nonlinear effects are not negligible due to large-amplitude body motions, nonlinear resonance, flow separation, etc. Having a better prediction of nonlinear wave loads is critical in order to evaluate extreme statistics and structural fatigue analysis for offshore structures [7–9]. Many researchers have published studies on the impact of nonlinear wave loads to the design, safety, and maintenance of offshore platforms [10–12].

Over the past decades, a variety of numerical methods have been developed to understand and predict nonlinear wave-structure interactions. Potential flow theory remains the most popular tool for the prediction of global responses of offshore structures, including extensions of linear theory which account for nonlinear free surface and body boundary conditions. Following the solution of the potential flow wave-structure interaction, the hydrodynamic load can be calculated either by integrating the pressure over the body wetted surface which is obtained by Bernoulli’s equation [13] or by the time rate of change of the impulse of the velocity potential which is a new methodology recently proposed by Ref. [14] as fluid impulse theory. These methods require the discretization of the body surface and some also require the discretization of the free surface, which can be very demanding leading to a delicate numerical treatment and significant computational cost. Meanwhile, depending on the free surface and body boundary conditions imposed, the numerical schemes can only take into account contributions from certain nonlinear aspects and often with limited accuracy. Therefore, establishing a direct generalized model for nonlinear wave loads from a data-driven perspective could be very prospective in terms of both accuracy and efficiency, and would further enhance its attractiveness in offshore platform design and marine operations in practice.

Among the various data-driven techniques mentioned above, support vector machine (SVM) stands out as one of the most widely used algorithms to establish models connecting pertinent features to physical quantities of interest [15,16]. Reference [4] has proposed to use support vector machines to achieve parameter identification of ship maneuvering models. The essence of SVM algorithms is to use kernels instead of an explicit set of basis functions in order to establish a more compact and generalized model based upon a set of explanatory variables or features. The selected kernel with optimized hyperparameters encodes the covariance structure between the features and forms the foundation of the learning algorithm which relates the quantity being modeled to the features, from the perspective of statistical learning theory.

In this paper, a SVM regression model is trained and validated using experiments for a fixed cylinder in shallow water. Different feature and target selections are compared and discussed in this work. Statistics of the nonlinear wave loads by the SVM regression model are compared with the original measurements demonstrating the accuracy of the model, which is critical for its use in applications involving structural analysis and design.

## 2 Least-Squares Support Vector Machines Regression

### 2.1 Mathematical Formulation.

Least-squares support vector machine (LS-SVM) [17] is the least-squares version of standard SVM, which solves a set of linear equations instead of a convex quadratic programming problem of classical SVM algorithms which minimize more general objective functions.

**= (**

*x**x*

_{1},

*x*

_{2}, …,

*x*

_{K})

^{T}

**= (**

*w**w*

_{1},

*w*

_{2}, …,

*w*

_{M})

^{T}and vector basis functions

*ϕ*(

**) = (**

*x**ϕ*

_{1}(

**),**

*x**ϕ*

_{2}(

**), …,**

*x**ϕ*

_{M}(

**))**

*x*^{T}. The number of basis functions

*M*is a priori unknown and may be infinite. The constant

*b*is the bias term. Given a sample of training data ${(xi,yi)}i=1N$, LS-SVM determines the optimal weight vector and bias term by minimizing the cost function

*R*:

*γ*is the regularization parameter which controls the trade-off between the bias and variance of the LS-SVM model.

**is the error vector,**

*e***= (**

*e**e*

_{1},

*e*

_{2}, …,

*e*

_{N})

^{T}.

^{T}.

**is the identity matrix.**

*I***= (**

*y**y*

_{1},

*y*

_{2}, …,

*y*

_{N})

^{T}. $K=(k(xi,xj))i,j=1N$ is called the kernel or Gram matrix defined by the inner product of the feature basis functions

*k*(

*x*_{i},

*x*_{j}) =

*ϕ*

^{T}(

*x*_{i})

*ϕ*(

*x*_{j}). The length of the vector

*ϕ*

^{T}(

*x*_{i}) is M and the dimensions of the square kernel matrix

**are**

*K**N*×

*N*.

**can be expressed in the form:**

*x**ϕ*

_{j}(

**) nor their number M need to be specified explicitly. All LS-SVM requires is the inner product $\varphi iT(x)\varphi j(x)$, i.e., the kernel function**

*x**k*(

*x*_{i},

*x*_{j}). This property is known as the “kernel trick” and is a key attribute of the SVM algorithm.

*a*)

*b*)

*c*)

In Eq. (8), $\u2225\u22c5\u2225$ denotes the Euclidean norm. *σ* is the “scale” that determines the width or variance of the Gaussian kernel. *d* is the degree of the polynomial kernel and t is its bias term. More generally, the value of *d* may be positive or negative, it does not need to be an integer, but its value and that of the bias must be such that the kernel (8) is positive definite [15].

Expression (7) provides an explicit nonlinear model for the dependent quantity. The summation in Eq. (7) is over the number of samples *N* used to train the SVM algorithm with the values of the sample features which appear in the second argument of the kernel. The Lagrange multipliers are obtained from the solution of the linear system (6) and are known in the SVM literature as the “support vectors.”

The regularization and kernel parameters are calibrated to optimal values via cross-validation during the training stage of the SVM nonlinear regression using a sufficiently large number of samples. Then the nonlinear models (6) and (7) leads to the simple summation of a series of transcendental functions and may be used either to generate signal forecasts from its time-lagged values or to represent complex physical quantities dependent on the selected set of features.

### 2.2 Further Discussion on General Kernel Selection.

The selection of the Gaussian kernel appears at first to be somewhat arbitrary. Moreover its connection to the set of basis functions has not yet been made explicit. Assume that the physical quantity under study has a well-defined mean and that it is otherwise oscillatory around its mean, a common occurrence in ocean applications dealing with signals that are deterministic or quasi-stationary and stochastic. In such cases appropriate basis functions would be a set of orthonormal functions in a multi-dimensional space with dimensions equal to the number of features.

The solution of the first-kind integral Eq. (8) is in principle not available in closed form nor is the a priori selection of the kernel evident. A reasonable selection of the basis functions capable of accurately describing the physical quantity under study according to (1) would be a reasonable starting point. For such a basis function set the kernel would be the generating function as indicated by the second equation in Eq. (9). This would also require knowledge of the eigenvalues. Moreover the robust performance of the LS-SVM algorithm is a consequence of the positive definite kernel which guarantees a unique solution of the optimization problem (2). Within the LS-SVM algorithm, the positive definitiveness of the kernel matrix in Eq. (6) makes available robust algorithms from linear algebra for the inversion of large linear systems that arise when a large number of training samples is necessary.

For the Gaussian kernel the solution of Eq. (9) is available in closed from in any number of dimensions. The basis functions are the generalized Hermite functions which are orthogonal over the entire real axis and are known to be a robust basis set for the representation of the wide range of sufficiently smooth functions. This is the case for the ocean applications considered in the present study.

*σ*

_{k}refers to the constant determining the scale or variance of the

*k*th feature of the Gaussian kernel (as in Eq. (8)).

Here, *H*_{n}(·) is the classical Hermite polynomial of degree n. *α*_{j} are the integral weights which are related to the global scale of the problem. *ε*_{j} are the scale parameters which are related to the local scale of the problem. *δ*_{j}, *β*_{j}, *γ*_{j} are auxiliary parameters defined in terms of *α*_{j}, *ε*_{j}. Refer to Ref. [18] for details on the derivation of (11) and (12).

This formulation of (11) and (12) allows us to select different shape parameters *ε*_{j} and different integral weights *α*_{j} for different space dimensions (i.e., K may be an anisotropic kernel), or we may assume that they are all equal (i.e., K is spherically isotropic).

The eigenvalues of the Gaussian kernel are seen in Eq. (10) to be positive therefore the matrix of the linear system (6) is positive definite. The basis functions *ϕ*_{k}(** x**) in Eq. (12) are the product of an exponential term and Hermite functions, where both are dependent on the auxiliary parameters

*α*

_{k}which must be properly selected. While these parameters do not appear explicitly in the definition of the kernel they affect the condition number of the matrix in Eq. (6). They must be properly selected to determine the rank of the matrix and in order to develop a robust inversion algorithm for large linear systems (6) that may be ill-conditioned, particularly when the scale or variance of the Gaussian kernel is large. Details on the robust inversion of (6) are presented in Ref. [18], where use is made of the analytical form of the singular value decomposition of the Gaussian kernel, discussed in the present section.

The set of Eqs. (10)–(12) underscore the popularity of the Gaussian kernel in LS-SVM applications. The reason is that the orthonormal Hermite functions are known to be a robust basis set for the approximation of a wide range functions on the entire real axis. These properties of the Gaussian kernel, along with the availability of an analytical singular value decomposition, have led to its use by the LS-SVM algorithms in wide range of problems.

In a number of LS-SVM applications, a polynomial kernel is used instead of the Gaussian. In the context of the ocean applications, this is equivalent to replacing the Gaussian in the right-hand side of Eq. (7) by a polynomial of (** x**,

*x*_{i}) which may involve linear, quadratic, cubic and higher order terms. On closer inspection of Eq. (9) this is equivalent to expanding the Gaussian kernel into a Taylor series for small values of the inverse scales $\u03f5k2$.

A polynomial representation of the physical quantity *y*(** x**) would for example be justified when developing an LS-SVM model for a viscous load in terms of the ambient flow kinematics, the Morison drag formula being an example. Another example involves the representation of the hydrodynamic derivatives in the ship maneuvering problem by a high-order polynomial of the ship kinematics. It follows from the Taylor series expansion of Eq. (9) that the polynomial kernel with an integer power d is related to the Gaussian kernel for small values of $\u03f5k2$ for some or all of the

*K*features. Therefore, the use of the polynomial kernel may be unnecessary and emphasis must instead be placed upon the proper calibration of the parameters $\u03f5k2$ for each of the

*K*features depending on the physics of the flows under study. In a number of applications, the same value of

*ε*

^{2}for all features is selected simplifying the calibration process often with very satisfactory results. In marine ocean applications, the selection of small values of $\u03f5k2$ for some features may be appropriate but not for others, leading to a kernel that is a mixture of polynomial like factors for some features and exponential factors for others. These choices can be determined by the cross-validation procedure during the training of the LS-SVM algorithm.

## 3 Baseline Dataset

In this study, we are interested in establishing SVM regression models for the prediction of the nonlinear wave loads on a fixed cylinder. To establish such a model, experimental data from a model test were used as the baseline data set. Meanwhile, the simulated nonlinear wave loads from fluid impulse theory [14] are also used in the model development as well in order to explore the merits of different feature and target selections.

### 3.1 Wave Tank Experiment Setup.

The experiment was carried out in the wave tank at the Marine Technology Center, NTNU on a bottom fixed cylinder in finite depth water. The wave tank is 28 m long, 2.5 m wide, and with a depth of 19 m. The scale ratio of the model is 1:48. The diameter of the cylinder is 6.912 m in full scale. The sketch of the model test facility is shown in Fig. 1. More details of the experiments can be found in Ref. [19].

Three sets of irregular wave cases are used in this study. The irregular wave sea states were generated using the standard Jonswap spectrum with a spectral parameter *γ*. The parameters of the sea states are summarized in Table 1. For each sea state, the experiments were repeated 20 times with independent random seeds. Each realization of the irregular wave sea states was tested and measured for approximately 3.5 h with a sampling rate 0.0346 s in full scale. During the experiments, the total horizontal force *F*_{x} and the mudline bending moment *M*_{y} were measured, which are the two physical quantities that this paper targets to model. The sketch of the loading on the fixed cylinder is shown in Fig. 2.

### 3.2 Fluid Impulse Theory.

*F*

_{x}and

*M*

_{y}(defined as in Fig. 2) for the fixed cylinder take into account the linear component, second-order contribution from the convective terms in the fluid acceleration and a waterline quadratic contribution. They take the form

*ρ*is the fluid density,

*R*is the radius of the cylinder,

*H*is the draft,

*u*

_{1},

*u*

_{3}are the ambient wave velocities in the

*x*,

*z*directions, respectively,

*ζ*

_{I}is the ambient wave elevation, and $u\u02d91$ is the ambient wave acceleration.

## 4 Support Vector Machines Regression Model of the Nonlinear Wave Loads

In this study, we aim to establish a SVM regression model for the nonlinear wave loads using the ambient wave elevations and its kinematics as features. As mentioned in Sec. 3.1, three different sea states with 20 random realizations for each sea state were tested in the experiment (see Table 1).

To compare the results, the nonlinear wave loads in the same three sea states were also simulated by fluid impulse theory. The ambient wave kinematics, velocity, and acceleration were calculated by linear wave theory using the shallow water dispersion relation. Then the simulated nonlinear wave loads were derived through Eqs. (13) and (14).

In the following sections, *F*_{x−exp}, *F*_{x−fit}, *M*_{y−exp}, *M*_{y−fit} refer to the horizontal force and bending moment measured in the experiment and simulated by fluid impulse theory, respectively. Three different combinations of features have been tested in the study (see Table 2). In order to take into account memory effects and the noncausality of the wave excitation forces, a time window of the wave kinematics of finite duration, instead of an instantaneous value, is defined as a feature. The time duration of each feature is [*t* − 15*s*, *t* + 10*s*], where *t* refers to the current time. The sampling rate within the feature window is the same as that of the sampling rate in the experiment, which is 0.0346 s in full scale. Therefore, the vector length of each feature is 25/0.0346 = 723.

As is known, the standard deviation and kurtosis are two important statistics of loads, and their magnitude is critical for fatigue analysis and the evaluation of the extreme statistics. The kurtosis of a Gaussian signal is always 3, which is the case for all linear loads and responses within linear wave-structure theory. Therefore the kurtosis of the measured signal is considered to be a measure of nonlinearity in wave/wave-structure interaction problem. The kurtosis of the measured wave loads is listed in Table 3 as a measure of the load nonlinearity.

*y*is the true value of the signal, $y^$ is the predicted value,

*σ*

_{y}is its standard deviation, and

*κ*

_{y}is its kurtosis.

We first have trained the model using the measured wave loads directly as the target. Data from realization 1 in sea state 1 are used as the training data set. The number of training samples is 3000, each sample has a duration of 25 s with a sampling rate 0.0346 s, and the samples used in the training process have been selected randomly from a long time scale of a 3-hour sea state. The sufficiently long time scale used for the selection of the features guarantees that the samples used for the training of the algorithm have accounted for most of the inherent physical information contained in the training seastate for the purpose of modeling the nonlinear load on the cylinder.

The rest of 19 realizations of sea state 1 and all the 20 realizations of sea state 2 and state 3 are used as the test data set to validate the accuracy and robustness of the model. For each seastate realization, a 3500-s long section is used for testing. The large data set used for testing ensures that the performance of the model is statistically robust and consistent. The results of the training and testing stages are shown in Figs. 3 and 4. For each test sea state, results from one of the realizations with one of the feature selections are shown here as examples. The overall error statistics are listed in Tables 4 and 5.

From these results, it can be seen that the SVM regression model has successfully captured the nonlinear physical relation between the features and the targets. The trained model using data from one sea state performs very well in the two other sea states which proves the robustness and generalization capability of the model. Furthermore, the different selection of features among ambient wave elevations or derived velocities and accelerations does not lead to a significant difference in performance. This is expected given the fact that the ambient wave elevations and derived wave kinematics are dependent via a convolution-type operation, intuitively justified by linear wave propagation theory. From the practical perspective, the ambient wave elevations are more accessible by direct measurements in experiments or simulations. In this sense, the SVM regression model effectively establishes a data-driven nonlinear transfer function between the ambient wave elevations and the nonlinear wave loads, derived from experimental measurements.

To further illustrate the performance of this model, Table 6 shows the error statistics between the simulated wave loads by fluid impulse theory (FIT) (13)—(14) and the measured experimental data. A comparison of the results of the SVM regression model with that of the FIT simulation in Tables 4–6 shows that the SVM regression model has a much better performance in predicting the overall statistics. The accurate prediction of the nonlinear wave load itself and its overall statistics (standard derivation and kurtosis) suggests that the model is a valuable and efficient tool for further use in design or simulation practice. For certain design or simulation purposes, the SVM regression model can be trained using a custom single representative sea state. Then a well trained SVM regression model can be applied to more realizations of the same sea state or other sea states for structural design or any other demanded analysis. The design of custom representative sea states for the purpose of training SVM nonlinear transfer functions will be the subject of future research.

Comparing the results in Tables 7 and 8 and Tables 4 and 5, it can be seen that if the target is the residual component after the fluid impulse theory prediction is subtracted, the performance of the SVM model in predicting the overall statistics (both standard deviation and kurtosis) is better than the case when the overall nonlinear hydrodynamics is directly modeled by the SVM algorithm, without invoking the intermediate simulation by the fluid impulse theory. Since the load components accounted for by the fluid impulse theory are the dominant linear and the second-order contributions which have a clear physical origin, by subtracting these components, the SVM regression model is able to learn more complex information from the remaining load component which is dominated by strongly nonlinear free surface and separated flow physics. It is conjectured that this the reason why fitting the residual components leads to a better predictive performance. The same process may be followed when the SVM algorithm is used to model more complex flows around marine structures. Well understood linear and second-order physics may be initially used via simulation using potential flow methods and subtracted from the overall measured nonlinear load or response being modeled. The residual load would then be modeled by the SVM algorithm along the lines of the present section. This topic will be the subject of a future study.

## 5 Conclusions and Discussion

This paper takes the experimental data for a fixed cylinder in finite depth water as the baseline dataset and considers the modeling of nonlinear wave loads by a machine learning SVM regression algorithm. The SVM regression model successfully establishes a deterministic nonlinear mapping function between the features (ambient wave kinematics in this study) to targets (nonlinear wave loads).

In essence, the kernel maps the association of the features of the target samples to those of the training samples in a hyperspace spanned by a set of basis functions which are the eigenfunctions of a positive definite kernel. The SVM algorithm predicts the target value of the quantity being modeled as a function of the kernel arguments which are the target and the features used during the training of the algorithm. Therefore, by randomly selecting a set of training samples from a representative sea state, the model achieves a very good generalization capability to predict unseen targets through the training process, combined with an optimization scheme which determines the width parameter or variance of the kernel and the regularization parameter which controls the magnitude of the SVM model error.

In this paper, different combinations of features and targets are tested. The selection of the features, either using ambient wave elevations or derived wave velocities and accelerations or using all of them, does not affect the overall performance significantly. The purpose of testing the SVM performance using three sets of features is to evaluate the significance of nonlinear effects. In ambient waves of small steepness where linear theory is expected to be valid, selecting just the wave elevation as Feature 1 should be sufficient since Features 2 and 3 may be obtained by linear convolution. In steep nonlinear ambient waves linear convolution relating the ambient wave velocities and accelerations to the ambient wave elevation record is no longer valid, therefore including Features 2 and 3 in addition to Feature 1 reveals the significance of nonlinear effects. The availability of reliable nonlinear potential flow models capable to predict significant free surface nonlinear loads, permit the more accurate modeling of the remaining residual load by the SVM algorithm. This is evidenced by the very good performance of the SVM algorithm in the most nonlinear seastate 3, illustrated in Table 8 for the bending moments, when all nonlinear Features 1, 2, and 3, eluded to above, are taken into account for the training of the algorithm.

This methodology may be extended to the study of a wide range of nonlinear wave-structure interaction problems where experimental measurements are available and potential flow models are not sufficiently accurate. In such cases, residual loads may be derived and modeled by a SVM algorithm acting as a nonlinear transfer function for strongly nonlinear free surface and separated flow effects. This SVM methodology was recently applied by Ref. [21] to the modeling of the roll viscous damping due to bilge keels on a barge. A Computational Fluid Dynamics code was in this case used to generate the imitate nature, namely a tank experiment, and simulations of the total hydrodynamic moment on the barge were generated including nonlinear free surface and viscous flow separated effects. Features representing the potential flow linear wave body interactions were identified, and a residual moment dominated by viscous flow and nonlinear free surface effects was simulated and modeled by the SVM algorithm in the form of a nonlinear transfer function. The results were very satisfactory and may be used for the modeling of the seakeeping of FPSOs and other marine structures equipped with bilge keels.

The SVM regression model developed in the present paper has utilized all 3000 samples available in the experimental measurements. It is possible that the algorithm performance may not be degraded if fewer samples than 3000 are used for the algorithm training. When fewer samples are used the magnitude of the hyperparameters will change and their dependence on the number of samples would be determined from a systematic sensitivity study, which will be the subject of future research. The seastates used for the training of the SVM algorithm have been generated in a tow tank to be stationary. The training and performance evaluation of the algorithm in nonstationary seastates requires a separate set of experimental measurements which were not available for the development of the SVM algorithm presented in the present paper.

In this study, the representative sea states in the experiments were designed for ultimate limit state tests, and the SVM regression model was established on the basis of available experiment data. However, this can be extended to more general cases if other specific targets are of interest, and specific representative sea states are carefully selected to provide samples to the SVM regression model. The design of the most appropriate representative seastate for the purpose of training a SVM algorithm for a particular application will be considered in a future study. It will enable the use of the algorithm as a nonlinear transfer function over a wide range of test sea states of interest in design practice, mitigating the need for and costs of systematic experiments.

## Acknowledgment

The authors acknowledge Equinor for providing the experiment data used in this study. This research has been supported by Equinor and the Office of Naval Research under Contract N00014-17-1-2985.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request. The authors attest that all data for this study are included in the paper. Data provided by a third party listed in Acknowledgment.