Abstract

Efficient human–robot collaboration during physical interaction requires estimating the human state for optimal role allocation and load sharing. Machine learning (ML) methods are gaining popularity for estimating the interaction parameters from physiological signals. However, due to individual differences, the ML models might not generalize well to new subjects. In this study, we present a convolution neural network (CNN) model to predict motor control difficulty using surface electromyography (sEMG) from human upper limb during physical human–robot interaction (pHRI) task and present a transfer learning approach to transfer a learned model to new subjects. Twenty-six individuals participated in a pHRI experiment where a subject guides the robot's end-effector with different levels of motor control difficulty. The motor control difficulty is varied by changing the damping parameter of the robot from low to high and constraining the motion to gross and fine movements. A CNN network with raw sEMG as input is used to classify the motor control difficulty. The CNN's transfer learning approach is compared against Riemann geometry-based Procrustes analysis (RPA). With very few labeled samples from new subjects, we demonstrate that the CNN-based transfer learning approach (avg. 69.77%) outperforms the RPA transfer learning (avg. 59.20%). Moreover, we observe that the subject's skill level in the pre-trained model has no significant effect on the transfer learning performance of the new users.

1 Introduction

Physical human–robot interaction (pHRI) is becoming a crucial part of many industrial applications such as assembly, welding, painting, etc., where dexterous human capabilities can be leveraged along with the precision of industrial robots [1]. Recent advances in machine learning are enabling the traditional robots to adapt to human operators by considering the operator's intention, cognitive and physical state to ensure safe and efficient collaboration [2]. For such adaptive decision algorithms, physiological data such as electromyography, eye-tracking, and electroencephalography (EEG) form indispensable modalities [3,4]. However, learning a generalized decision algorithm using physiological signals is still challenging because of individual differences and temporal changes in physiological signals even within the same subject. In such scenarios, transfer learning approaches are a viable choice [5,6]. This work explores such a transfer learning approach to adapt a learned motor difficulty classification model to new subjects in a physical human–robot interaction experiment.

Robotic compliance and human sensory feedback are the key aspects of pHRI necessary to adapt the robotic system to new users and tasks and also reduce the chances of physical fatigue induced due to repetitive movements [7]. Some popular control strategies for compliance control are impedance and admittance control [8,9]. These control strategies enable a stiff actuator equipped with position or force sensors to exhibit compliant behavior by rendering virtual dynamics. In such controllers, the robot compliance is rendered using the virtual inertia, damping, or stiffness parameters. One can achieve the desired behavior by tuning these parameters appropriately.

Most importantly, the desired values of these virtual parameters depend on the task type and the contact dynamics between the human and robot [10,11]. For example, increasing the virtual damping improves movement accuracy but requires more human effort whereas decreasing the damping facilitates a low-effort collaboration but that deteriorates the fine movement (FM) accuracy [10]. In addition to the task type, contact dynamics play a major role in the stability of interaction, for instance, the robot becomes unstable while interacting with a stiff environment or even if the operator increases the grasp pressure [12,13]. Therefore, an adaptive control strategy is required to adjust to different users or tasks [11,14].

One of the primary modalities considered in adaptive robotic control strategy is the physiological signals such as surface electromyography (sEMG) and EEG [10,15,16]. Among different physiological signals, sEMG can help extract the contact dynamics information as it is more directly related to the human limb stiffness [10,12]. However, the usage of physiological signals such as sEMG to adjust robot control parameters is not straightforward due to the low signal-to-noise ratio and task dependence [17]. Along with task dependency, physiological information is also subject dependent [18]. The data recorded from the same patient at a different time under identical experimental conditions exhibit non-negligible differences [19]. This inter-subject variability makes it difficult for classification algorithms to learn features that generalize well across different subjects.

Recently, deep learning approaches are gaining huge interest due to the vast amount of data available and their success in movement detection, gesture classification, and intent detection [2023]. For instance, a recent review by Faust et al. [20] provides the superiority of convolution neural networks (CNNs) over the conventional machine learning algorithms in gesture classification using sEMG data. However, even deep learning approaches fail to generalize across subjects due to individual differences. To overcome this issue, researchers use transfer learning techniques in which a model trained on a specific domain can be adapted to a new domain by retraining only a few parameters of the network [5,2427].

This paper explores a deep transfer learning approach to adapt a learned classification model to new subjects. For this purpose, we designed a physical human–robot interaction experiment in which the users perform fine and gross movements (GMs) by guiding an admittance-controlled robot. The damping parameter of the admittance controller is varied between predefined high and low values. Thus, the experiment has two factors with two levels, high/low for damping and fine/gross for task type. During the interaction, we collect the participant's sEMG from the forearm and use it for offline analysis to characterize the interaction into three categories. Based on the predicted category, the robotic system can choose to increase, decrease, or maintain the same level of virtual damping to ensure fluid interaction between the human and the robot. We compare two approaches: a feature-based approach and a deep CNN approach. The CNN architecture consists of two linear transformations and three convolution operations with a log-softmax output and no fully connected layers, inspired by Schirrmeister et al. [28] and Passalis et al. [29]. A total of 26 subjects participated in the experiment. We trained a base classifier on 10 subjects and performed inter-subject transfer learning, using only 10% of the new subject's data. We compare the performance of CNN-based transfer learning approach with Riemann geometry-based Procrustes analysis (RPA).

2 Experimental Setup

We used a six-degree-of-freedom robotic system (Schunk Powerball LWA4P) with a six-axis force/torque sensor (Weiss KMS40) as shown in Fig. 1. A handle is attached to the robot's end-effector, and its position is mapped onto a virtual environment created using the CoppeliaSim simulator. Visual feedback is provided to the human using a screen placed in front of the human. During the experiment, sEMG data from an individual's forearm are recorded using a Myo armband (Thalmic labs) consisting of eight electrodes.

Fig. 1
Experimental setup and the block diagram of human–robot interaction for performing the visuo-motor task
Fig. 1
Experimental setup and the block diagram of human–robot interaction for performing the visuo-motor task
Close modal

A total of four tasks are designed with varying levels of effort and task difficulty, i.e., low or high levels of damping and gross or fine movements for the task type as shown in Fig. 2. The selection of four tasks with two factors results in a full factorial design, and the participants perform these four tasks in a randomized order.

Fig. 2
Robot tracked position plotted across all four tasks: (a) FM-HD, (b) FM-LD, (c) GM-HD, and (d) GM-LD
Fig. 2
Robot tracked position plotted across all four tasks: (a) FM-HD, (b) FM-LD, (c) GM-HD, and (d) GM-LD
Close modal

An admittance control strategy converts the human applied force (measured using the force/torque sensor) into the respective Cartesian space velocities by simulating virtual dynamics at the end-effector. Two virtual parameters, mass and damping, govern the dynamics of the end-effector. Here, virtual mass is a diagonal matrix Diag([3, 3, 3, 0.1, 0.1, 0.1]) and is kept constant throughout all four tasks, while the virtual damping is switched between two levels: high damping (HD, Diag([80, 80, 80, 10, 10, 10])) and low damping (LD, Diag([20, 20, 20, 6, 6, 6])). More detailed information on the admittance control implementation can be found in Ref. [15].

In addition to the damping levels, the task difficulty is varied by constraining the motion of hand to fine (Figs. 2(a) and 2(b)) and gross movements (Figs. 2(c) and 2(d)). A GM task allows freehand motion around two points in the workspace (denoted by bold circles). In contrast, a FM task requires the motion to be confined to the inner and outer boundaries (denoted by bold lines) separated by an average distance of 17.5 mm. As a result, GM and FM resemble a horizontal eight shape and a star shape, respectively (Fig. 2). Therefore, subjects perform four tasks as demonstrated in Fig. 2 by combining the task type (GM and FM) with the damping level (HD and LD), case 1: FM-HD, case 2: FM-LD, case 3: GM-HD, and case 4: GM-LD.

Note that the robot is more sensitive to the grasping pressure while performing fine movements in the LD setting [13] and, thus, causes movement oscillations as observed in the position data of case 2 in Fig. 2. The stability can be enhanced by increasing the damping to a level such that the controller can attenuate high-frequency force components and make the controller more stable. However, the HD setting demands more physical effort to manipulate the robot compared to the LD setting. Therefore, the controller should adjust the damping (HD or LD) to appropriate levels during physical human–robot interaction.

2.1 Human Subject Study.

Twenty-six subjects (age group 23–34 years, all right-handed) participated in the study (Fig. 3). The experiments were conducted after obtaining approval regarding the setup and procedure from the university's Institutional Review Board (IRB# 030-801361). All the participants were recruited from the University at Buffalo School of Engineering and Applied Sciences. Before the experiment, participants went through a trial run to familiarize themselves with the equipment and the experiment protocol. The total time for each task (a total of four tasks) was fixed to 3 min, during which the participants could traverse the shape multiple times. Each participant performed all four tasks in pseudo-randomized order with no external intervention during any of the tasks. All the subjects were asked to wear a Myo armband on the forearm of their dominant hand (Fig. 1) to record the sEMG data. The applied force and respective position data were time-synchronized with the sEMG, and all the data were recorded with a sampling rate of 150 Hz.

Fig. 3
Human subject study experiment procedure. Each task was performed for 3 min.
Fig. 3
Human subject study experiment procedure. Each task was performed for 3 min.
Close modal

3 Methods

3.1 Class Labeling.

An assistive robot should select an appropriate damping level suitable for the task (FM or GM) to enable fluid interaction between humans and robots. For instance, Memar and Esfahani [15] showed that high damping is preferred while performing precise and stable movements whereas low damping is preferred for faster and less constrained movements. The robot would demand more effort and make the task difficult if an inappropriate damping level is selected during the interaction. Therefore, we can consider cases 1 and 4 as a single category during classification. This leaves the other two categories, cases 2 and 3, with undesired damping levels. For instance, the damping should be increased when case 2 is encountered and decreased when case 3 is encountered. Therefore, we categorize the four tasks into a three-class problem with cases 1 and 4 as a single category and cases 2 and 3 as two other categories. We use the recorded sEMG data to predict each category.

The three-class classification problem is solved using two main approaches as demonstrated in Fig. 4: (1) support vector machine (SVM) classifier with Riemann features and (2) raw sEMG data with a convolutional neural network (CNN). The continuous time-series data are split into constant length windows of 1 s known as epochs. Each new 1-s epoch is obtained after sliding the window by 500 ms, which constitutes a 50% overlap between windows. Riemann features are extracted from each 1 s window, and the resultant features are used to train an SVM classifier. On the other hand, a sEMG epoch is directly used to train the CNN classifier instead of extracting features.

Fig. 4
sEMG data processing and motor difficulty classification pipeline in this study
Fig. 4
sEMG data processing and motor difficulty classification pipeline in this study
Close modal

3.2 Feature Extraction

3.2.1 Riemannian Features.

Recently, Riemann geometry has drawn a lot of attention in multivariate time-series classification [30,31]. Riemannian features are based on the covariance matrix extracted from the selected channels of a fixed window of time-series data. Barachant et al. [30] have shown that for brain computer interface applications, the mean covariance matrices for each class separate well on a Riemann manifold (RM). They obtained good classification accuracy with a simple minimum distance to the mean classifier in the manifold space. A more comprehensive review of the Riemann geometry and its applications for time-series data can be found in Ref. [32]. Spatial covariance matrices can be extracted from the sEMG data and projected onto the tangent space for classification using SVM or linear discriminant analysis [33,34]. Additionally, an RPA-based [35] transfer learning approach can be implemented on these features to transform the new oncoming data (target) and match its statistics with that of the source data. Manjunatha et al. [34] extracted covariance matrices from eight channels of the sEMG data and applied RPA to demonstrate better classification performance across new sessions.

A classifier developed on the Riemannian feature space will work efficiently if the data statistics remain the same. However, the statistics of physiological data can vary significantly across new subjects thereby decreasing the classifier's performance. To address this problem, the Riemannian features of the target dataset have to be transformed on the manifold to match the statistics of the source data, which is done using a transfer learning approach proposed by Rodrigues et al. [35] that performs an affine transformation of the target data to align with the source data and learns a new classifier on the transformed target data.

3.3 Convolution Neural Network-Based Classification.

The CNN architecture shown in Fig. 5 is inspired from Refs. [28,29]. The network consists of two linear transformations and three convolution operations with a log-softmax output and no fully connected layers. The input to the network is the raw sEMG epoch. The first two operations are linear transformations. The first convolution operation is across time, which captures temporal information and the second convolution operation is across the sEMG channels capturing spatial information. After the temporal and spatial convolution, the output is squared, average pooled, and log-transformed. The temporal and spatial convolution can be combined into a single three-dimensional (3D) convolution operation. However, splitting the 3D convolution into two 2D convolutions facilitates the study of time and spatial domain features. The above architecture is inspired by the filter bank common spatial patterns (FBCSPs) method that has been very effective in EEG/sEMG classification studies [28]. The squaring and log transformation implemented in this network is similar to trial log-variance computation in FBCSP. Figure 6 shows the trainable parameters and convolution operations of the network.

Fig. 5
Convolution neural network architecture used in this study
Fig. 5
Convolution neural network architecture used in this study
Close modal
Fig. 6
Shift-scale layers of the CNN architecture used for transfer learning
Fig. 6
Shift-scale layers of the CNN architecture used for transfer learning
Close modal

To understand the network architecture, let us consider a 1-s epoch from the raw sEMG signal En×d of size n × d where n is the number of electrodes and d is the number of data samples. In this study, each epoch's dimension is n = 8 and d = 150. For such an input, the first and second linear transformations are given in Eq. (1).

E=(EWshe¯)(Wsce~sh),Esh=EWshe¯
(1)
where e¯ (ɛRn) is the mean value calculated across each electrode and e~sh is the standard deviation of (Esh = E − Wshe¯). The ☉ represents element-wise multiplication (Hadamard product). If Wsc is the inverse covariance matrix (precision matrix), then Eq. (1) reduces to a standard z-score. After two linear operations, the input is shifted and scaled, but with no change in shape (i.e., EɛRn×d). The transformed input is then fed into the convolution layers. The kernel in the first convolution operation (temporal convolution) is of size 1 × 10 with stride 1. The number of channels for the first convolution output is 32 (consequently 32 different 1 × 10 filters). The kernel in the second layer (spatial convolution) is of size 8 × 8 (across channels) with stride 1. After the second convolution operation, the output is element-wise squared, followed by a batch normalization with an average pooling layer. The kernel size for the pooling layer is 1 × 50 with a stride of 1 × 10. The pooled output is log-transformed, followed by the final convolution operation with log-softmax output to result in the class label (C). Negative log-likelihood loss is used with Adam optimizer (learning rate = 0.001). The model is trained for 200 steps using Nvidia Titan Xp GPU and the Pytorch library.

4 Transfer Learning Approach and Baselines

4.1 Nomenclature.

We consider an inductive transfer learning approach [36] where we have access to source data S (model trained on S will be denoted as SM) and a small subset of labeled target data (Tl). The transferred model is tested on unlabeled target data Tu. Here, Tl and Tu together form the target data, i.e., Tl∪ Tu = T. We balance Tl with an equal number of samples from each category to avoid training bias, and similarly, we balance Tu to obtain unbiased testing accuracy. The balanced Tl comes from the first 15 s of cases 2 and 3, and 7.5 s each of cases 1 and 2 for the third class (see Sec. 3.1).

4.2 Transfer Learning Procedure.

This section provides the details about the data splitting scheme to generate source S and target T dataset for inter-subject transfer learning. We perform five-fold cross-validation to assess the robustness. Figure 7 shows the scheme used for training the source model and data splitting of target data. There are two ways to select the base model for inter-subject transfer learning. In the first approach, 10 random subjects are selected to train the base network (S, data) and transfer to the remaining 16 subjects. The second way is to consider the individual performance (i.e., high performers/low performers) to train the base model and transfer it to the rest of the subjects. The shift-scale base CNN model is trained on S with 70% for training, 15% for validation, and 15% for testing from 10 subjects.

Fig. 7
Data splitting scheme for CNN-based transfer learning. Out of 26 subjects, a base model is trained on randomly selected 10 subjects and transferred to the rest of the 16 subjects. Only 10% of the new subject data is used for transfer learning.
Fig. 7
Data splitting scheme for CNN-based transfer learning. Out of 26 subjects, a base model is trained on randomly selected 10 subjects and transferred to the rest of the 16 subjects. Only 10% of the new subject data is used for transfer learning.
Close modal

In inter-subject transfer learning, only the shift and scale layers of the source model (SM) are re-initialized and retrained using the labeled target data Tl whereas the weights of the convolution layers after the shifting and scaling layers are frozen (Fig. 8). Note that Tl is only from the first 15 s of the experiment (see Sec. 4.1). The rationale for choosing the shifting and scaling layers for transfer learning is as follows: the CNN architecture tends to learn feature representation in a hierarchical fashion where features extracted become progressively specific to a given task starting from first layers to final layers [37,38]. Thus, in the CNN architecture (Fig. 5), the first layers would learn generalizable shifting and scaling parameters across the subjects in source data S, and the last layers should learn task-specific features. Since the task is the same, but the subjects are different, we choose to retrain only the shift and scale layers for transfer learning. Such a transfer learning procedure results in learning subject-specific Wsh and Wsc. Another added advantage is that the shifting and scaling layers are linear. So, relearning is inexpensive and less time-consuming.

Fig. 8
Inter-subject transfer: the weights of the two linear layers (shift and scale) are randomly re-initialized and retrained on the new subject data. All other weights are frozen.
Fig. 8
Inter-subject transfer: the weights of the two linear layers (shift and scale) are randomly re-initialized and retrained on the new subject data. All other weights are frozen.
Close modal

4.3 Baselines for Comparison.

Most of the transfer learning techniques are predominantly geared towards processing the data as images [5,27,39]. To use the existing transfer learning approaches on EMG data, an additional pre-processing step is required to convert the raw EMG signals to spectrograms [40] and treat the spectrograms as images. On the contrary, our approach eliminates the initial pre-processing step and utilizes raw EMG signals to train a CNN and then applies transfer learning by fine-tuning only a small set of parameters [28,29]. In this study, we compare the transfer learning results from CNN to that of the RM features transferred to new subjects. The transfer learning approach of the covariance matrices on the Riemann manifold is known as RPA [35]. This approach requires a fully labeled source S and a partially labeled target dataset Tl and involves three major steps for transforming the covariance matrices, re-center, re-scale, and rotate. First, the source and target datasets are re-centered using their respective means covariance matrices, then the target data are scaled to match the dispersion of the source data, and finally, the target data are rotated. Following these operations, one can match the distribution of the source and target datasets and thus transfer the previously trained model to new subjects. In our previous work [34], we have shown that RPA-based transfer learning yields significant performance gain compared to the re-calibration applied on classical time domain features where the model has to be retrained completely for a new subject.

5 Results and Discussion

In this section, we initially compare the model performance for different epoch lengths and overlap percentages of the sEMG data. Then we select the best epoch length and overlap percentage to validate the inter-subject transfer learning performance of CNN and RM features.

5.1 Analysis of Hyper-Parameters.

Data preparation for the feature-based or raw sEMG (CNN) approach has two parameters: epoch length and overlap between sEMG epochs. To select the best set of parameters, we conducted a parametric study in which the epoch length is studied at 1 s and 2 s, and epoch overlap is set to 25%, 50%, and 75%. For the parametric study, we chose the pooled data of all the subjects and calculated five-fold cross-validation accuracies. The procedure is the same for both feature-based classifiers and CNN. The training time for CNN is ∼252.6 s and for SVM with Riemann features, the training time is ∼18.4 s. Note that the training time for the CNN model depends on many factors such as GPU memory, machine learning framework, and data-loading techniques.

Table 1 provides the classification accuracy of motor difficulty with different epoch lengths and 50% overlap between epochs. The classification accuracy decreases as the epoch length increases in both the feature-based and the CNN approaches, with the CNN approach performing consistently better. The decrease in classification accuracy might be because of the increase in noise within the epoch. Also, the number of data points available for training decreases.

Table 1

Classification accuracy for feature-based SVM and CNN with raw sEMG with different epoch lengths (50% overlap)

Method1-s epoch2-s epoch
SVM (RM)84.62 ± 0.9883.59 ± 2.29
CNN87.44 ± 0.2884.14 ± 0.64
Method1-s epoch2-s epoch
SVM (RM)84.62 ± 0.9883.59 ± 2.29
CNN87.44 ± 0.2884.14 ± 0.64

Table 2 provides the classification accuracy with different overlap percentages with fixed 1-s epoch length. The classification accuracy increases as the epoch overlap increases in both the feature-based and the CNN approach; however, CNN performs better than the feature-based approach. An overlap of 75% produces higher classification as the number of data points available for learning is more, but with a higher training period. On the other hand, an overlap of 25% results in less training period but with less accuracy. We chose an epoch length of 1 s with a 50% overlap to balance the training period and classification accuracy.

Table 2

Classification accuracy with different overlap (fixed 1-s epoch) using feature-based SVM and CNN classifiers

Method25%50%75%
SVM (RM)83.07 ± 2.0484.62 ± 0.9887.19 ± 0.97
CNN85.10 ± 0.2087.44 ± 0.2889.69 ± 0.14
Method25%50%75%
SVM (RM)83.07 ± 2.0484.62 ± 0.9887.19 ± 0.97
CNN85.10 ± 0.2087.44 ± 0.2889.69 ± 0.14

5.2 Convolution Neural Network Architecture Analysis.

This section provides an analysis of features at different stages of the CNN architecture (Fig. 5). Specifically, we have studied the feature after shift and scale layers to provide insights into the transfer learning procedure.

For an inter-subject transfer learning approach using the CNN, we chose to retrain the shift and scaling layers (Fig. 8). The hypothesis was that the shift and scaling layers act as normalizing layers for the new subject's data while the subsequent layers act as the fixed feature extractors (Fig. 6). In other words, the shift and scale layers try to push Tl towards the S (see Sec. 4.1 for data splitting scheme). This is because the fixed feature weights (Wsp and Wc) are trained on the S. To explore this further, we analyzed the features just after the shift and scale layers, i.e., E′ = (EWshe′) ☉ (Wscsh) for a test subject. An average cosine similarity is calculated between the Es of S (10 subjects’ data) and Et of Tl. As shown in Table 3, the cosine similarity is increasing with the number of training steps which suggests that the similarity between Tl and S is increasing. Furthermore, we also performed a t-distributed stochastic neighbor embedding (t-SNE) [41] of the Es of S and Et of Tl (subject s2). As seen in Fig. 9, the overlap between the target data and source is increasing over subsequent training steps. Thus, the shift and scale layers act as normalizing layers pushing Tl towards S distribution.

Fig. 9
t-SNE analysis of shift and scaled features between the source data and target data. p1 and p2 represent the coordinates of the two-dimensional plane onto which the data are projected using t-SNE. The features of the target data move towards the features of the source data as the transfer learning proceeds (increasing training steps).
Fig. 9
t-SNE analysis of shift and scaled features between the source data and target data. p1 and p2 represent the coordinates of the two-dimensional plane onto which the data are projected using t-SNE. The features of the target data move towards the features of the source data as the transfer learning proceeds (increasing training steps).
Close modal
Table 3

Average cosine similarity (of a subject) between source and target dataset after shifting and scaling

Epoch0153060
Avg. cosine similarity0.01480.03650.04320.0482
Epoch0153060
Avg. cosine similarity0.01480.03650.04320.0482

5.3 Transfer Learning Results.

Figure 10 provides the inter-subject transfer learning results on 16 different subjects. Transfer learning based on CNN consistently performs better than the RPA baselines (see Sec. 4.3). Out of 16 subjects, CNN-based transfer learning performs better than RPA in 13 subjects. However, the performance is not drastically different among the three subjects where RPA is better than CNN. Only 10% (or the first 15 s data of each task) has been used for the transfer learning procedure. To statistically establish the performance gain using the CNN approach, we conducted a repeated measure analysis of variance (ANOVA) test. For all the statistical tests, we have used a significance level (α) of 0.05. The test revealed that CNN performs statistically better than the RPA method (p-value 0.0002). The lower performance of RPA might be because the statistics of S and Tl are drastically different and the manifold-based transformations cannot capture it. Furthermore, as the CNN approach outperforms the RPA approach, the rest of the analysis uses only the CNN approach.

Fig. 10
Classification accuracy of 16 subjects after transfer learning. Transfer learning using CNN performs significantly better than RPA in 13 out of the 16 subjects.
Fig. 10
Classification accuracy of 16 subjects after transfer learning. Transfer learning using CNN performs significantly better than RPA in 13 out of the 16 subjects.
Close modal

We further explored whether the base model choice significantly affects the transfer learning performance. For instance, to study this hypothesis, one can select high-performing/low-performing subjects from the pool of 26 subjects to train a base model and then transfer to the rest of the subjects. To choose high and low performers from the pool of 26 participants, we have used a quantitative metric known as the instability index, IR (see  Appendix A). This metric is computed from the interaction force data recorded using the force/torque sensor. The IR index has been previously shown to capture the interaction instability and reflects the subject's motor control ability [15,42]. For calculating the instability index, we have used the case 2 scenario (see Sec. 2) as it represents the most unstable interaction with the robot (Fig. 2(b)). Note that increasing instability index signifies lower motor control ability and lower instability index indicates higher motor control ability.

Out of 26 subjects, the six subjects with the highest instability index are labeled as low performers, and five subjects with the lowest instability index as high performers. Two base models were trained using the high performers and low performers and transferred to the rest of the subjects. Again, we used a repeated measure ANOVA (with significance level 0.05) to test whether base model choice affects transfer learning performance. The test revealed no significant transfer learning performance (on the same subject) gain when the base model is switched from high performers to low performers.

Note that inter-subject transfer learning results hold good only when the labeled Tl is good enough to learn the data distribution. To further analyze this, we repeat the transfer learning with differently labeled target data (Tl) size for new subjects. Table 4 shows the average classification accuracy across new subjects when the labeled target data (Tl) size is varied from 10% to 30% (of T) in steps of 5%. The performance of both CNN-based transfer learning and RPA increases as the data available for training also increases. A probable explanation might be that as the training data increases, the classifiers can learn the data distribution better, increasing performance. However, CNN-based transfer learning outperforms RPA-based transfer learning. In addition, the current approach facilitates the transfer learning approach due to the structure of the CNN. The transfer learning approach is not possible in other classical approaches, such as SVM or random forest classifier (RFC). These algorithms (SVM/RFC) have to be retrained completely for a new subject that involves more computational time. On the contrary, the CNN transfer learning approach used in this study only requires significantly less parameters to be retrained while freezing the rest of the network parameters.

Table 4

Average (over all subjects) classification accuracy with different Tl sizes as a fraction of T

Method10%15%20%25%30%
RPA58.664.165.0569.270.5
CNN74.172.0378.279.282.1
Method10%15%20%25%30%
RPA58.664.165.0569.270.5
CNN74.172.0378.279.282.1

6 Conclusion

In this work, we classify the motor difficulty during physical human–robot interaction using sEMG. We designed a collaborative task where the human subject guides an admittance-controlled robot. The motor difficulty in guiding the robot is changed by varying the admittance controller's damping parameter from LD to HD and task type from GM to FM. Therefore, a total of four experiments were performed by each subject and the sEMG signals are recorded during the experiment using a Myo armband placed on the subject's forearm.

Based on the damping level and task type, the motor control difficulty is modeled as a three-class classification problem, where we used two approaches: an SVM classifier with Riemannian features and a convolution neural network with raw sEMG. The results demonstrated that CNN outperforms the SVM classifier with Riemannian features. However, both approaches perform poorly while classifying the new subject's data. This is mainly because of the individual differences among subjects.

To overcome this issue, we used a transfer learning technique, where we partially trained a pre-trained CNN model by freezing the weights and re-initializing only a few layers of the network. We used 10% of the new data (first 15 s of the experiment) for training, and the remaining 90% for testing. We demonstrated that the inter-subject classification accuracy significantly increased when we use transfer learning, and the CNN-based transfer learning outperformed the other transfer method. We also demonstrated that the skill level of subjects considered for training the base model does not have a significant effect on the transfer to new users. This is particularly useful for adapting the robot control strategy to new users involved in the workforce training program.

In the future, we will include a real-time adaptation strategy for the robot based on motor difficulty detected using the three-class classification model's output. However, such direct use of classification should also explore the appropriate update rate of control strategy as more frequent updates might result in an unstable interaction.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Appendix A: Analysis of Instability Index

In this study, we used a metric known as the instability index (IR) to quantify interaction stability. IR index provided in Eq. (A1) is a dimensionless metric derived from the power spectral density (PSD) of the measured force signal. This metric has been previously used in Refs. [15,42] to characterize the haptic stability during pHRI.
IR=ω=ωcωNP(ω)/ω=ω0ωNP(ω)
(A1)

Here, ω0, ωN, and ωc denote the lowest, cutoff, and the Nyquist frequencies, respectively. The cutoff frequency is selected as 3 Hz based on the maximum frequency of voluntary upper limb movements. P(ωi) denotes the PSD corresponding to a frequency, ωi, of the signal. The value of the IR index is in the range of (0, 1) as it is the ratio of power above cutoff frequency to the total power. Higher value of IR corresponds to more power in the high-frequency region and thus, an indication of higher instability. Therefore, IR index can be used as a metric to recognize high and low performers in the experiment.

References

1.
Maurtua
,
I.
,
Ibarguren
,
A.
,
Kildal
,
J.
,
Susperregi
,
L.
, and
Sierra
,
B.
,
2017
, “
Human–Robot Collaboration in Industrial Applications: Safety, Interaction and Trust
,”
Int. J. Adv. Robot. Syst.
,
14
(
4
), p.
1729881417716010
.
2.
Vaughan
,
N.
,
Gabrys
,
B.
, and
Dubey
,
V. N.
,
2016
, “
An Overview of Self-Adaptive Technologies Within Virtual Reality Training
,”
Comput. Sci. Rev.
,
22
, pp.
65
87
.
3.
Kim
,
J.
,
Campbell
,
A. S.
,
de Ávila
,
B. E.-F.
, and
Wang
,
J.
,
2019
, “
Wearable Biosensors for Healthcare Monitoring
,”
Nat. Biotechnol.
,
37
(
4
), pp.
389
406
.
4.
Reilly
,
R. B.
, and
Lee
,
T. C.
,
2010
, “
Electrograms (ECG, EEG, EMG, EOG)
,”
Technol. Health Care
,
18
(
6
), pp.
443
458
.
5.
Côté-Allard
,
U.
,
Fall
,
C. L.
,
Drouin
,
A.
,
Campeau-Lecours
,
A.
,
Gosselin
,
C.
,
Glette
,
K.
,
Laviolette
,
F.
, and
Gosselin
,
B.
,
2019
, “
Deep Learning for Electromyographic Hand Gesture Signal Classification Using Transfer Learning
,”
IEEE Trans. Neural Syst. Rehabil. Eng.
,
27
(
4
), pp.
760
771
.
6.
Ameri
,
A.
,
Akhaee
,
M. A.
,
Scheme
,
E.
, and
Englehart
,
K.
,
2019
, “
A Deep Transfer Learning Approach to Reducing the Effect of Electrode Shift in EMG Pattern Recognition-Based Control
,”
IEEE Trans. Neural Syst. Rehabil. Eng.
,
28
(
2
), pp.
370
379
.
7.
Aune
,
T. K.
,
Ingvaldsen
,
R.
, and
Ettema
,
G.
,
2008
, “
Effect of Physical Fatigue on Motor Control at Different Skill Levels
,”
Percept. Motor Skills
,
106
(
2
), pp.
371
386
.
8.
Hogan
,
N.
,
1985
, “
Impedance Control: An Approach to Manipulation: Part I—Theory
,”
J. Dyn. Syst. Meas. Control
,
107
(
1
), pp.
1
7
.
9.
Calanca
,
A.
,
Muradore
,
R.
, and
Fiorini
,
P.
,
2015
, “
A Review of Algorithms for Compliant Control of Stiff and Fixed-Compliance Robots
,”
IEEE/ASME Trans. Mechatron.
,
21
(
2
), pp.
613
624
.
10.
Grafakos
,
S.
,
Dimeas
,
F.
, and
Aspragathos
,
N.
,
2016
, “
Variable Admittance Control in pHRI Using EMG-Based Arm Muscles Co-Activation
,”
2016 IEEE International Conference on Systems, Man, and Cybernetics
,
Budapest, Hungary
,
Oct. 9–12
, IEEE, pp.
001900
001905
.
11.
Gopinathan
,
S.
,
Ötting
,
S. K.
, and
Steil
,
J. J.
,
2017
, “
A User Study on Personalized Stiffness Control and Task Specificity in Physical Human–Robot Interaction
,”
Front. Robot. AI
,
4
, pp.
5
20
.
12.
Bian
,
F.
,
Ren
,
D.
,
Li
,
R.
, and
Liang
,
P.
,
2018
, “
Improving Stability in Physical Human–Robot Interaction by Estimating Human Hand Stiffness and a Vibration Index
,”
Ind. Robot.
,
46
(
4
), pp.
529
540
.
13.
Jujjavarapu
,
S. S.
, and
Esfahani
,
E. T.
,
2019
, “
Stiffness Based Stability Enhancement in Human–Robot Collaboration
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Anaheim, CA
,
Aug. 18–21
, p. V05AT07A019.
14.
Keemink
,
A. Q.
,
van der
,
K. H.
, and
Stienen
,
A. H.
,
2018
, “
Admittance Control for Physical Human–Robot Interaction
,”
Int. J. Robot. Res.
,
37
(
11
), pp.
1421
1444
.
15.
Memar
,
A. H.
, and
Esfahani
,
E. T.
,
2018
, “
EEG Correlates of Motor Control Difficulty in Physical Human–Robot Interaction: A Frequency Domain Analysis
,”
2018 IEEE Haptics Symposium
,
San Francisco, CA
,
Mar. 25–28
, IEEE, pp.
229
234
.
16.
Novak
,
D.
,
Beyeler
,
B.
,
Omlin
,
X.
, and
Riener
,
R.
,
2014
, “
Workload Estimation in Physical Human–Robot Interaction Using Physiological Measurements
,”
Interact. Comput.
,
27
(
6
), pp.
616
629
.
17.
Nazmi
,
N.
,
Abdul Rahman
,
M.
,
Yamamoto
,
S.-I.
,
Ahmad
,
S.
,
Zamzuri
,
H.
, and
Mazlan
,
S.
,
2016
, “
A Review of Classification Techniques of EMG Signals During Isotonic and Isometric Contractions
,”
Sensors
,
16
(
8
), p.
1304
.
18.
Seashore
,
R. H.
,
1930
, “
Individual Differences in Motor Skills
,”
J. Gen. Psychol.
,
3
(
1
), pp.
38
66
.
19.
Taborri
,
J.
,
Palermo
,
E.
,
Masiello
,
D.
, and
Rossi
,
S.
,
2017
, “
Factorization of EMG Via Muscle Synergies in Walking Task: Evaluation of Intra-Subject and Inter-Subject Variability
,”
2017 IEEE International Instrumentation and Measurement Technology Conference
,
Torino, Italy
,
IEEE
, pp.
1
6
.
20.
Faust
,
O.
,
Hagiwara
,
Y.
,
Hong
,
T. J.
,
Lih
,
O. S.
, and
Acharya
,
U. R.
,
2018
, “
Deep Learning for Healthcare Applications Based on Physiological Signals: A Review
,”
Comput. Methods Programs Biomed.
,
161
, pp.
1
13
.
21.
Gao
,
Y.
,
Hendricks
,
L. A.
,
Kuchenbecker
,
K. J.
, and
Darrell
,
T.
,
2016
, “
Deep Learning for Tactile Understanding From Visual and Haptic Data
,”
2016 IEEE International Conference on Robotics and Automation
,
Stockholm, Sweden
,
May 16–21
,
IEEE
, pp.
536
543
.
22.
Hu
,
Y.
,
Wong
,
Y.
,
Wei
,
W.
,
Du
,
Y.
,
Kankanhalli
,
M.
, and
Geng
,
W.
,
2018
, “
A Novel Attention-Based Hybrid CNN-RNN Architecture for sEMG-Based Gesture Recognition
,”
PLoS One
,
13
(
10
), p.
e0206049
.
23.
Manjunatha
,
H.
,
2021
, “
Addressing Stability, Transferability, and Interpretability Issues in Physical Human–Robot Interaction Using Physiological Data and Deep Learning
,”
PhD thesis
,
State University of New York at Buffalo
.
24.
Tan
,
C.
,
Sun
,
F.
,
Kong
,
T.
,
Zhang
,
W.
,
Yang
,
C.
, and
Liu
,
C.
,
2018
, “
A Survey on Deep Transfer Learning
,”
International Conference on Artificial Neural Networks
,
Rhodes, Greece
,
Oct. 4–7
,
Springer
, pp.
270
279
.
25.
Du
,
Y.
,
Jin
,
W.
,
Wei
,
W.
,
Hu
,
Y.
, and
Geng
,
W.
,
2017
, “
Surface EMG-Based Inter-Session Gesture Recognition Enhanced by Deep Domain Adaptation
,”
Sensors
,
17
(
3
), p.
458
.
26.
Li
,
Q.
,
Zhang
,
A.
,
Li
,
Z.
, and
Wu
,
Y.
,
2021
, “
Improvement of EMG Pattern Recognition Model Performance in Repeated Uses by Combining Feature Selection and Incremental Transfer Learning
,”
Front. Neurorobot.
,
15
, pp.
699174.1
699174.15
.
27.
Xiong
,
D.
,
Zhang
,
D.
,
Zhao
,
X.
, and
Zhao
,
Y.
,
2021
, “
Deep Learning for EMG-Based Human–Machine Interaction: A Review
,”
IEEE/CAA J. Autom. Sin.
,
8
(
3
), pp.
512
533
.
28.
Schirrmeister
,
R. T.
,
Springenberg
,
J. T.
,
Fiederer
,
L. D. J.
,
Glasstetter
,
M.
,
Eggensperger
,
K.
,
Tangermann
,
M.
,
Hutter
,
F.
,
Burgard
,
W.
, and
Ball
,
T.
,
2017
, “
Deep Learning With Convolutional Neural Networks for EEG Decoding and Visualization
,”
Hum. Brain Mapp.
,
38
(
11
), pp.
5391
5420
.
29.
Passalis
,
N.
,
Tefas
,
A.
,
Kanniainen
,
J.
,
Gabbouj
,
M.
, and
Iosifidis
,
A.
,
2020
, “
Deep Adaptive Input Normalization for Time Series Forecasting
,”
IEEE Trans. Neural Netw. Learn. Syst.
,
31
(
9
), pp.
3760
3765
.
30.
Barachant
,
A.
,
Bonnet
,
S.
,
Congedo
,
M.
, and
Jutten
,
C.
,
2011
, “
Multiclass Brain–Computer Interface Classification by Riemannian Geometry
,”
IEEE Trans. Biomed. Eng.
,
59
(
4
), pp.
920
928
.
31.
Pan
,
L.
,
Zhang
,
D.
,
Jiang
,
N.
,
Sheng
,
X.
, and
Zhu
,
X.
,
2015
, “
Improving Robustness Against Electrode Shift of High Density EMG for Myoelectric Control Through Common Spatial Patterns
,”
J. NeuroEng. Rehabil.
,
12
(
1
), p.
110
.
32.
Congedo
,
M.
,
Barachant
,
A.
, and
Bhatia
,
R.
,
2017
, “
Riemannian Geometry for EEG-Based Brain–Computer Interfaces; a Primer and a Review
,”
Brain Comput. Interface
,
4
(
3
), pp.
155
174
.
33.
Barachant
,
A.
,
Carmel
,
J. B.
,
Friel
,
K. M.
, and
Gupta
,
D.
,
2016
, “
Extraction of Motor Patterns From Joint EEG/EMG Recording: A Riemannian Geometry Approach
,”
Sixth International Brain-Computer Interface Meeting: BCI Past, Present, and Future
,
Monterey, CA
,
May 30–June 3
,
p. 181
.
34.
Manjunatha
,
H.
,
Jujjavarapu
,
S. S.
, and
Esfahani
,
E. T.
,
2020
, “
Classification of Motor Control Difficulty Using EMG in Physical Human–Robot Interaction
,”
2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
,
Virtual
, pp.
2708
2713
.
35.
Rodrigues
,
P. L. C.
,
Jutten
,
C.
, and
Congedo
,
M.
,
2019
, “
Riemannian Procrustes Analysis: Transfer Learning for Brain–Computer Interfaces
,”
IEEE Trans. Biomed. Eng.
,
66
(
8
), pp.
2390
2401
.
36.
Pan
,
S. J.
, and
Yang
,
Q.
,
2009
, “
A Survey on Transfer Learning
,”
IEEE Trans. Knowl. Data Eng.
,
22
(
10
), pp.
1345
1359
.
37.
Azizpour
,
H.
,
Razavian
,
A. S.
,
Sullivan
,
J.
,
Maki
,
A.
, and
Carlsson
,
S.
,
2015
, “
Factors of Transferability for a Generic Convnet Representation
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
38
(
9
), pp.
1790
1802
.
38.
Garcia-Gasulla
,
D.
,
Parés
,
F.
,
Vilalta
,
A.
,
Moreno
,
J.
,
Ayguadé
,
E.
,
Labarta
,
J.
,
Cortés
,
U.
, and
Suzumura
,
T.
,
2018
, “
On the Behavior of Convolutional Nets for Feature Extraction
,”
J. Artif. Intell. Res.
,
61
(
1
), pp.
563
592
.
39.
Bird
,
J. J.
,
Kobylarz
,
J.
,
Faria
,
D. R.
,
Ekárt
,
A.
, and
Ribeiro
,
E. P.
,
2020
, “
Cross-Domain MLP and CNN Transfer Learning for Biological Signal Processing: EEG and EMG
,”
IEEE Access
,
8
, pp.
54789
54801
.
40.
Sun
,
T.
,
Hu
,
Q.
,
Libby
,
J.
, and
Atashzar
,
S. F.
,
2022
, “
Deep Heterogeneous Dilation of LSTM for Transient-Phase Gesture Prediction Through High-Density Electromyography: Towards Application in Neurorobotics
,”
IEEE Robot. Autom. Lett.
,
7
(
2
), pp.
2851
2858
.
41.
van der
,
M. L.
, and
Hinton
,
G.
,
2008
, “
Visualizing Data Using t-SNE
,”
J. Mach. Learn. Res.
,
9
, pp.
2579
2605
.
42.
Dimeas
,
F.
, and
Aspragathos
,
N.
,
2016
, “
Online Stability in Human–Robot Cooperation With Admittance Control
,”
IEEE Trans. Haptics
,
9
(
2
), pp.
267
278
.