Dataset description
The experiment utilized the public bearing data set of Paderborn University (PU) dataset in Germany, as depicted in Fig. 5.

The experimental setup comprised a test motor, a measurement shaft, a bearing module, a flywheel, and a load motor, with the data set covering both artificially induced and real damage scenarios. The vibration acceleration signal was collected using piezoelectric accelerometers at a sampling frequency of 64 kHz. Bearing damage conditions were categorized into five levels, with levels 1 to 5 indicating increasing severity of damage. The data in this paper encompassed vibration data from eight different states under conditions of 900 rpm speed, 700 Nm torque, and 1000 N radial force. Time-domain vibration acceleration signals from eight samples were selected for experimental analysis. Details of the samples and tag number are presented in Table 2.
Selection rules for L,T
Before using MCKD optimization, it is necessary to determine the search range of L, T. Experimentally, the search range of L is set as [2,700]34. For T, it can be theoretically calculated by the fault characteristic frequency. The specific fault types are outer ring faults, inner ring faults, and ball faults, and the corresponding fault characteristic frequencies are \(f_{BPFO}\), \(f_{BPFI}\) and \(f_{BSF}\), as shown in Eq. (14-16):
$$\begin{aligned} & f_{BPFO} = \frac{z}{2} \times \left( 1 – \frac{d}{D} \times \cos \beta \right) \times \frac{f_r}{60} \end{aligned}$$
(14)
$$\begin{aligned} & f_{BPFI} = \frac{z}{2} \times \left( 1 + \frac{d}{D} \times \cos \beta \right) \times \frac{f_r}{60} \end{aligned}$$
(15)
$$\begin{aligned} & f_{BSF} = \frac{D}{d} \times \left( 1 – \left( \frac{d}{D} \times \cos \beta \right) ^2\right) \times \frac{f_r}{60} \end{aligned}$$
(16)
where z denotes the number of rolling balls, \(f_{r}\)stands for the shaft frequency, d, D, and \(\beta \) designates the ball diameter, pitch diameter, and bearing initial contact angle, respectively. From Eq. (4), the search range of T is as shown in Eq. (17):
$$\begin{aligned} \frac{f_{s}}{\max \left( f_{BPFO},f_{BPFI},f_{BSF} \right) } \le T \le \frac{f_{s}}{\min \left( f_{BPFO},f_{BPFI},f_{BSF} \right) } \end{aligned}$$
(17)
For the bearing data set from Paderborn University, this paper uses data with fault types consisting solely of either inner or outer ring faults. The search range for values of T is calculated from Eq. (17) to be [800,1400], without considering ball faults. Table 1 displays the experimental data for evaluating the AMOMCKD model’s performance. The NSGA-II algorithm optimized the key parameters of MCKD, which were subsequently used in the deconvolution processing of the fault vibration acceleration signal to produce a filtered signal with enhanced periodic impulse characteristics. Table 3 lists the optimal allocation combinations of L and T corresponding to different types of fault samples.
AMOMCKD deconvolution analysis
Figure 6 illustrates comparisons between filtered signal and original signal under different fault conditions using NSGA-II and PSO algorithms. To ensure fair comparisons, the fitness function of the algorithms were consistently set. The results show that the original signal demonstrates periodicity in the time domain but are greatly influenced by noise interference, leading to unclear fault impulses. After AMOMCKD filtering, the method successfully extracts the desired periodic transient pulse components, while effectively decoupling compound faults. PSO-MCKD filtering can extract pulse features of the original signal in some cases (KI18, KA22), but unexpected deformations occur in signal KB27 and KB24, such as inaccurate frequency characteristics, waveform distortion, or amplitude response offset.

Comparison of time domain waveforms: (a) original signal; (b) filter by AMOMCKD; (c) filter by PSO-MCKD.
Additionally, the comparison of bi-objective optimization results in Fig. 7 reveals that the unexpected deformations in KB27 and KB24 are caused by excessively large kurtosis indices. The situation indicates weak convergence assurance of PSO-MCKD in bi-objective optimization, which makes it difficult to effectively escape local optima. Moreover, in terms of envelope entropy, it is evident that the proposed method achieves the minimal indices. Therefore, AMOMCKD demonstrates an improved balance between kurtosis and envelope entropy indices, especially in compound fault signal.

Comparison of fitness function under different methods: (a) 1/kurtosis (b) Envelope entropy.
To further validate the effectiveness of AMOMCKD in extracting acceleration signal features, this paper analyzed by visualizing the envelope spectra of deconvolved signal. As shown in Fig. 8, the envelope spectra of filtered signal from primary inner race fault KI21 and primary outer race fault KA22 reveal fault characteristic frequencies with apparent amplitudes and distinct frequency harmonics. In contrast, the envelope spectra of original signal from secondary inner race fault KI18 and secondary outer race fault KI16 exhibited irregular fault frequency characteristics in the low-frequency range due to interference, which was successfully overcome by AMOMCKD filtering. For the primary compound fault KB27, clear fault frequency harmonics were observed in the original signal envelope spectra. However, the coupling effects between inner and outer race faults impacted the secondary compound fault KB23 and tertiary compound fault KB24, resulting in significant distortion of fault frequency characteristics in their original signal envelope spectra. Utilizing the unique deconvolution capability of the AMOMCKD method significantly attenuated the influence of non-fault-induced frequency components, enhancing the identification of fault frequencies and laying a solid foundation for subsequent model feature extraction.

Comparison of envelope spectra before and after filtering with AMOMCKD.
Experimental results and analysis
This section primarily focuses on the evaluation of the AMOMCKD-CNN model. The selected data will be randomly divided into training, validation, and test samples in a 4:1:1 ratio, with each fault type containing 400 samples and each sample having 2048 data points. The experiments are conducted on a computer running Windows 10, equipped with a multi-core 3.2 GHz Intel Corei9-12900K CPU, 64 GB of system memory (RAM), and two NVIDIA GeForce RTX 3090 graphics cards.
A batch size of 32 was set for all methods, Adam optimizer was chosen, and Cross Entropy served as the loss function. Throughout the training, the learning rate was set to 0.001, with 100 epochs. Based on K-fold cross-validation, each model underwent 10 experiments, and the average score of these 10 experiments was taken as the final score for comparative analysis. These experiments were implemented using the TensorFlow and Keras frameworks.
In terms of determining the kernel width, this paper investigated the effect of measuring the kernel width at different thresholds on diagnostic accuracy. Different \(\gamma \) values ranging from 0.1 to 0.9 were explored. These values represent different kernel widths for feature extraction across scales. Table 4 lists the fault pulse lengths extracted for different fault types with varying \(\gamma \) values. Afterward, outer race, inner race, and rolling element faults in the PU dataset were tested to compare the effects of different kernel widths on CNN classification accuracy (see Table 5), with a (3,3) square kernel used as a benchmark. Results showed a roughly 20% performance advantage of rectangular kernels over square kernels, further confirming that classical kernel sizes may not always be suitable in non-image processing domains. Figure 9 illustrates the kurtogram corresponding to each sample. The introduction of 2D kurtogram-based representations significantly improved classification performance compared to simple 2D reshaped data. The kurtogram representation yielded a mean accuracy of 85.621%, outperforming the 2D reshaped input by a notable margin. This demonstrates the kurtogram’s capability in preserving fault-related features across scales, making it a powerful tool for input preprocessing. However, the slightly lower classification accuracy of the kurtogram compared to 1D filtered data may be due to the insufficient signal length and the level being set to 4, which limits the information captured within the kurtogram.

Kurtograms of each sample in the Paderborn dataset at label 0-7 (a–h).
Upon selecting the rectangular kernel, the kernel width needed to be determined based on the \(\gamma \) value. Performance improved as \(\gamma \) decreased from 0.9 to 0.5, with the optimal performance observed at \(\gamma \) = 0.5, yielding the highest median and mean accuracies, indicating more effective capture of fault pulse information within the receptive field by the CNN. However, as \(\gamma \) decreased from 0.5 to 0.1, CNN performance gradually declined due to an increase in non-fault pulse components within the receptive field, leading to decreased accuracy. As a result, the optimal \(\gamma \) value for the design of the first kernel width was determined to be 0.5, corresponding to kernel widths of 122 for the PU dataset.
After determining the kernel width in CNN, the AMOMCKD-CNN model was compared with CEEMDAN-CNN35, VMD-CNN9, and RAW-CNN. The impact of different data processing methods on the results was explored using the same data preprocessing approach. Figure 10 illustrates the comparison among the four models with different kernel widths. The designed kernel width in this study demonstrates notable advantages in various signal processing tasks, providing further evidence to guide CNN kernel design based on the pulse width of the vibration acceleration signal.

Accuracy comparison of four methods: (a) Kernel wide = 3; (b) Kernel wide = 122.
Specifically speaking, at a kernel width of 3, CEEMDAN-CNN and VMD-CNN demonstrate excellent feature recognition among fault classes, achieving accuracies of 96.25% and 95.31%, respectively. The performance surpasses that of the RAW-CNN, although it performs poorly in normal versus fault sample classification. Notably, AMOMCKD-CNN achieves 100% accuracy in normal versus fault sample classification and outperforms the other three methods overall, demonstrating its superiority in compound fault decoupling detection. At a kernel width of 122, all methods exhibit significant improvement. The single fault type recognition accuracies of CEEMDAN-CNN and VMD-CNN models on original vibration acceleration signal increase from 94.84% to 97.81% (an approximately 3% increase). However, these models exhibit unstable performance in handling compound faults, showing fluctuations. The AMOMCKD-CNN model effectively detects compound fault features by utilizing deconvolution algorithms to process original data before feeding it into the CNN. The model achieves recognition accuracies of approximately 99% for the KI18 inner race fault and KA16 outer race fault, with recognition accuracies exceeding 97% for compound faults KB27, KB24, and KB23, showcasing strong precision and generalization capabilities.

Visualization of the learned features using t-SNE: (a) RAW-CNN; (b) AMOMCKD-CNN; (c) CEEMDAN-CNN; (d) VMD-CNN.
The t-SNE algorithm-based feature visualization results on the Paderborn test dataset, as depicted in Fig. 11, reveal the robust feature extraction capability of the AMOMCKD-CNN model. It effectively discriminates between samples representing different fault states, particularly distinguishing between normal and faulty signals. However, CEEMDAN-CNN and VMD-CNN exhibit some misclassifications in distinguishing different faults, especially in compound fault classification. The t-SNE feature visualization demonstrates the practical value of the proposed AMOMCKD-CNN model.

Channel fraction of normal bearing signal and 7 different types of bearing fault signal: (a) Kernel wide = 3; (b) Kernel wide = 122.
An important benefit of the adaptive selection kernel size model is its visualizability, which improves the interpretability of the model. The experiment investigated different operating conditions using the Paderborn dataset, utilizing intrinsic information from each channel in the model, where each channel corresponds to a specific feature extractor. The feature focused on observing the feature extraction of input pulse signal by adjusting the width of the first convolutional layer’s kernel. To facilitate visualization, the extracted feature information from each channel in the data sequence was combined and normalized, highlighting the significance of optimizing kernel widths for different types of fault pulses. Figure 12 illustrates the channel score visualization of normal bearing signal and seven different types of bearing fault signal from the Paderborn dataset, using both traditional and optimized kernel sizes, with brightness indicating the model’s focus on different signals. The results indicate that optimized kernel sizes better capture vibration impulse patterns in the signal. This method can simultaneously focus on different patterns and allocate more attention to significant patterns, thus possessing accurate fault diagnostic capabilities. These visualization results partially reveal the internal operational patterns of the model, aiding in a better understanding and interpretation of the model’s behavior and performance.
The method’s capability to handle signal of arbitrary lengths was validated in the visualization. Fault signals of different lengths (1024, 2048, 4096, 8192) were used as inputs for the model, and the visualization of scores for signals of different lengths is presented in Fig. 13. The results indicate that the method exhibits similar attention to identical vibration patterns across signal of different lengths, demonstrating the model’s strong generalization ability and robustness.

Different length signal channels score.
Further experiment
To further validate the performance of the AMOMCKD-CNN method, experiments were conducted using the publicly available dataset from Case Western Reserve University (CWRU), widely acknowledged as an authoritative benchmark for bearing fault diagnosis. Vibration acceleration signals from the drive-end bearings were used as the experimental dataset, encompassing faults in the rolling elements, inner race, and outer race. The experimental setup covered a load range of 0hp to 3hp and a speed range from 1730rpm to 1797rpm. Compound fault data was generated by directly summing corresponding single fault data to simulate different faults. All signals were measured using acceleration sensors at a frequency of 12kHz under various load torques and speeds. Table 6 presents the specifics of the samples and their tag number.
Compared to the Paderborn dataset, it is observed that the fault pulse components in the CWRU dataset are more distinct, and there is a significant difference between normal and fault signals, providing more favorable conditions for bearing fault diagnosis. Employing the same methodology as with the Paderborn dataset, Table 7 presents the extracted kernel widths with different thresholds, while Table 8 lists the corresponding CNN classification accuracies based on these kernel widths. The parameter \(\gamma \) for the CWRU dataset was determined to be 0.5, corresponding to a kernel width of 58.

Accuracy comparison of four methods (CWRU): (a) Kernel wide = 3; (b) Kernel wide = 122.

Visualization of the learned features using t-SNE(CWRU): (a) RAW-CNN; (b) AMOMCKD-CNN; (c) CEEMDAN-CNN; (d) VMD-CNN.
Figure 14 illustrates the identification accuracy for each type of fault under different conditions, with kernel widths set to 3 and 58, respectively. For example, at a kernel width of 3, the model shows an average accuracy improvement of approximately 4% for compound fault tasks. Although it falls short in handling compound faults involving the inner race, outer race, and rolling elements, this method still secures the second position with a slight deviation from the top-performing approach. By adjusting the kernel size, significant improvements in average precision are observed across all methods. Figure 15 compares eight types of rolling bearing faults using t-SNE clustering analysis, demonstrating that convolutional kernel receptive fields can effectively perceive the regularity of fault pulses and capture these patterns. The AMOMCKD-CNN, through its unique deconvolution approach and optimally designed kernel sizes, efficiently and accurately extracts pulse features, showcasing excellent classification accuracy across all labels, particularly in handling compound faults. In summary, the AMOMCKD-CNN emerges as a versatile model that consistently delivers competitive performance across diverse datasets.
Experiments on both datasets demonstrate that the optimal gamma value is 0.5. Therefore, this study will modify the rotational speed in the Paderborn dataset from 900 to 1500 rpm, while keeping other conditions constant, for further analysis. Figure 16 shows its classification performance. When \(\gamma \) is reduced from 0.9 to 0.1, the changes observed in the data at 1500 rpm, which exhibit more pronounced periodic pulses, are similar to those at 900 rpm. The kernel width at 1500 rpm is generally smaller than at 900 rpm, as the signal pulses are more frequent and concentrated. In terms of classification performance, \(\gamma \) = 0.4 produced the best results, which corresponded to the 1500 rpm data pulse characteristics, which demonstrated the match between the CNN kernel width and the data pulse width.

Classification performance with different kernel widths at 1500 rpm on the Paderborn dataset.