15  Transition property regression models

In this section, the results of the 3 different regression methods, Random Forest (RF), AdaBoost and Gaussian Process (GP) are compared. All the 3 regressors are implemented in CNMccontrol-oriented Cluster-based Network Modeling and can be selected via settings.py. The utilized model configuration is SLS and the decomposition method is SVDSingular Value Decomposition.

First, it shall be noted that CNMccontrol-oriented Cluster-based Network Modeling also offers the possibility to apply pySindy. However, pySindy has struggled to represent the training data in the first place, thus it cannot be employed for predicting \(\beta_{unseen}\). The latter does not mean that pySindy is not applicable for the construction of a surrogate model for the decomposed \(\boldsymbol Q / \boldsymbol T\) modes, but rather that the selected candidate library was not powerful enough. Nevertheless, only results for the 3 initially mentioned regressors will be discussed.

In figures 15.1 to 15.6 the true (dashed) and the approximation (solid) of the first 4 \(\boldsymbol Q / \boldsymbol T\) modes are shown for the methods RF, AdaBoost and GP, respectively. To begin with, it can be noted that the mode behavior over different model parameter values \(mod(\beta)\) is discontinuous, i.e., it exhibits spikes or sudden changes. In figures 15.1 and 15.2 it can be observed that RFRandom Forest reflects the actual behavior of \(mod(\beta)\) quite well. However, it encounters difficulties in capturing some spikes. AdaBoost on the other hand proves in figures 15.3 and 15.4 to represent the spikes better. Overall, AdaBoost outperforms RFRandom Forest in mirroring training data.

Figure 15.1— \(\boldsymbol Q\)

Figure 15.2— \(\boldsymbol T\)

SLS, SVDSingular Value Decomposition, \(\boldsymbol Q / \boldsymbol T\) modes approximation with RFRandom Forest for \(L=1\)

Figure 15.3— \(\boldsymbol Q\)

Figure 15.4— \(\boldsymbol T\)

SLS, SVDSingular Value Decomposition, \(\boldsymbol Q / \boldsymbol T\) mode approximation with AdaBoost for \(L=1\)

Figure 15.5— \(\boldsymbol Q\)

Figure 15.6— \(\boldsymbol T\)

SLS, SVDSingular Value Decomposition, \(\boldsymbol Q / \boldsymbol T\) mode approximation with GP for \(L=1\)

Gaussian Process (GP) is a very powerful method for regression. Often this is also true when reproducing \(mod(\beta)\). However, there are cases where the performance is insufficient, as shown in figures 15.5 and 15.6 . Applying GP results in absolutely incorrect predicted tensors \(\boldsymbol{\tilde{Q}}(\beta_{unseen}),\, \boldsymbol{\tilde{T}}(\beta_{unseen})\), where too many tensors entries are wrongly forced to zero. Therefore, \(\boldsymbol{\tilde{Q}}(\beta_{unseen}),\, \boldsymbol{\tilde{T}}(\beta_{unseen})\) will eventually lead to an unacceptably high deviation from the original trajectory. Consequently, the GP regression is not applicable for the decomposed \(\boldsymbol Q / \boldsymbol T\) modes without further modification.

The two remaining regressors are Random Forest (RF) and AdaBoost. Although AdaBoost is better at capturing the true modal behavior \(mod(\beta)\), there is no guarantee that it will always be equally better at predicting the modal behavior for unseen model parameter values \(mod(\beta_{unseen})\). In table 15.1 the MAE errors for different \(L\) and \(\beta_{unseen} = [\, 28 .5,\, 32.5\,]\) are provided. Since the table exhibits much information, the results can also be read qualitatively through the graphs 15.7 and 15.8 for \(\beta_{unseen} = 28.5\) and \(\beta_{unseen} = 32.5\), respectively. For the visual inspection, it is important to observe the order of the vertical axis scaling. It can be noted that the MAE errors themselves and the deviation between the RFRandom Forest and AdaBoost MAE errors are very low. Thus, it can be stated that RFRandom Forest as well ad AdaBoost are both well-suited regressors.

Table 15.1— SLS, Mean absolute error for comparing and AdaBoost different \(L\) and two \(\beta_{unseen}\)
\(L\) \(\beta_{unseen}\) \(\boldsymbol{MAE}_{RF, \boldsymbol Q}\) \(\boldsymbol{MAE}_{AdaBoost, \boldsymbol Q}\) \(\boldsymbol{MAE}_{RF, \boldsymbol T}\) \(\boldsymbol{MAE}_{AdaBoost, \boldsymbol T}\)
\(1\) \(28.5\) \(0.002580628\) \(0.002351781\) \(0.002275379\) \(0.002814208\)
\(1\) \(32.5\) \(0.003544923\) \(0.004133114\) \(0.011152145\) \(0.013054876\)
\(2\) \(28.5\) \(0.001823848\) \(0.001871858\) \(0.000409955\) \(0.000503748\)
\(2\) \(32.5\) \(0.006381635\) \(0.007952153\) \(0.002417142\) \(0.002660403\)
\(3\) \(28.5\) \(0.000369228\) \(0.000386292\) \(0.000067680\) \(0.000082808\)
\(3\) \(32.5\) \(0.001462458\) \(0.001613434\) \(0.000346298\) \(0.000360097\)
\(4\) \(28.5\) \(0.000055002\) \(0.000059688\) \(0.000009420\) \(0.000011500\)
\(4\) \(32.5\) \(0.000215147\) \(0.000230404\) \(0.000044509\) \(0.000046467\)
\(5\) \(28.5\) \(0.000007276\) \(0.000007712\) \(0.000001312\) \(0.000001600\)
\(5\) \(32.5\) \(0.000028663\) \(0.000030371\) \(0.000005306\) \(0.000005623\)
\(6\) \(28.5\) \(0.000000993\) \(0.000052682\) \(0.000000171\) \(0.000000206\)
\(6\) \(32.5\) \(0.000003513\) \(0.000003740\) \(0.000000629\) \(0.000000668\)
\(7\) \(28.5\) \(0.000000136\) \(0.000000149\) \(0.000000023\) \(0.000000031\)
\(7\) \(32.5\) \(0.000000422\) \(0.000000454\) \(0.000000078\) \(0.000000082\)

(a) \(\boldsymbol Q\)

(b) \(\boldsymbol T\)

Figure 15.7— SLS, Mean absolute error for comparing RFRandom Forest and AdaBoost different \(L\) and \(\beta_{unseen} = 28.5\)

(a) \(\boldsymbol Q\)

(b) \(\boldsymbol T\)

Figure 15.8— SLS, Mean absolute error for comparing RFRandom Forest and AdaBoost different \(L\) and \(\beta_{unseen} = 32.5\)

In summary, the following can be said, RFRandom Forest and AdaBoost are both performing well in regression. Furthermore, no clear winner between the two regressors can be detected. The third option GP is dismissed as it sometimes has unacceptably low regression performance. Finally, there is the possibility to use pySindy, however, for that, an appropriate candidate library must be defined.