7 Data generation

In this section, the first main step of the 5 steps shall be explained. The idea of CNMccontrol-oriented Cluster-based Network Modeling is to create a surrogate model such that predictions for unseen \(\beta_{unseen}\) can be made. An unseen model parameter value \(\beta_{unseen}\) is defined to be not incorporated in the training data. Generally in machine learning, the more linear independent data is available the higher the trustworthiness of the surrogate model is assumed to be. Linear independent data is to be described as data which provide new information. Imagining any million times a million data matrix \(\boldsymbol {A_{n\, x\, n}}\), where \(n = 1 \mathrm{e}{+6}\). On this big data matrix \(\boldsymbol A\) a modal decomposition method, e.g., the Singular Value Decomposition (SVD) (Brunton and Kutz 2019; Gerbrands 1981), shall be applied.

To reconstruct the original matrix \(\boldsymbol A\) fully with the decomposed matrices only the non-zero modes are required. The number of the non-zero modes \(r\) is often much smaller than the dimension of the original matrix, i.e., \(r << n\). If \(r << n\), the measurement matrix \(\boldsymbol A\) contains a high number of linear dependent data. This has the advantage of allowing the original size to be reduced. The disadvantage, however, is that \(\boldsymbol A\) contains duplicated entries (rows, or columns). For this reason, \(\boldsymbol A\) includes data parts which do not provide any new information. In the case of \(r = n\) only meaningful observations are comprised and \(\boldsymbol A\) has full rank. Part of feature engineering is to supply the regression model with beneficial training data and filter out redundant copies. The drawback of \(r = n\) is observed when the number of representative modes is chosen to be smaller than the full dimension \(r < n\). Consequently, valuable measurements could be lost.

Moreover, if the dimension \(n\) is very large, accuracy demands may make working with matrices unavoidable. As a result, more powerful computers are required and the computational time is expected to be increased. For this work, an attempt is made to represent non-linear differential equations by a surrogate model. In addition, trajectories of many \(\vec{\beta }\) can be handled quite efficiently. Therefore, it attempted to provide sufficient trajectories as training data. Having said that the data and workflow of this step, i.e., data generation, shall be described. The general overview is depicted in figure 7.1 . Data generation corresponding settings are passed to its step, which invokes the ODEOrdinary Differential Equation solver for the range of selected \(\vec{\beta}\). The trajectories are plotted and, both, all the obtained trajectories \(F_(\vec{\beta})\) and their plots are saved. Note that \(\vec{\beta}\) indicates that one differential equation is solved for selected \(\beta\) values within a range of model parameter values \(\vec{\beta}\).

Figure 7.1— Data and workflow of the first step: Data generation

A detailed description will be given in the following. First, in order to run this task, it should be activated in settings.py. Next, the user may change local output paths, define which kind of plots shall be generated, which dynamical model should be employed and provide the range \(\vec{\beta}\). As for the first point, the operator can select the path where the output of this specific task shall be stored. Note, that this is an optional attribute. Also, although it was only tested on Linux, the library pathlib was applied. Therefore, if the output is stored on a Windows or Mac-based operating system, which uses a different path system, no errors are expected.

Regarding the types of plots, first, for each type of plot, the user is enabled to define if these plots are desired or not. Second, all the plots are saved as HTML files. Some reasons for that were provided at the beginning of this chapter and others which are important for trajectory are the following. With in-depth explorations in mind, the user might want to highlight specific regions in order to get detailed and correct information. For trajectories, this can be encountered when e.g., coordinates of some points within a specified region shall be obtained. Here zooming, panning, rotation and a panel that writes out additional information about the current location of the cursor can be helpful tools. The first type of plot is the trajectory itself with the initial condition as a dot in the state-space.

If desired, arrows pointing in the direction of motion of the trajectory can be included in the plots. The trajectory, the initial state sphere and the arrows can be made invisible by one click on the legend if desired. The second type of representation is an animated plot, i.e., each trajectory \(F(\beta)\) is available as the animated motion. The final type of display is one plot that contains all \(F(\vec{\beta})\) as a sub-figure. The latter type of visualization is a very valuable method to see the impact of \(\beta\) across the available \(\vec{\beta }\) on the trajectories \(F(\vec{\beta})\). Also, it can be leveraged as fast sanity check technique, i.e., if any \(F(\beta )\) is from expectation, this can be determined quickly by looking at the stacked trajectory plots.

If for presentation HTML files are not desired, clicking on a button will provide a png image of the current view state of the trajectory. Note, that the button will not be on the picture. Finally, modern software, especially coding environments, understood that staring at white monitors is eye-straining. Consequently, dark working environments are set as default. For this reason, all the mentioned types of plots have a toggle implemented. It allows switching between a dark default and a white representation mode.

For choosing a dynamical system, two possibilities are given. On the one hand, one of the 10 incorporated models can be selected by simply selecting a number, which corresponds to an integrated dynamical system. On the other hand, a new dynamical system can be implemented. This can be achieved without much effort by following the syntax of one of the 10 available models. The main adjustment is done by replacing the ODEOrdinary Differential Equation. The differential equations of all 10 dynamic systems that can be selected by default are given in equations 7.1 to 7.7 and the 3 sets of equations A.1 to A.3 are found the Appendix. The latter 3 sets of equations are provided in the Appendix because they are not used for validating CNMccontrol-oriented Cluster-based Network Modeling prediction performance. Next to the model’s name, the reference to the dynamical system can be seen. The variables \(a\) and \(b\) are constants. Except for the Van der Pol, which is given in the Appendix A as equation A.3, all dynamical systems are 3-dimensional .

Lorenz (Lorenz 1963): \[ \begin{equation} \label{eq_6_Lorenz} \begin{aligned} \dot x &= a\, (y - x) \\ \dot y &= x\, (\beta - z -y) \\ \dot z &= x y -\beta z \end{aligned} \end{equation} \tag{7.1}\]
Rössler (Rössler 1976): \[ \begin{equation} \label{eq_7_Ross} \begin{aligned} \dot x &= -y -z \\ \dot y &= x + ay \\ \dot z &= b +z \, (x-\beta)\\ \end{aligned} \end{equation} \tag{7.2}\]

Two Scroll (Vaidyanathan et al. 2019): \[ \begin{equation} \label{eq_9_2_Scroll} \begin{aligned} \dot x &= \beta \, (y-x) \\ \dot y &= x z \\ \dot z &= a - by^4 \end{aligned} \end{equation} \tag{7.3}\]

Four Wing (Li et al. 2015): \[ \begin{equation} \label{eq_10_4_Wing} \begin{aligned} \dot x &= \beta x +y +yz\\ \dot y &= yz - xz \\ \dot z &= a + bxy -z \end{aligned} \end{equation} \tag{7.4}\]

Sprott_V_1 (SPROTT 2020): \[ \begin{equation} \label{eq_11_Sprott_V_1} \begin{aligned} \dot x &= y \\ \dot y &= -x - sign(z)\,y\\ \dot z &= y^2 - exp(-x^2) \, \beta \end{aligned} \end{equation} \tag{7.5}\]

Tornado (SPROTT 2020): \[ \begin{equation} \label{eq_12_Tornado} \begin{aligned} \dot x &= y \, \beta \\ \dot y &= -x - sign(z)\,y\\ \dot z &= y^2 - exp(-x^2) \end{aligned} \end{equation} \tag{7.6}\]

Insect (SPROTT 2020): \[ \begin{equation} \label{eq_13_Insect} \begin{aligned} \dot x &= y \\\ \dot y &= -x - sign(z)\,y \, \beta\\ \dot z &= y^2 - exp(-x^2) \end{aligned} \end{equation} \tag{7.7}\]

Sprott_V_1, Tornado and Insect in equations 7.5 to 7.7 are not present in the cited reference (SPROTT 2020) in this expressed form. The reason is that the introduced equations are a modification of the chaotic attractor proposed in (SPROTT 2020). The curious reader is invited to read (SPROTT 2020) and to be convinced about the unique properties. The given names are made up and serve to distinguish them. Upon closer inspection, it becomes clear that they differ only in the place where \(\beta\) is added. All 3 models are highly sensitive to \(\beta\), i.e., a small change in \(\beta\) results in bifurcations. For follow-up improvements of CNMccontrol-oriented Cluster-based Network Modeling, these 3 systems can be applied as performance benchmarks for bifurcation prediction capability.

Showing the trajectories of all 10 models with different \(\vec{\beta}\) would claim too much many pages. Therefore, for demonstration purposes the 3 above-mentioned models, i.e., Sprott_V_1, Tornado and Insect are displayed in figures 7.2 to 7.8 . Figure 7.2 depicts the dynamical system Sprott_V_1 7.5 with \(\beta =9\). Figures 7.3 to 7.5 presents the Tornado 7.6 with \(\beta =16.78\) with 3 different camera perspectives. Observing these figures, the reader might recognize why the name Tornado was chosen. The final 3 figures 7.6 to 7.8 display the Insect 7.7 with \(\beta =7\) for 3 different perspectives. Other default models will be displayed in subsection 16.0.2, as they were used for performing benchmarks .

Figure 7.2— Default model: Sprott_V_1 7.5 with \(\beta =9\)

Figure 7.3— Default model: Tornado 7.6 with \(\beta =16.78\), view: 1

Figure 7.4— Default model: Tornado 7.6 with \(\beta =16.78\), view: 2

Figure 7.5— Default model: Tornado 7.6 with \(\beta =16.78\), view: 3

Figure 7.6— Default model: Insect 7.7 with \(\beta =7\), view: 1

Figure 7.7— Default model: Insect 7.7 with \(\beta =7\), view: 2

Figure 7.8— Default model: Insect 7.7 with \(\beta =7\), view: 3

Having selected a dynamical system, the model parameter values for which the system shall be solved must be specified in settings.py. With the known range \(\vec{\beta}\) the problem can be described, as already mentioned in subsection 3.0.1, with equation 3.2 .

\[ \begin{equation} F_{CNMc} = \left(\dot{\vec{x}}(t), \, \vec{\beta} \right) = \left( \frac{\vec{x}(t)}{dt}, \, \vec{\beta} \right) = f(\vec{x}(t), \, \vec{\beta} ) \tag{3.2} \end{equation}\]

The solution to 3.2 is obtained numerically by applying SciPy’s RK45 ODEOrdinary Differential Equation solver. If desired CNMccontrol-oriented Cluster-based Network Modeling allows completing this task in parallel. Additional notes on executing this task in parallel are given in section 2. The main reason for relying on RK45 is that it is commonly known to be a reliable option. Also, in (Butt 2021) RK45 was directly compared with LSODA. The outcome was that LSODA was slightly better, however, the deviation between RK45’s and LSODA’s performance was found to be negligible. In other words, both solvers fulfilled the accuracy demands. Since chaotic systems are known for their Sensitive Dependence on Initial Conditions (SDIC) any deviation, even in the \(\mathcal{O} (1 \mathrm{e}{-15})\), will be amplified approximately exponentially and finally will become unacceptably high. Therefore, it was tested, whether the RK45 solver would allow statistical variations during the solution process. For this purpose, the Lorenz system (Lorenz 1963) was solved multiple times with different time ranges. The outcome is that RK45 has no built-in statistical variation. Simply put, the trajectory of the Lorenz system for one constant \(\beta\) will not differ when solved multiple times on one computer.

Comparing first CNMc and CNMccontrol-oriented Cluster-based Network Modeling the key takeaways are that CNMccontrol-oriented Cluster-based Network Modeling has 10 in-built dynamical systems. However, desiring to implement a new model is also achieved in a way that is considered relatively straightforward. Important settings, such as the model itself, the \(\vec{\beta }\), plotting and storing outcome can be managed with the settings.py. The plots are generated and stored such that post-processing capabilities are supplied.