6  CNMc’s data and workflow

In this section, the 5 main points that characterize CNMccontrol-oriented Cluster-based Network Modeling will be discussed. Before diving directly into CNMccontrol-oriented Cluster-based Network Modeling’s workflow some remarks are important to be made. First, CNMccontrol-oriented Cluster-based Network Modeling is written from scratch, it is not simply an updated version of the described first CNMc in subsection 4.0.1. Therefore, the workflow described in this section for CNMccontrol-oriented Cluster-based Network Modeling will not match that of first CNMc, e.g., first CNMc had no concept of settings.py and it was not utilizing Plotly (Inc. 2015) to facilitate post-processing capabilities. The reasons for a fresh start were given in subsection 4.0.1. However, the difficulty of running first CNMc and the time required to adjust first CNMc such that a generic dynamic system could be utilized were considered more time-consuming than starting from zero.

Second, the reader is reminded to have the following in mind. Although it is called pipeline or workflow, CNMccontrol-oriented Cluster-based Network Modeling is not obliged to run the whole workflow. With settings.py file, which will be explained below, it is possible to run only specific selected tasks. The very broad concept of CNMccontrol-oriented Cluster-based Network Modeling was already provided at the beginning of chapter 1. However, instead of providing data of dynamical systems for different model parameter values, the user defines a so-called settings.py file and executes CNMccontrol-oriented Cluster-based Network Modeling. The outcome of CNMccontrol-oriented Cluster-based Network Modeling consists, very broadly, of the predicted trajectories and some accuracy measurements as depicted in figure 1.1 . In the following, a more in-depth view shall be given.

The extension of settings.py is a regular Python file. However, it is a dictionary, thus there is no need to acquire and have specific knowledge about Python. The syntax of Python’s dictionary is quite similar to that of the JSON dictionary, in that the setting name is supplied within a quote mark and the argument is stated after a colon. In order to understand the main points of CNMccontrol-oriented Cluster-based Network Modeling, its main data and workflow are depicted 6.1 as an XDSM diagram (Lambe and Martins 2012).

Figure 6.1— general workflow overview

The first action for executing CNMccontrol-oriented Cluster-based Network Modeling is to define settings.py. It contains descriptive information about the entire pipeline, e.g., which dynamical system to use, which model parameters to select for training, which for testing, which method to use for modal decomposition and mode regression. To be precise, it contains all the configuration attributes of all the 5 main CNMccontrol-oriented Cluster-based Network Modeling steps and some other handy extra functions. It is written in a very clear way such that settings to the corresponding stages of CNMccontrol-oriented Cluster-based Network Modeling and the extra features can be distinguished at first glance. First, there are separate dictionaries for each of the 5 steps to ensure that the desired settings are made where they are needed. Second, instead of regular line breaks, multiline comment blocks with the stage names in the center are used. Third, almost every settings.py attribute is explained with comments. Fourth, there are some cases, where a specific attribute needs to be reused in other steps. The user is not required to adapt it manually for all its occurrences, but rather to change it only on the first occasion, where the considered function is defined. Python will automatically ensure that all remaining steps receive the change correctly. Other capabilities implemented in settings.py are mentioned when they are actively exploited. In figure 6.1 it can be observed that after passing settings.py a so-called Informer and a log file are obtained. The Informer is a file, which is designed to save all user-defined settings in settings.py for each execution of CNMccontrol-oriented Cluster-based Network Modeling. Also, here the usability and readability of the output are important and have been formatted accordingly. It proves to be particularly useful when a dynamic system with different settings is to be calculated, e.g., to observe the influence of one or multiple parameters.

One of the important attributes which can be arbitrarily defined by the user in settings.py and thus re-found in the Informer is the name of the model. In CNMccontrol-oriented Cluster-based Network Modeling multiple dynamical systems are implemented, which can be chosen by simply changing one attribute in settings.py. Different models could be calculated with the same settings, thus this clear and fast possibility to distinguish between multiple calculations is required. The name of the model is not only be saved in the Informer but it will be used to generate a folder, where all of CNMccontrol-oriented Cluster-based Network Modeling output for this single CNMccontrol-oriented Cluster-based Network Modeling workflow will be stored. The latter should contribute to on the one hand that the CNMccontrol-oriented Cluster-based Network Modeling models can be easily distinguished from each other and on the other hand that all results of one model are obtained in a structured way.

When executing CNMccontrol-oriented Cluster-based Network Modeling many terminal outputs are displayed. This allows the user to be kept up to date on the current progress on the one hand and to see important results directly on the other. In case of unsatisfying results, CNMccontrol-oriented Cluster-based Network Modeling could be aborted immediately, instead of having to compute the entire workflow. In other words, if a computation expensive CNMccontrol-oriented Cluster-based Network Modeling task shall be performed, knowing about possible issues in the first steps can be regarded as a time-saving mechanism. The terminal outputs are formatted to include the date, time, type of message, the message itself and the place in the code where the message can be found. The terminal outputs are colored depending on the type of the message, e.g., green is used for successful computations. Colored terminal outputs are applied for the sake of readability. More relevant outputs can easily be distinguished from others. The log file can be considered as a memory since, in it, the terminal outputs are saved.

The stored terminal outputs are in the format as the terminal output described above, except that no coloring is utilized. An instance, where the log file can be very helpful is the following. Some implemented quality measurements give very significant information about prediction reliability. Comparing different settings in terms of prediction capability would become very challenging if the terminal outputs would be lost whenever the CNMccontrol-oriented Cluster-based Network Modeling terminal is closed. The described Informer and the log file can be beneficial as explained, nevertheless, they are optional. That is, both come as two of the extra features mentioned above and can be turned off in settings.py.

Once settings.py is defined, CNMccontrol-oriented Cluster-based Network Modeling will filter the provided input, adapt the settings if required and send the corresponding parts to their respective steps. The sending of the correct settings is depicted in figure 6.1, where the abbreviation st stands for settings. The second abbreviation SOP is found for all 5 stages and denotes storing output and plots. All the outcome is stored in a compressed form such that memory can be saved. All the plots are saved as HTML files. There are many reasons to do so, however, to state the most crucial ones. First, the HTML file can be opened on any operating system. In other words, it does not matter if Windows, Linux or Mac is used. Second, the big difference to an image is that HTML files can be upgraded with, e.g., CSS, JavaScript and PHP functions. Each received HTML plot is equipped with some post-processing features, e.g., zooming, panning and taking screenshots of the modified view. When zooming in or out the axes labels are adapted accordingly. Depending on the position of the cursor, a panel with the exact coordinates of one point and other information such as the \(\beta\) are made visible.

In the same way that data is stored in a compressed format, all HTML files are generated in such a way that additional resources are not written directly into the HTML file, but a link is used so that the required content is obtained via the Internet.
Other features associated with HTML plots and which data are saved will be explained in their respective section in this chapter. The purpose of CNMccontrol-oriented Cluster-based Network Modeling is to generate a surrogate model with which predictions can be made for unknown model parameter values \({\beta}\). For a revision on important terminology as model parameter value \(\beta\) the reader is referred to subsection 3.0.1. Usually, in order to obtain a sound predictive model, machine learning methods require a considerable amount of data. Therefore, the ODEOrdinary Differential Equation is solved for a set of \(\vec{\beta }\). An in-depth explanation for the first is provided in section 7. The next step is to cluster all the received trajectories deploying kmeans++ (Arthur and Vassilvitskii 2006). Once this has been done, tracking can take be performed. Here the objective is to keep track of the positions of all the centroids when \(\beta\) is changed over the whole range of \(\vec{\beta }\). A more detailed description is given in section 9.

The modeling step is divided into two subtasks, which are not displayed as such in figure 6.1 . The first subtask aims to get a model that yields all positions of all the \(K\) centroids for an unseen \(\beta_{unseen}\), where an unseen \(\beta_{unseen}\) is any \(\beta\) that was not used to train the model. In the second subtask, multiple tasks are performed. First, the regular CNMCluster-based Network Modeling (Fernex, Noack, and Semaan 2021) shall be applied to all the tracked clusters from the tracking step. For this purpose, the format of the tracked results is adapted in a way such that CNMCluster-based Network Modeling can be executed without having to modify CNMCluster-based Network Modeling itself. By running CNMCluster-based Network Modeling on the tracked data of all \(\vec{\beta }\), the transition property tensors \(\boldsymbol Q\) and \(\boldsymbol T\) for all \(\vec{\beta }\) are received.

Second, all the \(\boldsymbol Q\) and the \(\boldsymbol T\) tensors are stacked to form \(\boldsymbol {Q_{stacked}}\) and \(\boldsymbol {T_{stacked}}\) matrices. These stacked matrices are subsequently supplied to one of the two possible implemented modal decomposition methods. Third, a regression model for the obtained modes is constructed. Clarifications on the modeling stage can be found in section 10.

The final step is to make the actual predictions for all provided \(\beta_{unseen}\) and allow the operator to draw conclusions about the trustworthiness of the predictions. For the trustworthiness, among others, the three quality measurement concepts explained in subsection 4.0.1 are leveraged. Namely, comparing the CNMccontrol-oriented Cluster-based Network Modeling and CNMCluster-based Network Modeling predicted trajectories by overlaying them directly. The two remaining techniques, which were already applied in regular CNMCluster-based Network Modeling (Fernex, Noack, and Semaan 2021), are the Cluster Probability Distribution (CPD) and the autocorrelation.

The data and workflow in figure 6.1 do not reveal one additional feature of the implementation of CNMccontrol-oriented Cluster-based Network Modeling. That is, inside the folder Inputs multiple subfolders containing a settings.py file, e.g., different dynamical systems, can be inserted to allow a sequential run. In the case of an empty subfolder, CNMccontrol-oriented Cluster-based Network Modeling will inform the user about that and continue its execution without an error. As explained above, each model will have its own folder where the entire output will be stored. To switch between the multiple and a single settings.py version, the settings.py file outside the Inputs folder needs to be modified. The argument for that is multiple_Settings.

Finally, one more extra feature shall be mentioned. After having computed expensive models, it is not desired to overwrite the log file or any other output. To prevent such unwanted events, it is possible to leverage the overwriting attribute in settings.py. If overwriting is disabled, CNMccontrol-oriented Cluster-based Network Modeling would verify whether a folder with the specified model name already exists. In the positive case, CNMccontrol-oriented Cluster-based Network Modeling would initially only propose an alternative model name. Only if the suggested model name would not overwrite any existing folders, the suggestion will be accepted as the new model name. Both, whether the model name was chosen in settings.py as well the new final replaced model name is going to be printed out in the terminal line.

In summary, the data and workflow of CNMccontrol-oriented Cluster-based Network Modeling are shown in Figure 6.1 and are sufficient for a broad understanding of the main steps. However, each of the 5 steps can be invoked individually, without having to run the full pipeline. Through the implementation of settings.py CNMccontrol-oriented Cluster-based Network Modeling is highly flexible. All settings for the steps and the extra features can be managed with settings.py. A log file containing all terminal outputs as well a summary of chosen settings is stored in a separate file called Informer are part of CNMccontrol-oriented Cluster-based Network Modeling’s tools.