5  Methodology

In this chapter, the entire pipeline for designing the proposed CNMccontrol-oriented Cluster-based Network Modeling is elaborated. For this purpose, the ideas behind the individual processes are explained. Results from the step tracking onwards will be presented in chapter 11.

Having said that, CNMccontrol-oriented Cluster-based Network Modeling consists of multiple main process steps or stages. First, a broad overview of the CNMccontrol-oriented Cluster-based Network Modeling’s workflow shall be given. Followed by a detailed explanation for each major operational step. The implemented process stages are presented in the same order as they are executed in CNMccontrol-oriented Cluster-based Network Modeling. However, CNMccontrol-oriented Cluster-based Network Modeling is not forced to go through each stage. If the output of some steps is already available, the execution of the respective steps can be skipped.

The main idea behind such an implementation is to prevent computing the same task multiple times. Computational time can be reduced if the output of some CNMccontrol-oriented Cluster-based Network Modeling steps are available. Consequently, it allows users to be flexible in their explorations. It could be the case that only one step of is desired to be examined with different settings or even with newly implemented functions without running the full CNMccontrol-oriented Cluster-based Network Modeling pipeline. Let the one CNMccontrol-oriented Cluster-based Network Modeling step be denoted as C, then it is possible to skip steps A and B if their output is already calculated and thus available. Also, the upcoming steps can be skipped or activated depending on the need for their respective outcomes. Simply put, the mentioned flexibility enables to load data for A and B and execute only C. Executing follow-up steps or loading their data is also made selectable. Since the tasks of this thesis required much coding, it is important to mention the used programming language and the dependencies. As for the programming language, Python 3 (Van Rossum and Drake 2009) was chosen. For the libraries, only a few important libraries will be mentioned, because the number of used libraries is high. Note, each used module is freely available on the net and no licenses are required to be purchased.

The important libraries in terms of performing actual calculations are
NumPy (Harris et al. 2020), SciPy (Virtanen et al. 2020), Scikit-learn (Pedregosa et al. 2011), pySindy (Silva et al. 2020; Kaptanoglu et al. 2022), for multi-dimensional sparse matrix management sparse and for plotting only plotly (Inc. 2015) was deployed. One of the reason why plotly is preferred over Matplotlib (Hunter 2007) are post-processing capabilities, which now a re available. Note, the previous version used Matplotlib* (Hunter 2007), which in this work has been fully replaced by plotly (Inc. 2015). More reasons why this modification is useful and new implemented post-processing capabilities will be given in the upcoming sections.

For local coding, the author’s Linux-Mint-based laptop with the following hardware was deployed: CPU: Intel Core i7-4702MQ CPUComputer Processing Unit@ 2.20GHz × 4, RAM: 16GB. The Institute of fluid dynamics of the Technische Universität Braunschweig also supported this work by providing two more powerful computation resources. The hardware specification will not be mentioned, due to the fact, that all computations and results elaborated in this thesis can be obtained by the hardware described above (authors laptop). However, the two provided resources shall be mentioned and explained if CNMccontrol-oriented Cluster-based Network Modeling benefits from faster computers. The first bigger machine is called Buran, it is a powerful Linux-based working station and access to it is directly provided by the chair of fluid dynamics.

The second resource is the high-performance computer or cluster available across the Technische Universität Braunschweig Phoenix. The first step, where the dynamical systems are solved through an ODEOrdinary Differential Equation solver is written in a parallel manner. This step can if specified in the settings.py file, be performed in parallel and thus benefits from multiple available cores. However, most implemented ODEOrdinary Differential Equations are solved within a few seconds. There are also some dynamical systems implemented whose ODE solution can take a few minutes. Applying CNMccontrol-oriented Cluster-based Network Modeling on latter dynamical systems results in solving their ODEOrdinary Differential Equations for multiple different model parameter values. Thus, deploying the parallelization can be advised in the latter mentioned time-consuming ODEOrdinary Differential Equations.

By far the most time-intensive part of the improved CNMccontrol-oriented Cluster-based Network Modeling is the clustering step. The main computation for this step is done with Scikit-learn (Pedregosa et al. 2011). It is heavily parallelized and the computation time can be reduced drastically when multiple threads are available. Other than that, NumPy and SciPy are well-optimized libraries and are assumed to benefit from powerful computers. In summary, it shall be stated that a powerful machine is for sure advised when multiple dynamical systems with a range of different settings shall be investigated since parallelization is available. Yet executing CNMccontrol-oriented Cluster-based Network Modeling on a single dynamical system, a regular laptop can be regarded as a sufficient tool.