Data-driven identification of the spatio-temporal structure of turbulent flows by streaming Dynamic Mode Decomposition

Streaming Dynamic Mode Decomposition (sDMD) (Hemati et al., Phys. Fluids 26(2014)) is a low-storage version of Dynamic Mode Decomposition (DMD) (Schmid, J. Fluid Mech. 656 (2010)), a data-driven method to extract spatio-temporal flow patterns. Streaming DMD avoids storing the entire data sequence in memory by approximating the dynamic modes through incremental updates with new available data. In this paper, we use sDMD to identify and extract dominant spatio-temporal structures of different turbulent flows, requiring the analysis of large datasets. First, the efficiency and accuracy of sDMD are compared to the classical DMD, using a publicly available test dataset that consists of velocity field snapshots obtained by direct numerical simulation of a wake flow behind a cylinder. Streaming DMD not only reliably reproduces the most important dynamical features of the flow; our calculations also highlight its advantage in terms of the required computational resources. We subsequently use sDMD to analyse three different turbulent flows that all show some degree of large-scale coherence: rapidly rotating Rayleigh--B\'enard convection, horizontal convection and the asymptotic suction boundary layer. Structures of different frequencies and spatial extent can be clearly separated, and the prominent features of the dynamics are captured with just a few dynamic modes. In summary, we demonstrate that sDMD is a powerful tool for the identification of spatio-temporal structures in a wide range of turbulent flows.


INTRODUCTION
Coherent structures at different spatial and temporal scales are a prominent feature of many turbulent fluid flows occurring in nature and in engineering applications [62,20,11]. Examples include large-scale vortices, wakes, convection rolls and thermal plumes in Rayleigh-Bénard convection (RBC) [1,32], Taylor rolls in Taylor-Couette flow [18], jets, travelling waves, very-large scale motions [23,24] and low-momentum zones [35] that develop in wall-bounded turbulent boundary layers (BLs). These structures are known to have manifold significant effects in turbulent flows, for instance influence on heat and mass transport, the occurrence of extreme fluctuations or enhanced drag due their to interaction with near-wall dynamics in turbulent BLs [38,34,27]. Improving our knowledge of multi-scale spatio-temporal coherence and the underlying physics is of paramount importance as it would lead to a better fundamental understanding of turbulence, specifically in terms of model-building and turbulence control. However, the co-existence of several coherent structures makes the identification and the extraction of particular spatiotemporal features difficult, which led to a growing need for data-driven methods designed to identify and extract patterns.
Modal decomposition, as an umbrella term for a variety of structurally similar methods, identifies structures by decomposing a given dataset in a suitable set of basis functions, or modes. Fourier analysis constitutes perhaps the most well-known and widely used example of a modal decomposition technique. A more sophisticated example is Proper Orthogonal Decomposition (POD) [58,5,41,44,4], where each mode describes a flow structure according to its energy content. However, as the POD modes do not produce a separated and compact signal in frequency space, they usually contain more than one characteristic frequency and thus cannot yield information on frequential coherence.
Dynamic Mode Decomposition (DMD), by contrast, decomposes a dataset into spatio-temporal coherent structures [50] with dynamic modes obtained as eigenmodes of a high-dimensional linear approximation of the dynamics. More precisely, DMD has solid mathematical foundations in the context of nonlinear dynamical systems theory. Under certain conditions it represents a finite-dimensional approximation of the Koopman operator [47,61], a linear but infinite-dimensional representation of a nonlinear dynamical system [29,36,37]. DMD results have an intuitive physical interpretation as each dynamic mode corresponds to a single frequency and growth or decay rate. Therefore, it is a well-suited data-driven method for the analysis of complex datasets and model reduction. Since its introduction by Peter Schmid in 2010 [50], DMD has had a history of successful applications in fluid dynamics such as obtaining low-dimensional dynamic model of the cylinder wake flow [60,3], generating good initial guesses for unstable periodic orbits in turbulent channel flow [40], flow control [7,42,45], aerodynamics [13], and more general in pattern recognition [26,6]. An overview of the development of DMD extensions and applications thereof in the context of fluid dynamics is given in an upcoming review article by Peter Schmid [49].
Most DMD applications consist of post-processing a time series of experimental or computational data, where most implementations process the entire data sequence at once. However, the size of highly resolved turbulent flow data usually precludes saving or loading the entire dataset into memory. Therefore, only a few studies so far have applied DMD to highly turbulent flows. These constraints can be circumvented by a DMD implementation that allows for incremental data updates [19,2,63], such that the DMD calculation proceeds alongside the main data acquisition process such as Direct Numerical Simulations (DNS) or real-time Particle Image Velocimetry (PIV). Streaming DMD (sDMD) [19] is such a method, which requires only two data samples at a given instant in time and converges to the same results as classical DMD. In what follows we focus on sDMD as a promising method for the analysis of turbulent flows.
The present article is intended to serve three purposes: (a) to demonstrate the applicability of streaming DMD across different large datasets of highly turbulent flows relevant to fundamental science and engineering applications, (b) to analyse the large-scale spatio-temporal dynamics of the flow in sub-domains of particular interest, (c) to demonstrate the robustness of the algorithm with respect to different degrees of downsampling, which allows for analyses of very large datasets to be carried out efficiently on local desktop machines.
The streaming version of the DMD algorithm [19] is applied to three datasets consisting of time series obtained in DNS of three different turbulent flows: rapidly rotating RBC, horizontal convection (HC) and the asymptotic suction boundary layer (ASBL). Despite their physical differences, these three system share a few features that render them interesting and suitable as test cases. We demonstrate the advantages of sDMD for the analysis of turbulent flows, with a particular focus on large-scale spatio-temporal data features.
First, all three cases are paradigmatic examples of fluid-dynamic systems of interest in geophysical fluid dynamics and engineering applications. Rapidly rotating RBC is of relevance whenever rotation and thermal convection are the key physical processes [1,57], such as in the dynamics of planetary cores. Horizontal convection [59,51,22,54] occurs in the ocean which is mostly heated and cooled by its upper surface being in contact with the atmosphere. The ASBL [25,48] is a flat-plate BL with a constant BL thickness in the streamwise direction. The latter is achieved by removing fluid through the pores in the bottom plate, a well-known technique for BL stabilisation. Furthermore, due to the constant BL thickness the ASBL allows the application of techniques developed for parallel wall-bounded shear flows to an open flow.
Second, all three systems host spatio-temporally coherent structures. In rapidly rotating RBC, this is the boundary zonal flow, a large-scale travelling wave structure confined to the lateral near-wall region [65,10,53]. HC features two characteristic processes that operate on very different time scales, i. e., plume emission and slow oscillatory dynamics in the bulk [43], with the former one being an order of magnitude faster than the latter one. The ASBL shows coherent low momentum zones in the free stream, as do many wall bounded shear flows and freely evolving BLs [35], in the present dataset with a slow spanwise drift.
Third, the size of the datasets presents challenges in all three cases that can be mitigated by the incremental nature of sDMD. For rapidly rotating RBC and HC, the fine grids required to properly resolve the small-scale turbulent dynamics result in large datasets, as usual for DNS of turbulent flows. In case of ASBL, a further difficulty lies in the slow dynamics of the low-momentum zone, as an analysis thereof requires very long time series.
This article is organised as follows. Section 2 provides a summary of both, classical DMD [50] and streaming DMD [19], where we highlight few subtle differences concerning technical steps and compare our implementations of DMD and sDMD using a standard publicly available dataset -DNS of a developing von-Kármán vortex street. The main results of our analysis concerning turbulent flows are contained in Sec. 3, beginning with rapidly rotating Rayleigh-Bénard convection in Sec. 3.1, followed by horizontal convection in Sec. 3.2 and the asymptotic suction BL in Sec. 3.3. The paper ends with conclusions and an outlook in Sec. 4.

DYNAMIC MODE DECOMPOSITION
Before describing the specific features and advantages of streaming DMD (sDMD) [19], we briefly summarise the basic ideas and the classical singular value decomposition (SVD) based DMD algorithm [50]. For simplicity we restrict ourselves here to the case of equidistant data sequences, for a more general discussion see [31]. Consider a time series of spatially resolved measurement results recorded at a fixed sampling rate 1∕Δ resulting in, say, equidistant snapshots. Let us further assume that the possibly multidimensional data in each snapshot is flattened into a corresponding -dimensional real vector, such that the time series can be represented by an ordered sequence ( ) { =1,…, } of column vectors ∈ ℝ for ∈ {1, … , }. In the present context would represent the th velocity field in a series of measurements, hence in particular for highly resolved three-dimensional flow fields = 3 3 , where is the number of grid points, can quickly become very large. We will come back to this point in due course.
The assumption DMD relies upon is the existence of a linear operator ∈ ℝ × which approximates the nonlinear dynamics across the interval Δ , that is Here, crucially, does not depend on . Finally, denotes an error term that is assumed to be small. The validity of this assumption depends to some extent on the ratio of the characteristic time scale of the observed nonlinear dynamics and the sampling interval Δ , but most importantly on the potential to describe the dynamics by a linear surrogate model (i.e., the degree of nonlinearity). In practice, is chosen by regression over the available data by least-squares minimisation of the [31]. Since the operator describes the spatio-temporal dynamics of the system, its eigenvectors, known as dynamic modes or somewhat tautologically DMD modes, may be used to disentangle complex spatio-temporal dynamics and to construct low-dimensional models. In what follows we summarize how the dynamic modes may be determined from the data sequence ( ) { =1,…, } , following an SVD-based approach as this is what is mostly used in practice owing to numerical stability concerns with the more fundamental Krylov-subspace-type approach and for reasons of computational cost reduction. Further details can be found in the original work by [50] and the textbook by [31].

SVD-based DMD
For what follows it is convenient to combine data sequences that consist of − 1 samples and are shifted forwards in time by where ∈ 1, … , is the spatial index and the temporal index. Then Eq. (1) implies where = ( ) is the matrix of residuals. The best-fit solution for with respect to least-squares minimization of is given by = + , where + is the pseudo-inverse of . In practice, and in particular in fluid dynamics, ≫ as the dimension of the spatial samples usually exceeds the number of temporal samples by far. Hence, ∈ ℝ × is at most of rank − 1, which calls for a lower-dimensional approximation of , for instance, by restricting to act on a subspace spanned by, say, POD modes obtained by calculating the compact SVD of , where the superscript denotes the transpose. The truncation number is bounded from above by the rank of the data matrix , which is at most − 1. The columns of ∈ ℝ × and the rows of ∈ ℝ ( −1)× are orthogonal, and ∈ ℝ × is a diagonal matrix containing the nonzero singular values of . The matrix contains the spatial structures of the data sequence, that is, the POD modes are given by the columns of . Restricting to act on the subspace spanned by POD modes gives rise to the definition of an auxiliary matrix This equation is to be interpreted in a least-squares optimal sense (hence the absence of the residual), it is obtained by calculating through the pseudo-inverse of , which is calculated via SVD, and subsequently projecting onto the -dimensional subspace spanned by the POD modes, i.e. using the orthogonality of . The eigenvalues of correspond to a subset of the non-zero eigenvalues of . For practical purposes we summarize the SVD-based DMD algorithm [50] as follows: • Build a matrix ∈ ℝ ×( −1) out of the first ( − 1) snapshots, according to Eq. (2).
• Calculate the compact SVD of according to Eq. (5).
• Build a matrix ∈ ℝ ×( −1) out of the last ( − 1) snapshots, according to Eq. (3) and combine it with the matrices and to calculate the optimal representation of the linear mapping in the orthogonal basis given by the POD modes according to Eq. (6).
• Calculate the (projected) dynamic modes The data vector , or, in the present context the velocity field, at time = Δ , where is an integer, can then be approximated using ′ ⩽ dynamic modes and their corresponding DMD eigenvalues where are the components of the least-squares solution of this equation at = 0. The real and imaginary parts of the logarithm of the eigenvalues are the frequency and temporal growth or decay rate of the th dynamic mode for ∈ {1, … , }, respectively. The accuracy of the approximation does not only depend on the number of dynamic modes used to reconstruct the data, it also depends on the truncation number , which determines the accuracy with which the projected dynamic modes have been calculated. Several truncation criteria have been developed to determine a suitable value for , such as Optimal Singular Value Hard Threshold [12], or to choose a number of nonzero mode coefficients from a larger number of modes. The sparsity-promoting algorithm [26], does the latter. Kou and Zhang [30] also developed an improved rank selection criterion for the most dominant DMD modes, which are determined based on the temporal history of each DMD mode and has better modal convergence compared to the classical DMD. So far, the dynamic modes are ordered by amplitude, which may or may not result in a good reconstruction of the data, as the most energetic modes may not be the most dynamically relevant ones, and a mode selection criterion is required. Concerning low-dimensional data representation, sparsity-promotion [26] solves this impasse by minimising the least-squares error between the original and the reconstructed data over the available set of dynamic modes, and it includes an 1 -penalisation term to restrict the number of active modes used for reconstruction. As the focus here is on flow features on large spatio-temporal scales that represent statistically stationary dynamics, we only retain modes with eigenvalues that lie on or very close to the unit circle. The remaining modes are then ranked by frequency in ascending order, and we restrict our attention on the first few low-frequency modes. We point out that this criterion may not be adequate for flows that feature multiple important dynamical features with vastly different timescales, in which case either temporal filtering or a different mode selection crterion may be required. We will come back to this point in Sec. 3.2 on horizontal convection.

Streaming DMD
Classical DMD requires access to the entire data sequence at once, which precludes the analysis of large datasets due to memory constraints. This applies to data of either a high degree of spatial or temporal complexity, where the former results in high spatial dimensionality (large ) and the latter requires long time series (large ) to capture the temporal features of the dynamics. Streaming DMD is a method for the calculation of the POD-projected linear operator based on incremental data updates that addresses this challenge by only requiring two data samples to be held in memory at a given time [19]. In what follows we summarize this procedure; further details including processing steps that reduce the effects of data contamination by noise can be found in the original work by [19]. Streaming DMD consists of two conceptual parts, a low-storage calculation of , and a scheme to update using new data samples based on the iterative Gram-Schmidt orthogonalization.
Let us re-consider the data matrices and defined in Eqs. (2) and (3) and write Eq. (6) as wherẽ ∶= ∈ ℝ × −1 and̃ ∶= ∈ ℝ × −1 , are the projected data matrices, and ∶=̃ ̃ ∈ ℝ × and =̃ ̃ ∈ ℝ × . The identitỹ + =̃ (̃ ̃ ) + , which can be readily verified via SVD, was used in the penultimate step. That is, now both data matrices and are projected onto orthogonal bases consisting of their respective left singular vectors, the POD-modes, with truncation numbers ⩽ rank and ⩽ rank . The rearrangement carried out in the penultimate step has the advantage that ∈ ℝ × and ∈ ℝ × , which in itself is an improvement of classical DMD in terms of memory usage as long as < and < . Especially in fluid dynamics, this is often the case unless the data is very noisy. We will come back to this issue in due course. However, the main advantage of the formulation in Eq. (10) lies in the fact that all matrices on the right-hand side of Eq. (10) can be obtained incrementally from a data stream using only two samples at a time. The matrices and can be calculated incrementally from the data stream by iterative Gram-Schmidt orthogonalization. After each orthogonalization step the updated orthogonal matrices are then used to project the sample vectors onto the respective bases, and the matrices and are subsequently constructed from the projected sample vectors. More precisely, consider for instance the th pair of sample vectors and = +1 . The matrices and , which have been constructed incrementally from the previous data samples, are now updated using and the newly available . Then,̃ = and̃ = are calculated and we can update the remaining matrices according to Before proceeding to the calculations, a few comments are in order. First, the incremental nature of the method precludes the application of numerically more stable orthogonalization methods such as Householder reflections, and this may affect the convergence properties of the method. Second, experimental noise may result in a drastic decrease in computational efficiency as noise usually results in the data matrices being of high rank. In practice, this can be mitigated through an intermediate processing step, as explained in detail in [19]. Third, we note that = contains the squares of the nonzero singular values of , as can be verified via SVD.

Validation
Before applying sDMD to the three aforementioned datasets, we first compare classical and streaming DMD implementations in terms of their respective memory consumption for a publicly available dataset [31] that has been extensively used for testing and validation purposes in the literature [9,2]. Subsequently, we use this dataset to test DMD in conjunction with a coarsening interpolation scheme designed to reduce the computational effort when analysing data of high spatial dimension , as will be the case for turbulent flows.
Since the focus of the present work lies in the identification of the large-scale features which happen to be also the energetically dominant structures of the system, usually represented by one or two of the dynamic modes with the highest amplitudes, we do not apply any specific algorithm to order the dynamic modes or to determine the truncation number . Instead, different values of were tested to ensure convergence with respect to changes in the mode order and values of the lowest frequencies, resulting in = 30 as a sufficient truncation number. For data where dynamical relevance and energy content of the determined dynamic modes result in different mode ordering, more sophisticated methods such as sparsity promotion [26] are required to obtain good low-dimensional data representations.

Comparsion between sDMD and DMD
The dataset provided in Ref. [31] consists of a time series of two-dimensional vorticity fields obtained by computer simulation of the wake flow behind a cylinder for Reynolds number = ∕ = 100, where , and denote the free-stream velocity, the diameter of the cylinder and the kinematic viscosity of the fluid. The dominant dynamics is governed by periodic vortex shedding, therefore it is very well suited for DMD validation. The vortex-shedding frequency can be expressed in nondimensional form through the Strouhal number = ∕ . Here, the Strouhal number is around = 0. 16. For details on the numerical method used to generate the data we refer to the original reference [31]. In total, 150 vorticity-field samples, separated by a time interval Δ = 0.2, were analysed.
The memory consumption of both methods, that is, the RAM usage of the code at a single iteration in case of sDMD and for the full dataset for DMD, has been assessed by two comparisons. First we increased the number of samples for a fixed state dimension, and secondly increased the state dimension for a fixed number of samples. In order to ensure consistency, all tests were carried out on the same computer with an Intel i5-8250U CPU at 1.60GHz and 8GB RAM. We expect the memory usage to increase with the number of data samples for DMD, as each additional data sample requires the same additional amount of memory. However, the scaling should be nonlinear, as the efficiency of the SVD is not linearly related to the data matrix size when the state dimension is much larger than the number of samples and as the memory consumption due to matrix multiplications depends on the number of data samples. For sDMD, the memory consumption should remain constant, as only two data samples are held in memory at a given time. Concerning the memory consumption as a function of the state dimension, we expect a linear relation for both methods. The predictions are confirmed by the data to a good approximation, as can be seen in Fig. 1 (a) for memory consumption as a function of the number of data samples, and in Fig. 1 (b) as a function of the state dimension. In the former case, we indeed observe nonlinear scaling of memory consumption as a function of the number of data samples for DMD, while the memory consumption remains constant for sDMD. In the latter case, the memory consumption scales linearly with state dimension. Compared to DMD, it is lower by a nearly constant offset for sDMD, which results from the larger number of snapshots required by the DMD algorithm. For DMD all 150 snapshots are stored in memory, while sDMD requires only two. Information on computational time and memory usage for DMD and sDMD calculations for 150 snapshots on 90000 grid points is provided in table 1 .

Coarse interpolation for the analysis of high-dimensional data
Numerical simulations of highly turbulent flows require fine computational grids to accurately resolve the dynamics at the small scales. This is not necessarily always due to a need to precisely measure small-scale quantities such as dissipation or correlation functions of high order, it is also a requirement for numerical stability. The required large number of grid points results in a high memory load even for a single sample, which quickly becomes prohibitive even for sDMD. This calls for a reliable downsampling strategy to interpolate the data on coarser grids, in particular when the focus is on large-scale structures. In what follows we analyse the robustness of sDMD with respect to different degrees of spatial downsampling, using the same vortex shedding dataset of the previous subsection. The downsampling was carried out by successively decreasing the original number of grid points uniformly, that is by merging nearby grid points, beginning with 90000 grid points down to a minimum of 5 grid points. The effect of downsampling is assessed by considering two observables, the DMD eigenvalues and the time-averaged reconstruction error, defined as where is the downsampled vorticity field here, and the angled brackets denote a time average. The results are summarized in  Fig. 2 . The DMD eigenvalues, which need to lie on the unit circle as the dynamics are nonlinear and statistically stationary [21], are remarkably robust under the downsampling procedure. This can be expected as the Koopman operator can be approximated for any observable function. Thus, a coarsening of the grid represents a change in the observable, which should not have a large impact on the eigenfunctions. As can be seen from the data shown in Fig. 2 (a) and 2 (b), a reduction by three orders of magnitude in the state dimension results in almost the same values for the DMD eigenvalues. Significant qualitative differences in the eigenvalues occur only after drastic downsampling from 90000 to less than 10 data points. A more quantitative comparison is achieved by considering the difference 1 between the Strouhal number and the dimensionless frequency of the second dynamic mode as a function of the state dimension presented in Fig. 2 (c). As can be seen from the figure, the Strouhal number is reproduced very accurately using only 25 data points. This is particularly striking in view of the unsurprisingly large reconstruction error 2 of order 10 −3 to 10 −2 , for the corresponding downsampled data, as shown in Fig. 2 (d). According to the data presented in the figure, converged results for the reconstruction of the full vorticity field requires a state dimension of least 9000 points. The finite residual for higher resolved data is then due to truncation in the DMD algorithm. The visualisations of the reconstructed vorticity fields in Fig. 2 (e) give a visual impression of the effect the downsampling has on the reconstructed data. As expected, the large-scale spatial coherence is still present in the downsampled data. Since the focus is on the detection of large-scale coherent structures, like the vortex street in this case, and since the coarsening interpolation results in the removal of small-scale spatial structures, the downsampling has very little effect on the results, as expected.

RESULTS
Having validated our implementation of sDMD in conjunction with downsampling on publicly available data, we now apply the method to three different flows, rapidly rotating Rayleigh-Bénard convection (RBC), horizontal convection (HC), and the asymptotic suction boundary layer (ASBL). We chose these three examples in order to demonstrate sDMD to be a useful tool for the analysis of different turbulent flows in terms of their main spatio-temporal structure.
In rapidly rotating RBC, the anticyclonic circulation in the bulk is surrounded by a cyclonic layer close to the horizontal cell walls, and the aim is to identify this large-scale flow pattern. Horizontal convection lends itself well as a test case for the distinction of different spatio-temporal structures, as the dynamics is largely governed by two instabilities that operate on different time scales. The Rayleigh-Taylor instability leads to fast periodic plume generation close to the boundary while an oscillatory instability in the bulk results in much slower periodic dynamics in the bulk. The respective frequencies associated with these two processes differ by an order of magnitude. Similar to canonical wall-bounded parallel shear flows and spatially developing BLs, the ASBL features long-lived large-scale coherent motion. Here the aim is to identify the corresponding spatiotemporal structure. The slow dynamics requires very long time series, which makes this example particularly suitable for the application of sDMD.
All datasets were obtained by direct numerical simulation at parameter values corresponding to turbulent flow. Further details on the numerical methods and parameter values will be given in the following subsections. A summary of computational details such as wall time and memory consumption for the DMD or sDMD calculations is provided in table 1 for all datasets.

Fluid structures
In rotating Rayleigh-Bénard convection, a fluid is confined between a heated bottom plate and a cooled top plate and is rotated around a vertical axis. It is a paradigmatic problem to study many geophysical and astrophysical phenomena in the laboratory, e.g. convective motion occurring in the oceans, the atmosphere, in the outer layer of stars, or in the metallic core of planets.
In rotating RBC laboratory experiment, the fluid is laterally confined. The centrifugal force can be neglected, provided the Froude number is small, and then only the Coriolis force is considered. The interplay of the occurring buoyancy and Coriolis forces, however, may yield highly complex flows with very distinct flow structures whose nature strongly depends on the control parameters. Without rotation or with slow rotation, a distinct feature of turbulent RBC is the emergence of the Large-Scale Circulation (LSC) of fluid. For rapid rotation, however, a mean flow with cyclonic azimuthal velocity near the boundary, the Boundary Zonal Flow (BZF), develops close to the side walls, surrounding a core region of anticyclonic mean flow. The viscous Ekman BLs near the plates induce an anticyclonic circulation with radial outflow in horizontal planes, which is balanced by the vertical velocity in a thin annular region near the sidewall, where cyclonic vorticity is concentrated. The Taylor-Proudman effect induced by rapid rotation tends to homogenize the flow in the vertical direction, resulting in an anticyclonic mean flow in the core region throughout the height. The temperature pattern near the vertical wall, however, moves anticyclonically within the BZF and is likely connected to the thin anticyclonic Ekman layers at the top and bottom plates. The interesting part of the BZF flow structure is, that it has an organized and predominant pattern as mean flow, although the instantaneous flow is turbulent in the whole domain, and consists of active and complex vortex motion. This grants sDMD big potential, as it can find the dominant modes quickly and thus reconstruct the global statistics at very low cost. In addition, the BZF has special drift feature, which could test how well sDMD could capture the temporal evolution of a flow. Thus the aim here is to recover the BZF via sDMD.

Dynamic equations & control parameters
We consider RBC in a vertical cylinder rotating with uniform angular velocity Ω about the vertical axis. The governing equations of the problem are the incompressible Navier-Stokes equations in the Oberbeck-Boussinesq approximation, coupled with the temperature equation, given here in dimensionless form ∇ ⋅ = 0.
which are nondimensionalized by using the fluid layer height , the temperature differences between heated bottom and cooled top plates Δ = + − − , and the free-fall velocity = √ Δ , with denoting the isobaric expansion coefficient, the acceleration due to gravity. The Rayleigh number, Ra, describes the strength of the thermal buoyancy force, the Prandtl number, Pr, the ratio of viscosity and diffusivity, and the convective Rossby number, Ro, is a measure for the rotation rate. They are defined as where the thermal diffusivity, the kinematic viscosity, and Ω the angular rotation speed. Equations (13)-(15) were stepped forward in time using the finite-volume code GOLDFISH [56,28,65]. For the temperature we impose Dirichlet boundary conditions (isothermal) on the top and bottom plates and Neumann conditions (adiabatic) on the lateral walls. All boundaries are assumed to be impenetrable and no-slip, i. e. the velocity field vanishes at all boundaries.

Numerical details
We consider datasets for two different Rayleigh numbers, Ra = 10 8 and Ra = 10 9 , the remaining control parameters are Pr = 0.8, Ro = 0.1. The resolution of the original datasets is × × = 100 × 256 × 380 for Ra = 10 8 and × × = 192 × 512 × 820 for Ra = 10 9 , according to [55], where , and denote the number of grid points in radial, azimuthal and vertical direction, respectively. Grid nodes are non-equidistant in both the radial and vertical directions, being clustered near the boundaries to resolve thermal and velocity boundary layers [64]. The velocity fields from both datasets are sampled for a time period of 200 free-fall time units with a sampling interval of of Δ = 1.25, resulting in 160 samples in total. Both datasets are spatially downsampled for the sDMD analysis, by a four-fold and a 16-fold reduction in the number of data points, respectively, resulting in a spatial resolution of × × = 100 × 128 × 190 for Ra = 10 8 and × × = 96 × 128 × 410 for Ra = 10 9 . The truncation number is = 40 in both cases. In what follows we first describe eigenvalue spectra and the generic spatial features that can be extracted with the first few dynamic mode for the case Ra = 10 8 . Subsequently, we consider the temporal features for both Ra = 10 8 and Ra = 10 9 , providing a quantitative comparison of the time scales associated with the global structure obtained directly from the DNS data and calculated from the lowest DMD frequencies.

Streaming DMD
The DMD eigenvalues obtained for the two cases Ra = 10 8 and Ra = 10 9 are presented in Fig. 3 (a) and (b), respectively. As can be seen from the data shown in the figure, the eigenvalues lie on or close to the unit circle for a truncation number = 40.
We only consider converged modes corresponding to eigenvalues on the unit circle, that is, the mean flow and the dynamic mode corresponding to the next lowest frequency indicated by the blue dots. The first two dynamic modes obtained from the Ra = 10 8dataset are visualised in Fig. 4 in terms of temperature and azimuthal velocity. The temperature field of the first dynamic mode shown in Fig. 4 (a) resembles the mean temperature profile, the corresponding azimuthal velocity field (Fig. 4 b) consists of anticyclonic motion in the bulk and cyclonic motion close to the sidewall. As expected, the first dominant mode corresponds to a base or mean flow. However, this mode is evidently dynamically not important without temporal change. The principal mode is the second one, presented in Fig. 4 (c), and the BZF [65] is clearly visible, both in the temperature (left) and azimuthal velocity (right).  Even though the flow is turbulent, its large-scale spatial structure can be reconstructed nicely with only a few modes as demonstrated by the visualisation of the azimuthal velocity in the lower half of the RBC-cell presented in Fig. 5 . We point out that much of the small-scale dynamics and thereby accuracy in the representation is lost through the downsampling procedure, and applying streaming DMD without coarse interpolation but with larger memory consumption may be advisable if the focus is on a more detailed reconstruction of the flow. Here, we focus only on the large-scale structure. The comparison is carried out for two velocity fields which have been sampled about 30 free-fall times apart in order to guarantee sufficiently decorrelated samples. The originals are shown in Fig. 5 (a) and 5 (d), respectively. Figure 5 (b) and 5 (e) contain the reconstructions using the first two modes, and Fig. 5 (c) and 5 (f) the reconstructions from the first five modes for the two samples, respectively. These examples demonstrate consistently that even though the main features of the flow can be captured by the mean flow and the BZF, a fair amount of detail is missing and its inclusion requires a few more modes. A much better reconstruction can be achieved with as little as five modes.
Having discussed the identification of the dominant spatial feature of the flow, the BZF, we now focus on its temporal structure. Figure 6 presents spatio-temporally resolved diagrams of the dynamics in a ring located at half-height = ∕2 and at radial location = max , where the maximum azimuthal velocity is observed, as indicated by the red circle in the schematic drawing shown in Fig. 6 (a). The time evolution of the temperature and vertical velocity fields of the Ra = 10 8 -dataset are presented in Fig. 6 (b) and 6 (c), respectively, while Fig. 6 (d) corresponds to the time-evolution of the temperature field at Ra = 10 9 . The original data is shown in the left panels of the respective visualisations and the data reconstructed from the first two dynamic modes is shown in the right panels. Visual comparison of the left and right panels confirms again the zonal flow pattern can be clearly captured with only the first two dynamic modes. Furthermore, the visualisations clearly identify the BZF as a travelling wave with strongly correlated temperature and vertical velocity fields as can be seen by comparison of Fig. 6 (b) and 6 (c). The travelling wave structure of the BZF is also present at higher Ra, as can be seen in Fig. 6 (d). As such, it seems to be a robust feature of the BZF in the Rayleigh-number range considered here. However, according to the visualisation the dynamics appear to be slightly more complex at Ra = 10 9 than at Ra = 10 8 , hence it remains to be seen to what extent the travelling wave dynamics persist with increasing Ra.
We now provide a qualitative comparison of time scales between the full data and a reconstruction using only the mean flow and the dynamic modes representing statistically stationary dynamics on the longest time scales presented qualitatively in Fig. 6 . For the case Ra = 10 8 , the BZF period obtained from the data is 74 free-fall time units, which compares well with the period of 72.96 free-fall time units corresponding to the second DMD mode. The relative error between the two amounts to 1.4 %. The comparison at higher Rayleigh number for the case Ra = 10 9 results in a larger relative error, the period obtained from the data of 53 free-fall time units compares with a relative error of 8.5 % to the DMD-period of 57.5 free-fall time units. The latter has been calculated using the expression for DMD frequencies in Eq. (9).
In summary, the most prominent spatio-temporal features of rapidly rotating RBC can be identified through sDMD, with the BZF emerging as the dominant dynamic mode. The cyclonic motion of the fluid reflected in the azimuthal velocity and the anticyclonic motion of the flow pattern reflected in the temperature and vertical velocity as well as their frequencies are fully reproduced by only the first two dynamic modes. These results firmly establish sDMD as a powerful tool for the extraction of dominant coherent structures in turbulent rapidly rotating RBC.  Large-scale structures in RBC at Ra = 10 7 and Pr = 0.7 without rotation and in a cubic domain have recently been identified through Koopman analysis [14]. In cubic geometry, eight LSC states occur in RBC, four long-lived diagonal configurations with sojourn times of (1000) free-fall time units and four wall-aligned configurations, which are visited shortly during transitions between the diagonal configurations. All LSC states have been reconstructed using three Koopman modes, a pair of complex conjugate modes whose frequency should approximate the time scale of one cycle through all four diagonal configurations, and a real mode representing the mean flow. In this context, the present results suggest that a representation of the BZF through Koopman eigenfunctions should also be possible.

Horizontal Convection 3.2.1 Fluid structures
Horizontal convection (HC), similarly to RBC, is driven by thermal buoyancy. However, in HC heating and cooling are applied to different parts of the same horizontal surface. In our case, the heated plate is located in the center and the cooled plates are placed at both ends, as shown in Fig. 7 (a). This setup is relevant for many geophysical and astrophysical flows [51,59] and engineering applications [17], in particular concerning the large-scale overturning circulation of the ocean as heat is supplied to and removed from the ocean predominantly through its upper surface, where the ocean contacts the atmosphere. The dimensionless control parameters are similar to RBC, that is the Rayleigh number, the Prandtl number and the aspect ratio Γ, where the characteristic length scale is the half-cell length. The governing equations are again the incompressible Navier-Stokes equations in the Oberbeck-Boussinesq approximation, and a temperature equation stated in Eqs. (13)- (15), but without the Coriolis term in the momentum equation.
For the parameters Ra = 10 11 and Pr = 10 it was observed that sheared plumes, originated by a Rayleigh-Taylor instability, periodically arise above the heated plate and travel towards the center [43]. However, another time-dependent feature that emerges is the oscillatory instability that breaks symmetry inside the bulk region, see Fig. 7 (b). So there is a fast periodic  emission of thermal plumes close to the boundary and a slow periodic oscillation in the bulk region. That is, the horizontal convection has coexisting dynamics on very different time scales. Streaming DMD is used in conjunction with temporal filtering to provide separate low-dimensional reconstructions of these coexisting dynamics. Temporal filtering is required here for reasons of computational efficiency. In principle, it is possible to extract both time scales from a single analysis. However, the dataset would become very large and the calculations slow. This is because the snapshot spacing must be small enough to detect the small period and we need to process a large number of snapshots along a trajectory long enough to capture the large period. To obtain converged results in particular for the larger period, the truncation number must be increased accordingly, resulting in much larger matrices to be processed at each iteration step.

Dynamic equations & numerical details
The dataset consists of velocity fields obtained in the DNS for a rectangular geometry, as shown in the schematic drawing in Fig. 7 (a). The temperature boundary conditions at the bottom plate are = 0.5 for 0 ≤ ≤ 0.1 and = −0.5 for 0.9 ≤ ≤ 1, all the other walls are adiabatic. No-slip boundary conditions are imposed at all walls for the velocity field. The calculations were carried out using the GOLDFISH code, as in the previous section. Further details can be found in [43]. The original grid is × × = 1026 × 66 × 98, where , , and , denote the number of grid points in the mean-flow -direction, the spanwise -direction and the -direction, which is normal to the heated and cooled bottom plates, respectively. Though, since plumes and oscillations are concentrated above the heated plate, we extract only the dynamically most important data inside the shaded domain, shown in Fig. 7 (b), with × × = 200 × 66 × 98. The truncation number is set to 80 to ensure the dominant modes can be captured properly. Since the plume emission motion is more than ten times faster than the oscillatory flow, a small time interval is needed to capture the fast plume emission while a large number of velocity-field samples is required to simultaneously identify the slow oscillations. To save computational resources, we decouple the two tasks and use two datasets comprised of 200 snapshots each, sampled at different time intervals: 0.1 free-fall time units for the fast plume emission and 0.5 free-fall time units for the slow oscillatory flow. As discussed in the previous section, it is in principle possible to extract both phenomena from a single dataset. Figure 8 presents the DMD spectra for fast plume emission in panel (a) and the slow oscillatory flow in panel (b). Similar to the RRBC DMD spectra shown in Fig. 3 , the eigenvalues lie on or close to the unit circle. In both cases the focus is on the slow dynamics in the respective datasets obtained by the two sampling procedure outlined in the previous section, we focus on the lowest obtained frequencies. The first five low-frequency modes lie in fact on the unit circle for both cases, with the data sampled at larger intervals shown in 8 (b) being converged at higher frequencies as well. The eigenvalues shown in blue (three) corresponds to the modes used in the reconstruction shown in Fig. 9 The temporal structure of the original temperature field and the temperature field reconstructed from the first two dynamic modes is shown in Fig. 9 using horizontal slices located that the spanwise middle of the domain, = ∕2, and at different heights. Figure 9 (a) and 9 (b) contain visualisations of the original field at = 0.1 , to capture fast plume emission, and at = 0.8 , to capture slow oscillations, respectively, and Fig. 9 (c) and 9 (d) present the corresponding reconstructions. A visual comparison of the original and the reconstructed fields qualitatively shows that sDMD can clearly distinguish the two dominant spatio-temporal structures, with the first two dynamic modes identifying the fast motion of the plume emission for the dataset sampled at 0.1 free-fall time units (Fig. 9 (a) and 9 (c)), and the first two dynamic modes capturing the slow oscillatory mode for the dataset sampled at 0.5 free-fall time units (Fig. 9 (b) and 9 (d)). The frequencies obtained from the DNS data and the sDMD calculations are compared with the DMD frequencies calculated according to Eq. (9). The period of the first dynamic mode is 16.98 free-fall time units according to Eq. (9), which matches very well the period of the slow oscillation observed in the original dataset measured to be 16.8 free-fall time units. The second dominant mode has a period of 1.58 free-fall time units according to Eq. (9), which fits the period of 1.6 free-fall time units of the fast plume emission determined from the original DNS data. In both cases, the relative error between the time scales obtained from the full data and via DMD is about 1 %. The agreement between the sDMD results and the DNS data, and the distinct identification and separation of the two dominant spatio-temporal structures with frequencies that differ by an order of magnitude, gives further confidence in the capability of DMD to capture the relevant processes, be it in the temporal or spatial framework.

Asymptotic Suction Boundary Layer (ASBL) 3.3.1 Fluid flow
The ASBL is an open flow that develops over a flat bottom plate in the presence of suction through that plate. In consequence, the BL thickness remains constant in the streamwise direction, and the ASBL shares certain properties with parallel shear flows and spatially developing BLs. In the DNS, the ASBL is emulated by a plane Couette setup using a high simulation domain. That is, we consider a fluid located in a wide gap between two parallel plates as shown schematically in Fig. 10 . The bottom plate is stationary and the fluid is set in motion through the top plate moving in the -direction with velocity ∞ . The latter corresponds to the free-stream velocity of the open flow. The flow is assumed to be incompressible and the conditions isothermal such that the density can be regarded as constant.
The occurrence of large-scale persistent coherent flow structures of long streamwise extent is one of the striking features in turbulent BLs, and ASBL is no exception. We attempt to describe the dynamics of such a large-scale structure in a long time series using a small number of dynamic modes. In order to alleviate the computational effort, the simulations were carried out at moderate Reynolds number using a short computational domain in the streamwise direction and the sampled flow fields were averaged in streamwise direction. As such, the analysed two-dimensional fields obtained by streamwise averaging adequately represent the three-dimensional fields at least concerning the large-scale dynamics with streamwise coherence that is of interest here.

Governing equations & numerical details
Expressed in units of the free-stream velocity, the laminar flow is given by where is the suction velocity and is the kinematic viscosity. The deviations of the laminar flow are then described by the dimensionless equations where is the pressure divided by the constant density and Re = ∞ ∕ the Reynolds number based on the free-stream velocity, the laminar displacement thickness = ∕ and the kinematic viscosity of the fluid. The DNS data was generated with the open-source code channelflow2.0 [15,16]. Equations (18)     , and the number of grid points in , and -directions, respectively, Δ + and Δ + the grid resolution in wall units taking into account 2/3rds dealiasing in stream-and spanwise directions, Δ the sampling interval and the number of samples. Figure 11 (a) shows the deviations from the laminar flow averaged in the streamwise direction of a typical data sample. A largescale coherent region that is localised in the spanwise direction and extends from about 2 to 7 in wall-normal direction is clearly visible. This structure moves slower than the laminar flow and is accompanied by near-wall small-scale regions where the flow is faster than the laminar flow. The slow large-scale structure drifts through the simulation domain in spanwise direction. It takes the large-scale coherent structure = (2070 ∕ ∞ ) time units to traverse the simulation area once. The shift time scale and its relative error of about 6% have been determined by minimising the 2 -distance between two velocity-field samples at time and + ′ over ′ . This was repeated for several pairs of data snapshots separated by ′ . The time scale of the spanwise shift is also clearly discernible through the periodic pattern in the spatio-temporal evolution of the flow at a fixed distance ∕ = 3 from the bottom plate shown in Fig. 11 (b). During that time it varies in intensity, as can be seen when considering the diagonal structure visible in the spatio-temporal evolution, it does not disappear completely and it is difficult to discern other patterns in its dynamics.

Streaming DMD
The aim is to reconstruct the large spatio-temporal scales of the dynamics, i.e. the spatial extent of the slow large-scale structure, its slow spanwise drift and its dynamics, with a few dynamic modes. Capturing the latter two requires a very long time series and as such the application of streaming DMD as opposed to classical DMD, as not all data can be stored in memory at the same time. Following a convergence study for the first few lowest frequencies, the truncation number was set to = 150, and the sampling interval to 20 ∕ ∞ . This sampling interval results in about 200 velocity-field snapshots to be analysed (see table 2 ). The lowest nonzero DMD-frequency obtained from Eq. (9) results in a time scale sDMD = 2156 ∕ ∞ , which falls within the error margins of the time scale obtained by the aformentioned minimisation procedure, = (2070 ∕ ∞ ). We find that three dynamic modes, that is, the mean flow and the two complex conjugate modes corresponding to the lowest frequency, reproduce the slow spanwise drift with time scale sDMD ≈ , here demonstrated by comparison of the spatio-temporal evolution of the reconstructed flow shown in Fig. 11 (c) with that of the original data shown in Fig. 11 (b). (c) Time evolution of the reconstructed flow from the mean flow and the lowest frequency, that is, the first two dynamic modes at * ≈ 3 .

Streaming DMD in a co-moving frame
The detected spanwise drift, however, is not dynamically relevant as it is merely a continuous shift symmetry allowed by the periodic boundary conditions in spanwise direction. In fact, DMD is known to perform poorly in presence of continuous symmetries [31], the drift can lead to spurious modes [52], for instance. Therefore, and in order to obtain further information on the large-scale dynamics of the ASBL, the spanwise drift was removed from the data sequence by spatially shifting each data sample in the sequence an appropriate distance in spanwise direction. That is, a re-analysis of the data was carried out in a comoving reference frame, similar to the approach taken by Rowley and Marsden [46]. Here, the constant shift velocity has been determined by the aforementioned 2 -minimisation, while Rowley and Marsden used a reconstruction equation to calculate a time-dependent shift velocity. Very recently, by combination of the method of slices for symmetry reduction [8] with DMD, a new approach to remove time-dependent continuous symmetries has been devised [33].
In order to analyse the large-scale dynamics of the flow, we focus on flow reconstruction using low-frequency modes. Figure 12 (a) presents a reconstruction of the flow at = 3920 ∕ ∞ , the time at which the full data sample shown in Fig. 11 (a) was taken, using the mean flow and the complex conjugate pair of dynamic modes with the lowest frequency. The DMD spectrum resulting from the calculation in the co-moving frame is provided in Fig. 12 (d), where the eigenvalues corresponding to the modes used in the aforementioned reconstruction are highlighted in green. As can be seen from the data reconstruction in Fig. 12 (a) three dynamic modes are sufficient to reproduce the large-scale coherent structure. However, this reconstruction does not reproduce the flow in the neighbouring regions of the coherent structure very well. For instance, a secondary lowmomentum zone that extends further into the free stream and the vortex pattern of the original cross-flow (see Fig. 11 (a)) are not captured. We found that adding more modes, up to the highest frequency with corresponding eigenvalues still on the unit circle as indicated in red and blue in Fig. 12 (d), has very little effect (not shown). A more adequate reconstruction could only be achieved at = 200, where five dynamic modes were sufficient to describe the aforementioned features (not shown). However, as the dataset only comprises of 203 snapshots, = 200 is likely to result in overfitting.
The distinctive maxima and minima that are present in the spatio-temporal pattern of the flow evolution shown in Fig. 11 (b) suggest the presence of slow periodicity in the background and faster dynamics within the structure's core located at ∕ ≈ 1.5 in the co-moving frame. We find that the former can already be captured with the first two dynamic modes as can be seen form the corresponding spatiotemporal representation of the flow shown in Fig. 11 (c). Adding two pairs of complex conjugate modes -those with eigenvalues shown in red in Fig. 12 (d) -results in a representation of the background flow that changes little when reconstructed with higher frequency modes, see Fig. 12 (b) for a spatiotemporal representation of the flow reconstructed with five modes. The faster dynamics of the structure's core requires a higher-dimensional description. In this context we recapitulate that the structure never completely disappears, hence the fast dynamics encoded in the higher-frequency modes represent fluctuations in intensity on top of a persistent flow feature.

CONCLUSION & OUTLOOK
In this paper we demonstrated the applicability of streaming DMD [19], an efficient low-storage version of the classical SVDbased DMD [50], for the analysis of turbulent flows that show a certain degree of spatio-temporal coherence. As turbulent flow dataset can be substantial in size, we propose a to couple the DMD calculation with prior downsampling, which is appropriate only if the focus is on the large-scale spatiotemporal dynamics, which we focussed on here. We first validate the proposed combination of downsampling with streaming DMD by comparing it to the classical SVD-based DMD [50], based on the example of the flow past a cylinder for = 100. The comparison shows that the obtained streaming dynamic modes and eigenvalues match well with those computed from a post-processing implementation of the SVD-based DMD given enough truncation modes. However, streaming DMD can handle considerably larger datasets with less computational costs compared to the SVD-based DMD, thanks to the feature of incremental data updating, which only requires two data samples to be held in memory at a given time.
The objective of this study was to extract the main dynamic features with an efficient data-driven method and use the resulting information for a low-dimensional reconstruction of the flow. We considered three examples, namely rapidly rotating turbulent Rayleigh-Bénard convection, horizontal convection, and asymptotic suction BL. For rapidly rotating turbulent RBC, a dominant zonal flow pattern, the boundary zonal flow, was identified through the first two dynamic modes. Similarly, for horizontal convection two processes that operate on different time scales could be clearly classified in terms of dynamic modes: The second dynamic mode captures the slow oscillatory dynamics in the bulk while the third dynamic mode describes the much faster process of thermal plume emission. Finally, for ASBL a distinctive coherent low-momentum zone that travels through the simulation domain in spanwise direction can be well described by the only first two dynamic modes once a streamwise drift is removed from the data by calculating dynamic modes in a co-moving frame of reference. To describe dynamical features of the coherent low-momentum zone, a few more frequencies must be included. These examples show that the incrementally updated DMD algorithm can successfully decompose the dominant structures with corresponding frequencies and modes. This establishes sDMD as an accurate and efficient method to identify and capture dominant spatio-temporal features from large datasets of highly-turbulent flows.
As DMD decomposes datasets into coherent structures based on characteristic frequencies, it is especially useful for the analysis of flows featuring large-scale coherent structures and periodic motion. The advantages of the sDMD algorithm, both in terms of low-storage and potential real-time implementation, will make DMD available in numerous contexts where it would have been infeasible previously. This includes in particular the analysis of massive datasets that cannot completely reside in memory. One such application, for instance, concerns the search for unstable periodic orbits in turbulent flows, where a classical DMD-based approach has been successfully applied at moderate Reynolds number [40]. Streaming DMD may constitute a step forward in extending the applicability of this method to higher Reynolds numbers.
In further steps, streaming DMD can be applied to different turbulent flow datasets, to investigate in detail the ability to decompose coherent flow structures. One disadvantage of streaming DMD is that the truncation number of snapshots to achieve a similar reconstruction accuracy as the SVD-based DMD is typically larger. Here, as we focussed on statistically stationary large spatio-temporal scales only, apart from convergence tests we have not considered the effect of the truncation number but have restricted our attention on the analysis of the first few modes corresponding to statistically stationary dynamics ranked by frequency in ascending order. The truncation number is, however, important and should be quantitatively considered when applying streaming DMD to flow field reconstruction or decomposition of complex turbulent flows with multi-frequency temporal structures. Similarly, the mode ordering by frequency used here may not be the optimal choice for all datasets. An efficient and robust criterion needs to be introduced for rank selection.
Finally we wish to mention that not only DMD but also some other approaches, based on or related to DMD, might be very efficient in extraction and analysis of the dynamics of the turbulent superstructures. While in the DMD we apply linear transformations to obtain modes out of snapshots and vice versa, a natural extension of the DMD would be to employ nonlinear transformations instead. This can be realized either via application of hand-picked nonlinear functions (e.g. to use socalled extended DMD -eDMD [61]), by calculation of the full Koopman modes (see [14] in the context of convection), or by training a deep neural network (e.g. to use deep Koopman models [39]). A multilayer convolutional neural network appears to be a good candidate for such a task. In general, (un)supervised deep learning seems to be very promising technique for extraction and analysis of the global dynamics of the turbulent flow superstructures. Further to this, kernel methods or tensorbased reformulations of DMD exist that are also well suited for high-dimensional data sets. A more detailed consideration of these alternate approaches is beyond the scope of this article and application of these advanced methods for the turbulent superstructure analysis remains a challenge for future studies.