doc/en/theory.rst

   1 ..
   2    Copyright (C) 2008-2022 EDF R&D
   3
   4    This file is part of SALOME ADAO module.
   5
   6    This library is free software; you can redistribute it and/or
   7    modify it under the terms of the GNU Lesser General Public
   8    License as published by the Free Software Foundation; either
   9    version 2.1 of the License, or (at your option) any later version.
  10
  11    This library is distributed in the hope that it will be useful,
  12    but WITHOUT ANY WARRANTY; without even the implied warranty of
  13    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  14    Lesser General Public License for more details.
  15
  16    You should have received a copy of the GNU Lesser General Public
  17    License along with this library; if not, write to the Free Software
  18    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  19
  20    See http://www.salome-platform.org/ or email : webmaster.salome@opencascade.com
  21
  22    Author: Jean-Philippe Argaud, jean-philippe.argaud@edf.fr, EDF R&D
  23
  24 .. _section_theory:
  25
  26 =================================================================================
  27 **[DocT]** A brief introduction to Data Assimilation and Optimization
  28 =================================================================================
  29
  30 .. index:: single: Data Assimilation
  31 .. index:: single: true state
  32 .. index:: single: observation
  33 .. index:: single: a priori
  34 .. index:: single: EstimationOf
  35 .. index:: single: analysis
  36
  37 **Data Assimilation** is a general well established framework for computing the
  38 optimal estimate of the true state of a system, over time if necessary. It uses
  39 values obtained by combining both observations and *a priori* models, including
  40 information about their errors while simultaneously respecting constraints.
  41 This takes into account the laws of behavior or motion of the system through
  42 the equations of the model, and the way the measurements are physically related
  43 to the variables of the system.
  44
  45 In other words, data assimilation merges measurement data of a system, that are
  46 the observations, with *a priori* system physical and mathematical knowledge,
  47 embedded in numerical models. The goal is to obtain the best possible estimate,
  48 called "*analysis*", of the system real state and of its stochastic properties.
  49 Note that this real state (or "*true state*") cannot usually be reached, but
  50 can only be estimated. Moreover, despite the fact that the used information are
  51 stochastic by nature, data assimilation provides deterministic techniques in
  52 order to perform very efficiently the estimation.
  53
  54 Because data assimilation looks for the **best possible** estimate, its
  55 underlying procedure always integrates optimization in order to find this
  56 estimate: particular optimization methods are always embedded in data
  57 assimilation algorithms. Optimization methods can be seen in ADAO as a way to
  58 extend data assimilation applications. They will be introduced this way in the
  59 section `Going further in the state estimation by optimization methods`_, but
  60 they are far more general and can be used without data assimilation concepts.
  61
  62 Two main types of applications exist in data assimilation, which are covered by
  63 the same formalism: **fields reconstruction** (see `Fields reconstruction or
  64 measures interpolation`_) and **parameters identification** (see `Parameters
  65 identification, models adjustment, or calibration`_). These are also referred
  66 to as **state estimation** and **parameters estimation** respectively, and one
  67 can if necessary elaborate joint estimation of both (see `Joint state and
  68 parameter estimation in dynamics`_). In ADAO, some algorithms can be used
  69 either in state estimation or in parameter estimation. This is done simply by
  70 changing the required option "*EstimationOf*" in the algorithm parameters.
  71 Before introducing the `Simple description of the data assimilation
  72 methodological framework`_ in a next section, these two types of applications
  73 are briefly described. At the end, some detailed information allow `Going
  74 further in the data assimilation framework`_ and `Going further in the state
  75 estimation by optimization methods`_, as well as `Going further in data
  76 assimilation for dynamics`_  and having `An overview of reduction methods and
  77 of reduced optimization`_.
  78
  79 Fields reconstruction or measures interpolation
  80 -----------------------------------------------
  81
  82 .. index:: single: fields reconstruction
  83 .. index:: single: measures interpolation
  84 .. index:: single: fields interpolation
  85 .. index:: single: state estimation
  86 .. index:: single: background
  87
  88 **Fields reconstruction (or interpolation)** consists in finding, from a
  89 restricted set of real measures, the physical field which is the most
  90 *consistent* with these measures.
  91
  92 This *consistency* is to understand in terms of interpolation, that is to say
  93 that the field we want to reconstruct, using data assimilation on measures, has
  94 to fit at best the measures, while remaining constrained by the overall field
  95 calculation. The calculation is thus an *a priori* estimation of the field that
  96 we seek to identify. One also speaks of **state estimation** in this case.
  97
  98 If the system evolves over time, the reconstruction of the whole field has to
  99 be established at each time step, taking into account the information over a
 100 time window. The interpolation process is more complicated in this case because
 101 it is temporal, and not only in terms of instantaneous field values.
 102
 103 A simple example of fields reconstruction comes from meteorology, in which one
 104 look for value of variables such as temperature or pressure in all points of the
 105 spatial domain. One have instantaneous measurements of these quantities at
 106 certain points, but also a history set of these measures. Moreover, these
 107 variables are constrained by evolution equations for the state of the
 108 atmosphere, which indicates for example that the pressure at a point can not
 109 take any value independently of the value at this same point in previous time.
 110 One must therefore make the reconstruction of a field at any point in space, in
 111 a "consistent" manner with the evolution equations and with the measures of the
 112 previous time steps.
 113
 114 Parameters identification, models adjustment, or calibration
 115 ------------------------------------------------------------
 116
 117 .. index:: single: parameters identification
 118 .. index:: single: parameters adjustment
 119 .. index:: single: models adjustment
 120 .. index:: single: calibration
 121 .. index:: single: background
 122 .. index:: single: regularization
 123 .. index:: single: inverse problems
 124 .. index:: single: parameters estimation
 125
 126 The **identification (or adjustment) of parameters** by data assimilation is a
 127 form of state calibration which uses both the physical measurement and an *a
 128 priori* parameters estimation (called the "*background*") of the state that one
 129 seeks to identify, as well as a characterization of their errors. From this
 130 point of view, it uses all available information on the physical system, with
 131 restrictive yet realistic assumptions about errors, to find the "*optimal
 132 estimation*" from the true state. We note, in terms of optimization, that the
 133 background realizes a "*regularization*", in the mathematical meaning of
 134 Tikhonov [[Tikhonov77]_ [WikipediaTI]_, of the main problem of parameters
 135 identification. One can also use the term "*inverse problem*" to refer to this
 136 process.
 137
 138 In practice, the two observed gaps "*calculation-measures*" and
 139 "*calculation-background*" are combined to build the calibration correction of
 140 parameters or initial conditions. The addition of these two gaps requires a
 141 relative weight, which is chosen to reflect the trust we give to each piece of
 142 information. This confidence is depicted by the covariance of the errors on the
 143 background and on the observations. Thus the stochastic aspect of information is
 144 essential for building the calibration error function.
 145
 146 A simple example of parameters identification comes from any kind of physical
 147 simulation process involving a parametrized model. For example, a static
 148 mechanical simulation of a beam constrained by some forces is described by beam
 149 parameters, such as a Young coefficient, or by the intensity of the force. The
 150 parameters estimation problem consists in finding for example the right Young
 151 coefficient value in order that the simulation of the beam corresponds to
 152 measurements, including the knowledge of errors.
 153
 154 All quantities representing the description of physics in a model are likely to
 155 be calibrated in a data assimilation process, whether they are model
 156 parameters, initial conditions or boundary conditions. Their simultaneous
 157 consideration is greatly facilitated by the data assimilation framework, which
 158 makes it possible to objectively process a heterogeneous set of available
 159 information.
 160
 161 Joint estimation of states and parameters
 162 -----------------------------------------
 163
 164 .. index:: single: joint estimation of states and parameters
 165
 166 It is sometimes necessary, when considering the two previous types of
 167 applications, to need to simultaneously estimate states (fields) and parameters
 168 characterizing a physical phenomenon. This is known as **joint estimation of
 169 states and parameters**.
 170
 171 Without going into the advanced methods to solve this problem, we can mention
 172 the conceptually very simple approach of considering the vector of states to be
 173 interpolated as *augmented* by the vector of parameters to be calibrated. It
 174 can be noted that we are in *state estimation* or *reconstruction of fields*,
 175 and that in the temporal case of parameters identification, the evolution of
 176 the parameters to estimate is simply the identity. The assimilation or
 177 optimization algorithms can then be applied to the augmented vector. Valid for
 178 moderate nonlinearities in the simulation, this simple method extends the
 179 optimization space, and thus leads to larger problems, but it is often possible
 180 to reduce the representation to numerically computable cases. Without
 181 exhaustiveness, the separated variables optimization, the reduced rank
 182 filtering, or the specific treatment of covariance matrices, are common
 183 techniques to avoid this dimension problem. In the temporal case, we will see
 184 below indications for a `Joint state and parameter estimation in dynamics`_.
 185
 186 To go further, we refer to the mathematical methods of optimization and
 187 augmentation developed in many books or specialized articles, finding their
 188 origin for example in [Lions68]_, [Jazwinski70]_ or [Dautray85]_. In particular
 189 in the case of more marked nonlinearities during the numerical simulation of
 190 the states, it is advisable to treat in a more complete but also more complex
 191 way the problem of joint estimation of states and parameters.
 192
 193 Simple description of the data assimilation methodological framework
 194 --------------------------------------------------------------------
 195
 196 .. index:: single: analysis
 197 .. index:: single: background
 198 .. index:: single: background error covariances
 199 .. index:: single: observation error covariances
 200 .. index:: single: covariances
 201 .. index:: single: 3DVAR
 202 .. index:: single: Blue
 203
 204 We can write these features in a simple manner. By default, all variables are
 205 vectors, as there are several parameters to readjust, or a discrete field to
 206 reconstruct.
 207
 208 According to standard notations in data assimilation, we note
 209 :math:`\mathbf{x}^a` the optimal parameters that is to be determined by
 210 calibration, :math:`\mathbf{y}^o` the observations (or experimental
 211 measurements) that we must compare to the simulation outputs,
 212 :math:`\mathbf{x}^b` the background (*a priori* values, or regularization
 213 values) of searched parameters, :math:`\mathbf{x}^t` the unknown ideals
 214 parameters that would give exactly the observations (assuming that the errors
 215 are zero and the model is exact) as output.
 216
 217 In the simplest case, which is static, the steps of simulation and of
 218 observation can be combined into a single observation operator noted
 219 :math:`\mathcal{H}` (linear or nonlinear). It transforms the input parameters
 220 :math:`\mathbf{x}` to results :math:`\mathbf{y}`, to be directly compared to
 221 observations :math:`\mathbf{y}^o`:
 222
 223 .. math:: \mathbf{y} = \mathcal{H}(\mathbf{x})
 224
 225 Moreover, we use the linearized operator :math:`\mathbf{H}` to represent the
 226 effect of the full operator :math:`\mathcal{H}` around a linearization point
 227 (and we will usually omit thereafter to mention :math:`\mathcal{H}`, even if it
 228 is possible to keep it, to mention only :math:`\mathbf{H}`). In reality, we
 229 have already indicated that the stochastic nature of variables is essential,
 230 coming from the fact that model, background and observations are all incorrect.
 231 We therefore introduce errors of observations additively, in the form of a
 232 random vector :math:`\mathbf{\epsilon}^o` such that:
 233
 234 .. math:: \mathbf{y}^o = \mathbf{H} \mathbf{x}^t + \mathbf{\epsilon}^o
 235
 236 The errors represented here are not only those from observation, but also from
 237 the simulation. We can always consider that these errors are of zero mean.
 238 Noting :math:`E[.]` the classical mathematical expectation, we can then define a
 239 matrix :math:`\mathbf{R}` of the observation error covariances by the
 240 expression:
 241
 242 .. math:: \mathbf{R} = E[\mathbf{\epsilon}^o.{\mathbf{\epsilon}^o}^T]
 243
 244 The background can be written formally as a function of the true value, by
 245 introducing the errors vector :math:`\mathbf{\epsilon}^b` such that:
 246
 247 .. math:: \mathbf{x}^b = \mathbf{x}^t + \mathbf{\epsilon}^b
 248
 249 The background errors :math:`\mathbf{\epsilon}^b` are also assumed of zero
 250 mean, in the same manner as for observations. We define the :math:`\mathbf{B}`
 251 matrix of background error covariances by:
 252
 253 .. math:: \mathbf{B} = E[\mathbf{\epsilon}^b.{\mathbf{\epsilon}^b}^T]
 254
 255 The optimal estimation of the true parameters :math:`\mathbf{x}^t`, given the
 256 background :math:`\mathbf{x}^b` and the observations :math:`\mathbf{y}^o`, is
 257 then called an "*analysis*", noted as :math:`\mathbf{x}^a`, and comes from the
 258 minimisation of an error function, explicit in variational assimilation, or
 259 from the filtering correction in assimilation by filtering.
 260
 261 In **variational assimilation**, in a static case, one classically attempts to
 262 minimize the following function :math:`J`:
 263
 264 .. math:: J(\mathbf{x})=\frac{1}{2}(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+\frac{1}{2}(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 265
 266 :math:`J` is classically designed as the "*3D-Var*" functional in data
 267 assimlation (see for example [Talagrand97]_) or as the generalized Tikhonov
 268 regularization functional in optimization (see for example [WikipediaTI]_).
 269 Since :math:`\mathbf{B}` and :math:`\mathbf{R}` covariance matrices are
 270 proportional to the variances of errors, their presence in both terms of the
 271 function :math:`J` can effectively weight the gap terms by the confidence in the
 272 background or observations errors. The parameters vector :math:`\mathbf{x}`
 273 realizing the minimum of this function therefore constitute the analysis
 274 :math:`\mathbf{x}^a`. It is at this level that we have to use the full panoply
 275 of function minimization methods otherwise known in optimization (see also
 276 section `Going further in the state estimation by optimization methods`_).
 277 Depending on the size of the parameters vector :math:`\mathbf{x}` to identify,
 278 and of the availability of gradient or Hessian of :math:`J`, it is appropriate
 279 to adapt the chosen optimization method (gradient, Newton, quasi-Newton...).
 280
 281 In **assimilation by filtering**, in this simple case usually referred to as
 282 "*BLUE*" (for "*Best Linear Unbiased Estimator*"), the :math:`\mathbf{x}^a`
 283 analysis is given as a correction of the background :math:`\mathbf{x}^b` by a
 284 term proportional to the difference between observations :math:`\mathbf{y}^o`
 285 and calculations :math:`\mathbf{H}\mathbf{x}^b`:
 286
 287 .. math:: \mathbf{x}^a = \mathbf{x}^b + \mathbf{K}(\mathbf{y}^o - \mathbf{H}\mathbf{x}^b)
 288
 289 where :math:`\mathbf{K}` is the Kalman gain matrix, which is expressed using
 290 covariance matrices in the following form:
 291
 292 .. math:: \mathbf{K} = \mathbf{B}\mathbf{H}^T(\mathbf{H}\mathbf{B}\mathbf{H}^T+\mathbf{R})^{-1}
 293
 294 The advantage of filtering is to explicitly calculate the gain, to produce then
 295 the *a posteriori* covariance analysis matrix.
 296
 297 In this simple static case, we can show, under an assumption of Gaussian error
 298 distributions (very little restrictive in practice) and of :math:`\mathcal{H}`
 299 linearity, that the two *variational* and *filtering* approaches give the same
 300 solution.
 301
 302 It is indicated here that these methods of "*3D-Var*" and "*BLUE*" may be
 303 extended to dynamic or time-related problems, called respectively "*4D-Var*"
 304 and "*Kalman filter (KF)*" and their derivatives. They have to take into
 305 account an evolution operator to establish an analysis at the right time steps
 306 of the gap between observations and simulations, and to have, at every moment,
 307 the propagation of the background through the evolution model. The next section
 308 provides information on `Going further in data assimilation for dynamics`_. In
 309 the same way, these methods can be used in case of non linear observation or
 310 evolution operators. Many other variants have been developed to improve the
 311 numerical quality of the methods or to take into account computer requirements
 312 such as calculation size and time.
 313
 314 A schematic view of Data Assimilation and Optimization approaches
 315 -----------------------------------------------------------------
 316
 317 To help the reader get an idea of the approaches that can be used with ADAO in
 318 Data Assimilation and Optimization, we propose here a simplified scheme
 319 describing an arbitrary classification of methods. It is partially and freely
 320 inspired by [Asch16]_ (Figure 1.5).
 321
 322   .. _meth_steps_in_study:
 323   .. image:: images/meth_ad_and_opt.png
 324     :align: center
 325     :width: 75%
 326   .. centered::
 327     **A simplified classification of methods that can be used with ADAO in Data Assimilation and Optimization (acronyms and internal descriptive links are listed below)**
 328
 329 It is deliberately simple to remain readable, the dashed lines showing some of
 330 the simplifications or extensions. For example, it does not specifically
 331 mention the methods with reductions (of which it is given hereafter `An
 332 overview of reduction methods and of reduced optimization`_), some of which
 333 were variations of the basic methods shown here, nor does it mention the more
 334 detailed extensions. It also omits the test methods available in ADAO and
 335 useful for the study.
 336
 337 Each method mentioned in this diagram is the subject of a specific descriptive
 338 section in the chapter on :ref:`section_reference_assimilation`. The acronyms
 339 mentioned in the diagram have the meaning indicated in the associated internal
 340 links:
 341
 342 - 3D-Var: :ref:`section_ref_algorithm_3DVAR`,
 343 - 4D-Var: :ref:`section_ref_algorithm_4DVAR`,
 344 - Blue: :ref:`section_ref_algorithm_Blue`,
 345 - DiffEvol : :ref:`section_ref_algorithm_DifferentialEvolution`,
 346 - EKF: :ref:`section_ref_algorithm_ExtendedKalmanFilter`,
 347 - EnKF: :ref:`section_ref_algorithm_EnsembleKalmanFilter`,
 348 - DFO: :ref:`section_ref_algorithm_DerivativeFreeOptimization`,
 349 - Incr-Var: Incremental version Variational optimisation,
 350 - KF: :ref:`section_ref_algorithm_KalmanFilter`,
 351 - LLS: :ref:`section_ref_algorithm_LinearLeastSquares`,
 352 - NLLS: :ref:`section_ref_algorithm_NonLinearLeastSquares`,
 353 - QR: :ref:`section_ref_algorithm_QuantileRegression`,
 354 - Swarm: :ref:`section_ref_algorithm_ParticleSwarmOptimization`,
 355 - Tabu: :ref:`section_ref_algorithm_TabuSearch`,
 356 - UKF: :ref:`section_ref_algorithm_UnscentedKalmanFilter`.
 357
 358 An overview of reduction methods and of reduced optimization
 359 ------------------------------------------------------------
 360
 361 .. index:: single: reduction
 362 .. index:: single: reduction methods
 363 .. index:: single: reduced methods
 364 .. index:: single: reduced space
 365 .. index:: single: neutral sub-space
 366 .. index:: single: SVD
 367 .. index:: single: POD
 368 .. index:: single: PCA
 369 .. index:: single: Kahrunen-Loeve
 370 .. index:: single: RBM
 371 .. index:: single: EIM
 372 .. index:: single: Fourier
 373 .. index:: single: wavelets
 374 .. index:: single: EOF
 375 .. index:: single: sparse
 376
 377 Data assimilation and optimization approaches always imply a certain amount of
 378 reiteration of a unitary numerical simulation representing the physics that is
 379 to be treated. In order to handle this physics as well as possible, this
 380 elementary numerical simulation is often of large size, even huge, and leads to
 381 an extremely high computational cost when it is repeated. The complete physical
 382 simulation is often called "*high fidelity simulation*" (or "*full scale
 383 simulation*").
 384
 385 To avoid this practical challenge, **different strategies to reduce the cost of
 386 the optimization calculation exist, and some of them also allow to control the
 387 numerical error implied by this reduction**. These strategies are seamlessly
 388 integrated into some of the ADAO methods or are the purpose of special
 389 algorithms.
 390
 391 To establish such an approach, one seeks to reduce at least one of the
 392 ingredients that make up the data assimilation or optimization problem. One can
 393 thus classify the reduction methods according to the ingredient on which they
 394 operate, knowing that some methods deal with several of them. A rough
 395 classification is provided here, which the reader can complete by reading
 396 general mathematical books or articles, or those specialized in his physics.
 397
 398 Reduction of data assimilation or optimization algorithms:
 399     the optimization algorithms themselves can generate significant
 400     computational costs to process numerical information. Various methods can
 401     be used to reduce their algorithmic cost, for example by working in the
 402     most suitable reduced space for optimization, or by using multi-level
 403     optimization techniques. ADAO has such techniques that are included in
 404     variants of classical algorithms, leading to exact or approximate but
 405     numerically more efficient resolutions. By default, the algorithmic options
 406     chosen in ADAO are always the most efficient when they do not impact the
 407     quality of the optimization.
 408
 409 Reduction of the representation of covariances:
 410     in data assimilation algorithms, covariances are the most expensive
 411     quantities to handle or to store, often becoming the limiting quantities
 412     from the point of view of the computational cost. Many methods try to use a
 413     reduced representation of these matrices (leading sometimes but not
 414     necessarily to reduce the dimension of the optimization space).
 415     Classically, factorization, decomposition (spectral, Fourier, wavelets...)
 416     or ensemble estimation (EOF...) techniques, or combinations, are used to
 417     reduce the numerical load of these covariances in the computations. ADAO
 418     uses some of these techniques, in combination with sparse computation
 419     techniques, to make the handling of covariance matrices more efficient.
 420
 421 Reduction of the physical model:
 422     the simplest way to reduce the cost of the unit calculation consists in
 423     reducing the simulation model itself, by representing it in a more economic
 424     way. Numerous methods allow this reduction of models by ensuring a more or
 425     less rigorous control of the approximation error generated by the
 426     reduction. The use of simplified models of the physics allows a reduction
 427     but without always producing an error control. On the contrary, all
 428     decomposition methods (Fourier, wavelets, SVD, POD, PCA, Kahrunen-Loeve,
 429     RBM, EIM, etc.) aim at a reduction of the representation space with an
 430     explicit error control. Although they are very frequently used, they must
 431     nevertheless be completed by a fine analysis of the interaction with the
 432     optimization algorithm in which the reduced computation is inserted, in
 433     order to avoid instabilities, discrepancies or inconsistencies that are
 434     notoriously harmful. ADAO fully supports the use of this type of reduction
 435     method, even if it is often necessary to establish this generic independent
 436     reduction prior to the optimization.
 437
 438 Reduction of the data assimilation or optimization space:
 439     the size of the optimization space depends greatly on the type of problem
 440     treated (estimation of states or parameters) but also on the number of
 441     observations available to conduct the data assimilation. It is therefore
 442     sometimes possible to conduct the optimization in the smallest space by
 443     adapting the internal formulation of the optimization algorithms. When it
 444     is possible and judicious, ADAO integrates this kind of reduced formulation
 445     to improve the numerical performance without reducing the quality of the
 446     optimization.
 447
 448 Combining multiple reductions:
 449     many advanced algorithms seek to combine multiple reduction techniques
 450     simultaneously. However, it is difficult to have both generic and robust
 451     methods, and to use several very efficient reduction techniques at the same
 452     time. ADAO integrates some of the most robust methods, but this aspect is
 453     still largely the subject of research and development.
 454
 455 One can end this quick overview of reduction methods highlighting that their
 456 use is ubiquitous in real applications and in numerical tools, and that ADAO
 457 allows to use proven methods without even knowing it.
 458
 459 Going further in the data assimilation framework
 460 ------------------------------------------------
 461
 462 .. index:: single: adjustment
 463 .. index:: single: artificial intelligence
 464 .. index:: single: Bayesian estimation
 465 .. index:: single: calibration
 466 .. index:: single: data smoothing
 467 .. index:: single: data-driven
 468 .. index:: single: field interpolation
 469 .. index:: single: inverse problems
 470 .. index:: single: inversion
 471 .. index:: single: machine learning
 472 .. index:: single: mathematical regularization
 473 .. index:: single: meta-heuristics
 474 .. index:: single: model reduction
 475 .. index:: single: optimal interpolation
 476 .. index:: single: parameter adjustment
 477 .. index:: single: parameter estimation
 478 .. index:: single: quadratic optimization
 479 .. index:: single: regularization methods
 480 .. index:: single: state estimation
 481 .. index:: single: variational optimization
 482
 483 To get more information about the data assimilation techniques, the reader can
 484 consult introductory documents like [Talagrand97]_ or [Argaud09]_, on-line
 485 training courses or lectures like [Bouttier99]_ and [Bocquet04]_ (along with
 486 other materials coming from geosciences applications), or general documents
 487 like [Talagrand97]_, [Tarantola87]_, [Asch16]_, [Kalnay03]_, [Ide97]_,
 488 [Tikhonov77]_ and [WikipediaDA]_. In a more mathematical way, one can also
 489 consult [Lions68]_, [Jazwinski70]_.
 490
 491 Note that data assimilation is not restricted to meteorology or geo-sciences,
 492 but is widely used in other scientific domains. There are several fields in
 493 science and technology where the effective use of observed but incomplete data
 494 is crucial.
 495
 496 Some aspects of data assimilation are also known by other names. Without being
 497 exhaustive, we can mention the names of *calibration*, *adjustment*, *state
 498 estimation*, *parameter estimation*, *parameter adjustment*, *inverse problems*
 499 or *inversion*, *Bayesian estimation*, *field interpolation* or *optimal
 500 interpolation*, *variational optimization*, *quadratic optimization*,
 501 *mathematical regularization*, *meta-heuristics for optimization*, *model
 502 reduction*, *data smoothing*, *data-driven* modeling, model and data learning
 503 (*Machine Learning* and *Artificial Intelligence*), etc. These terms can be
 504 used in bibliographic searches.
 505
 506 Going further in the state estimation by optimization methods
 507 -------------------------------------------------------------
 508
 509 .. index:: single: state estimation
 510 .. index:: single: optimization methods
 511 .. index:: single: DerivativeFreeOptimization
 512 .. index:: single: ParticleSwarmOptimization
 513 .. index:: single: DifferentialEvolution
 514 .. index:: single: QuantileRegression
 515 .. index:: single: QualityCriterion
 516
 517 As seen before, in a static simulation case, the variational data assimilation
 518 requires to minimize the goal function :math:`J`:
 519
 520 .. math:: J(\mathbf{x})=\frac{1}{2}(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+\frac{1}{2}(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 521
 522 which is named the "*3D-Var*" objective function. It can be seen as a *least
 523 squares minimization* extented form, obtained by adding a regularizing term
 524 using :math:`\mathbf{x}-\mathbf{x}^b`, and by weighting the differences using
 525 :math:`\mathbf{B}` and :math:`\mathbf{R}` the two covariance matrices. The
 526 minimization of the :math:`J` function leads to the *best* :math:`\mathbf{x}`
 527 state estimation. To get more information about these notions, one can consult
 528 reference general documents like [Tarantola87]_.
 529
 530 State estimation possibilities extension, by using more explicitly optimization
 531 methods and their properties, can be imagined in two ways.
 532
 533 First, classical optimization methods often involves using various
 534 gradient-based minimizing procedures. They are extremely efficient to look for
 535 a single local minimum. But they require the goal function :math:`J` to be
 536 sufficiently regular and differentiable, and are not able to capture global
 537 properties of the minimization problem, for example: global minimum, set of
 538 equivalent solutions due to over-parametrization, multiple local minima, etc.
 539 **An approach to extend estimation possibilities is then to use a whole range of
 540 optimizers, allowing global minimization, various robust search properties,
 541 etc**. There is a lot of minimizing methods, such as stochastic ones,
 542 evolutionary ones, heuristics and meta-heuristics for real-valued problems,
 543 etc. They can treat partially irregular or noisy function :math:`J`, can
 544 characterize local minima, etc. The main drawbacks are a greater numerical cost
 545 to find state estimates, and often a lack of guarantee of convergence in finite
 546 time. Here, we only point the following topics, as the methods are available in
 547 ADAO:
 548
 549 - *Derivative Free Optimization (or DFO)* (see :ref:`section_ref_algorithm_DerivativeFreeOptimization`),
 550 - *Particle Swarm Optimization (or PSO)* (see :ref:`section_ref_algorithm_ParticleSwarmOptimization`),
 551 - *Differential Evolution (or DE)* (see :ref:`section_ref_algorithm_DifferentialEvolution`),
 552 - *Quantile Regression (or QR)* (see :ref:`section_ref_algorithm_QuantileRegression`).
 553
 554 Secondly, optimization methods try usually to minimize quadratic measures of
 555 errors, as the natural properties of such goal functions are well suited for
 556 classical gradient optimization. But other measures of errors can be more
 557 adapted to real physical simulation problems. Then, **an another way to extend
 558 estimation possibilities is to use other measures of errors to be reduced**.
 559 For example, we can cite *absolute error value*, *maximum error value*, etc.
 560 The most classical instances of error measurements are recalled or specified
 561 below, indicating their identifiers in ADAO for the possible selection of a
 562 quality criterion:
 563
 564 - the objective function for the augmented weighted least squares error measurement (which is the basic default functional in all data assimilation algorithms, often named "*3D-Var*" objective function, and which is known for the quality criteria in ADAO as "*AugmentedWeightedLeastSquares*", "*AWLS*" or "*DA*") is:
 565
 566     .. index:: single: AugmentedWeightedLeastSquares (QualityCriterion)
 567     .. index:: single: AWLS (QualityCriterion)
 568     .. math:: J(\mathbf{x})=\frac{1}{2}(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+\frac{1}{2}(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 569
 570 - the objective function for the weighted least squares error measurement (which is the squared :math:`L^2` weighted norm of the innovation, with a :math:`1/2` coefficient to be homogeneous with the previous one, and which is known for the quality criteria in ADAO as "*WeightedLeastSquares*" or "*WLS*") is:
 571
 572     .. index:: single: WeightedLeastSquares (QualityCriterion)
 573     .. index:: single: WLS (QualityCriterion)
 574     .. math:: J(\mathbf{x})=\frac{1}{2}(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 575
 576 - the objective function for the least squares error measurement (which is the squared :math:`L^2` norm of the innovation, with a :math:`1/2` coefficient to be homogeneous with the previous ones, and which is known for the quality criteria in ADAO as "*LeastSquares*", "*LS*" or "*L2*") is:
 577
 578     .. index:: single: LeastSquares (QualityCriterion)
 579     .. index:: single: LS (QualityCriterion)
 580     .. index:: single: L2 (QualityCriterion)
 581     .. math:: J(\mathbf{x})=\frac{1}{2}(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})=\frac{1}{2}||\mathbf{y}^o-\mathbf{H}.\mathbf{x}||_{L^2}^2
 582
 583 - the objective function for the absolute error value measurement (which is the :math:`L^1` norm of the innovation, and which is known for the quality criteria in ADAO as "*AbsoluteValue*" or "*L1*") is:
 584
 585     .. index:: single: AbsoluteValue (QualityCriterion)
 586     .. index:: single: L1 (QualityCriterion)
 587     .. math:: J(\mathbf{x})=||\mathbf{y}^o-\mathbf{H}.\mathbf{x}||_{L^1}
 588
 589 - the objective function for the maximum error value measurement (which is the :math:`L^{\infty}` norm, and which is known for the quality criteria in ADAO as "*MaximumError*" or "*ME*") is:
 590
 591     .. index:: single: MaximumError (QualityCriterion)
 592     .. index:: single: ME (QualityCriterion)
 593     .. math:: J(\mathbf{x})=||\mathbf{y}^o-\mathbf{H}.\mathbf{x}||_{L^{\infty}}
 594
 595 These error measures may be not differentiable for the last two, but some
 596 optimization methods can still handle them:  heuristics and meta-heuristics for
 597 real-valued problem, etc. As previously, the main drawback remain a greater
 598 numerical cost to find state estimates, and often a lack of guarantee of
 599 convergence in finite time. Here again, we only point the following methods as
 600 it is available in the ADAO module:
 601
 602 - *Derivative Free Optimization (or DFO)* (see :ref:`section_ref_algorithm_DerivativeFreeOptimization`),
 603 - *Particle Swarm Optimization (or PSO)* (see :ref:`section_ref_algorithm_ParticleSwarmOptimization`),
 604 - *Differential Evolution (or DE)* (see :ref:`section_ref_algorithm_DifferentialEvolution`).
 605
 606 The reader interested in the subject of optimization can look at [WikipediaMO]_
 607 as a general entry point.
 608
 609 .. _section_theory_dynamique:
 610
 611 Going further in data assimilation for dynamics
 612 -----------------------------------------------
 613
 614 .. index:: single: dynamic (system)
 615 .. index:: single: system dynamic
 616 .. index:: single: temporal evolution
 617 .. index:: single: ODE (Ordinary Differential Equation)
 618 .. index:: single: EstimationOf
 619
 620 We can analyze a system in temporal evolution (dynamics) with the help of data
 621 assimilation, in order to explicitly take into account the flow of time in the
 622 estimation of states or parameters. We briefly introduce here the problematic,
 623 and some theoretical or practical tools, to facilitate the user treatment of
 624 such situations. It is nevertheless indicated that the variety of physical and
 625 user problems is large, and that it is therefore recommended to adapt the
 626 treatment to the constraints, whether they are physical, numerical or
 627 computational.
 628
 629 General form of dynamic systems
 630 +++++++++++++++++++++++++++++++
 631
 632 Systems in temporal evolution can be studied or represented using dynamic
 633 systems. In this case, it is easy to conceive the analysis of their behavior
 634 with the help of data assimilation (it is even in this precise case that the
 635 data assimilation approach was initially widely developed).
 636
 637 We formalize the numerical simulation framework in a simple way. A simple
 638 dynamic system dynamic system on the state :math:`\mathbf{x}` can be described
 639 in continuous time in the form:
 640
 641 .. math:: \forall t \in \mathbb{R}^{+}, \frac{d\mathbf{x}}{dt} = \mathcal{D}(\mathbf{x},\mathbf{u},t)
 642
 643 where :math:`\mathbf{x}` is the unknown state vector, :math:`\mathbf{u}` is a
 644 known external control vector, and :math:`\mathcal{D}` is the (possibly
 645 non-linear) operator of the system dynamics. It is an Ordinary Differential
 646 Equation (ODE), of the first order, on the state. In discrete time, this
 647 dynamical system can be written in the following form:
 648
 649 .. math:: \forall n \in \mathbb{N}, \mathbf{x}_{n+1} = M(\mathbf{x}_{n},\mathbf{u}_{n},t_n\rightarrow t_{n+1})
 650
 651 for an indexing :math:`t_n` of discrete times with :math:`n\in\mathbf{N}`.
 652 :math:`M` is the discrete evolution operator, symbolically obtained from
 653 :math:`\mathcal{D}` by the discretization scheme. Usually, we omit the time
 654 notation in the evolution operator :math:`M`. Approximating the
 655 :math:`\mathcal{D}` operator by :math:`M` introduces (or adds, if it already
 656 exists) a :math:`\epsilon` model error.
 657
 658 We can then characterize two types of estimates in dynamics, which we describe
 659 hereafter on the discrete time dynamical system: `State estimation in
 660 dynamics`_ and `Parameter estimation in dynamics`_. Combined, the two types can
 661 be used to make a `Joint state and parameter estimation in dynamics`_. In ADAO,
 662 some algorithms can be used either in state estimation or in parameter
 663 estimation. This is done simply by changing the required option
 664 "*EstimationOf*" in the algorithm parameters.
 665
 666 State estimation in dynamics
 667 ++++++++++++++++++++++++++++
 668
 669 The state estimation can be conducted by data assimilation on the discrete time
 670 version of the dynamical system, written in the following form:
 671
 672 .. math:: \mathbf{x}_{n+1} = M(\mathbf{x}_{n},\mathbf{u}_{n}) + \mathbf{\epsilon}_{n}
 673
 674 .. math:: \mathbf{y}_{n} = H(\mathbf{x}_{n}) + \mathbf{\nu}_{n}
 675
 676 where :math:`\mathbf{x}` is the system state to be estimated,
 677 :math:`\mathbf{x}_{n}` and :math:`\mathbf{y}_{n}` are respectively the
 678 computed (unobserved) and measured (observed) state of the system, :math:`M`
 679 and :math:`H` are the incremental evolution and observation operators,
 680 respectively, :math:`\mathbf{\epsilon}_{n}` and :math:`\mathbf{\nu}_{n}` are
 681 the evolution and observation noise or error, respectively, and
 682 :math:`\mathbf{u}_{n}` is a known external control. The two operators :math:`M`
 683 and :math:`H` are directly usable in data assimilation with ADAO.
 684
 685 Parameter estimation in dynamics
 686 ++++++++++++++++++++++++++++++++
 687
 688 The parameter estimation can be written a differently to be solved by data
 689 assimilation. Still on the discrete time version of the dynamical system, we
 690 look for a nonlinear :math:`G` mapping, parameterized by :math:`\mathbf{a}`,
 691 between inputs :math:`\mathbf{x}_{n}` and measurements :math:`\mathbf{y}_{n}`
 692 at each step :math:`t_n`, the error to be controlled as a function of
 693 parameters :math:`\mathbf{y}_{n}` being
 694 :math:`\mathbf{y}_{n}-G(\mathbf{x}_{n},\mathbf{a})`. We can proceed by
 695 optimization on this error, with regularization, or by filtering by writing the
 696 problem represented in state estimation:
 697
 698 .. math:: \mathbf{a}_{n+1} = \mathbf{a}_{n} + \mathbf{\epsilon}_{n}
 699
 700 .. math:: \mathbf{y}_{n} = G(\mathbf{x}_{n},\mathbf{a}_{n}) + \mathbf{\nu}_{n}
 701
 702 where, this time, the choice of the evolution and observation error models
 703 :math:`\mathbf{\epsilon}_{n}` and :math:`\mathbf{\nu}_{n}` condition the
 704 performance of convergence and observation tracking (while the error
 705 representations come from the behavior of the physics in the case of state
 706 estimation). The estimation of the parameters :math:`\mathbf{a}` is done by
 707 using pairs :math:`(\mathbf{x}_{n},\mathbf{y}_{n})` of corresponding inputs and
 708 outputs.
 709
 710 In this case of parameter estimation, in order to apply data assimilation
 711 methods, we therefore impose the hypothesis that the evolution operator is the
 712 identity (*Note: it is therefore not used, but must be declared in ADAO, for
 713 example as a 1 matrix*), and the observation operator is :math:`G`.
 714
 715 Joint state and parameter estimation in dynamics
 716 ++++++++++++++++++++++++++++++++++++++++++++++++
 717
 718 A special case concerns the joint estimation of state and parameters used in a
 719 dynamic system. One seeks to jointly estimate the state :math:`\mathbf{x}`
 720 (which depends on time) and the parameters :math:`\mathbf{a}` (which here does
 721 not depend on time). There are several ways to deal with this problem, but the
 722 most general one is to use a state vector augmented by the parameters, and to
 723 extend the operators accordingly.
 724
 725 To do this, using the notations of the previous two subsections, we define the
 726 auxiliary variable :math:`\mathbf{w}` such that:
 727
 728 .. math:: \mathbf{w} = \left[
 729     \begin{array}{c}
 730     \mathbf{x} \\
 731     \mathbf{a}
 732     \end{array}
 733     \right]
 734     = \left[
 735     \begin{array}{c}
 736     \mathbf{w}_{|x} \\
 737     \mathbf{w}_{|a}
 738     \end{array}
 739     \right]
 740
 741 and the operators of evolution :math:`\tilde{M}` and observation
 742 :math:`\tilde{H}` associated to the augmented problem:
 743
 744 .. math:: \tilde{M}(\mathbf{w},\mathbf{u}) = \left[
 745     \begin{array}{c}
 746     M(\mathbf{w}_{|x},\mathbf{u}) \\
 747     \mathbf{w}_{|a}
 748     \end{array}
 749     \right]
 750     = \left[
 751     \begin{array}{c}
 752     M(\mathbf{x},\mathbf{u}) \\
 753     \mathbf{a}
 754     \end{array}
 755     \right]
 756
 757 .. math:: \tilde{H}(\mathbf{w}) = \left[
 758     \begin{array}{c}
 759     H(\mathbf{w}_{|x}) \\
 760     G(\mathbf{w}_{|x},\mathbf{w}_{|a})
 761     \end{array}
 762     \right]
 763     = \left[
 764     \begin{array}{c}
 765     H(\mathbf{x}) \\
 766     G(\mathbf{x},\mathbf{a})
 767     \end{array}
 768     \right]
 769
 770 With these notations, by extending the noise variables
 771 :math:`\mathbf{\epsilon}` and :math:`\mathbf{\nu}` appropriately, the joint
 772 state :math:`\mathbf{x}` and parameters :math:`\mathbf{a}` discrete-time
 773 estimation problem, using the joint variable :math:`\mathbf{w}`, is then
 774 written:
 775
 776 .. math:: \mathbf{w}_{n+1} = \tilde{M}(\mathbf{w}_{n},\mathbf{u}_{n}) + \mathbf{\epsilon}_{n}
 777
 778 .. math:: \mathbf{y}_{n} = \tilde{H}(\mathbf{w}_{n}) + \mathbf{\nu}_{n}
 779
 780 where :math:`\mathbf{w}_{n}=[\mathbf{x}_n~~\mathbf{a}_n]^T`. The incremental
 781 evolution and observation operators are therefore respectively the augmented
 782 operators :math:`\tilde{M}` and :math:`\tilde{H}`, and are directly suitable
 783 for study cases with ADAO.
 784
 785 Conceptual scheme for data assimilation in dynamics
 786 +++++++++++++++++++++++++++++++++++++++++++++++++++
 787
 788 To complete the description, we can represent the data assimilation process in
 789 a dynamics specific way using a temporal scheme, which describes the action of
 790 the evolution (:math:`M` or :math:`\tilde{M}`) and observation (:math:`H` or
 791 :math:`\tilde{H}`) operators during the discrete simulation and the recursive
 792 estimation of the state (:math:`\mathbf{x}`). A possible representation is as
 793 follows, particularly appropriate for iterative Kalman filtering algorithms:
 794
 795   .. _schema_d_AD_temporel:
 796   .. image:: images/schema_temporel_KF.png
 797     :align: center
 798     :width: 100%
 799   .. centered::
 800     **Timeline of steps for data assimilation operators in dynamics**
 801
 802 with **P** the state error covariance and *t* the discrete iterative time. In
 803 this scheme, the analysis **(x,P)** is obtained by means of the "*correction*"
 804 by observing the "*prediction*" of the previous state. The concepts described
 805 in this diagram can be directly and simply used in ADAO to elaborate study
 806 cases, and are included in the description of some algorithms.