doc/en/theory.rst

   1 ..
   2    Copyright (C) 2008-2014 EDF R&D
   3
   4    This file is part of SALOME ADAO module.
   5
   6    This library is free software; you can redistribute it and/or
   7    modify it under the terms of the GNU Lesser General Public
   8    License as published by the Free Software Foundation; either
   9    version 2.1 of the License, or (at your option) any later version.
  10
  11    This library is distributed in the hope that it will be useful,
  12    but WITHOUT ANY WARRANTY; without even the implied warranty of
  13    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  14    Lesser General Public License for more details.
  15
  16    You should have received a copy of the GNU Lesser General Public
  17    License along with this library; if not, write to the Free Software
  18    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  19
  20    See http://www.salome-platform.org/ or email : webmaster.salome@opencascade.com
  21
  22    Author: Jean-Philippe Argaud, jean-philippe.argaud@edf.fr, EDF R&D
  23
  24 .. _section_theory:
  25
  26 =================================================================================
  27 **[DocT]** A brief introduction to Data Assimilation and Optimization
  28 =================================================================================
  29
  30 .. index:: single: Data Assimilation
  31 .. index:: single: true state
  32 .. index:: single: observation
  33 .. index:: single: a priori
  34
  35 **Data Assimilation** is a general framework for computing the optimal estimate
  36 of the true state of a system, over time if necessary. It uses values obtained
  37 by combining both observations and *a priori* models, including information
  38 about their errors.
  39
  40 In other words, data assimilation merges measurement data of a system, that are
  41 the observations, with *a priori* system physical and mathematical knowledge,
  42 embedded in numerical models, to obtain the best possible estimate of the system
  43 true state and of its stochastic properties. Note that this true state can not
  44 be reached, but can only be estimated. Moreover, despite the fact that the used
  45 information are stochastic by nature, data assimilation provides deterministic
  46 techniques in order to perform very efficiently the estimation.
  47
  48 Because data assimilation look for the **best possible** estimate, its
  49 underlying procedure always integrates optimization in order to find this
  50 estimate: particular optimization methods are always embedded in data
  51 assimilation algorithms. Optimization methods can be seen in ADAO as a way to
  52 extend data assimilation applications. They will be introduced this way in the
  53 section `Going further in the state estimation by optimization methods`_, but
  54 they are far more general and can be used without data assimilation concepts.
  55
  56 Two main types of applications exist in data assimilation, being covered by the
  57 same formalism: **parameters identification** and **fields reconstruction**.
  58 Before introducing the `Simple description of the data assimilation
  59 methodological framework`_ in a next section, we describe briefly these two
  60 types. At the end, some references allow `Going further in the data assimilation
  61 framework`_.
  62
  63 Fields reconstruction or measures interpolation
  64 -----------------------------------------------
  65
  66 .. index:: single: fields reconstruction
  67 .. index:: single: measures interpolation
  68 .. index:: single: fields interpolation
  69
  70 **Fields reconstruction (or interpolation)** consists in finding, from a
  71 restricted set of real measures, the physical field which is the most
  72 *consistent* with these measures.
  73
  74 This *consistency* is to understand in terms of interpolation, that is to say
  75 that the field we want to reconstruct, using data assimilation on measures, has
  76 to fit at best the measures, while remaining constrained by the overall field
  77 calculation. The calculation is thus an *a priori* estimation of the field that
  78 we seek to identify.
  79
  80 If the system evolves in time, the reconstruction has to be established on every
  81 time step, of the field as a whole. The interpolation process in this case is
  82 more complicated since it is temporal, and not only in terms of instantaneous
  83 values of the field.
  84
  85 A simple example of fields reconstruction comes from meteorology, in which one
  86 look for value of variables such as temperature or pressure in all points of the
  87 spatial domain. One have instantaneous measurements of these quantities at
  88 certain points, but also a history set of these measures. Moreover, these
  89 variables are constrained by evolution equations for the state of the
  90 atmosphere, which indicates for example that the pressure at a point can not
  91 take any value independently of the value at this same point in previous time.
  92 One must therefore make the reconstruction of a field at any point in space, in
  93 a "consistent" manner with the evolution equations and with the measures of the
  94 previous time steps.
  95
  96 Parameters identification, models adjustment, calibration
  97 ---------------------------------------------------------
  98
  99 .. index:: single: parameters identification
 100 .. index:: single: parameters adjustment
 101 .. index:: single: models adjustment
 102 .. index:: single: calibration
 103 .. index:: single: background
 104 .. index:: single: regularization
 105 .. index:: single: inverse problems
 106
 107 The **identification (or adjustment) of parameters** by data assimilation is a
 108 form of state calibration which uses both the physical measurement and an *a
 109 priori* parameters estimation (called the "*background*") of the state that one
 110 seeks to identify, as well as a characterization of their errors. From this
 111 point of view, it uses all available information on the physical system (even if
 112 assumptions about errors are relatively restrictive) to find the "*optimal
 113 estimation*" from the true state. We note, in terms of optimization, that the
 114 background realizes a "*regularization*", in a mathematical meaning, of the main
 115 problem of parameters identification. One can also use the terms "*inverse
 116 problems*" to refer to this process.
 117
 118 In practice, the two observed gaps "*calculation-background*" and
 119 "*calculation-measures*" are combined to build the calibration correction of
 120 parameters or initial conditions. The addition of these two gaps requires a
 121 relative weight, which is chosen to reflect the trust we give to each piece of
 122 information. This confidence is depicted by the covariance of the errors on the
 123 background and on the observations. Thus the stochastic aspect of information,
 124 measured or *a priori*, is essential for building the calibration error
 125 function.
 126
 127 A simple example of parameters identification comes from any kind of physical
 128 simulation process involving a parametrized model. For example, a static
 129 mechanical simulation of a beam constrained by some forces is described by beam
 130 parameters, such as a Young coefficient, or by the intensity of the force. The
 131 parameters estimation problem consists in finding for example the right Young
 132 coefficient value in order that the simulation of the beam corresponds to
 133 measurements, including the knowledge of errors.
 134
 135 Simple description of the data assimilation methodological framework
 136 --------------------------------------------------------------------
 137
 138 .. index:: single: background
 139 .. index:: single: background error covariances
 140 .. index:: single: observation error covariances
 141 .. index:: single: covariances
 142 .. index:: single: 3DVAR
 143 .. index:: single: Blue
 144
 145 We can write these features in a simple manner. By default, all variables are
 146 vectors, as there are several parameters to readjust, or a discrete field to
 147 reconstruct.
 148
 149 According to standard notations in data assimilation, we note
 150 :math:`\mathbf{x}^a` the optimal parameters that is to be determined by
 151 calibration, :math:`\mathbf{y}^o` the observations (or experimental
 152 measurements) that we must compare to the simulation outputs,
 153 :math:`\mathbf{x}^b` the background (*a priori* values, or regularization
 154 values) of searched parameters, :math:`\mathbf{x}^t` the unknown ideals
 155 parameters that would give exactly the observations (assuming that the errors
 156 are zero and the model is exact) as output.
 157
 158 In the simplest case, which is static, the steps of simulation and of
 159 observation can be combined into a single observation operator noted :math:`H`
 160 (linear or nonlinear). It transforms the input parameters :math:`\mathbf{x}` to
 161 results :math:`\mathbf{y}`, to be directly compared to observations
 162 :math:`\mathbf{y}^o`:
 163
 164 .. math:: \mathbf{y} = H(\mathbf{x})
 165
 166 Moreover, we use the linearized operator :math:`\mathbf{H}` to represent the
 167 effect of the full operator :math:`H` around a linearization point (and we omit
 168 thereafter to mention :math:`H` even if it is possible to keep it). In reality,
 169 we have already indicated that the stochastic nature of variables is essential,
 170 coming from the fact that model, background and observations are all incorrect.
 171 We therefore introduce errors of observations additively, in the form of a
 172 random vector :math:`\mathbf{\epsilon}^o` such that:
 173
 174 .. math:: \mathbf{y}^o = \mathbf{H} \mathbf{x}^t + \mathbf{\epsilon}^o
 175
 176 The errors represented here are not only those from observation, but also from
 177 the simulation. We can always consider that these errors are of zero mean.
 178 Noting :math:`E[.]` the classical mathematical expectation, we can then define a
 179 matrix :math:`\mathbf{R}` of the observation error covariances by:
 180
 181 .. math:: \mathbf{R} = E[\mathbf{\epsilon}^o.{\mathbf{\epsilon}^o}^T]
 182
 183 The background can also be written as a function of the true value, by
 184 introducing the error vector :math:`\mathbf{\epsilon}^b` such that:
 185
 186 .. math:: \mathbf{x}^b = \mathbf{x}^t + \mathbf{\epsilon}^b
 187
 188 where errors are also assumed of zero mean, in the same manner as for
 189 observations. We define the :math:`\mathbf{B}` matrix of background error
 190 covariances by:
 191
 192 .. math:: \mathbf{B} = E[\mathbf{\epsilon}^b.{\mathbf{\epsilon}^b}^T]
 193
 194 The optimal estimation of the true parameters :math:`\mathbf{x}^t`, given the
 195 background :math:`\mathbf{x}^b` and the observations :math:`\mathbf{y}^o`, is
 196 then the "*analysis*" :math:`\mathbf{x}^a` and comes from the minimisation of an
 197 error function (in variational assimilation) or from the filtering correction (in
 198 assimilation by filtering).
 199
 200 In **variational assimilation**, in a static case, one classically attempts to
 201 minimize the following function :math:`J`:
 202
 203 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 204
 205 which is usually designed as the "*3D-VAR*" function (see for example
 206 [Talagrand97]_). Since :math:`\mathbf{B}` and :math:`\mathbf{R}` covariance
 207 matrices are proportional to the variances of errors, their presence in both
 208 terms of the function :math:`J` can effectively weight the differences by
 209 confidence in the background or observations errors. The parameters vector
 210 :math:`\mathbf{x}` realizing the minimum of this function therefore constitute
 211 the analysis :math:`\mathbf{x}^a`. It is at this level that we have to use the
 212 full panoply of function minimization methods otherwise known in optimization
 213 (see also section `Going further in the state estimation by optimization
 214 methods`_). Depending on the size of the parameters vector :math:`\mathbf{x}` to
 215 identify, and of the availability of gradient or Hessian of :math:`J`, it is
 216 appropriate to adapt the chosen optimization method (gradient, Newton,
 217 quasi-Newton...).
 218
 219 In **assimilation by filtering**, in this simple case usually referred to as
 220 "*BLUE*" (for "*Best Linear Unbiased Estimator*"), the :math:`\mathbf{x}^a`
 221 analysis is given as a correction of the background :math:`\mathbf{x}^b` by a
 222 term proportional to the difference between observations :math:`\mathbf{y}^o`
 223 and calculations :math:`\mathbf{H}\mathbf{x}^b`:
 224
 225 .. math:: \mathbf{x}^a = \mathbf{x}^b + \mathbf{K}(\mathbf{y}^o - \mathbf{H}\mathbf{x}^b)
 226
 227 where :math:`\mathbf{K}` is the Kalman gain matrix, which is expressed using
 228 covariance matrices in the following form:
 229
 230 .. math:: \mathbf{K} = \mathbf{B}\mathbf{H}^T(\mathbf{H}\mathbf{B}\mathbf{H}^T+\mathbf{R})^{-1}
 231
 232 The advantage of filtering is to explicitly calculate the gain, to produce then
 233 the *a posteriori* covariance analysis matrix.
 234
 235 In this simple static case, we can show, under an assumption of Gaussian error
 236 distributions (very little restrictive in practice), that the two *variational*
 237 and *filtering* approaches give the same solution.
 238
 239 It is indicated here that these methods of "*3D-VAR*" and "*BLUE*" may be
 240 extended to dynamic problems, called respectively "*4D-VAR*" and "*Kalman
 241 filter*". They can take into account the evolution operator to establish an
 242 analysis at the right time steps of the gap between observations and
 243 simulations, and to have, at every moment, the propagation of the background
 244 through the evolution model. Many other variants have been developed to improve
 245 the numerical quality of the methods or to take into account computer
 246 requirements such as calculation size and time.
 247
 248 Going further in the data assimilation framework
 249 ------------------------------------------------
 250
 251 .. index:: single: state estimation
 252 .. index:: single: parameter estimation
 253 .. index:: single: inverse problems
 254 .. index:: single: Bayesian estimation
 255 .. index:: single: optimal interpolation
 256 .. index:: single: mathematical regularization
 257 .. index:: single: regularization methods
 258 .. index:: single: data smoothing
 259
 260 To get more information about the data assimilation techniques, the reader can
 261 consult introductory documents like [Talagrand97]_ or [Argaud09]_, on-line
 262 training courses or lectures like [Bouttier99]_ and [Bocquet04]_ (along with
 263 other materials coming from geosciences applications), or general documents like
 264 [Talagrand97]_, [Tarantola87]_, [Kalnay03]_, [Ide97]_ and [WikipediaDA]_.
 265
 266 Note that data assimilation is not restricted to meteorology or geo-sciences,
 267 but is widely used in other scientific domains. There are several fields in
 268 science and technology where the effective use of observed but incomplete data
 269 is crucial.
 270
 271 Some aspects of data assimilation are also known as *state estimation*,
 272 *parameter estimation*, *inverse problems*, *Bayesian estimation*, *optimal
 273 interpolation*, *mathematical regularization*, *data smoothing*, etc. These
 274 terms can be used in bibliographical searches.
 275
 276 Going further in the state estimation by optimization methods
 277 -------------------------------------------------------------
 278
 279 .. index:: single: state estimation
 280 .. index:: single: optimization methods
 281
 282 As seen before, in a static simulation case, the variational data assimilation
 283 requires to minimize the goal function :math:`J`:
 284
 285 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 286
 287 which is named the "*3D-VAR*" function. It can be seen as a *least squares
 288 minimization* extented form, obtained by adding a regularizing term using
 289 :math:`\mathbf{x}-\mathbf{x}^b`, and by weighting the differences using
 290 :math:`\mathbf{B}` and :math:`\mathbf{R}` the two covariance matrices. The
 291 minimization of the :math:`J` function leads to the *best* :math:`\mathbf{x}`
 292 state estimation. To get more information about these notions, one can consult
 293 reference general documents like [Tarantola87]_.
 294
 295 State estimation possibilities extension, by using more explicitly optimization
 296 methods and their properties, can be imagined in two ways.
 297
 298 First, classical optimization methods involves using various gradient-based
 299 minimizing procedures. They are extremely efficient to look for a single local
 300 minimum. But they require the goal function :math:`J` to be sufficiently regular
 301 and differentiable, and are not able to capture global properties of the
 302 minimization problem, for example: global minimum, set of equivalent solutions
 303 due to over-parametrization, multiple local minima, etc. **A way to extend
 304 estimation possibilities is then to use a whole range of optimizers, allowing
 305 global minimization, various robust search properties, etc**. There is a lot of
 306 minimizing methods, such as stochastic ones, evolutionary ones, heuristics and
 307 meta-heuristics for real-valued problems, etc. They can treat partially
 308 irregular or noisy function :math:`J`, can characterize local minima, etc. The
 309 main drawback is a greater numerical cost to find state estimates, and no
 310 guarantee of convergence in finite time. Here, we only point the following
 311 topics, as the methods are available in the ADAO module: *Quantile Regression*
 312 [WikipediaQR]_ and *Particle Swarm Optimization* [WikipediaPSO]_.
 313
 314 Secondly, optimization methods try usually to minimize quadratic measures of
 315 errors, as the natural properties of such goal functions are well suited for
 316 classical gradient optimization. But other measures of errors can be more
 317 adapted to real physical simulation problems. Then, **an another way to extend
 318 estimation possibilities is to use other measures of errors to be reduced**. For
 319 example, we can cite *absolute error value*, *maximum error value*, etc. These
 320 error measures are not differentiables, but some optimization methods can deal
 321 with:  heuristics and meta-heuristics for real-valued problem, etc. As
 322 previously, the main drawback remain a greater numerical cost to find state
 323 estimates, and no guarantee of convergence in finite time. Here again, we only
 324 point the following methods as it is available in the ADAO module: *Particle
 325 swarm optimization* [WikipediaPSO]_.
 326
 327 The reader interested in the subject of optimization can look at [WikipediaMO]_
 328 as a general entry point.