doc/en/theory.rst

   1 .. _section_theory:
   2
   3 ================================================================================
   4 A brief introduction to Data Assimilation and Optimization
   5 ================================================================================
   6
   7 .. index:: single: Data Assimilation
   8 .. index:: single: true state
   9 .. index:: single: observation
  10 .. index:: single: a priori
  11
  12 **Data Assimilation** is a general framework for computing the optimal estimate
  13 of the true state of a system, over time if necessary. It uses values obtained
  14 by combining both observations and *a priori* models, including information
  15 about their errors.
  16
  17 In other words, data assimilation merges measurement data of a system, that are
  18 the observations, with *a priori* system physical and mathematical knowledge,
  19 embedded in numerical models, to obtain the best possible estimate of the system
  20 true state and of its stochastic properties. Note that this true state can not
  21 be reached, but can only be estimated. Moreover, despite the fact that the used
  22 information are stochastic by nature, data assimilation provides deterministic
  23 techniques in order to perform very efficiently the estimation.
  24
  25 Because data assimilation look for the **best possible** estimate, its
  26 underlying procedure always integrates optimization in order to find this
  27 estimate: particular optimization methods are always embedded in data
  28 assimilation algorithms. Optimization methods can be seen in ADAO as a way to
  29 extend data assimilation applications. They will be introduced this way in the
  30 section `Going further in the state estimation by optimization methods`_, but
  31 they are far more general and can be used without data assimilation concepts.
  32
  33 Two main types of applications exist in data assimilation, being covered by the
  34 same formalism: **parameters identification** and **fields reconstruction**.
  35 Before introducing the `Simple description of the data assimilation
  36 methodological framework`_ in a next section, we describe briefly these two
  37 types. At the end, some references allow `Going further in the data assimilation
  38 framework`_.
  39
  40 Fields reconstruction or measures interpolation
  41 -----------------------------------------------
  42
  43 .. index:: single: fields reconstruction
  44 .. index:: single: measures interpolation
  45 .. index:: single: fields interpolation
  46
  47 **Fields reconstruction (or interpolation)** consists in finding, from a
  48 restricted set of real measures, the physical field which is the most
  49 *consistent* with these measures.
  50
  51 This *consistency* is to understand in terms of interpolation, that is to say
  52 that the field we want to reconstruct, using data assimilation on measures, has
  53 to fit at best the measures, while remaining constrained by the overall field
  54 calculation. The calculation is thus an *a priori* estimation of the field that
  55 we seek to identify.
  56
  57 If the system evolves in time, the reconstruction has to be established on every
  58 time step, of the field as a whole. The interpolation process in this case is
  59 more complicated since it is temporal, and not only in terms of instantaneous
  60 values of the field.
  61
  62 A simple example of fields reconstruction comes from meteorology, in which one
  63 look for value of variables such as temperature or pressure in all points of the
  64 spatial domain. One have instantaneous measurements of these quantities at
  65 certain points, but also a history set of these measures. Moreover, these
  66 variables are constrained by evolution equations for the state of the
  67 atmosphere, which indicates for example that the pressure at a point can not
  68 take any value independently of the value at this same point in previous time.
  69 One must therefore make the reconstruction of a field at any point in space, in
  70 a "consistent" manner with the evolution equations and with the measures of the
  71 previous time steps.
  72
  73 Parameters identification, models adjustment, calibration
  74 ---------------------------------------------------------
  75
  76 .. index:: single: parameters identification
  77 .. index:: single: parameters adjustment
  78 .. index:: single: models adjustment
  79 .. index:: single: calibration
  80 .. index:: single: background
  81 .. index:: single: regularization
  82 .. index:: single: inverse problems
  83
  84 The **identification (or adjustment) of parameters** by data assimilation is a
  85 form of state calibration which uses both the physical measurement and an *a
  86 priori* parameters estimation (called the "*background*") of the state that one
  87 seeks to identify, as well as a characterization of their errors. From this
  88 point of view, it uses all available information on the physical system (even if
  89 assumptions about errors are relatively restrictive) to find the "*optimal
  90 estimation*" from the true state. We note, in terms of optimization, that the
  91 background realizes a "*regularization*", in a mathematical meaning, of the main
  92 problem of parameters identification. One can also use the terms "*inverse
  93 problems*" to refer to this process.
  94
  95 In practice, the two observed gaps "*calculation-background*" and
  96 "*calculation-measures*" are combined to build the calibration correction of
  97 parameters or initial conditions. The addition of these two gaps requires a
  98 relative weight, which is chosen to reflect the trust we give to each piece of
  99 information. This confidence is depicted by the covariance of the errors on the
 100 background and on the observations. Thus the stochastic aspect of information,
 101 measured or *a priori*, is essential for building the calibration error
 102 function.
 103
 104 A simple example of parameters identification comes from any kind of physical
 105 simulation process involving a parametrized model. For example, a static
 106 mechanical simulation of a beam constrained by some forces is described by beam
 107 parameters, such as a Young coefficient, or by the intensity of the force. The
 108 parameters estimation problem consists in finding for example the right Young
 109 coefficient value in order that the simulation of the beam corresponds to
 110 measurements, including the knowledge of errors.
 111
 112 Simple description of the data assimilation methodological framework
 113 --------------------------------------------------------------------
 114
 115 .. index:: single: background
 116 .. index:: single: background error covariances
 117 .. index:: single: observation error covariances
 118 .. index:: single: covariances
 119
 120 We can write these features in a simple manner. By default, all variables are
 121 vectors, as there are several parameters to readjust.
 122
 123 According to standard notations in data assimilation, we note
 124 :math:`\mathbf{x}^a` the optimal parameters that is to be determined by
 125 calibration, :math:`\mathbf{y}^o` the observations (or experimental
 126 measurements) that we must compare to the simulation outputs,
 127 :math:`\mathbf{x}^b` the background (*a priori* values, or regularization
 128 values) of searched parameters, :math:`\mathbf{x}^t` the unknown ideals
 129 parameters that would give exactly the observations (assuming that the errors
 130 are zero and the model is exact) as output.
 131
 132 In the simplest case, which is static, the steps of simulation and of
 133 observation can be combined into a single observation operator noted :math:`H`
 134 (linear or nonlinear). It transforms the input parameters :math:`\mathbf{x}` to
 135 results :math:`\mathbf{y}`, to be directly compared to observations
 136 :math:`\mathbf{y}^o`:
 137
 138 .. math:: \mathbf{y} = H(\mathbf{x})
 139
 140 Moreover, we use the linearized operator :math:`\mathbf{H}` to represent the
 141 effect of the full operator :math:`H` around a linearization point (and we omit
 142 thereafter to mention :math:`H` even if it is possible to keep it). In reality,
 143 we have already indicated that the stochastic nature of variables is essential,
 144 coming from the fact that model, background and observations are all incorrect.
 145 We therefore introduce errors of observations additively, in the form of a
 146 random vector :math:`\mathbf{\epsilon}^o` such that:
 147
 148 .. math:: \mathbf{y}^o = \mathbf{H} \mathbf{x}^t + \mathbf{\epsilon}^o
 149
 150 The errors represented here are not only those from observation, but also from
 151 the simulation. We can always consider that these errors are of zero mean.
 152 Noting :math:`E[.]` the classical mathematical expectation, we can then define a
 153 matrix :math:`\mathbf{R}` of the observation error covariances by:
 154
 155 .. math:: \mathbf{R} = E[\mathbf{\epsilon}^o.{\mathbf{\epsilon}^o}^T]
 156
 157 The background can also be written as a function of the true value, by
 158 introducing the error vector :math:`\mathbf{\epsilon}^b` such that:
 159
 160 .. math:: \mathbf{x}^b = \mathbf{x}^t + \mathbf{\epsilon}^b
 161
 162 where errors are also assumed of zero mean, in the same manner as for
 163 observations. We define the :math:`\mathbf{B}` matrix of background error
 164 covariances by:
 165
 166 .. math:: \mathbf{B} = E[\mathbf{\epsilon}^b.{\mathbf{\epsilon}^b}^T]
 167
 168 The optimal estimation of the true parameters :math:`\mathbf{x}^t`, given the
 169 background :math:`\mathbf{x}^b` and the observations :math:`\mathbf{y}^o`, is
 170 then the "*analysis*" :math:`\mathbf{x}^a` and comes from the minimisation of an
 171 error function (in variational assimilation) or from the filtering correction (in
 172 assimilation by filtering).
 173
 174 In **variational assimilation**, in a static case, one classically attempts to
 175 minimize the following function :math:`J`:
 176
 177 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 178
 179 which is usually designed as the "*3D-VAR*" function (see for example
 180 [Talagrand97]_). Since :math:`\mathbf{B}` and :math:`\mathbf{R}` covariance
 181 matrices are proportional to the variances of errors, their presence in both
 182 terms of the function :math:`J` can effectively weight the differences by
 183 confidence in the background or observations errors. The parameters vector
 184 :math:`\mathbf{x}` realizing the minimum of this function therefore constitute
 185 the analysis :math:`\mathbf{x}^a`. It is at this level that we have to use the
 186 full panoply of function minimization methods otherwise known in optimization
 187 (see also section `Going further in the state estimation by optimization
 188 methods`_). Depending on the size of the parameters vector :math:`\mathbf{x}` to
 189 identify, and of the availability of gradient or Hessian of :math:`J`, it is
 190 appropriate to adapt the chosen optimization method (gradient, Newton,
 191 quasi-Newton...).
 192
 193 In **assimilation by filtering**, in this simple case usually referred to as
 194 "*BLUE*" (for "*Best Linear Unbiased Estimator*"), the :math:`\mathbf{x}^a`
 195 analysis is given as a correction of the background :math:`\mathbf{x}^b` by a
 196 term proportional to the difference between observations :math:`\mathbf{y}^o`
 197 and calculations :math:`\mathbf{H}\mathbf{x}^b`:
 198
 199 .. math:: \mathbf{x}^a = \mathbf{x}^b + \mathbf{K}(\mathbf{y}^o - \mathbf{H}\mathbf{x}^b)
 200
 201 where :math:`\mathbf{K}` is the Kalman gain matrix, which is expressed using
 202 covariance matrices in the following form:
 203
 204 .. math:: \mathbf{K} = \mathbf{B}\mathbf{H}^T(\mathbf{H}\mathbf{B}\mathbf{H}^T+\mathbf{R})^{-1}
 205
 206 The advantage of filtering is to explicitly calculate the gain, to produce then
 207 the *a posteriori* covariance analysis matrix.
 208
 209 In this simple static case, we can show, under an assumption of Gaussian error
 210 distributions (very little restrictive in practice), that the two *variational*
 211 and *filtering* approaches give the same solution.
 212
 213 It is indicated here that these methods of "*3D-VAR*" and "*BLUE*" may be
 214 extended to dynamic problems, called respectively "*4D-VAR*" and "*Kalman
 215 filter*". They can take into account the evolution operator to establish an
 216 analysis at the right time steps of the gap between observations and
 217 simulations, and to have, at every moment, the propagation of the background
 218 through the evolution model. Many other variants have been developed to improve
 219 the numerical quality of the methods or to take into account computer
 220 requirements such as calculation size and time.
 221
 222 Going further in the data assimilation framework
 223 ------------------------------------------------
 224
 225 .. index:: single: state estimation
 226 .. index:: single: parameter estimation
 227 .. index:: single: inverse problems
 228 .. index:: single: Bayesian estimation
 229 .. index:: single: optimal interpolation
 230 .. index:: single: mathematical regularization
 231 .. index:: single: regularization methods
 232 .. index:: single: data smoothing
 233
 234 To get more information about the data assimilation techniques, the reader can
 235 consult introductory documents like [Talagrand97]_ or [Argaud09]_, on-line
 236 training courses or lectures like [Bouttier99]_ and [Bocquet04]_ (along with
 237 other materials coming from geosciences applications), or general documents like
 238 [Talagrand97]_, [Tarantola87]_, [Kalnay03]_, [Ide97]_ and [WikipediaDA]_.
 239
 240 Note that data assimilation is not restricted to meteorology or geo-sciences,
 241 but is widely used in other scientific domains. There are several fields in
 242 science and technology where the effective use of observed but incomplete data
 243 is crucial.
 244
 245 Some aspects of data assimilation are also known as *state estimation*,
 246 *parameter estimation*, *inverse problems*, *Bayesian estimation*, *optimal
 247 interpolation*, *mathematical regularization*, *data smoothing*, etc. These
 248 terms can be used in bibliographical searches.
 249
 250 Going further in the state estimation by optimization methods
 251 -------------------------------------------------------------
 252
 253 .. index:: single: state estimation
 254 .. index:: single: optimization methods
 255
 256 As seen before, in a static simulation case, the variational data assimilation
 257 requires to minimize the goal function :math:`J`:
 258
 259 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 260
 261 which is named the "*3D-VAR*" function. It can be seen as a *least squares
 262 minimization* extented form, obtained by adding a regularizing term using
 263 :math:`\mathbf{x}-\mathbf{x}^b`, and by weighting the differences using
 264 :math:`\mathbf{B}` and :math:`\mathbf{R}` the two covariance matrices. The
 265 minimization of the :math:`J` function leads to the *best* :math:`\mathbf{x}`
 266 state estimation.
 267
 268 State estimation possibilities extension, by using more explicitly optimization
 269 methods and their properties, can be imagined in two ways.
 270
 271 First, classical optimization methods involves using various gradient-based
 272 minimizing procedures. They are extremely efficient to look for a single local
 273 minimum. But they require the goal function :math:`J` to be sufficiently regular
 274 and differentiable, and are not able to capture global properties of the
 275 minimization problem, for example: global minimum, set of equivalent solutions
 276 due to over-parametrization, multiple local minima, etc. **A way to extend
 277 estimation possibilities is then to use a whole range of optimizers, allowing
 278 global minimization, various robust search properties, etc**. There is a lot of
 279 minimizing methods, such as stochastic ones, evolutionary ones, heuristics and
 280 meta-heuristics for real-valued problems, etc. They can treat partially
 281 irregular or noisy function :math:`J`, can characterize local minima, etc. The
 282 main drawback is a greater numerical cost to find state estimates, and no
 283 guarantee of convergence in finite time. Here, we only point the following
 284 topics, as the methods are available in the ADAO module: *Quantile Regression*
 285 [WikipediaQR]_ and *Particle Swarm Optimization* [WikipediaPSO]_.
 286
 287 Secondly, optimization methods try usually to minimize quadratic measures of
 288 errors, as the natural properties of such goal functions are well suited for
 289 classical gradient optimization. But other measures of errors can be more
 290 adapted to real physical simulation problems. Then, **an another way to extend
 291 estimation possibilities is to use other measures of errors to be reduced**. For
 292 example, we can cite *absolute error value*, *maximum error value*, etc. These
 293 error measures are not differentiables, but some optimization methods can deal
 294 with:  heuristics and meta-heuristics for real-valued problem, etc. As
 295 previously, the main drawback remain a greater numerical cost to find state
 296 estimates, and no guarantee of convergence in finite time. Here, we point also
 297 the following methods as it is available in the ADAO module: *Particle swarm
 298 optimization* [WikipediaPSO]_.
 299
 300 The reader interested in the subject of optimization can look at [WikipediaMO]_
 301 as a general entry point.