doc/en/theory.rst

   1 .. _section_theory:
   2
   3 ================================================================================
   4 A brief introduction to Data Assimilation and Optimization
   5 ================================================================================
   6
   7 .. index:: single: Data Assimilation
   8 .. index:: single: true state
   9 .. index:: single: observation
  10 .. index:: single: a priori
  11
  12 **Data Assimilation** is a general framework for computing the optimal estimate
  13 of the true state of a system, over time if necessary. It uses values obtained
  14 by combining both observations and *a priori* models, including information
  15 about their errors.
  16
  17 In other words, data assimilation merges measurement data of a system, that are
  18 the observations, with *a priori* system physical and mathematical knowledge,
  19 embedded in numerical models, to obtain the best possible estimate of the system
  20 true state and of its stochastic properties. Note that this true state can not
  21 be reached, but can only be estimated. Moreover, despite the fact that the used
  22 information are stochastic by nature, data assimilation provides deterministic
  23 techniques in order to perform very efficiently the estimation.
  24
  25 Because data assimilation look for the **best possible** estimate, its
  26 underlying procedure always integrates optimization in order to find this
  27 estimate: particular optimization methods are always embedded in data
  28 assimilation algorithms. Optimization methods can be seen in ADAO as a way to
  29 extend data assimilation applications. They will be introduced this way in the
  30 section `Going further in the state estimation by optimization methods`_, but
  31 they are far more general and can be used without data assimilation concepts.
  32
  33 Two main types of applications exist in data assimilation, being covered by the
  34 same formalism: **parameters identification** and **fields reconstruction**.
  35 Before introducing the `Simple description of the data assimilation
  36 methodological framework`_ in a next section, we describe briefly these two
  37 types. At the end, some references allow `Going further in the data assimilation
  38 framework`_.
  39
  40 Fields reconstruction or measures interpolation
  41 -----------------------------------------------
  42
  43 .. index:: single: fields reconstruction
  44 .. index:: single: measures interpolation
  45
  46 Fields reconstruction consists in finding, from a restricted set of real
  47 measures, the physical field which is the most *consistent* with these measures.
  48
  49 This consistency is to understand in terms of interpolation, that is to say that
  50 the field we want to reconstruct, using data assimilation on measures, has to
  51 fit at best the measures, while remaining constrained by the overall field
  52 calculation. The calculation is thus an *a priori* estimation of the field that
  53 we seek to identify.
  54
  55 If the system evolves in time, the reconstruction has to be established on every
  56 time step, of the field as a whole. The interpolation process in this case is
  57 more complicated since it is temporal, and not only in terms of instantaneous
  58 values of the field.
  59
  60 A simple example of fields reconstruction comes from meteorology, in which one
  61 look for value of variables such as temperature or pressure in all points of the
  62 spatial domain. One have instantaneous measurements of these quantities at
  63 certain points, but also a history set of these measures. Moreover, these
  64 variables are constrained by evolution equations for the state of the
  65 atmosphere, which indicates for example that the pressure at a point can not
  66 take any value independently of the value at this same point in previous time.
  67 One must therefore make the reconstruction of a field at any point in space, in
  68 a "consistent" manner with the evolution equations and with the measures of the
  69 previous time steps.
  70
  71 Parameters identification, models adjustment, calibration
  72 ---------------------------------------------------------
  73
  74 .. index:: single: parameters identification
  75 .. index:: single: models adjustment
  76 .. index:: single: calibration
  77 .. index:: single: background
  78 .. index:: single: regularization
  79
  80 The identification of parameters by data assimilation is a form of state
  81 calibration which uses both the physical measurement and an *a priori*
  82 parameters estimation (called the "*background*") of the state that one seeks to
  83 identify, as well as a characterization of their errors. From this point of
  84 view, it uses all available information on the physical system (even if
  85 assumptions about errors are relatively restrictive) to find the "*optimal
  86 estimation*" from the true state. We note, in terms of optimization, that the
  87 background realizes a mathematical regularization of the main problem of
  88 parameters identification.
  89
  90 In practice, the two observed gaps "*calculation-background*" and
  91 "*calculation-measures*" are combined to build the calibration correction of
  92 parameters or initial conditions. The addition of these two gaps requires a
  93 relative weight, which is chosen to reflect the trust we give to each piece of
  94 information. This confidence is depicted by the covariance of the errors on the
  95 background and on the observations. Thus the stochastic aspect of information,
  96 measured or *a priori*, is essential for building the calibration error
  97 function.
  98
  99 A simple example of parameters identification comes from any kind of physical
 100 simulation process involving a parametrized model. For example, a static
 101 mechanical simulation of a beam constrained by some forces is described by beam
 102 parameters, such as a Young coefficient, or by the intensity of the force. The
 103 parameters estimation problem consists in finding for example the right Young
 104 coefficient value in order that the simulation of the beam corresponds to
 105 measurements, including the knowledge of errors.
 106
 107 Simple description of the data assimilation methodological framework
 108 --------------------------------------------------------------------
 109
 110 .. index:: single: background
 111 .. index:: single: background error covariances
 112 .. index:: single: observation error covariances
 113 .. index:: single: covariances
 114
 115 We can write these features in a simple manner. By default, all variables are
 116 vectors, as there are several parameters to readjust.
 117
 118 According to standard notations in data assimilation, we note
 119 :math:`\mathbf{x}^a` the optimal parameters that is to be determined by
 120 calibration, :math:`\mathbf{y}^o` the observations (or experimental
 121 measurements) that we must compare to the simulation outputs,
 122 :math:`\mathbf{x}^b` the background (*a priori* values, or regularization
 123 values) of searched parameters, :math:`\mathbf{x}^t` the unknown ideals
 124 parameters that would give exactly the observations (assuming that the errors
 125 are zero and the model is exact) as output.
 126
 127 In the simplest case, which is static, the steps of simulation and of
 128 observation can be combined into a single observation operator noted :math:`H`
 129 (linear or nonlinear). It transforms the input parameters :math:`\mathbf{x}` to
 130 results :math:`\mathbf{y}` to be directly compared to observations
 131 :math:`\mathbf{y}^o`. Moreover, we use the linearized operator
 132 :math:`\mathbf{H}` to represent the effect of the full operator :math:`H` around
 133 a linearization point (and we omit thereafter to mention :math:`H` even if it is
 134 possible to keep it). In reality, we have already indicated that the stochastic
 135 nature of variables is essential, coming from the fact that model, background
 136 and observations are all incorrect. We therefore introduce errors of
 137 observations additively, in the form of a random vector
 138 :math:`\mathbf{\epsilon}^o` such that:
 139
 140 .. math:: \mathbf{y}^o = \mathbf{H} \mathbf{x}^t + \mathbf{\epsilon}^o
 141
 142 The errors represented here are not only those from observation, but also from
 143 the simulation. We can always consider that these errors are of zero mean. We
 144 can then define a matrix :math:`\mathbf{R}` of the observation error covariances
 145 by:
 146
 147 .. math:: \mathbf{R} = E[\mathbf{\epsilon}^o.{\mathbf{\epsilon}^o}^T]
 148
 149 The background can also be written as a function of the true value, by
 150 introducing the error vector :math:`\mathbf{\epsilon}^b` such that:
 151
 152 .. math:: \mathbf{x}^b = \mathbf{x}^t + \mathbf{\epsilon}^b
 153
 154 where errors are also assumed of zero mean, in the same manner as for
 155 observations. We define the :math:`\mathbf{B}` matrix of background error
 156 covariances by:
 157
 158 .. math:: \mathbf{B} = E[\mathbf{\epsilon}^b.{\mathbf{\epsilon}^b}^T]
 159
 160 The optimal estimation of the true parameters :math:`\mathbf{x}^t`, given the
 161 background :math:`\mathbf{x}^b` and the observations :math:`\mathbf{y}^o`, is
 162 then the "*analysis*" :math:`\mathbf{x}^a` and comes from the minimisation of an
 163 error function (in variational assimilation) or from the filtering correction (in
 164 assimilation by filtering).
 165
 166 In **variational assimilation**, in a static case, one classically attempts to
 167 minimize the following function :math:`J`:
 168
 169 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 170
 171 which is usually designed as the "*3D-VAR*" function. Since :math:`\mathbf{B}`
 172 and :math:`\mathbf{R}` covariance matrices are proportional to the variances of
 173 errors, their presence in both terms of the function :math:`J` can effectively
 174 weight the differences by confidence in the background or observations errors.
 175 The parameters vector :math:`\mathbf{x}` realizing the minimum of this function
 176 therefore constitute the analysis :math:`\mathbf{x}^a`. It is at this level that
 177 we have to use the full panoply of function minimization methods otherwise known
 178 in optimization (see also section `Going further in the state estimation by
 179 optimization methods`_). Depending on the size of the parameters vector
 180 :math:`\mathbf{x}` to identify, and of the availability of gradient or Hessian
 181 of :math:`J`, it is appropriate to adapt the chosen optimization method
 182 (gradient, Newton, quasi-Newton...).
 183
 184 In **assimilation by filtering**, in this simple case usually referred to as
 185 "*BLUE*" (for "*Best Linear Unbiased Estimator*"), the :math:`\mathbf{x}^a`
 186 analysis is given as a correction of the background :math:`\mathbf{x}^b` by a
 187 term proportional to the difference between observations :math:`\mathbf{y}^o`
 188 and calculations :math:`\mathbf{H}\mathbf{x}^b`:
 189
 190 .. math:: \mathbf{x}^a = \mathbf{x}^b + \mathbf{K}(\mathbf{y}^o - \mathbf{H}\mathbf{x}^b)
 191
 192 where :math:`\mathbf{K}` is the Kalman gain matrix, which is expressed using
 193 covariance matrices in the following form:
 194
 195 .. math:: \mathbf{K} = \mathbf{B}\mathbf{H}^T(\mathbf{H}\mathbf{B}\mathbf{H}^T+\mathbf{R})^{-1}
 196
 197 The advantage of filtering is to explicitly calculate the gain, to produce then
 198 the *a posteriori* covariance analysis matrix.
 199
 200 In this simple static case, we can show, under the assumption of Gaussian error
 201 distributions, that the two *variational* and *filtering* approaches are
 202 equivalent.
 203
 204 It is indicated here that these methods of "*3D-VAR*" and "*BLUE*" may be
 205 extended to dynamic problems, called respectively "*4D-VAR*" and "*Kalman
 206 filter*". They can take into account the evolution operator to establish an
 207 analysis at the right time steps of the gap between observations and
 208 simulations, and to have, at every moment, the propagation of the background
 209 through the evolution model. Many other variants have been developed to improve
 210 the numerical quality of the methods or to take into account computer
 211 requirements such as calculation size and time.
 212
 213 Going further in the data assimilation framework
 214 ------------------------------------------------
 215
 216 .. index:: single: state estimation
 217 .. index:: single: parameter estimation
 218 .. index:: single: inverse problems
 219 .. index:: single: Bayesian estimation
 220 .. index:: single: optimal interpolation
 221 .. index:: single: mathematical regularization
 222 .. index:: single: regularization methods
 223 .. index:: single: data smoothing
 224
 225 To get more information about the data assimilation techniques, the reader can
 226 consult introductory documents like [Argaud09]_, on-line training courses or
 227 lectures like [Bouttier99]_ and [Bocquet04]_ (along with other materials coming
 228 from geosciences applications), or general documents like [Talagrand97]_,
 229 [Tarantola87]_, [Kalnay03]_, [Ide97]_ and [WikipediaDA]_.
 230
 231 Note that data assimilation is not restricted to meteorology or geo-sciences,
 232 but is widely used in other scientific domains. There are several fields in
 233 science and technology where the effective use of observed but incomplete data
 234 is crucial.
 235
 236 Some aspects of data assimilation are also known as *state estimation*,
 237 *parameter estimation*, *inverse problems*, *Bayesian estimation*, *optimal
 238 interpolation*, *mathematical regularization*, *data smoothing*, etc. These
 239 terms can be used in bibliographical searches.
 240
 241 Going further in the state estimation by optimization methods
 242 -------------------------------------------------------------
 243
 244 .. index:: single: state estimation
 245 .. index:: single: optimization methods
 246
 247 As seen before, in a static simulation case, the variational data assimilation
 248 requires to minimize the goal function :math:`J`:
 249
 250 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 251
 252 which is named the "*3D-VAR*" function. It can be seen as a *least squares
 253 minimization* extented form, obtained by adding a regularizing term using
 254 :math:`\mathbf{x}-\mathbf{x}^b`, and by weighting the differences using
 255 :math:`\mathbf{B}` and :math:`\mathbf{R}` the two covariance matrices. The
 256 minimization of the :math:`J` function leads to the *best* `\mathbf{x}` state
 257 estimation.
 258
 259 State estimation possibilities extension, by using more explicitly optimization
 260 methods and their properties, can be imagined in two ways.
 261
 262 First, classical optimization methods involves using various gradient-based
 263 minimizing procedures. They are extremely efficient to look for a single local
 264 minimum. But they require the goal function :math:`J` to be sufficiently regular
 265 and differentiable, and are not able to capture global properties of the
 266 minimization problem, for example: global minimum, set of equivalent solutions
 267 due to over-parametrization, multiple local minima, etc. **A way to extend
 268 estimation possibilities is then to use a whole range of optimizers, allowing
 269 global minimization, various robust search properties, etc**. There is a lot of
 270 minimizing methods, such as stochastic ones, evolutionary ones, heuristics and
 271 meta-heuristics for real-valued problems, etc. They can treat partially
 272 irregular or noisy function :math:`J`, can characterize local minima, etc. The
 273 main drawback is a greater numerical cost to find state estimates, and no
 274 guarantee of convergence in finite time. Here, we only point the following
 275 topics, as the methods are available in the ADAO module: *Quantile Regression*
 276 [WikipediaQR]_ and *Particle Swarm Optimization* [WikipediaPSO]_.
 277
 278 Secondly, optimization methods try usually to minimize quadratic measures of
 279 errors, as the natural properties of such goal functions are well suited for
 280 classical gradient optimization. But other measures of errors can be more
 281 adapted to real physical simulation problems. Then, **an another way to extend
 282 estimation possibilities is to use other measures of errors to be reduced**. For
 283 example, we can cite *absolute error value*, *maximum error value*, etc. These
 284 error measures are not differentiables, but some optimization methods can deal
 285 with:  heuristics and meta-heuristics for real-valued problem, etc. As
 286 previously, the main drawback remain a greater numerical cost to find state
 287 estimates, and no guarantee of convergence in finite time. Here, we point also
 288 the following methods as it is available in the ADAO module: *Particle swarm
 289 optimization* [WikipediaPSO]_.
 290
 291 The reader interested in the subject of optimization can look at [WikipediaMO]_
 292 as a general entry point.