doc/theory.rst

   1 .. _section_theory:
   2
   3 ================================================================================
   4 A brief introduction to Data Assimilation and Optimization
   5 ================================================================================
   6
   7 .. index:: single: Data Assimilation
   8 .. index:: single: true state
   9 .. index:: single: observation
  10 .. index:: single: a priori
  11
  12 **Data Assimilation** is a general framework for computing the optimal estimate
  13 of the true state of a system, over time if necessary. It uses values obtained
  14 by combining both observations and *a priori* models, including information
  15 about their errors.
  16
  17 In other words, data assimilation merges measurement data of a system, that are
  18 the observations, with *a priori* system physical and mathematical knowledge,
  19 embedded in numerical models, to obtain the best possible estimate of the system
  20 true state and of its stochastic properties. Note that this true state can not
  21 be reached, but can only be estimated. Moreover, despite the fact that the used
  22 information are stochastic by nature, data assimilation provides deterministic
  23 techniques in order to realize the estimation.
  24
  25 Because data assimilation look for the **best possible** estimate, its
  26 underlying procedure always integrates optimization in order to find this
  27 estimate: particular optimization methods are always embedded in data
  28 assimilation algorithms. Optimization methods can be seen here as a way to
  29 extend data assimilation applications. They will be introduced this way in the
  30 section `Going further in the state estimation by optimization methods`_, but
  31 they are far more general and can be used without data assimilation concepts.
  32
  33 Two main types of applications exist in data assimilation, being covered by the
  34 same formalism: **parameters identification** and **fields reconstruction**.
  35 Before introducing the `Simple description of the data assimilation framework`_
  36 in a next section, we describe briefly these two types. At the end, some
  37 references allow `Going further in the data assimilation framework`_.
  38
  39 Fields reconstruction or measures interpolation
  40 -----------------------------------------------
  41
  42 .. index:: single: fields reconstruction
  43
  44 Fields reconstruction consists in finding, from a restricted set of real
  45 measures, the physical field which is the most *consistent* with these measures.
  46
  47 This consistency is to understand in terms of interpolation, that is to say that
  48 the field we want to reconstruct, using data assimilation on measures, has to
  49 fit at best the measures, while remaining constrained by the overall
  50 calculation. The calculation is thus an *a priori* estimation of the field that
  51 we seek to identify.
  52
  53 If the system evolves in time, the reconstruction has to be established on every
  54 time step, as a whole. The interpolation process in this case is more
  55 complicated since it is temporal, not only in terms of instantaneous values of
  56 the field.
  57
  58 A simple example of fields reconstruction comes from meteorology, in which one
  59 look for value of variables such as temperature or pressure in all points of the
  60 spatial domain. One have instantaneous measurements of these quantities at
  61 certain points, but also a history set of these measures. Moreover, these
  62 variables are constrained by evolution equations for the state of the
  63 atmosphere, which indicates for example that the pressure at a point can not
  64 take any value independently of the value at this same point in previous time.
  65 One must therefore make the reconstruction of a field at any point in space, in
  66 a "consistent" manner with the evolution equations and with the measures of the
  67 previous time steps.
  68
  69 Parameters identification or calibration
  70 ----------------------------------------
  71
  72 .. index:: single: parameters identification
  73
  74 The identification of parameters by data assimilation is a form of calibration
  75 which uses both the measurement and an *a priori* estimation (called the
  76 "*background*") of the state that one seeks to identify, as well as a
  77 characterization of their errors. From this point of view, it uses all available
  78 information on the physical system (even if assumptions about errors are
  79 relatively restrictive) to find the "*optimal*" estimation from the true state.
  80 We note, in terms of optimization, that the background realizes a mathematical
  81 regularization of the main problem of parameters identification.
  82
  83 In practice, the two observed gaps "*calculation-background*" and
  84 "*calculation-measures*" are added to build the calibration correction of
  85 parameters or initial conditions. The addition of these two gaps requires a
  86 relative weight, which is chosen to reflect the trust we give to each piece of
  87 information. This confidence is measured by the covariance of the errors on the
  88 background and on the observations. Thus the stochastic aspect of information,
  89 measured or *a priori*, is essential for building the calibration error
  90 function.
  91
  92 A simple example of parameters identification comes from any kind of physical
  93 simulation process involving a parametrized model. For example, a static
  94 mechanical simulation of a beam constrained by some forces is described by beam
  95 parameters, such as a Young coefficient, or by the intensity of the force. The
  96 parameter estimation problem consists in finding for example the right Young
  97 coefficient in order that the simulation of the beam corresponds to
  98 measurements, including the knowledge of errors.
  99
 100 Simple description of the data assimilation framework
 101 -----------------------------------------------------
 102
 103 .. index:: single: background
 104 .. index:: single: background error covariances
 105 .. index:: single: observation error covariances
 106 .. index:: single: covariances
 107
 108 We can write these features in a simple manner. By default, all variables are
 109 vectors, as there are several parameters to readjust.
 110
 111 According to standard notations in data assimilation, we note
 112 :math:`\mathbf{x}^a` the optimal parameters that is to be determined by
 113 calibration, :math:`\mathbf{y}^o` the observations (or experimental
 114 measurements) that we must compare to the simulation outputs,
 115 :math:`\mathbf{x}^b` the background (*a priori* values, or regularization
 116 values) of searched parameters, :math:`\mathbf{x}^t` the unknown ideals
 117 parameters that would give exactly the observations (assuming that the errors
 118 are zero and the model is exact) as output.
 119
 120 In the simplest case, which is static, the steps of simulation and of
 121 observation can be combined into a single observation operator noted :math:`H`
 122 (linear or nonlinear), which transforms the input parameters :math:`\mathbf{x}`
 123 to results :math:`\mathbf{y}` to be compared to observations
 124 :math:`\mathbf{y}^o`. Moreover, we use the linearized operator
 125 :math:`\mathbf{H}` to represent the effect of the full operator :math:`H` around
 126 a linearization point (and we omit thereafter to mention :math:`H` even if it is
 127 possible to keep it). In reality, we have already indicated that the stochastic
 128 nature of variables is essential, coming from the fact that model, background
 129 and observations are incorrect. We therefore introduce errors of observations
 130 additively, in the form of a random vector :math:`\mathbf{\epsilon}^o` such
 131 that:
 132
 133 .. math:: \mathbf{y}^o = \mathbf{H} \mathbf{x}^t + \mathbf{\epsilon}^o
 134
 135 The errors represented here are not only those from observation, but also from
 136 the simulation. We can always consider that these errors are of zero mean. We
 137 can then define a matrix :math:`\mathbf{R}` of the observation error covariances
 138 by:
 139
 140 .. math:: \mathbf{R} = E[\mathbf{\epsilon}^o.{\mathbf{\epsilon}^o}^T]
 141
 142 The background can also be written as a function of the true value, by
 143 introducing the error vector :math:`\mathbf{\epsilon}^b`:
 144
 145 .. math:: \mathbf{x}^b = \mathbf{x}^t + \mathbf{\epsilon}^b
 146
 147 where errors are also assumed of zero mean, in the same manner as for
 148 observations. We define the :math:`\mathbf{B}` matrix of background error
 149 covariances by:
 150
 151 .. math:: \mathbf{B} = E[\mathbf{\epsilon}^b.{\mathbf{\epsilon}^b}^T]
 152
 153 The optimal estimation of the true parameters :math:`\mathbf{x}^t`, given the
 154 background :math:`\mathbf{x}^b` and the observations :math:`\mathbf{y}^o`, is
 155 then the "*analysis*" :math:`\mathbf{x}^a` and comes from the minimisation of an
 156 error function (in variational assimilation) or from the filtering correction (in
 157 assimilation by filtering).
 158
 159 In **variational assimilation**, in a static case, one classically attempts to
 160 minimize the following function :math:`J`:
 161
 162 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 163
 164 which is usually designed as the "*3D-VAR*" function. Since :math:`\mathbf{B}`
 165 and :math:`\mathbf{R}` covariance matrices are proportional to the variances of
 166 errors, their presence in both terms of the function :math:`J` can effectively
 167 weight the differences by confidence in the background or observations. The
 168 parameters vector :math:`\mathbf{x}` realizing the minimum of this function
 169 therefore constitute the analysis :math:`\mathbf{x}^a`. It is at this level that
 170 we have to use the full panoply of function minimization methods otherwise known
 171 in optimization (see also section `Going further in the state estimation by
 172 optimization methods`_). Depending on the size of the parameters vector
 173 :math:`\mathbf{x}` to identify and of the availability of gradient and Hessian
 174 of :math:`J`, it is appropriate to adapt the chosen optimization method
 175 (gradient, Newton, quasi-Newton...).
 176
 177 In **assimilation by filtering**, in this simple case usually referred to as
 178 "*BLUE*" (for "*Best Linear Unbiased Estimator*"), the :math:`\mathbf{x}^a`
 179 analysis is given as a correction of the background :math:`\mathbf{x}^b` by a
 180 term proportional to the difference between observations :math:`\mathbf{y}^o`
 181 and calculations :math:`\mathbf{H}\mathbf{x}^b`:
 182
 183 .. math:: \mathbf{x}^a = \mathbf{x}^b + \mathbf{K}(\mathbf{y}^o - \mathbf{H}\mathbf{x}^b)
 184
 185 where :math:`\mathbf{K}` is the Kalman gain matrix, which is expressed using
 186 covariance matrices in the following form:
 187
 188 .. math:: \mathbf{K} = \mathbf{B}\mathbf{H}^T(\mathbf{H}\mathbf{B}\mathbf{H}^T+\mathbf{R})^{-1}
 189
 190 The advantage of filtering is to explicitly calculate the gain, to produce then
 191 the *a posteriori* covariance analysis matrix.
 192
 193 In this simple static case, we can show, under the assumption of Gaussian error
 194 distributions, that the two *variational* and *filtering* approaches are
 195 equivalent.
 196
 197 It is indicated here that these methods of "*3D-VAR*" and "*BLUE*" may be
 198 extended to dynamic problems, called respectively "*4D-VAR*" and "*Kalman
 199 filter*". They can take into account the evolution operator to establish an
 200 analysis at the right time steps of the gap between observations and simulations,
 201 and to have, at every moment, the propagation of the background through the
 202 evolution model. Many other variants have been developed to improve the
 203 numerical quality or to take into account computer requirements such as
 204 calculation size and time.
 205
 206 Going further in the data assimilation framework
 207 ------------------------------------------------
 208
 209 .. index:: single: state estimation
 210 .. index:: single: parameter estimation
 211 .. index:: single: inverse problems
 212 .. index:: single: Bayesian estimation
 213 .. index:: single: optimal interpolation
 214 .. index:: single: mathematical regularization
 215 .. index:: single: data smoothing
 216
 217 To get more information about all the data assimilation techniques, the reader
 218 can consult introductory documents like [Argaud09]_, on-line training courses or
 219 lectures like [Bouttier99]_ and [Bocquet04]_ (along with other materials coming
 220 from geosciences applications), or general documents like [Talagrand97]_,
 221 [Tarantola87]_, [Kalnay03]_, [Ide97]_ and [WikipediaDA]_.
 222
 223 Note that data assimilation is not restricted to meteorology or geo-sciences, but
 224 is widely used in other scientific domains. There are several fields in science
 225 and technology where the effective use of observed but incomplete data is
 226 crucial.
 227
 228 Some aspects of data assimilation are also known as *state estimation*,
 229 *parameter estimation*, *inverse problems*, *Bayesian estimation*, *optimal
 230 interpolation*, *mathematical regularization*, *data smoothing*, etc. These
 231 terms can be used in bibliographical searches.
 232
 233 Going further in the state estimation by optimization methods
 234 -------------------------------------------------------------
 235
 236 .. index:: single: state estimation
 237 .. index:: single: optimization methods
 238
 239 As seen before, in a static simulation case, the variational data assimilation
 240 requires to minimize the goal function :math:`J`:
 241
 242 .. math:: J(\mathbf{x})=(\mathbf{x}-\mathbf{x}^b)^T.\mathbf{B}^{-1}.(\mathbf{x}-\mathbf{x}^b)+(\mathbf{y}^o-\mathbf{H}.\mathbf{x})^T.\mathbf{R}^{-1}.(\mathbf{y}^o-\mathbf{H}.\mathbf{x})
 243
 244 which is named the "*3D-VAR*" function. It can be seen as a *least squares
 245 minimization* extented form, obtained by adding a regularizing term using
 246 :math:`\mathbf{x}-\mathbf{x}^b`, and by weighting the differences using
 247 :math:`\mathbf{B}` and :math:`\mathbf{R}` the two covariance matrices. The
 248 minimization of the :math:`J` function leads to the *best* state estimation.
 249
 250 State estimation possibilities extension, by using more explicitly optimization
 251 methods and their properties, can be imagined in two ways.
 252
 253 First, classical optimization methods involves using various gradient-based
 254 minimizing procedures. They are extremely efficient to look for a single local
 255 minimum. But they require the goal function :math:`J` to be sufficiently regular
 256 and differentiable, and are not able to capture global properties of the
 257 minimization problem, for example: global minimum, set of equivalent solutions
 258 due to over-parametrization, multiple local minima, etc. **A way to extend
 259 estimation possibilities is then to use a whole range of optimizers, allowing
 260 global minimization, various robust search properties, etc**. There is a lot of
 261 minimizing methods, such as stochastic ones, evolutionary ones, heuristics and
 262 meta-heuristics for real-valued problems, etc. They can treat partially irregular
 263 or noisy function :math:`J`, can characterize local minima, etc. The main
 264 drawback is a greater numerical cost to find state estimates, and no guarantee
 265 of convergence in finite time. Here, we only point the following
 266 topics, as the methods are available in the ADAO module: *Quantile regression*
 267 [WikipediaQR]_ and *Particle swarm optimization* [WikipediaPSO]_.
 268
 269 Secondly, optimization methods try usually to minimize quadratic measures of
 270 errors, as the natural properties of such goal functions are well suited for
 271 classical gradient optimization. But other measures of errors can be more
 272 adapted to real physical simulation problems. Then, **an another way to extend
 273 estimation possibilities is to use other measures of errors to be reduced**. For
 274 example, we can cite *absolute error value*, *maximum error value*, etc. These
 275 error measures are not differentiables, but some optimization methods can deal
 276 with:  heuristics and meta-heuristics for real-valued problem, etc. As
 277 previously, the main drawback remain a greater numerical cost to find state
 278 estimates, and no guarantee of convergence in finite time. Here, we point also
 279 the following methods as it is available in the ADAO module: *Particle swarm
 280 optimization* [WikipediaPSO]_.
 281
 282 The reader interested in the subject of optimization can look at [WikipediaMO]_
 283 as a general entry point.