doc/en/methodology.rst

   1 ..
   2    Copyright (C) 2008-2017 EDF R&D
   3
   4    This file is part of SALOME ADAO module.
   5
   6    This library is free software; you can redistribute it and/or
   7    modify it under the terms of the GNU Lesser General Public
   8    License as published by the Free Software Foundation; either
   9    version 2.1 of the License, or (at your option) any later version.
  10
  11    This library is distributed in the hope that it will be useful,
  12    but WITHOUT ANY WARRANTY; without even the implied warranty of
  13    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  14    Lesser General Public License for more details.
  15
  16    You should have received a copy of the GNU Lesser General Public
  17    License along with this library; if not, write to the Free Software
  18    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  19
  20    See http://www.salome-platform.org/ or email : webmaster.salome@opencascade.com
  21
  22    Author: Jean-Philippe Argaud, jean-philippe.argaud@edf.fr, EDF R&D
  23
  24 .. _section_methodology:
  25
  26 ================================================================================
  27 **[DocT]** Methodology to elaborate a Data Assimilation or Optimization study
  28 ================================================================================
  29
  30 This section presents the methodology to build a Data Assimilation or
  31 Optimization study. It describes the conceptual steps to build autonomously such
  32 a study. It is not dependent of any tool, but the ADAO module allows to set up
  33 efficiently such a study, following :ref:`section_using`. Notations are the same
  34 than the ones used in :ref:`section_theory`.
  35
  36 Logical procedure for a study
  37 -----------------------------
  38
  39 For a generic Data Assimilation or Optimization study, the main methodological
  40 steps can be the following:
  41
  42     - :ref:`section_m_step1`
  43     - :ref:`section_m_step2`
  44     - :ref:`section_m_step3`
  45     - :ref:`section_m_step4`
  46     - :ref:`section_m_step5`
  47     - :ref:`section_m_step6`
  48     - :ref:`section_m_step7`
  49
  50 Each step will be detailed in the next sections.
  51
  52 Detailed procedure for a study
  53 ------------------------------
  54
  55 .. _section_m_step1:
  56
  57 STEP 1: Specifying the resolution of the physical problem and the parameters to adjust
  58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  59
  60 An essential knowledge, about the physical system to be studied, is the
  61 numerical simulation, often available through calculation case(s) and symbolized
  62 as a **simulation operator** (previously included in :math:`H`). A standard
  63 calculation case gathers model hypothesis, numerical implementation, computing
  64 capacities, etc. in order to represent the behavior of the physical system.
  65 Moreover, a calculation case is characterized by its computing time and memory
  66 requirements, its data and results sizes, etc. The knowledge of all these
  67 elements is of primary importance in the setup of the data assimilation or
  68 optimization study.
  69
  70 To state correctly the study, one have also to state or choose the unknowns of
  71 the simulation. For example, this can be expressed through physical models, of
  72 which the parameters can be adjusted. Moreover, it is always useful to add some
  73 knowledge of sensitivity, for example of the numerical simulation to the
  74 parameters that can be adjusted. More general elements, like stability or
  75 regularity of the simulation with respect to the unknown inputs, are also of
  76 great interest.
  77
  78 Technically, optimization methods can require gradient information of the
  79 simulation with respect to unknowns. In this case, explicit gradient code has to
  80 be given or numerical gradient has to be tuned. Its quality is in relation with
  81 code stability or regularity, and it has to be checked carefully before
  82 establishing optimization calculations.
  83
  84 An **observation operator** is always required, in complement of the simulation
  85 operator. This observation operator, denoted as :math:`H` or included in, has to
  86 convert the numerical simulation outputs into something that is directly
  87 comparable to observations. It is as essential operator as it is the way to
  88 compare simulations and observations. It is usually done by sampling, projection
  89 or integration, of the numerical outputs, but it can be more complicated. Often,
  90 because the observation operator follows the simulation one in simple data
  91 assimilation schemes,
  92
  93 .. _section_m_step2:
  94
  95 STEP 2: Specifying the criteria for physical results qualification
  96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  97
  98 Because the studied system are real physical ones, it is of great importance to
  99 express the **physical information that can help to qualify a simulated state**.
 100 There are two main types of such information that leads to criteria allowing
 101 qualification and quantification of future results.
 102
 103 First, coming from numerical or mathematical knowledge, a lot of standard
 104 criteria allow to qualify, relatively or in absolute, the quality of a state.
 105 For example, balance equations or equation closing conditions are good
 106 complementary measures of optimized state quality. Criteria like RMS, RMSE,
 107 field extrema, integrals, etc. are also of great interest to assess optimized
 108 state quality.
 109
 110 Second, coming from physical or experimental knowledge, valuable information can
 111 be obtained on the meaning of optimized states or results. In particular,
 112 physical validity or technical interest can assess of the mathematical results
 113 of the optimization.
 114
 115 In order to get helpful information from these two main types of knowledge, it
 116 is recommended, if possible, to build numerical criteria to ease the assessment
 117 of physical results quality.
 118
 119 .. _section_m_step3:
 120
 121 STEP 3: Identifying and describe the available observations
 122 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 123
 124 As the second main source of knowledge of the physical system to be studied, the
 125 **observations, or measures,** denoted as :math:`\mathbf{y}^o`, has to be
 126 properly described. The quality of the measures, their intrinsic errors, the
 127 special features is worth to know, in order to introduce these information in
 128 the data assimilation or optimization calculations.
 129
 130 The observations have not only to be available, but also to be easily introduced
 131 in the numerical framework of calculation  or optimization. So the computing
 132 environment  giving access to the observations is of great importance to smooth
 133 the effective use of various measures and sources of measures, and to promote
 134 extensive tests using measures. Computing environment covers availability in
 135 database or not, data formats, computing interfaces...
 136
 137 .. _section_m_step4:
 138
 139 STEP 4: Specifying the AD/Optimization modeling elements (covariance, background...)
 140 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 141
 142 Additional Data Assimilation and Optimization modeling elements allows to
 143 improve information about the fine physical representation of the studied
 144 system.
 145
 146 The *a-priori* knowledge of the system state can be modelized using the
 147 **background**, denoted as :math:`\mathbf{x}^b`, and the **background error
 148 covariance matrix**, denoted as :math:`\mathbf{B}`. These information are
 149 extremely important to complete, in particular in order to obtain meaningful
 150 results from Data Assimilation.
 151
 152 On the other hand, information on observation errors can be used to fill the
 153 **observation error covariance matrix** denoted as :math:`\mathbf{R}`. As for
 154 :math:`\mathbf{B}`, it is recommended to use carefully checked data to fill
 155 these covariance matrices.
 156
 157 In case of dynamic simulation, one has to define also an **evolution operator**
 158 and the associated error covariance matrix.
 159
 160 .. _section_m_step5:
 161
 162 STEP 5: Choosing the algorithms and their parameters
 163 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 164
 165 Data Assimilation or Optimization requires to solve an optimization problem,
 166 more often modelized as a minimization problem. Depending on the availability of
 167 the gradient of the cost function with respect to the optimization parameters,
 168 recommended class of optimization methods are different. Variational or locally
 169 linearized minimization methods requires this gradient. On the opposite,
 170 derivative free optimization methods doesn't requires this gradient but usually
 171 at a higher computational price.
 172
 173 Inside a class of optimization methods, there is usually a trade-off between the
 174 *"generic capacity of a method"* and the *"particular performance on a specific
 175 problem"*. Generic methods, as for example variational minimization using the
 176 :ref:`section_ref_algorithm_3DVAR`, present remarkable properties of efficiency,
 177 robustness and reliability, that leads to recommend it independently of the
 178 problem. Moreover, it is generally difficult to tune the parameters of an
 179 optimization method, so the most robust one is often the one with the less
 180 parameters. Finally, at least for the beginning, it is recommended to use the
 181 most generic method and to change the less possible the known default
 182 parameters.
 183
 184 .. _section_m_step6:
 185
 186 STEP 6: Conducting the calculations and get the results
 187 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 188
 189 After setting up the Data Assimilation or Optimization study, the calculation
 190 has to be done in an efficient way.
 191
 192 Because optimizing usually involves a lot of elementary physical simulation of
 193 the system, the calculations are often done in Hight Performance Computing (HPC)
 194 environment to reduce the overall user time. Even if the optimization problem is
 195 small, the simulation time can be long, requiring efficient computing resources.
 196 These requirements have to be taken into account early enough in the study
 197 procedure to be satisfied without needing too much effort.
 198
 199 For the same reason of hight computing requirements, it is important to
 200 carefully prepare the outputs of the optimization procedure. The optimal state
 201 is the main required information, but a lot of other special information can be
 202 obtained during or at the end of the optimization process: error evaluations,
 203 intermediary states, quality indicators... All these information, sometimes
 204 requiring additional processing, has to be asked at the beginning of the
 205 optimization process.
 206
 207 .. _section_m_step7:
 208
 209 STEP 7: Exploiting the results and qualify their physical properties
 210 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 211
 212 Once getting the results, they have to be interpreted in terms of physical and
 213 numerical meaning. Even if the optimization calculation always give a new
 214 optimal state at least as good as the *a priori* one, and hopefully better, this
 215 optimal state has for example to be checked with respect to the quality criteria
 216 identified when :ref:`section_m_step2`. This can lead to physical, statistical
 217 or numerical studies in order to assess the interest of the optimal state to
 218 represent the physical system.
 219
 220 Besides this analysis that has to be done for each Data Assimilation or
 221 Optimization study, it can be worth to exploit the optimization results as part
 222 of a more complete study of the physical system.