doc/en/methodology.rst

   1 ..
   2    Copyright (C) 2008-2018 EDF R&D
   3
   4    This file is part of SALOME ADAO module.
   5
   6    This library is free software; you can redistribute it and/or
   7    modify it under the terms of the GNU Lesser General Public
   8    License as published by the Free Software Foundation; either
   9    version 2.1 of the License, or (at your option) any later version.
  10
  11    This library is distributed in the hope that it will be useful,
  12    but WITHOUT ANY WARRANTY; without even the implied warranty of
  13    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  14    Lesser General Public License for more details.
  15
  16    You should have received a copy of the GNU Lesser General Public
  17    License along with this library; if not, write to the Free Software
  18    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  19
  20    See http://www.salome-platform.org/ or email : webmaster.salome@opencascade.com
  21
  22    Author: Jean-Philippe Argaud, jean-philippe.argaud@edf.fr, EDF R&D
  23
  24 .. _section_methodology:
  25
  26 ================================================================================
  27 **[DocT]** Methodology to elaborate a Data Assimilation or Optimization study
  28 ================================================================================
  29
  30 This section presents a generic methodology to build a Data Assimilation or
  31 Optimization study. It describes the conceptual steps to build autonomously this
  32 study. It is not dependent of any tool, but the ADAO module allows to set up
  33 efficiently such a study, following :ref:`section_using`. Notations are the same
  34 than the ones used in :ref:`section_theory`.
  35
  36 Logical procedure for a study
  37 -----------------------------
  38
  39 For a generic Data Assimilation or Optimization study, the main methodological
  40 steps can be the following:
  41
  42     - :ref:`section_m_step1`
  43     - :ref:`section_m_step2`
  44     - :ref:`section_m_step3`
  45     - :ref:`section_m_step4`
  46     - :ref:`section_m_step5`
  47     - :ref:`section_m_step6`
  48     - :ref:`section_m_step7`
  49
  50 Each step will be detailed in the next sections.
  51
  52 Detailed procedure for a study
  53 ------------------------------
  54
  55 .. _section_m_step1:
  56
  57 STEP 1: Specifying the resolution of the physical problem and the parameters to adjust
  58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  59
  60 An essential knowledge about the studied physical system is the numerical
  61 simulation. It is often available through calculation case(s), and symbolized as
  62 a **simulation operator** (previously included in :math:`H`). A standard
  63 calculation case gathers model hypothesis, numerical implementation, computing
  64 capacities, etc. in order to represent the behavior of the physical system.
  65 Moreover, a calculation case is characterized for example by its computing time
  66 and memory requirements, its data and results sizes, etc. The knowledge of all
  67 these elements is of primary importance in the setup of the data assimilation or
  68 optimization study.
  69
  70 To state correctly the study, one have also to choose the optimization unknowns
  71 in the simulation. Frequently, this can be through physical models of which the
  72 parameters can be adjusted. Moreover, it is always useful to add some knowledge
  73 of sensitivity, for example of the numerical simulation to the parameters that
  74 can be adjusted. More general elements, like stability or regularity of the
  75 simulation with respect to the unknown inputs, are also of great interest.
  76
  77 Technically, optimization methods can require gradient information of the
  78 simulation with respect to unknowns. In this case, explicit gradient code has to
  79 be given, or numerical gradient has to be tuned. Its quality is in relation with
  80 the simulation code stability or regularity, and it has to be checked carefully
  81 before establishing optimization calculations. Specific conditions has to be
  82 used for these checkings.
  83
  84 An **observation operator** is always required, in complement of the simulation
  85 operator. This observation operator, denoted as :math:`H` or included in, has to
  86 convert the numerical simulation outputs into something that is directly
  87 comparable to observations. It is an essential operator, as it is the practical
  88 way to compare simulations and observations. It is usually done by sampling,
  89 projection or integration, of the simulation outputs, but it can be more
  90 complicated. Often, because the observation operator directly follows the
  91 simulation one in simple data assimilation schemes, this observation operator
  92 heavily use the postprocessing and extraction capacities of the simulation code.
  93
  94 .. _section_m_step2:
  95
  96 STEP 2: Specifying the criteria for physical results qualification
  97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  98
  99 Because the studied system are real physical ones, it is of great importance to
 100 express the **physical information that can help to qualify a simulated system
 101 state**. There are two main types of such information that leads to criteria
 102 allowing qualification and quantification of optimization results.
 103
 104 First, coming from mathematical or numerical knowledge, a lot of standard
 105 criteria allow to qualify, relatively or in absolute, the interest of an
 106 optimized state. For example, balance equations or equation closing conditions
 107 are good complementary measures of system state quality. Well chosen criteria
 108 like RMS, RMSE, field extrema, integrals, etc. are also of great interest to
 109 assess optimized state quality.
 110
 111 Second, coming from physical or experimental knowledge, valuable information can
 112 be obtained from the meaning of optimized results. In particular, physical
 113 validity or technical interest can assess of the numerical results of the
 114 optimization.
 115
 116 In order to get helpful information from these two main types of knowledge, it
 117 is recommended, if possible, to build numerical criteria to ease the assessment
 118 of global quality of numerical results.
 119
 120 .. _section_m_step3:
 121
 122 STEP 3: Identifying and describe the available observations
 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 124
 125 As the second main source of knowledge of the physical system to be studied, the
 126 **observations, or measures,** denoted as :math:`\mathbf{y}^o`, has to be
 127 properly described. The quality of the measures, their intrinsic errors, their
 128 special features, are worth to know, in order to introduce these information in
 129 the data assimilation or optimization calculations.
 130
 131 The observations have not only to be available, but also to be efficiently
 132 introduced in the numerical framework of calculation or optimization. So the
 133 computing environment giving access to the observations is of great importance
 134 to smooth the effective use of various measures and sources of measures, and to
 135 promote extensive tests using measures. Computing environment covers
 136 availability in database or not, data formats, application interfaces, etc.
 137
 138 .. _section_m_step4:
 139
 140 STEP 4: Specifying the DA/Optimization modeling elements (covariances, background...)
 141 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 142
 143 Additional Data Assimilation or Optimization modeling elements allows to
 144 improve information about the fine physical representation of the studied
 145 system.
 146
 147 The *a-priori* knowledge of the system state can be modelized using the
 148 **background**, denoted as :math:`\mathbf{x}^b`, and the **background error
 149 covariance matrix**, denoted as :math:`\mathbf{B}`. These information are
 150 extremely important to complete, in particular in order to obtain meaningful
 151 results from Data Assimilation.
 152
 153 On the other hand, information on observation errors can be used to fill the
 154 **observation error covariance matrix** denoted as :math:`\mathbf{R}`. As for
 155 :math:`\mathbf{B}`, it is recommended to use carefully checked data to fill
 156 these covariance matrices.
 157
 158 In case of dynamic simulation, one has to define also an **evolution operator**
 159 and the associated **evolution error covariance matrix**.
 160
 161 .. _section_m_step5:
 162
 163 STEP 5: Choosing the optimization algorithm and its parameters
 164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 165
 166 Data Assimilation or Optimization requires to solve an optimization problem,
 167 more often modelized as a minimization problem. Depending on the availability of
 168 the gradient of the cost function with respect to the optimization parameters,
 169 recommended class of methods are different. Variational or locally linearized
 170 minimization methods requires this gradient. On the opposite, derivative free
 171 optimization methods doesn't requires this gradient, but present usually a
 172 really higher computational price.
 173
 174 Inside a class of optimization methods, for each method, there is usually a
 175 trade-off between the *"generic capacity of the method"* and its *"particular
 176 performance on a specific problem"*. Most generic methods, as for example
 177 variational minimization using the :ref:`section_ref_algorithm_3DVAR`, present
 178 remarkable numerical properties of efficiency, robustness and reliability, that
 179 leads to recommend it independently of the problem to solve. Moreover, it is
 180 generally difficult to tune the parameters of an optimization method, so the
 181 most robust one is often the one with the less parameters. Finally, at least for
 182 the beginning, it is recommended to use the most generic methods and to change
 183 the less possible the known default parameters.
 184
 185 .. _section_m_step6:
 186
 187 STEP 6: Conducting the optimization calculations and get the results
 188 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 189
 190 After setting up the Data Assimilation or Optimization study, the calculation
 191 has to be done in an efficient way.
 192
 193 Because optimizing usually involves a lot of elementary physical simulation of
 194 the system, the calculations are often done in Hight Performance Computing (HPC)
 195 environment to reduce the overall user time. Even if the optimization problem is
 196 small, the physical system simulation time can be long, requiring efficient
 197 computing resources. These requirements have to be taken into account early
 198 enough in the study procedure to be satisfied without needing too much effort.
 199
 200 For the same reason of hight computing requirements, it is important to
 201 carefully prepare the outputs of the optimization procedure. The optimal state
 202 is the main required information, but a lot of other special information can be
 203 obtained during or at the end of the optimization process: error evaluations,
 204 intermediary states, quality indicators, etc. All these information, sometimes
 205 requiring additional processing, have to be known and asked at the beginning of
 206 the optimization process.
 207
 208 .. _section_m_step7:
 209
 210 STEP 7: Exploiting the results and qualify their physical properties
 211 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 212
 213 Once getting the results, they have to be interpreted in terms of physical and
 214 numerical meaning. Even if the optimization calculation always give a new
 215 optimal state at least as good as the *a priori* one, and most hopefully better,
 216 this optimal state has for example to be checked with respect to the quality
 217 criteria identified when :ref:`section_m_step2`. This can lead to physical,
 218 statistical or numerical studies in order to assess the interest of the optimal
 219 state to represent the physical system.
 220
 221 Besides this analysis that has to be done for each Data Assimilation or
 222 Optimization study, it can be worth to exploit the optimization results as part
 223 of a more complete study of the physical system of interest.