doc/en/reference.rst

   1 .. _section_reference:
   2
   3 ================================================================================
   4 Reference description of the ADAO commands and keywords
   5 ================================================================================
   6
   7 This section presents the reference description of the ADAO commands and
   8 keywords available through the GUI or through scripts.
   9
  10 Each command or keyword to be defined through the ADAO GUI has some properties.
  11 The first property is to be *required*, *optional* or only factual, describing a
  12 type of input. The second property is to be an "open" variable with a fixed type
  13 but with any value allowed by the type, or a "restricted" variable, limited to
  14 some specified values. The EFICAS editor GUI having build-in validating
  15 capacities, the properties of the commands or keywords given through this GUI
  16 are automatically correct.
  17
  18 The mathematical notations used afterward are explained in the section
  19 :ref:`section_theory`.
  20
  21 Examples of using these commands are available in the section
  22 :ref:`section_examples` and in example files installed with ADAO module.
  23
  24 List of possible input types
  25 ----------------------------
  26
  27 .. index:: single: Dict
  28 .. index:: single: Function
  29 .. index:: single: Matrix
  30 .. index:: single: ScalarSparseMatrix
  31 .. index:: single: DiagonalSparseMatrix
  32 .. index:: single: String
  33 .. index:: single: Script
  34 .. index:: single: Vector
  35
  36 Each ADAO variable has a pseudo-type to help filling it and validation. The
  37 different pseudo-types are:
  38
  39 **Dict**
  40     This indicates a variable that has to be filled by a dictionary, usually
  41     given as a script.
  42
  43 **Function**
  44     This indicates a variable that has to be filled by a function, usually given
  45     as a script or a component method.
  46
  47 **Matrix**
  48     This indicates a variable that has to be filled by a matrix, usually given
  49     either as a string or as a script.
  50
  51 **ScalarSparseMatrix**
  52     This indicates a variable that has to be filled by a unique number, which
  53     will be used to multiply an identity matrix, usually given either as a
  54     string or as a script.
  55
  56 **DiagonalSparseMatrix**
  57     This indicates a variable that has to be filled by a vector, which will be
  58     over the diagonal of an identity matrix, usually given either as a string or
  59     as a script.
  60
  61 **Script**
  62     This indicates a script given as an external file. It can be described by a
  63     full absolute path name or only by the file name without path.
  64
  65 **String**
  66     This indicates a string giving a literal representation of a matrix, a
  67     vector or a vector serie, such as "1 2 ; 3 4" for a square 2x2 matrix.
  68
  69 **Vector**
  70     This indicates a variable that has to be filled by a vector, usually given
  71     either as a string or as a script.
  72
  73 **VectorSerie** This indicates a variable that has to be filled by a list of
  74     vectors, usually given either as a string or as a script.
  75
  76 When a command or keyword can be filled by a script file name, the script has to
  77 contain a variable or a method that has the same name as the one to be filled.
  78 In other words, when importing the script in a YACS Python node, it must create
  79 a variable of the good name in the current namespace.
  80
  81 Reference description for ADAO calculation cases
  82 ------------------------------------------------
  83
  84 List of commands and keywords for an ADAO calculation case
  85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  86
  87 .. index:: single: ASSIMILATION_STUDY
  88 .. index:: single: Algorithm
  89 .. index:: single: AlgorithmParameters
  90 .. index:: single: Background
  91 .. index:: single: BackgroundError
  92 .. index:: single: ControlInput
  93 .. index:: single: Debug
  94 .. index:: single: EvolutionError
  95 .. index:: single: EvolutionModel
  96 .. index:: single: InputVariables
  97 .. index:: single: Observation
  98 .. index:: single: ObservationError
  99 .. index:: single: ObservationOperator
 100 .. index:: single: Observers
 101 .. index:: single: OutputVariables
 102 .. index:: single: Study_name
 103 .. index:: single: Study_repertory
 104 .. index:: single: UserDataInit
 105 .. index:: single: UserPostAnalysis
 106
 107 The first set of commands is related to the description of a calculation case,
 108 that is a *Data Assimilation* procedure or an *Optimization* procedure. The
 109 terms are ordered in alphabetical order, except the first, which describes
 110 choice between calculation or checking. The different commands are the
 111 following:
 112
 113 **ASSIMILATION_STUDY**
 114     *Required command*. This is the general command describing the data
 115     assimilation or optimization case. It hierarchically contains all the other
 116     commands.
 117
 118 **Algorithm**
 119     *Required command*. This is a string to indicate the data assimilation or
 120     optimization algorithm chosen. The choices are limited and available through
 121     the GUI. There exists for example "3DVAR", "Blue"... See below the list of
 122     algorithms and associated parameters in the following subsection `Options
 123     and required commands for calculation algorithms`_.
 124
 125 **AlgorithmParameters**
 126     *Optional command*. This command allows to add some optional parameters to
 127     control the data assimilation or optimization algorithm. It is defined as a
 128     "*Dict*" type object, that is, given as a script. See below the list of
 129     algorithms and associated parameters in the following subsection `Options
 130     and required commands for calculation algorithms`_.
 131
 132 **Background**
 133     *Required command*. This indicates the background or initial vector used,
 134     previously noted as :math:`\mathbf{x}^b`. It is defined as a "*Vector*" type
 135     object, that is, given either as a string or as a script.
 136
 137 **BackgroundError**
 138     *Required command*. This indicates the background error covariance matrix,
 139     previously noted as :math:`\mathbf{B}`. It is defined as a "*Matrix*" type
 140     object, a "*ScalarSparseMatrix*" type object, or a "*DiagonalSparseMatrix*"
 141     type object, that is, given either as a string or as a script.
 142
 143 **ControlInput**
 144     *Optional command*. This indicates the control vector used to force the
 145     evolution model at each step, usually noted as :math:`\mathbf{U}`. It is
 146     defined as a "*Vector*" or a *VectorSerie* type object, that is, given
 147     either as a string or as a script. When there is no control, it has to be a
 148     void string ''.
 149
 150 **Debug**
 151     *Required command*. This define the level of trace and intermediary debug
 152     information. The choices are limited between 0 (for False) and 1 (for
 153     True).
 154
 155 **EvolutionError**
 156     *Optional command*. This indicates the evolution error covariance matrix,
 157     usually noted as :math:`\mathbf{Q}`. It is defined as a "*Matrix*" type
 158     object, a "*ScalarSparseMatrix*" type object, or a "*DiagonalSparseMatrix*"
 159     type object, that is, given either as a string or as a script.
 160
 161 **EvolutionModel**
 162     *Optional command*. This indicates the evolution model operator, usually
 163     noted :math:`M`, which describes a step of evolution. It is defined as a
 164     "*Function*" type object, that is, given as a script. Different functional
 165     forms can be used, as described in the following subsection `Requirements
 166     for functions describing an operator`_. If there is some control :math:`U`
 167     included in the evolution model, the operator has to be applied to a pair
 168     :math:`(X,U)`.
 169
 170 **InputVariables**
 171     *Optional command*. This command allows to indicates the name and size of
 172     physical variables that are bundled together in the control vector. This
 173     information is dedicated to data processed inside an algorithm.
 174
 175 **Observation**
 176     *Required command*. This indicates the observation vector used for data
 177     assimilation or optimization, previously noted as :math:`\mathbf{y}^o`. It
 178     is defined as a "*Vector*" or a *VectorSerie* type object, that is, given
 179     either as a string or as a script.
 180
 181 **ObservationError**
 182     *Required command*. This indicates the observation error covariance matrix,
 183     previously noted as :math:`\mathbf{R}`. It is defined as a "*Matrix*" type
 184     object, a "*ScalarSparseMatrix*" type object, or a "*DiagonalSparseMatrix*"
 185     type object, that is, given either as a string or as a script.
 186
 187 **ObservationOperator**
 188     *Required command*. This indicates the observation operator, previously
 189     noted :math:`H`, which transforms the input parameters :math:`\mathbf{x}` to
 190     results :math:`\mathbf{y}` to be compared to observations
 191     :math:`\mathbf{y}^o`. It is defined as a "*Function*" type object, that is,
 192     given as a script. Different functional forms can be used, as described in
 193     the following subsection `Requirements for functions describing an
 194     operator`_. If there is some control :math:`U` included in the observation,
 195     the operator has to be applied to a pair :math:`(X,U)`.
 196
 197 **Observers**
 198     *Optional command*. This command allows to set internal observers, that are
 199     functions linked with a particular variable, which will be executed each
 200     time this variable is modified. It is a convenient way to monitor variables
 201     of interest during the data assimilation or optimization process, by
 202     printing or plotting it, etc. Common templates are provided to help the user
 203     to start or to quickly make his case.
 204
 205 **OutputVariables**
 206     *Optional command*. This command allows to indicates the name and size of
 207     physical variables that are bundled together in the output observation
 208     vector. This information is dedicated to data processed inside an algorithm.
 209
 210 **Study_name**
 211     *Required command*. This is an open string to describe the study by a name
 212     or a sentence.
 213
 214 **Study_repertory**
 215     *Optional command*. If available, this repertory is used to find all the
 216     script files that can be used to define some other commands by scripts.
 217
 218 **UserDataInit**
 219     *Optional command*. This commands allows to initialize some parameters or
 220     data automatically before data assimilation algorithm processing.
 221
 222 **UserPostAnalysis**
 223     *Optional command*. This commands allows to process some parameters or data
 224     automatically after data assimilation algorithm processing. It is defined as
 225     a script or a string, allowing to put post-processing code directly inside
 226     the ADAO case. Common templates are provided to help the user to start or
 227     to quickly make his case.
 228
 229 Options and required commands for calculation algorithms
 230 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 231
 232 .. index:: single: 3DVAR
 233 .. index:: single: Blue
 234 .. index:: single: EnsembleBlue
 235 .. index:: single: KalmanFilter
 236 .. index:: single: ExtendedKalmanFilter
 237 .. index:: single: LinearLeastSquares
 238 .. index:: single: NonLinearLeastSquares
 239 .. index:: single: ParticleSwarmOptimization
 240 .. index:: single: QuantileRegression
 241
 242 .. index:: single: AlgorithmParameters
 243 .. index:: single: Bounds
 244 .. index:: single: CostDecrementTolerance
 245 .. index:: single: GradientNormTolerance
 246 .. index:: single: GroupRecallRate
 247 .. index:: single: MaximumNumberOfSteps
 248 .. index:: single: Minimizer
 249 .. index:: single: NumberOfInsects
 250 .. index:: single: ProjectedGradientTolerance
 251 .. index:: single: QualityCriterion
 252 .. index:: single: Quantile
 253 .. index:: single: SetSeed
 254 .. index:: single: StoreInternalVariables
 255 .. index:: single: StoreSupplementaryCalculations
 256 .. index:: single: SwarmVelocity
 257
 258 Each algorithm can be controlled using some generic or specific options given
 259 through the "*AlgorithmParameters*" optional command, as follows for example::
 260
 261     AlgorithmParameters = {
 262         "Minimizer" : "LBFGSB",
 263         "MaximumNumberOfSteps" : 25,
 264         "StoreSupplementaryCalculations" : ["APosterioriCovariance","OMA"],
 265         }
 266
 267 This section describes the available options algorithm by algorithm. If an
 268 option is specified for an algorithm that doesn't support it, the option is
 269 simply left unused. The meaning of the acronyms or particular names can be found
 270 in the :ref:`genindex` or the :ref:`section_glossary`. In addition, for each
 271 algorithm, the required commands/keywords are given, being described in `List of
 272 commands and keywords for an ADAO calculation case`_.
 273
 274 **"Blue"**
 275
 276   *Required commands*
 277     *"Background", "BackgroundError",
 278     "Observation", "ObservationError",
 279     "ObservationOperator"*
 280
 281   StoreInternalVariables
 282     This boolean key allows to store default internal variables, mainly the
 283     current state during iterative optimization process. Be careful, this can be
 284     a numerically costly choice in certain calculation cases. The default is
 285     "False".
 286
 287   StoreSupplementaryCalculations
 288     This list indicates the names of the supplementary variables that can be
 289     available at the end of the algorithm. It involves potentially costly
 290     calculations. The default is a void list, none of these variables being
 291     calculated and stored by default. The possible names are in the following
 292     list: ["APosterioriCovariance", "BMA", "OMA", "OMB", "Innovation",
 293     "SigmaBck2", "SigmaObs2", "MahalanobisConsistency"].
 294
 295 **"LinearLeastSquares"**
 296
 297   *Required commands*
 298     *"Observation", "ObservationError",
 299     "ObservationOperator"*
 300
 301   StoreInternalVariables
 302     This boolean key allows to store default internal variables, mainly the
 303     current state during iterative optimization process. Be careful, this can be
 304     a numerically costly choice in certain calculation cases. The default is
 305     "False".
 306
 307   StoreSupplementaryCalculations
 308     This list indicates the names of the supplementary variables that can be
 309     available at the end of the algorithm. It involves potentially costly
 310     calculations. The default is a void list, none of these variables being
 311     calculated and stored by default. The possible names are in the following
 312     list: ["OMA"].
 313
 314 **"3DVAR"**
 315
 316   *Required commands*
 317     *"Background", "BackgroundError",
 318     "Observation", "ObservationError",
 319     "ObservationOperator"*
 320
 321   Minimizer
 322     This key allows to choose the optimization minimizer. The default choice
 323     is "LBFGSB", and the possible ones are "LBFGSB" (nonlinear constrained
 324     minimizer, see [Byrd95]_ and [Zhu97]_), "TNC" (nonlinear constrained
 325     minimizer), "CG" (nonlinear unconstrained minimizer), "BFGS" (nonlinear
 326     unconstrained minimizer), "NCG" (Newton CG minimizer).
 327
 328   Bounds
 329     This key allows to define upper and lower bounds for every control
 330     variable being optimized. Bounds can be given by a list of list of pairs
 331     of lower/upper bounds for each variable, with possibly ``None`` every time
 332     there is no bound. The bounds can always be specified, but they are taken
 333     into account only by the constrained minimizers.
 334
 335   MaximumNumberOfSteps
 336     This key indicates the maximum number of iterations allowed for iterative
 337     optimization. The default is 15000, which is very similar to no limit on
 338     iterations. It is then recommended to adapt this parameter to the needs on
 339     real problems. For some minimizers, the effective stopping step can be
 340     slightly different due to algorithm internal control requirements.
 341
 342   CostDecrementTolerance
 343     This key indicates a limit value, leading to stop successfully the
 344     iterative optimization process when the cost function decreases less than
 345     this tolerance at the last step. The default is 1.e-7, and it is
 346     recommended to adapt it to the needs on real problems.
 347
 348   ProjectedGradientTolerance
 349     This key indicates a limit value, leading to stop successfully the iterative
 350     optimization process when all the components of the projected gradient are
 351     under this limit. It is only used for constrained minimizers. The default is
 352     -1, that is the internal default of each minimizer (generally 1.e-5), and it
 353     is not recommended to change it.
 354
 355   GradientNormTolerance
 356     This key indicates a limit value, leading to stop successfully the
 357     iterative optimization process when the norm of the gradient is under this
 358     limit. It is only used for non-constrained minimizers.  The default is
 359     1.e-5 and it is not recommended to change it.
 360
 361   StoreInternalVariables
 362     This boolean key allows to store default internal variables, mainly the
 363     current state during iterative optimization process. Be careful, this can be
 364     a numerically costly choice in certain calculation cases. The default is
 365     "False".
 366
 367   StoreSupplementaryCalculations
 368     This list indicates the names of the supplementary variables that can be
 369     available at the end of the algorithm. It involves potentially costly
 370     calculations. The default is a void list, none of these variables being
 371     calculated and stored by default. The possible names are in the following
 372     list: ["APosterioriCovariance", "BMA", "OMA", "OMB", "Innovation",
 373     "SigmaObs2", "MahalanobisConsistency"].
 374
 375 **"NonLinearLeastSquares"**
 376
 377   *Required commands*
 378     *"Background",
 379     "Observation", "ObservationError",
 380     "ObservationOperator"*
 381
 382   Minimizer
 383     This key allows to choose the optimization minimizer. The default choice
 384     is "LBFGSB", and the possible ones are "LBFGSB" (nonlinear constrained
 385     minimizer, see [Byrd95]_ and [Zhu97]_), "TNC" (nonlinear constrained
 386     minimizer), "CG" (nonlinear unconstrained minimizer), "BFGS" (nonlinear
 387     unconstrained minimizer), "NCG" (Newton CG minimizer).
 388
 389   Bounds
 390     This key allows to define upper and lower bounds for every control
 391     variable being optimized. Bounds can be given by a list of list of pairs
 392     of lower/upper bounds for each variable, with possibly ``None`` every time
 393     there is no bound. The bounds can always be specified, but they are taken
 394     into account only by the constrained minimizers.
 395
 396   MaximumNumberOfSteps
 397     This key indicates the maximum number of iterations allowed for iterative
 398     optimization. The default is 15000, which is very similar to no limit on
 399     iterations. It is then recommended to adapt this parameter to the needs on
 400     real problems. For some minimizers, the effective stopping step can be
 401     slightly different due to algorithm internal control requirements.
 402
 403   CostDecrementTolerance
 404     This key indicates a limit value, leading to stop successfully the
 405     iterative optimization process when the cost function decreases less than
 406     this tolerance at the last step. The default is 1.e-7, and it is
 407     recommended to adapt it to the needs on real problems.
 408
 409   ProjectedGradientTolerance
 410     This key indicates a limit value, leading to stop successfully the iterative
 411     optimization process when all the components of the projected gradient are
 412     under this limit. It is only used for constrained minimizers. The default is
 413     -1, that is the internal default of each minimizer (generally 1.e-5), and it
 414     is not recommended to change it.
 415
 416   GradientNormTolerance
 417     This key indicates a limit value, leading to stop successfully the
 418     iterative optimization process when the norm of the gradient is under this
 419     limit. It is only used for non-constrained minimizers.  The default is
 420     1.e-5 and it is not recommended to change it.
 421
 422   StoreInternalVariables
 423     This boolean key allows to store default internal variables, mainly the
 424     current state during iterative optimization process. Be careful, this can be
 425     a numerically costly choice in certain calculation cases. The default is
 426     "False".
 427
 428   StoreSupplementaryCalculations
 429     This list indicates the names of the supplementary variables that can be
 430     available at the end of the algorithm. It involves potentially costly
 431     calculations. The default is a void list, none of these variables being
 432     calculated and stored by default. The possible names are in the following
 433     list: ["BMA", "OMA", "OMB", "Innovation"].
 434
 435 **"EnsembleBlue"**
 436
 437   *Required commands*
 438     *"Background", "BackgroundError",
 439     "Observation", "ObservationError",
 440     "ObservationOperator"*
 441
 442   SetSeed
 443     This key allow to give an integer in order to fix the seed of the random
 444     generator used to generate the ensemble. A convenient value is for example
 445     1000. By default, the seed is left uninitialized, and so use the default
 446     initialization from the computer.
 447
 448 **"KalmanFilter"**
 449
 450   *Required commands*
 451     *"Background", "BackgroundError",
 452     "Observation", "ObservationError",
 453     "ObservationOperator",
 454     "EvolutionModel", "EvolutionError",
 455     "ControlInput"*
 456
 457   EstimationOf
 458     This key allows to choose the type of estimation to be performed. It can be
 459     either state-estimation, named "State", or parameter-estimation, named
 460     "Parameters". The default choice is "State".
 461
 462   StoreSupplementaryCalculations
 463     This list indicates the names of the supplementary variables that can be
 464     available at the end of the algorithm. It involves potentially costly
 465     calculations. The default is a void list, none of these variables being
 466     calculated and stored by default. The possible names are in the following
 467     list: ["APosterioriCovariance", "BMA", "Innovation"].
 468
 469 **"ExtendedKalmanFilter"**
 470
 471   *Required commands*
 472     *"Background", "BackgroundError",
 473     "Observation", "ObservationError",
 474     "ObservationOperator",
 475     "EvolutionModel", "EvolutionError",
 476     "ControlInput"*
 477
 478   Bounds
 479     This key allows to define upper and lower bounds for every control variable
 480     being optimized. Bounds can be given by a list of list of pairs of
 481     lower/upper bounds for each variable, with extreme values every time there
 482     is no bound. The bounds can always be specified, but they are taken into
 483     account only by the constrained minimizers.
 484
 485   ConstrainedBy
 486     This key allows to define the method to take bounds into account. The
 487     possible methods are in the following list: ["EstimateProjection"].
 488
 489   EstimationOf
 490     This key allows to choose the type of estimation to be performed. It can be
 491     either state-estimation, named "State", or parameter-estimation, named
 492     "Parameters". The default choice is "State".
 493
 494   StoreSupplementaryCalculations
 495     This list indicates the names of the supplementary variables that can be
 496     available at the end of the algorithm. It involves potentially costly
 497     calculations. The default is a void list, none of these variables being
 498     calculated and stored by default. The possible names are in the following
 499     list: ["APosterioriCovariance", "BMA", "Innovation"].
 500
 501 **"ParticleSwarmOptimization"**
 502
 503   *Required commands*
 504     *"Background", "BackgroundError",
 505     "Observation", "ObservationError",
 506     "ObservationOperator"*
 507
 508   MaximumNumberOfSteps
 509     This key indicates the maximum number of iterations allowed for iterative
 510     optimization. The default is 50, which is an arbitrary limit. It is then
 511     recommended to adapt this parameter to the needs on real problems.
 512
 513   NumberOfInsects
 514     This key indicates the number of insects or particles in the swarm. The
 515     default is 100, which is a usual default for this algorithm.
 516
 517   SwarmVelocity
 518     This key indicates the part of the insect velocity which is imposed by the
 519     swarm. It is a positive floating point value. The default value is 1.
 520
 521   GroupRecallRate
 522     This key indicates the recall rate at the best swarm insect. It is a
 523     floating point value between 0 and 1. The default value is 0.5.
 524
 525   QualityCriterion
 526     This key indicates the quality criterion, minimized to find the optimal
 527     state estimate. The default is the usual data assimilation criterion named
 528     "DA", the augmented ponderated least squares. The possible criteria has to
 529     be in the following list, where the equivalent names are indicated by "=":
 530     ["AugmentedPonderatedLeastSquares"="APLS"="DA",
 531     "PonderatedLeastSquares"="PLS", "LeastSquares"="LS"="L2",
 532     "AbsoluteValue"="L1", "MaximumError"="ME"]
 533
 534   SetSeed
 535     This key allow to give an integer in order to fix the seed of the random
 536     generator used to generate the ensemble. A convenient value is for example
 537     1000. By default, the seed is left uninitialized, and so use the default
 538     initialization from the computer.
 539
 540   StoreInternalVariables
 541     This boolean key allows to store default internal variables, mainly the
 542     current state during iterative optimization process. Be careful, this can be
 543     a numerically costly choice in certain calculation cases. The default is
 544     "False".
 545
 546   StoreSupplementaryCalculations
 547     This list indicates the names of the supplementary variables that can be
 548     available at the end of the algorithm. It involves potentially costly
 549     calculations. The default is a void list, none of these variables being
 550     calculated and stored by default. The possible names are in the following
 551     list: ["BMA", "OMA", "OMB", "Innovation"].
 552
 553 **"QuantileRegression"**
 554
 555   *Required commands*
 556     *"Background",
 557     "Observation",
 558     "ObservationOperator"*
 559
 560   Quantile
 561     This key allows to define the real value of the desired quantile, between
 562     0 and 1. The default is 0.5, corresponding to the median.
 563
 564   Minimizer
 565     This key allows to choose the optimization minimizer. The default choice
 566     and only available choice is "MMQR" (Majorize-Minimize for Quantile
 567     Regression).
 568
 569   MaximumNumberOfSteps
 570     This key indicates the maximum number of iterations allowed for iterative
 571     optimization. The default is 15000, which is very similar to no limit on
 572     iterations. It is then recommended to adapt this parameter to the needs on
 573     real problems.
 574
 575   CostDecrementTolerance
 576     This key indicates a limit value, leading to stop successfully the
 577     iterative optimization process when the cost function or the surrogate
 578     decreases less than this tolerance at the last step. The default is 1.e-6,
 579     and it is recommended to adapt it to the needs on real problems.
 580
 581   StoreInternalVariables
 582     This boolean key allows to store default internal variables, mainly the
 583     current state during iterative optimization process. Be careful, this can be
 584     a numerically costly choice in certain calculation cases. The default is
 585     "False".
 586
 587   StoreSupplementaryCalculations
 588     This list indicates the names of the supplementary variables that can be
 589     available at the end of the algorithm. It involves potentially costly
 590     calculations. The default is a void list, none of these variables being
 591     calculated and stored by default. The possible names are in the following
 592     list: ["BMA", "OMA", "OMB", "Innovation"].
 593
 594 Reference description for ADAO checking cases
 595 ---------------------------------------------
 596
 597 List of commands and keywords for an ADAO checking case
 598 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 599
 600 .. index:: single: CHECKING_STUDY
 601 .. index:: single: Algorithm
 602 .. index:: single: AlgorithmParameters
 603 .. index:: single: CheckingPoint
 604 .. index:: single: Debug
 605 .. index:: single: ObservationOperator
 606 .. index:: single: Study_name
 607 .. index:: single: Study_repertory
 608 .. index:: single: UserDataInit
 609
 610 The second set of commands is related to the description of a checking case,
 611 that is a procedure to check required properties on information somewhere else
 612 by a calculation case. The terms are ordered in alphabetical order, except the
 613 first, which describes choice between calculation or checking. The different
 614 commands are the following:
 615
 616 **CHECKING_STUDY**
 617     *Required command*. This is the general command describing the checking
 618     case. It hierarchically contains all the other commands.
 619
 620 **Algorithm**
 621     *Required command*. This is a string to indicate the data assimilation or
 622     optimization algorithm chosen. The choices are limited and available through
 623     the GUI. There exists for example "FunctionTest", "AdjointTest"... See below
 624     the list of algorithms and associated parameters in the following subsection
 625     `Options and required commands for checking algorithms`_.
 626
 627 **AlgorithmParameters**
 628     *Optional command*. This command allows to add some optional parameters to
 629     control the data assimilation or optimization algorithm. It is defined as a
 630     "*Dict*" type object, that is, given as a script. See below the list of
 631     algorithms and associated parameters in the following subsection `Options
 632     and required commands for checking algorithms`_.
 633
 634 **CheckingPoint**
 635     *Required command*. This indicates the vector used,
 636     previously noted as :math:`\mathbf{x}^b`. It is defined as a "*Vector*" type
 637     object, that is, given either as a string or as a script.
 638
 639 **Debug**
 640     *Required command*. This define the level of trace and intermediary debug
 641     information. The choices are limited between 0 (for False) and 1 (for
 642     True).
 643
 644 **ObservationOperator**
 645     *Required command*. This indicates the observation operator, previously
 646     noted :math:`H`, which transforms the input parameters :math:`\mathbf{x}` to
 647     results :math:`\mathbf{y}` to be compared to observations
 648     :math:`\mathbf{y}^o`. It is defined as a "*Function*" type object, that is,
 649     given as a script. Different functional forms can be used, as described in
 650     the following subsection `Requirements for functions describing an
 651     operator`_.
 652
 653 **Study_name**
 654     *Required command*. This is an open string to describe the study by a name
 655     or a sentence.
 656
 657 **Study_repertory**
 658     *Optional command*. If available, this repertory is used to find all the
 659     script files that can be used to define some other commands by scripts.
 660
 661 **UserDataInit**
 662     *Optional command*. This commands allows to initialize some parameters or
 663     data automatically before data assimilation algorithm processing.
 664
 665 Options and required commands for checking algorithms
 666 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 667
 668 .. index:: single: AdjointTest
 669 .. index:: single: FunctionTest
 670 .. index:: single: GradientTest
 671 .. index:: single: LinearityTest
 672
 673 .. index:: single: AlgorithmParameters
 674 .. index:: single: AmplitudeOfInitialDirection
 675 .. index:: single: EpsilonMinimumExponent
 676 .. index:: single: InitialDirection
 677 .. index:: single: ResiduFormula
 678 .. index:: single: SetSeed
 679
 680 We recall that each algorithm can be controlled using some generic or specific
 681 options given through the "*AlgorithmParameters*" optional command, as follows
 682 for example::
 683
 684     AlgorithmParameters = {
 685         "AmplitudeOfInitialDirection" : 1,
 686         "EpsilonMinimumExponent" : -8,
 687         }
 688
 689 If an option is specified for an algorithm that doesn't support it, the option
 690 is simply left unused. The meaning of the acronyms or particular names can be
 691 found in the :ref:`genindex` or the :ref:`section_glossary`. In addition, for
 692 each algorithm, the required commands/keywords are given, being described in
 693 `List of commands and keywords for an ADAO checking case`_.
 694
 695 **"AdjointTest"**
 696
 697   *Required commands*
 698     *"CheckingPoint",
 699     "ObservationOperator"*
 700
 701   AmplitudeOfInitialDirection
 702     This key indicates the scaling of the initial perturbation build as a vector
 703     used for the directional derivative around the nominal checking point. The
 704     default is 1, that means no scaling.
 705
 706   EpsilonMinimumExponent
 707     This key indicates the minimal exponent value of the power of 10 coefficient
 708     to be used to decrease the increment multiplier. The default is -8, and it
 709     has to be between 0 and -20. For example, its default value leads to
 710     calculate the residue of the scalar product formula with a fixed increment
 711     multiplied from 1.e0 to 1.e-8.
 712
 713   InitialDirection
 714     This key indicates the vector direction used for the directional derivative
 715     around the nominal checking point. It has to be a vector. If not specified,
 716     this direction defaults to a random perturbation around zero of the same
 717     vector size than the checking point.
 718
 719   SetSeed
 720     This key allow to give an integer in order to fix the seed of the random
 721     generator used to generate the ensemble. A convenient value is for example
 722     1000. By default, the seed is left uninitialized, and so use the default
 723     initialization from the computer.
 724
 725 **"FunctionTest"**
 726
 727   *Required commands*
 728     *"CheckingPoint",
 729     "ObservationOperator"*
 730
 731   NumberOfPrintedDigits
 732     This key indicates the number of digits of precision for floating point
 733     printed output. The default is 5, with a minimum of 0.
 734
 735   NumberOfRepetition
 736     This key indicates the number of time to repeat the function evaluation. The
 737     default is 1.
 738
 739   SetDebug
 740     This key requires the activation, or not, of the debug mode during the
 741     function evaluation. The default is True, the choices are True of False.
 742
 743 **"GradientTest"**
 744
 745   *Required commands*
 746     *"CheckingPoint",
 747     "ObservationOperator"*
 748
 749   AmplitudeOfInitialDirection
 750     This key indicates the scaling of the initial perturbation build as a vector
 751     used for the directional derivative around the nominal checking point. The
 752     default is 1, that means no scaling.
 753
 754   EpsilonMinimumExponent
 755     This key indicates the minimal exponent value of the power of 10 coefficient
 756     to be used to decrease the increment multiplier. The default is -8, and it
 757     has to be between 0 and -20. For example, its default value leads to
 758     calculate the residue of the scalar product formula with a fixed increment
 759     multiplied from 1.e0 to 1.e-8.
 760
 761   InitialDirection
 762     This key indicates the vector direction used for the directional derivative
 763     around the nominal checking point. It has to be a vector. If not specified,
 764     this direction defaults to a random perturbation around zero of the same
 765     vector size than the checking point.
 766
 767   ResiduFormula
 768     This key indicates the residue formula that has to be used for the test. The
 769     default choice is "Taylor", and the possible ones are "Taylor" (residue of
 770     the Taylor development of the operator, which has to decrease with the power
 771     of 2 in perturbation) and "Norm" (residue obtained by taking the norm of the
 772     Taylor development at zero order approximation, which approximate the
 773     gradient, and which has to remain constant).
 774
 775   SetSeed
 776     This key allow to give an integer in order to fix the seed of the random
 777     generator used to generate the ensemble. A convenient value is for example
 778     1000. By default, the seed is left uninitialized, and so use the default
 779     initialization from the computer.
 780
 781 **"LinearityTest"**
 782
 783   *Required commands*
 784     *"CheckingPoint",
 785     "ObservationOperator"*
 786
 787   AmplitudeOfInitialDirection
 788     This key indicates the scaling of the initial perturbation build as a vector
 789     used for the directional derivative around the nominal checking point. The
 790     default is 1, that means no scaling.
 791
 792   EpsilonMinimumExponent
 793     This key indicates the minimal exponent value of the power of 10 coefficient
 794     to be used to decrease the increment multiplier. The default is -8, and it
 795     has to be between 0 and -20. For example, its default value leads to
 796     calculate the residue of the scalar product formula with a fixed increment
 797     multiplied from 1.e0 to 1.e-8.
 798
 799   InitialDirection
 800     This key indicates the vector direction used for the directional derivative
 801     around the nominal checking point. It has to be a vector. If not specified,
 802     this direction defaults to a random perturbation around zero of the same
 803     vector size than the checking point.
 804
 805   ResiduFormula
 806     This key indicates the residue formula that has to be used for the test. The
 807     default choice is "CenteredDL", and the possible ones are "CenteredDL"
 808     (residue of the difference between the function at nominal point and the
 809     values with positive and negative increments, which has to stay very small),
 810     "Taylor" (residue of the Taylor development of the operator normalized by
 811     the nominal value, which has to stay very small), "NominalTaylor" (residue
 812     of the order 1 approximations of the operator, normalized to the nominal
 813     point, which has to stay close to 1), and "NominalTaylorRMS" (residue of the
 814     order 1 approximations of the operator, normalized by RMS to the nominal
 815     point, which has to stay close to 0).
 816
 817   SetSeed
 818     This key allow to give an integer in order to fix the seed of the random
 819     generator used to generate the ensemble. A convenient value is for example
 820     1000. By default, the seed is left uninitialized, and so use the default
 821     initialization from the computer.
 822
 823 Requirements for functions describing an operator
 824 -------------------------------------------------
 825
 826 The operators for observation and evolution are required to implement the data
 827 assimilation or optimization procedures. They include the physical simulation
 828 numerical simulations, but also the filtering and restriction to compare the
 829 simulation to observation. The evolution operator is considered here in its
 830 incremental form, representing the transition between two successive states, and
 831 is then similar to the observation operator.
 832
 833 Schematically, an operator has to give a output solution given the input
 834 parameters. Part of the input parameters can be modified during the optimization
 835 procedure. So the mathematical representation of such a process is a function.
 836 It was briefly described in the section :ref:`section_theory` and is generalized
 837 here by the relation:
 838
 839 .. math:: \mathbf{y} = O( \mathbf{x} )
 840
 841 between the pseudo-observations :math:`\mathbf{y}` and the parameters
 842 :math:`\mathbf{x}` using the observation or evolution operator :math:`O`. The
 843 same functional representation can be used for the linear tangent model
 844 :math:`\mathbf{O}` of :math:`O` and its adjoint :math:`\mathbf{O}^*`, also
 845 required by some data assimilation or optimization algorithms.
 846
 847 Then, **to describe completely an operator, the user has only to provide a
 848 function that fully and only realize the functional operation**.
 849
 850 This function is usually given as a script that can be executed in a YACS node.
 851 This script can without difference launch external codes or use internal SALOME
 852 calls and methods. If the algorithm requires the 3 aspects of the operator
 853 (direct form, tangent form and adjoint form), the user has to give the 3
 854 functions or to approximate them.
 855
 856 There are 3 practical methods for the user to provide the operator functional
 857 representation.
 858
 859 First functional form: using "*ScriptWithOneFunction*"
 860 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 861
 862 .. index:: single: ScriptWithOneFunction
 863 .. index:: single: DirectOperator
 864 .. index:: single: DifferentialIncrement
 865 .. index:: single: CenteredFiniteDifference
 866
 867 The first one consist in providing only one potentially non-linear function, and
 868 to approximate the tangent and the adjoint operators. This is done by using the
 869 keyword "*ScriptWithOneFunction*" for the description of the chosen operator in
 870 the ADAO GUI. The user have to provide the function in a script, with a
 871 mandatory name "*DirectOperator*". For example, the script can follow the
 872 template::
 873
 874     def DirectOperator( X ):
 875         """ Direct non-linear simulation operator """
 876         ...
 877         ...
 878         ...
 879         return Y=O(X)
 880
 881 In this case, the user can also provide a value for the differential increment,
 882 using through the GUI the keyword "*DifferentialIncrement*", which has a default
 883 value of 1%. This coefficient will be used in the finite difference
 884 approximation to build the tangent and adjoint operators. The finite difference
 885 approximation order can also be chosen through the GUI, using the keyword
 886 "*CenteredFiniteDifference*", with 0 for an uncentered schema of first order,
 887 and with 1 for a centered schema of second order (of twice the first order
 888 computational cost). The keyword has a default value of 0.
 889
 890 This first operator definition allow easily to test the functional form before
 891 its use in an ADAO case, greatly reducing the complexity of implementation.
 892
 893 **Important warning:** the name "*DirectOperator*" is mandatory, and the type of
 894 the X argument can be either a python list, a numpy array or a numpy 1D-matrix.
 895 The user has to treat these cases in his script.
 896
 897 Second functional form: using "*ScriptWithFunctions*"
 898 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 899
 900 .. index:: single: ScriptWithFunctions
 901 .. index:: single: DirectOperator
 902 .. index:: single: TangentOperator
 903 .. index:: single: AdjointOperator
 904
 905 The second one consist in providing directly the three associated operators
 906 :math:`O`, :math:`\mathbf{O}` and :math:`\mathbf{O}^*`. This is done by using
 907 the keyword "*ScriptWithFunctions*" for the description of the chosen operator
 908 in the ADAO GUI. The user have to provide three functions in one script, with
 909 three mandatory names "*DirectOperator*", "*TangentOperator*" and
 910 "*AdjointOperator*". For example, the script can follow the template::
 911
 912     def DirectOperator( X ):
 913         """ Direct non-linear simulation operator """
 914         ...
 915         ...
 916         ...
 917         return something like Y
 918
 919     def TangentOperator( (X, dX) ):
 920         """ Tangent linear operator, around X, applied to dX """
 921         ...
 922         ...
 923         ...
 924         return something like Y
 925
 926     def AdjointOperator( (X, Y) ):
 927         """ Adjoint operator, around X, applied to Y """
 928         ...
 929         ...
 930         ...
 931         return something like X
 932
 933 Another time, this second operator definition allow easily to test the
 934 functional forms before their use in an ADAO case, reducing the complexity of
 935 implementation.
 936
 937 **Important warning:** the names "*DirectOperator*", "*TangentOperator*" and
 938 "*AdjointOperator*" are mandatory, and the type of the X, Y, dX arguments can be
 939 either a python list, a numpy array or a numpy 1D-matrix. The user has to treat
 940 these cases in his script.
 941
 942 Third functional form: using "*ScriptWithSwitch*"
 943 +++++++++++++++++++++++++++++++++++++++++++++++++
 944
 945 .. index:: single: ScriptWithSwitch
 946 .. index:: single: DirectOperator
 947 .. index:: single: TangentOperator
 948 .. index:: single: AdjointOperator
 949
 950 This third form give more possibilities to control the execution of the three
 951 functions representing the operator, allowing advanced usage and control over
 952 each execution of the simulation code. This is done by using the keyword
 953 "*ScriptWithSwitch*" for the description of the chosen operator in the ADAO GUI.
 954 The user have to provide a switch in one script to control the execution of the
 955 direct, tangent and adjoint forms of its simulation code. The user can then, for
 956 example, use other approximations for the tangent and adjoint codes, or
 957 introduce more complexity in the argument treatment of the functions. But it
 958 will be far more complicated to implement and debug.
 959
 960 **It is recommended not to use this third functional form without a solid
 961 numerical or physical reason.**
 962
 963 If, however, you want to use this third form, we recommend using the following
 964 template for the switch. It requires an external script or code named
 965 "*Physical_simulation_functions.py*", containing three functions named
 966 "*DirectOperator*", "*TangentOperator*" and "*AdjointOperator*" as previously.
 967 Here is the switch template::
 968
 969     import Physical_simulation_functions
 970     import numpy, logging
 971     #
 972     method = ""
 973     for param in computation["specificParameters"]:
 974         if param["name"] == "method":
 975             method = param["value"]
 976     if method not in ["Direct", "Tangent", "Adjoint"]:
 977         raise ValueError("No valid computation method is given")
 978     logging.info("Found method is \'%s\'"%method)
 979     #
 980     logging.info("Loading operator functions")
 981     Function = Physical_simulation_functions.DirectOperator
 982     Tangent  = Physical_simulation_functions.TangentOperator
 983     Adjoint  = Physical_simulation_functions.AdjointOperator
 984     #
 985     logging.info("Executing the possible computations")
 986     data = []
 987     if method == "Direct":
 988         logging.info("Direct computation")
 989         Xcurrent = computation["inputValues"][0][0][0]
 990         data = Function(numpy.matrix( Xcurrent ).T)
 991     if method == "Tangent":
 992         logging.info("Tangent computation")
 993         Xcurrent  = computation["inputValues"][0][0][0]
 994         dXcurrent = computation["inputValues"][0][0][1]
 995         data = Tangent(numpy.matrix(Xcurrent).T, numpy.matrix(dXcurrent).T)
 996     if method == "Adjoint":
 997         logging.info("Adjoint computation")
 998         Xcurrent = computation["inputValues"][0][0][0]
 999         Ycurrent = computation["inputValues"][0][0][1]
1000         data = Adjoint((numpy.matrix(Xcurrent).T, numpy.matrix(Ycurrent).T))
1001     #
1002     logging.info("Formatting the output")
1003     it = numpy.ravel(data)
1004     outputValues = [[[[]]]]
1005     for val in it:
1006       outputValues[0][0][0].append(val)
1007     #
1008     result = {}
1009     result["outputValues"]        = outputValues
1010     result["specificOutputInfos"] = []
1011     result["returnCode"]          = 0
1012     result["errorMessage"]        = ""
1013
1014 All various modifications could be done from this template hypothesis.
1015
1016 Special case of controled evolution operator
1017 ++++++++++++++++++++++++++++++++++++++++++++
1018
1019 In some cases, the evolution or the observation operators are required to be
1020 controled by an external input control, given a priori. In this case, the
1021 generic form of the incremental evolution model is slightly modified as follows:
1022
1023 .. math:: \mathbf{y} = O( \mathbf{x}, \mathbf{u})
1024
1025 where :math:`\mathbf{u}` is the control over one state increment. In this case,
1026 the direct operator has to be applied to a pair of variables :math:`(X,U)`.
1027 Schematically, the operator has to be set as::
1028
1029     def DirectOperator( (X, U) ):
1030         """ Direct non-linear simulation operator """
1031         ...
1032         ...
1033         ...
1034         return something like X(n+1) or Y(n+1)
1035
1036 The tangent and adjoint operators have the same signature as previously, noting
1037 that the derivatives has to be done only partially against :math:`\mathbf{x}`.
1038 In such a case with explicit control, only the second functional form (using
1039 "*ScriptWithFunctions*") and third functional form (using "*ScriptWithSwitch*")
1040 can be used.