doc/reference.rst

   1 .. _section_reference:
   2
   3 ================================================================================
   4 Reference description of the ADAO commands and keywords
   5 ================================================================================
   6
   7 This section presents the reference description of the ADAO commands and
   8 keywords available through the GUI or through scripts.
   9
  10 Each command or keyword to be defined through the ADAO GUI has some properties.
  11 The first property is to be *required*, *optional* or only factual, describing a
  12 type of input. The second property is to be an "open" variable with a fixed type
  13 but with any value allowed by the type, or a "restricted" variable, limited to
  14 some specified values. The EFICAS editor GUI having build-in validating
  15 capacities, the properties of the commands or keywords given through this GUI
  16 are automatically correct.
  17
  18 The mathematical notations used afterward are explained in the section
  19 :ref:`section_theory`.
  20
  21 Examples of using these commands are available in the section
  22 :ref:`section_examples` and in example files installed with ADAO module.
  23
  24 List of possible input types
  25 ----------------------------
  26
  27 .. index:: single: Dict
  28 .. index:: single: Function
  29 .. index:: single: Matrix
  30 .. index:: single: ScalarSparseMatrix
  31 .. index:: single: DiagonalSparseMatrix
  32 .. index:: single: String
  33 .. index:: single: Script
  34 .. index:: single: Vector
  35
  36 Each ADAO variable has a pseudo-type to help filling it and validation. The
  37 different pseudo-types are:
  38
  39 **Dict**
  40     This indicates a variable that has to be filled by a dictionary, usually
  41     given as a script.
  42
  43 **Function**
  44     This indicates a variable that has to be filled by a function, usually given
  45     as a script or a component method.
  46
  47 **Matrix**
  48     This indicates a variable that has to be filled by a matrix, usually given
  49     either as a string or as a script.
  50
  51 **ScalarSparseMatrix**
  52     This indicates a variable that has to be filled by a unique number, which
  53     will be used to multiply an identity matrix, usually given either as a
  54     string or as a script.
  55
  56 **DiagonalSparseMatrix**
  57     This indicates a variable that has to be filled by a vector, which will be
  58     over the diagonal of an identity matrix, usually given either as a string or
  59     as a script.
  60
  61 **Script**
  62     This indicates a script given as an external file. It can be described by a
  63     full absolute path name or only by the file name without path.
  64
  65 **String**
  66     This indicates a string giving a literal representation of a matrix, a
  67     vector or a vector serie, such as "1 2 ; 3 4" for a square 2x2 matrix.
  68
  69 **Vector**
  70     This indicates a variable that has to be filled by a vector, usually given
  71     either as a string or as a script.
  72
  73 **VectorSerie** This indicates a variable that has to be filled by a list of
  74     vectors, usually given either as a string or as a script.
  75
  76 When a command or keyword can be filled by a script file name, the script has to
  77 contain a variable or a method that has the same name as the one to be filled.
  78 In other words, when importing the script in a YACS Python node, it must create
  79 a variable of the good name in the current namespace.
  80
  81 Reference description for ADAO calculation cases
  82 ------------------------------------------------
  83
  84 List of commands and keywords for an ADAO calculation case
  85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  86
  87 .. index:: single: ASSIMILATION_STUDY
  88 .. index:: single: Algorithm
  89 .. index:: single: AlgorithmParameters
  90 .. index:: single: Background
  91 .. index:: single: BackgroundError
  92 .. index:: single: ControlInput
  93 .. index:: single: Debug
  94 .. index:: single: EvolutionError
  95 .. index:: single: EvolutionModel
  96 .. index:: single: InputVariables
  97 .. index:: single: Observation
  98 .. index:: single: ObservationError
  99 .. index:: single: ObservationOperator
 100 .. index:: single: Observers
 101 .. index:: single: OutputVariables
 102 .. index:: single: Study_name
 103 .. index:: single: Study_repertory
 104 .. index:: single: UserDataInit
 105 .. index:: single: UserPostAnalysis
 106
 107 The first set of commands is related to the description of a calculation case,
 108 that is a *Data Assimilation* procedure or an *Optimization* procedure. The
 109 terms are ordered in alphabetical order, except the first, which describes
 110 choice between calculation or checking. The different commands are the
 111 following:
 112
 113 **ASSIMILATION_STUDY**
 114     *Required command*. This is the general command describing the data
 115     assimilation or optimization case. It hierarchically contains all the other
 116     commands.
 117
 118 **Algorithm**
 119     *Required command*. This is a string to indicate the data assimilation or
 120     optimization algorithm chosen. The choices are limited and available through
 121     the GUI. There exists for example "3DVAR", "Blue"... See below the list of
 122     algorithms and associated parameters in the following subsection `Options
 123     and required commands for calculation algorithms`_.
 124
 125 **AlgorithmParameters**
 126     *Optional command*. This command allows to add some optional parameters to
 127     control the data assimilation or optimization algorithm. It is defined as a
 128     "*Dict*" type object, that is, given as a script. See below the list of
 129     algorithms and associated parameters in the following subsection `Options
 130     and required commands for calculation algorithms`_.
 131
 132 **Background**
 133     *Required command*. This indicates the background or initial vector used,
 134     previously noted as :math:`\mathbf{x}^b`. It is defined as a "*Vector*" type
 135     object, that is, given either as a string or as a script.
 136
 137 **BackgroundError**
 138     *Required command*. This indicates the background error covariance matrix,
 139     previously noted as :math:`\mathbf{B}`. It is defined as a "*Matrix*" type
 140     object, a "*ScalarSparseMatrix*" type object, or a "*DiagonalSparseMatrix*"
 141     type object, that is, given either as a string or as a script.
 142
 143 **ControlInput**
 144     *Optional command*. This indicates the control vector used to force the
 145     evolution model at each step, usually noted as :math:`\mathbf{U}`. It is
 146     defined as a "*Vector*" or a *VectorSerie* type object, that is, given
 147     either as a string or as a script. When there is no control, it has to be a
 148     void string ''.
 149
 150 **Debug**
 151     *Required command*. This define the level of trace and intermediary debug
 152     information. The choices are limited between 0 (for False) and 1 (for
 153     True).
 154
 155 **EvolutionError**
 156     *Optional command*. This indicates the evolution error covariance matrix,
 157     usually noted as :math:`\mathbf{Q}`. It is defined as a "*Matrix*" type
 158     object, a "*ScalarSparseMatrix*" type object, or a "*DiagonalSparseMatrix*"
 159     type object, that is, given either as a string or as a script.
 160
 161 **EvolutionModel**
 162     *Optional command*. This indicates the evolution model operator, usually
 163     noted :math:`M`, which describes a step of evolution. It is defined as a
 164     "*Function*" type object, that is, given as a script. Different functional
 165     forms can be used, as described in the following subsection `Requirements
 166     for functions describing an operator`_. If there is some control :math:`U`
 167     included in the evolution model, the operator has to be applied to a pair
 168     :math:`(X,U)`.
 169
 170 **InputVariables**
 171     *Optional command*. This command allows to indicates the name and size of
 172     physical variables that are bundled together in the control vector. This
 173     information is dedicated to data processed inside an algorithm.
 174
 175 **Observation**
 176     *Required command*. This indicates the observation vector used for data
 177     assimilation or optimization, previously noted as :math:`\mathbf{y}^o`. It
 178     is defined as a "*Vector*" or a *VectorSerie* type object, that is, given
 179     either as a string or as a script.
 180
 181 **ObservationError**
 182     *Required command*. This indicates the observation error covariance matrix,
 183     previously noted as :math:`\mathbf{R}`. It is defined as a "*Matrix*" type
 184     object, a "*ScalarSparseMatrix*" type object, or a "*DiagonalSparseMatrix*"
 185     type object, that is, given either as a string or as a script.
 186
 187 **ObservationOperator**
 188     *Required command*. This indicates the observation operator, previously
 189     noted :math:`H`, which transforms the input parameters :math:`\mathbf{x}` to
 190     results :math:`\mathbf{y}` to be compared to observations
 191     :math:`\mathbf{y}^o`. It is defined as a "*Function*" type object, that is,
 192     given as a script. Different functional forms can be used, as described in
 193     the following subsection `Requirements for functions describing an
 194     operator`_. If there is some control :math:`U` included in the observation,
 195     the operator has to be applied to a pair :math:`(X,U)`.
 196
 197 **Observers**
 198     *Optional command*. This command allows to set internal observers, that are
 199     functions linked with a particular variable, which will be executed each
 200     time this variable is modified. It is a convenient way to monitor variables
 201     of interest during the data assimilation or optimization process, by
 202     printing or plotting it, etc. Common templates are provided to help the user
 203     to start or to quickly make his case.
 204
 205 **OutputVariables**
 206     *Optional command*. This command allows to indicates the name and size of
 207     physical variables that are bundled together in the output observation
 208     vector. This information is dedicated to data processed inside an algorithm.
 209
 210 **Study_name**
 211     *Required command*. This is an open string to describe the study by a name
 212     or a sentence.
 213
 214 **Study_repertory**
 215     *Optional command*. If available, this repertory is used to find all the
 216     script files that can be used to define some other commands by scripts.
 217
 218 **UserDataInit**
 219     *Optional command*. This commands allows to initialize some parameters or
 220     data automatically before data assimilation algorithm processing.
 221
 222 **UserPostAnalysis**
 223     *Optional command*. This commands allows to process some parameters or data
 224     automatically after data assimilation algorithm processing. It is defined as
 225     a script or a string, allowing to put post-processing code directly inside
 226     the ADAO case. Common templates are provided to help the user to start or
 227     to quickly make his case.
 228
 229 Options and required commands for calculation algorithms
 230 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 231
 232 .. index:: single: 3DVAR
 233 .. index:: single: Blue
 234 .. index:: single: EnsembleBlue
 235 .. index:: single: KalmanFilter
 236 .. index:: single: ExtendedKalmanFilter
 237 .. index:: single: LinearLeastSquares
 238 .. index:: single: NonLinearLeastSquares
 239 .. index:: single: ParticleSwarmOptimization
 240 .. index:: single: QuantileRegression
 241
 242 .. index:: single: AlgorithmParameters
 243 .. index:: single: Bounds
 244 .. index:: single: CostDecrementTolerance
 245 .. index:: single: GradientNormTolerance
 246 .. index:: single: GroupRecallRate
 247 .. index:: single: MaximumNumberOfSteps
 248 .. index:: single: Minimizer
 249 .. index:: single: NumberOfInsects
 250 .. index:: single: ProjectedGradientTolerance
 251 .. index:: single: QualityCriterion
 252 .. index:: single: Quantile
 253 .. index:: single: SetSeed
 254 .. index:: single: StoreInternalVariables
 255 .. index:: single: StoreSupplementaryCalculations
 256 .. index:: single: SwarmVelocity
 257
 258 Each algorithm can be controlled using some generic or specific options given
 259 through the "*AlgorithmParameters*" optional command, as follows for example::
 260
 261     AlgorithmParameters = {
 262         "Minimizer" : "LBFGSB",
 263         "MaximumNumberOfSteps" : 25,
 264         "StoreSupplementaryCalculations" : ["APosterioriCovariance","OMA"],
 265         }
 266
 267 This section describes the available options algorithm by algorithm. If an
 268 option is specified for an algorithm that doesn't support it, the option is
 269 simply left unused. The meaning of the acronyms or particular names can be found
 270 in the :ref:`genindex` or the :ref:`section_glossary`. In addition, for each
 271 algorithm, the required commands/keywords are given, being described in `List of
 272 commands and keywords for an ADAO calculation case`_.
 273
 274 **"Blue"**
 275
 276   *Required commands*
 277     *"Background", "BackgroundError",
 278     "Observation", "ObservationError",
 279     "ObservationOperator"*
 280
 281   StoreInternalVariables
 282     This boolean key allows to store default internal variables, mainly the
 283     current state during iterative optimization process. Be careful, this can be
 284     a numerically costly choice in certain calculation cases. The default is
 285     "False".
 286
 287   StoreSupplementaryCalculations
 288     This list indicates the names of the supplementary variables that can be
 289     available at the end of the algorithm. It involves potentially costly
 290     calculations. The default is a void list, none of these variables being
 291     calculated and stored by default. The possible names are in the following
 292     list: ["APosterioriCovariance", "BMA", "OMA", "OMB", "Innovation",
 293     "SigmaBck2", "SigmaObs2", "MahalanobisConsistency"].
 294
 295 **"LinearLeastSquares"**
 296
 297   *Required commands*
 298     *"Observation", "ObservationError",
 299     "ObservationOperator"*
 300
 301   StoreInternalVariables
 302     This boolean key allows to store default internal variables, mainly the
 303     current state during iterative optimization process. Be careful, this can be
 304     a numerically costly choice in certain calculation cases. The default is
 305     "False".
 306
 307   StoreSupplementaryCalculations
 308     This list indicates the names of the supplementary variables that can be
 309     available at the end of the algorithm. It involves potentially costly
 310     calculations. The default is a void list, none of these variables being
 311     calculated and stored by default. The possible names are in the following
 312     list: ["OMA"].
 313
 314 **"3DVAR"**
 315
 316   *Required commands*
 317     *"Background", "BackgroundError",
 318     "Observation", "ObservationError",
 319     "ObservationOperator"*
 320
 321   Minimizer
 322     This key allows to choose the optimization minimizer. The default choice
 323     is "LBFGSB", and the possible ones are "LBFGSB" (nonlinear constrained
 324     minimizer, see [Byrd95]_ and [Zhu97]_), "TNC" (nonlinear constrained
 325     minimizer), "CG" (nonlinear unconstrained minimizer), "BFGS" (nonlinear
 326     unconstrained minimizer), "NCG" (Newton CG minimizer).
 327
 328   Bounds
 329     This key allows to define upper and lower bounds for every control
 330     variable being optimized. Bounds can be given by a list of list of pairs
 331     of lower/upper bounds for each variable, with possibly ``None`` every time
 332     there is no bound. The bounds can always be specified, but they are taken
 333     into account only by the constrained minimizers.
 334
 335   MaximumNumberOfSteps
 336     This key indicates the maximum number of iterations allowed for iterative
 337     optimization. The default is 15000, which is very similar to no limit on
 338     iterations. It is then recommended to adapt this parameter to the needs on
 339     real problems. For some minimizers, the effective stopping step can be
 340     slightly different due to algorithm internal control requirements.
 341
 342   CostDecrementTolerance
 343     This key indicates a limit value, leading to stop successfully the
 344     iterative optimization process when the cost function decreases less than
 345     this tolerance at the last step. The default is 1.e-7, and it is
 346     recommended to adapt it to the needs on real problems.
 347
 348   ProjectedGradientTolerance
 349     This key indicates a limit value, leading to stop successfully the iterative
 350     optimization process when all the components of the projected gradient are
 351     under this limit. It is only used for constrained minimizers. The default is
 352     -1, that is the internal default of each minimizer (generally 1.e-5), and it
 353     is not recommended to change it.
 354
 355   GradientNormTolerance
 356     This key indicates a limit value, leading to stop successfully the
 357     iterative optimization process when the norm of the gradient is under this
 358     limit. It is only used for non-constrained minimizers.  The default is
 359     1.e-5 and it is not recommended to change it.
 360
 361   StoreInternalVariables
 362     This boolean key allows to store default internal variables, mainly the
 363     current state during iterative optimization process. Be careful, this can be
 364     a numerically costly choice in certain calculation cases. The default is
 365     "False".
 366
 367   StoreSupplementaryCalculations
 368     This list indicates the names of the supplementary variables that can be
 369     available at the end of the algorithm. It involves potentially costly
 370     calculations. The default is a void list, none of these variables being
 371     calculated and stored by default. The possible names are in the following
 372     list: ["APosterioriCovariance", "BMA", "OMA", "OMB", "Innovation",
 373     "SigmaObs2", "MahalanobisConsistency"].
 374
 375 **"NonLinearLeastSquares"**
 376
 377   *Required commands*
 378     *"Background",
 379     "Observation", "ObservationError",
 380     "ObservationOperator"*
 381
 382   Minimizer
 383     This key allows to choose the optimization minimizer. The default choice
 384     is "LBFGSB", and the possible ones are "LBFGSB" (nonlinear constrained
 385     minimizer, see [Byrd95]_ and [Zhu97]_), "TNC" (nonlinear constrained
 386     minimizer), "CG" (nonlinear unconstrained minimizer), "BFGS" (nonlinear
 387     unconstrained minimizer), "NCG" (Newton CG minimizer).
 388
 389   Bounds
 390     This key allows to define upper and lower bounds for every control
 391     variable being optimized. Bounds can be given by a list of list of pairs
 392     of lower/upper bounds for each variable, with possibly ``None`` every time
 393     there is no bound. The bounds can always be specified, but they are taken
 394     into account only by the constrained minimizers.
 395
 396   MaximumNumberOfSteps
 397     This key indicates the maximum number of iterations allowed for iterative
 398     optimization. The default is 15000, which is very similar to no limit on
 399     iterations. It is then recommended to adapt this parameter to the needs on
 400     real problems. For some minimizers, the effective stopping step can be
 401     slightly different due to algorithm internal control requirements.
 402
 403   CostDecrementTolerance
 404     This key indicates a limit value, leading to stop successfully the
 405     iterative optimization process when the cost function decreases less than
 406     this tolerance at the last step. The default is 1.e-7, and it is
 407     recommended to adapt it to the needs on real problems.
 408
 409   ProjectedGradientTolerance
 410     This key indicates a limit value, leading to stop successfully the iterative
 411     optimization process when all the components of the projected gradient are
 412     under this limit. It is only used for constrained minimizers. The default is
 413     -1, that is the internal default of each minimizer (generally 1.e-5), and it
 414     is not recommended to change it.
 415
 416   GradientNormTolerance
 417     This key indicates a limit value, leading to stop successfully the
 418     iterative optimization process when the norm of the gradient is under this
 419     limit. It is only used for non-constrained minimizers.  The default is
 420     1.e-5 and it is not recommended to change it.
 421
 422   StoreInternalVariables
 423     This boolean key allows to store default internal variables, mainly the
 424     current state during iterative optimization process. Be careful, this can be
 425     a numerically costly choice in certain calculation cases. The default is
 426     "False".
 427
 428   StoreSupplementaryCalculations
 429     This list indicates the names of the supplementary variables that can be
 430     available at the end of the algorithm. It involves potentially costly
 431     calculations. The default is a void list, none of these variables being
 432     calculated and stored by default. The possible names are in the following
 433     list: ["BMA", "OMA", "OMB", "Innovation"].
 434
 435 **"EnsembleBlue"**
 436
 437   *Required commands*
 438     *"Background", "BackgroundError",
 439     "Observation", "ObservationError",
 440     "ObservationOperator"*
 441
 442   SetSeed
 443     This key allow to give an integer in order to fix the seed of the random
 444     generator used to generate the ensemble. A convenient value is for example
 445     1000. By default, the seed is left uninitialized, and so use the default
 446     initialization from the computer.
 447
 448 **"KalmanFilter"**
 449
 450   *Required commands*
 451     *"Background", "BackgroundError",
 452     "Observation", "ObservationError",
 453     "ObservationOperator",
 454     "EvolutionModel", "EvolutionError",
 455     "ControlInput"*
 456
 457   EstimationOf
 458     This key allows to choose the type of estimation to be performed. It can be
 459     either state-estimation, named "State", or parameter-estimation, named
 460     "Parameters". The default choice is "State".
 461
 462   StoreSupplementaryCalculations
 463     This list indicates the names of the supplementary variables that can be
 464     available at the end of the algorithm. It involves potentially costly
 465     calculations. The default is a void list, none of these variables being
 466     calculated and stored by default. The possible names are in the following
 467     list: ["APosterioriCovariance", "BMA", "Innovation"].
 468
 469 **"ExtendedKalmanFilter"**
 470
 471   *Required commands*
 472     *"Background", "BackgroundError",
 473     "Observation", "ObservationError",
 474     "ObservationOperator",
 475     "EvolutionModel", "EvolutionError",
 476     "ControlInput"*
 477
 478   Bounds
 479     This key allows to define upper and lower bounds for every control variable
 480     being optimized. Bounds can be given by a list of list of pairs of
 481     lower/upper bounds for each variable, with extreme values every time there
 482     is no bound. The bounds can always be specified, but they are taken into
 483     account only by the constrained minimizers.
 484
 485   ConstrainedBy
 486     This key allows to define the method to take bounds into account. The
 487     possible methods are in the following list: ["EstimateProjection"].
 488
 489   EstimationOf
 490     This key allows to choose the type of estimation to be performed. It can be
 491     either state-estimation, named "State", or parameter-estimation, named
 492     "Parameters". The default choice is "State".
 493
 494   StoreSupplementaryCalculations
 495     This list indicates the names of the supplementary variables that can be
 496     available at the end of the algorithm. It involves potentially costly
 497     calculations. The default is a void list, none of these variables being
 498     calculated and stored by default. The possible names are in the following
 499     list: ["APosterioriCovariance", "BMA", "Innovation"].
 500
 501 **"ParticleSwarmOptimization"**
 502
 503   *Required commands*
 504     *"Background", "BackgroundError",
 505     "Observation", "ObservationError",
 506     "ObservationOperator"*
 507
 508   MaximumNumberOfSteps
 509     This key indicates the maximum number of iterations allowed for iterative
 510     optimization. The default is 50, which is an arbitrary limit. It is then
 511     recommended to adapt this parameter to the needs on real problems.
 512
 513   NumberOfInsects
 514     This key indicates the number of insects or particles in the swarm. The
 515     default is 100, which is a usual default for this algorithm.
 516
 517   SwarmVelocity
 518     This key indicates the part of the insect velocity which is imposed by the
 519     swarm. It is a positive floating point value. The default value is 1.
 520
 521   GroupRecallRate
 522     This key indicates the recall rate at the best swarm insect. It is a
 523     floating point value between 0 and 1. The default value is 0.5.
 524
 525   QualityCriterion
 526     This key indicates the quality criterion, minimized to find the optimal
 527     state estimate. The default is the usual data assimilation criterion named
 528     "DA", the augmented ponderated least squares. The possible criteria has to
 529     be in the following list, where the equivalent names are indicated by "=":
 530     ["AugmentedPonderatedLeastSquares"="APLS"="DA",
 531     "PonderatedLeastSquares"="PLS", "LeastSquares"="LS"="L2",
 532     "AbsoluteValue"="L1", "MaximumError"="ME"]
 533
 534   SetSeed
 535     This key allow to give an integer in order to fix the seed of the random
 536     generator used to generate the ensemble. A convenient value is for example
 537     1000. By default, the seed is left uninitialized, and so use the default
 538     initialization from the computer.
 539
 540   StoreInternalVariables
 541     This boolean key allows to store default internal variables, mainly the
 542     current state during iterative optimization process. Be careful, this can be
 543     a numerically costly choice in certain calculation cases. The default is
 544     "False".
 545
 546   StoreSupplementaryCalculations
 547     This list indicates the names of the supplementary variables that can be
 548     available at the end of the algorithm. It involves potentially costly
 549     calculations. The default is a void list, none of these variables being
 550     calculated and stored by default. The possible names are in the following
 551     list: ["BMA", "OMA", "OMB", "Innovation"].
 552
 553 **"QuantileRegression"**
 554
 555   *Required commands*
 556     *"Background",
 557     "Observation",
 558     "ObservationOperator"*
 559
 560   Quantile
 561     This key allows to define the real value of the desired quantile, between
 562     0 and 1. The default is 0.5, corresponding to the median.
 563
 564   Minimizer
 565     This key allows to choose the optimization minimizer. The default choice
 566     and only available choice is "MMQR" (Majorize-Minimize for Quantile
 567     Regression).
 568
 569   MaximumNumberOfSteps
 570     This key indicates the maximum number of iterations allowed for iterative
 571     optimization. The default is 15000, which is very similar to no limit on
 572     iterations. It is then recommended to adapt this parameter to the needs on
 573     real problems.
 574
 575   CostDecrementTolerance
 576     This key indicates a limit value, leading to stop successfully the
 577     iterative optimization process when the cost function or the surrogate
 578     decreases less than this tolerance at the last step. The default is 1.e-6,
 579     and it is recommended to adapt it to the needs on real problems.
 580
 581   StoreInternalVariables
 582     This boolean key allows to store default internal variables, mainly the
 583     current state during iterative optimization process. Be careful, this can be
 584     a numerically costly choice in certain calculation cases. The default is
 585     "False".
 586
 587   StoreSupplementaryCalculations
 588     This list indicates the names of the supplementary variables that can be
 589     available at the end of the algorithm. It involves potentially costly
 590     calculations. The default is a void list, none of these variables being
 591     calculated and stored by default. The possible names are in the following
 592     list: ["BMA", "OMA", "OMB", "Innovation"].
 593
 594 Reference description for ADAO checking cases
 595 ---------------------------------------------
 596
 597 List of commands and keywords for an ADAO checking case
 598 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 599
 600 .. index:: single: CHECKING_STUDY
 601 .. index:: single: Algorithm
 602 .. index:: single: AlgorithmParameters
 603 .. index:: single: CheckingPoint
 604 .. index:: single: Debug
 605 .. index:: single: ObservationOperator
 606 .. index:: single: Study_name
 607 .. index:: single: Study_repertory
 608 .. index:: single: UserDataInit
 609
 610 The second set of commands is related to the description of a checking case,
 611 that is a procedure to check required properties on information somewhere else
 612 by a calculation case. The terms are ordered in alphabetical order, except the
 613 first, which describes choice between calculation or checking. The different
 614 commands are the following:
 615
 616 **CHECKING_STUDY**
 617     *Required command*. This is the general command describing the checking
 618     case. It hierarchically contains all the other commands.
 619
 620 **Algorithm**
 621     *Required command*. This is a string to indicate the data assimilation or
 622     optimization algorithm chosen. The choices are limited and available through
 623     the GUI. There exists for example "FunctionTest", "AdjointTest"... See below
 624     the list of algorithms and associated parameters in the following subsection
 625     `Options and required commands for checking algorithms`_.
 626
 627 **AlgorithmParameters**
 628     *Optional command*. This command allows to add some optional parameters to
 629     control the data assimilation or optimization algorithm. It is defined as a
 630     "*Dict*" type object, that is, given as a script. See below the list of
 631     algorithms and associated parameters in the following subsection `Options
 632     and required commands for checking algorithms`_.
 633
 634 **CheckingPoint**
 635     *Required command*. This indicates the vector used,
 636     previously noted as :math:`\mathbf{x}^b`. It is defined as a "*Vector*" type
 637     object, that is, given either as a string or as a script.
 638
 639 **Debug**
 640     *Required command*. This define the level of trace and intermediary debug
 641     information. The choices are limited between 0 (for False) and 1 (for
 642     True).
 643
 644 **ObservationOperator**
 645     *Required command*. This indicates the observation operator, previously
 646     noted :math:`H`, which transforms the input parameters :math:`\mathbf{x}` to
 647     results :math:`\mathbf{y}` to be compared to observations
 648     :math:`\mathbf{y}^o`. It is defined as a "*Function*" type object, that is,
 649     given as a script. Different functional forms can be used, as described in
 650     the following subsection `Requirements for functions describing an
 651     operator`_.
 652
 653 **Study_name**
 654     *Required command*. This is an open string to describe the study by a name
 655     or a sentence.
 656
 657 **Study_repertory**
 658     *Optional command*. If available, this repertory is used to find all the
 659     script files that can be used to define some other commands by scripts.
 660
 661 **UserDataInit**
 662     *Optional command*. This commands allows to initialize some parameters or
 663     data automatically before data assimilation algorithm processing.
 664
 665 Options and required commands for checking algorithms
 666 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 667
 668 .. index:: single: AdjointTest
 669 .. index:: single: FunctionTest
 670 .. index:: single: GradientTest
 671 .. index:: single: LinearityTest
 672
 673 .. index:: single: AlgorithmParameters
 674 .. index:: single: AmplitudeOfInitialDirection
 675 .. index:: single: EpsilonMinimumExponent
 676 .. index:: single: InitialDirection
 677 .. index:: single: ResiduFormula
 678 .. index:: single: SetSeed
 679
 680 We recall that each algorithm can be controlled using some generic or specific
 681 options given through the "*AlgorithmParameters*" optional command, as follows
 682 for example::
 683
 684     AlgorithmParameters = {
 685         "AmplitudeOfInitialDirection" : 1,
 686         "EpsilonMinimumExponent" : -8,
 687         }
 688
 689 If an option is specified for an algorithm that doesn't support it, the option
 690 is simply left unused. The meaning of the acronyms or particular names can be
 691 found in the :ref:`genindex` or the :ref:`section_glossary`. In addition, for
 692 each algorithm, the required commands/keywords are given, being described in
 693 `List of commands and keywords for an ADAO checking case`_.
 694
 695 **"AdjointTest"**
 696
 697   *Required commands*
 698     *"CheckingPoint",
 699     "ObservationOperator"*
 700
 701   AmplitudeOfInitialDirection
 702     This key indicates the scaling of the initial perturbation build as a vector
 703     used for the directional derivative around the nominal checking point. The
 704     default is 1, that means no scaling.
 705
 706   EpsilonMinimumExponent
 707     This key indicates the minimal exponent value of the power of 10 coefficient
 708     to be used to decrease the increment multiplier. The default is -8, and it
 709     has to be between 0 and -20. For example, its default value leads to
 710     calculate the residue of the scalar product formula with a fixed increment
 711     multiplied from 1.e0 to 1.e-8.
 712
 713   InitialDirection
 714     This key indicates the vector direction used for the directional derivative
 715     around the nominal checking point. It has to be a vector. If not specified,
 716     this direction defaults to a random perturbation around zero of the same
 717     vector size than the checking point.
 718
 719   SetSeed
 720     This key allow to give an integer in order to fix the seed of the random
 721     generator used to generate the ensemble. A convenient value is for example
 722     1000. By default, the seed is left uninitialized, and so use the default
 723     initialization from the computer.
 724
 725 **"FunctionTest"**
 726
 727   *Required commands*
 728     *"CheckingPoint",
 729     "ObservationOperator"*
 730
 731   No option
 732
 733 **"GradientTest"**
 734
 735   *Required commands*
 736     *"CheckingPoint",
 737     "ObservationOperator"*
 738
 739   AmplitudeOfInitialDirection
 740     This key indicates the scaling of the initial perturbation build as a vector
 741     used for the directional derivative around the nominal checking point. The
 742     default is 1, that means no scaling.
 743
 744   EpsilonMinimumExponent
 745     This key indicates the minimal exponent value of the power of 10 coefficient
 746     to be used to decrease the increment multiplier. The default is -8, and it
 747     has to be between 0 and -20. For example, its default value leads to
 748     calculate the residue of the scalar product formula with a fixed increment
 749     multiplied from 1.e0 to 1.e-8.
 750
 751   InitialDirection
 752     This key indicates the vector direction used for the directional derivative
 753     around the nominal checking point. It has to be a vector. If not specified,
 754     this direction defaults to a random perturbation around zero of the same
 755     vector size than the checking point.
 756
 757   ResiduFormula
 758     This key indicates the residue formula that has to be used for the test. The
 759     default choice is "Taylor", and the possible ones are "Taylor" (residue of
 760     the Taylor development of the operator, which has to decrease with the power
 761     of 2 in perturbation) and "Norm" (residue obtained by taking the norm of the
 762     Taylor development at zero order approximation, which approximate the
 763     gradient, and which has to remain constant).
 764
 765   SetSeed
 766     This key allow to give an integer in order to fix the seed of the random
 767     generator used to generate the ensemble. A convenient value is for example
 768     1000. By default, the seed is left uninitialized, and so use the default
 769     initialization from the computer.
 770
 771 **"LinearityTest"**
 772
 773   *Required commands*
 774     *"CheckingPoint",
 775     "ObservationOperator"*
 776
 777   AmplitudeOfInitialDirection
 778     This key indicates the scaling of the initial perturbation build as a vector
 779     used for the directional derivative around the nominal checking point. The
 780     default is 1, that means no scaling.
 781
 782   EpsilonMinimumExponent
 783     This key indicates the minimal exponent value of the power of 10 coefficient
 784     to be used to decrease the increment multiplier. The default is -8, and it
 785     has to be between 0 and -20. For example, its default value leads to
 786     calculate the residue of the scalar product formula with a fixed increment
 787     multiplied from 1.e0 to 1.e-8.
 788
 789   InitialDirection
 790     This key indicates the vector direction used for the directional derivative
 791     around the nominal checking point. It has to be a vector. If not specified,
 792     this direction defaults to a random perturbation around zero of the same
 793     vector size than the checking point.
 794
 795   ResiduFormula
 796     This key indicates the residue formula that has to be used for the test. The
 797     default choice is "CenteredDL", and the possible ones are "CenteredDL"
 798     (residue of the difference between the function at nominal point and the
 799     values with positive and negative increments, which has to stay very small),
 800     "Taylor" (residue of the Taylor development of the operator normalized by
 801     the nominal value, which has to stay very small), "NominalTaylor" (residue
 802     of the order 1 approximations of the operator, normalized to the nominal
 803     point, which has to stay close to 1), and "NominalTaylorRMS" (residue of the
 804     order 1 approximations of the operator, normalized by RMS to the nominal
 805     point, which has to stay close to 0).
 806
 807   SetSeed
 808     This key allow to give an integer in order to fix the seed of the random
 809     generator used to generate the ensemble. A convenient value is for example
 810     1000. By default, the seed is left uninitialized, and so use the default
 811     initialization from the computer.
 812
 813 Requirements for functions describing an operator
 814 -------------------------------------------------
 815
 816 The operators for observation and evolution are required to implement the data
 817 assimilation or optimization procedures. They include the physical simulation
 818 numerical simulations, but also the filtering and restriction to compare the
 819 simulation to observation. The evolution operator is considered here in its
 820 incremental form, representing the transition between two successive states, and
 821 is then similar to the observation operator.
 822
 823 Schematically, an operator has to give a output solution given the input
 824 parameters. Part of the input parameters can be modified during the optimization
 825 procedure. So the mathematical representation of such a process is a function.
 826 It was briefly described in the section :ref:`section_theory` and is generalized
 827 here by the relation:
 828
 829 .. math:: \mathbf{y} = O( \mathbf{x} )
 830
 831 between the pseudo-observations :math:`\mathbf{y}` and the parameters
 832 :math:`\mathbf{x}` using the observation or evolution operator :math:`O`. The
 833 same functional representation can be used for the linear tangent model
 834 :math:`\mathbf{O}` of :math:`O` and its adjoint :math:`\mathbf{O}^*`, also
 835 required by some data assimilation or optimization algorithms.
 836
 837 Then, **to describe completely an operator, the user has only to provide a
 838 function that fully and only realize the functional operation**.
 839
 840 This function is usually given as a script that can be executed in a YACS node.
 841 This script can without difference launch external codes or use internal SALOME
 842 calls and methods. If the algorithm requires the 3 aspects of the operator
 843 (direct form, tangent form and adjoint form), the user has to give the 3
 844 functions or to approximate them.
 845
 846 There are 3 practical methods for the user to provide the operator functional
 847 representation.
 848
 849 First functional form: using "*ScriptWithOneFunction*"
 850 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 851
 852 .. index:: single: ScriptWithOneFunction
 853 .. index:: single: DirectOperator
 854 .. index:: single: DifferentialIncrement
 855 .. index:: single: CenteredFiniteDifference
 856
 857 The first one consist in providing only one potentially non-linear function, and
 858 to approximate the tangent and the adjoint operators. This is done by using the
 859 keyword "*ScriptWithOneFunction*" for the description of the chosen operator in
 860 the ADAO GUI. The user have to provide the function in a script, with a
 861 mandatory name "*DirectOperator*". For example, the script can follow the
 862 template::
 863
 864     def DirectOperator( X ):
 865         """ Direct non-linear simulation operator """
 866         ...
 867         ...
 868         ...
 869         return Y=O(X)
 870
 871 In this case, the user can also provide a value for the differential increment,
 872 using through the GUI the keyword "*DifferentialIncrement*", which has a default
 873 value of 1%. This coefficient will be used in the finite difference
 874 approximation to build the tangent and adjoint operators. The finite difference
 875 approximation order can also be chosen through the GUI, using the keyword
 876 "*CenteredFiniteDifference*", with 0 for an uncentered schema of first order,
 877 and with 1 for a centered schema of second order (of twice the first order
 878 computational cost). The keyword has a default value of 0.
 879
 880 This first operator definition allow easily to test the functional form before
 881 its use in an ADAO case, greatly reducing the complexity of implementation.
 882
 883 **Important warning:** the name "*DirectOperator*" is mandatory, and the type of
 884 the X argument can be either a python list, a numpy array or a numpy 1D-matrix.
 885 The user has to treat these cases in his script.
 886
 887 Second functional form: using "*ScriptWithFunctions*"
 888 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 889
 890 .. index:: single: ScriptWithFunctions
 891 .. index:: single: DirectOperator
 892 .. index:: single: TangentOperator
 893 .. index:: single: AdjointOperator
 894
 895 The second one consist in providing directly the three associated operators
 896 :math:`O`, :math:`\mathbf{O}` and :math:`\mathbf{O}^*`. This is done by using
 897 the keyword "*ScriptWithFunctions*" for the description of the chosen operator
 898 in the ADAO GUI. The user have to provide three functions in one script, with
 899 three mandatory names "*DirectOperator*", "*TangentOperator*" and
 900 "*AdjointOperator*". For example, the script can follow the template::
 901
 902     def DirectOperator( X ):
 903         """ Direct non-linear simulation operator """
 904         ...
 905         ...
 906         ...
 907         return something like Y
 908
 909     def TangentOperator( (X, dX) ):
 910         """ Tangent linear operator, around X, applied to dX """
 911         ...
 912         ...
 913         ...
 914         return something like Y
 915
 916     def AdjointOperator( (X, Y) ):
 917         """ Adjoint operator, around X, applied to Y """
 918         ...
 919         ...
 920         ...
 921         return something like X
 922
 923 Another time, this second operator definition allow easily to test the
 924 functional forms before their use in an ADAO case, reducing the complexity of
 925 implementation.
 926
 927 **Important warning:** the names "*DirectOperator*", "*TangentOperator*" and
 928 "*AdjointOperator*" are mandatory, and the type of the X, Y, dX arguments can be
 929 either a python list, a numpy array or a numpy 1D-matrix. The user has to treat
 930 these cases in his script.
 931
 932 Third functional form: using "*ScriptWithSwitch*"
 933 +++++++++++++++++++++++++++++++++++++++++++++++++
 934
 935 .. index:: single: ScriptWithSwitch
 936 .. index:: single: DirectOperator
 937 .. index:: single: TangentOperator
 938 .. index:: single: AdjointOperator
 939
 940 This third form give more possibilities to control the execution of the three
 941 functions representing the operator, allowing advanced usage and control over
 942 each execution of the simulation code. This is done by using the keyword
 943 "*ScriptWithSwitch*" for the description of the chosen operator in the ADAO GUI.
 944 The user have to provide a switch in one script to control the execution of the
 945 direct, tangent and adjoint forms of its simulation code. The user can then, for
 946 example, use other approximations for the tangent and adjoint codes, or
 947 introduce more complexity in the argument treatment of the functions. But it
 948 will be far more complicated to implement and debug.
 949
 950 **It is recommended not to use this third functional form without a solid
 951 numerical or physical reason.**
 952
 953 If, however, you want to use this third form, we recommend using the following
 954 template for the switch. It requires an external script or code named
 955 "*Physical_simulation_functions.py*", containing three functions named
 956 "*DirectOperator*", "*TangentOperator*" and "*AdjointOperator*" as previously.
 957 Here is the switch template::
 958
 959     import Physical_simulation_functions
 960     import numpy, logging
 961     #
 962     method = ""
 963     for param in computation["specificParameters"]:
 964         if param["name"] == "method":
 965             method = param["value"]
 966     if method not in ["Direct", "Tangent", "Adjoint"]:
 967         raise ValueError("No valid computation method is given")
 968     logging.info("Found method is \'%s\'"%method)
 969     #
 970     logging.info("Loading operator functions")
 971     Function = Physical_simulation_functions.DirectOperator
 972     Tangent  = Physical_simulation_functions.TangentOperator
 973     Adjoint  = Physical_simulation_functions.AdjointOperator
 974     #
 975     logging.info("Executing the possible computations")
 976     data = []
 977     if method == "Direct":
 978         logging.info("Direct computation")
 979         Xcurrent = computation["inputValues"][0][0][0]
 980         data = Function(numpy.matrix( Xcurrent ).T)
 981     if method == "Tangent":
 982         logging.info("Tangent computation")
 983         Xcurrent  = computation["inputValues"][0][0][0]
 984         dXcurrent = computation["inputValues"][0][0][1]
 985         data = Tangent(numpy.matrix(Xcurrent).T, numpy.matrix(dXcurrent).T)
 986     if method == "Adjoint":
 987         logging.info("Adjoint computation")
 988         Xcurrent = computation["inputValues"][0][0][0]
 989         Ycurrent = computation["inputValues"][0][0][1]
 990         data = Adjoint((numpy.matrix(Xcurrent).T, numpy.matrix(Ycurrent).T))
 991     #
 992     logging.info("Formatting the output")
 993     it = numpy.ravel(data)
 994     outputValues = [[[[]]]]
 995     for val in it:
 996       outputValues[0][0][0].append(val)
 997     #
 998     result = {}
 999     result["outputValues"]        = outputValues
1000     result["specificOutputInfos"] = []
1001     result["returnCode"]          = 0
1002     result["errorMessage"]        = ""
1003
1004 All various modifications could be done from this template hypothesis.
1005
1006 Special case of controled evolution operator
1007 ++++++++++++++++++++++++++++++++++++++++++++
1008
1009 In some cases, the evolution or the observation operators are required to be
1010 controled by an external input control, given a priori. In this case, the
1011 generic form of the incremental evolution model is slightly modified as follows:
1012
1013 .. math:: \mathbf{y} = O( \mathbf{x}, \mathbf{u})
1014
1015 where :math:`\mathbf{u}` is the control over one state increment. In this case,
1016 the direct operator has to be applied to a pair of variables :math:`(X,U)`.
1017 Schematically, the operator has to be set as::
1018
1019     def DirectOperator( (X, U) ):
1020         """ Direct non-linear simulation operator """
1021         ...
1022         ...
1023         ...
1024         return something like X(n+1) or Y(n+1)
1025
1026 The tangent and adjoint operators have the same signature as previously, noting
1027 that the derivatives has to be done only partially against :math:`\mathbf{x}`.
1028 In such a case with explicit control, only the second functional form (using
1029 "*ScriptWithFunctions*") and third functional form (using "*ScriptWithSwitch*")
1030 can be used.