From: ribes Date: Fri, 1 Oct 2010 12:29:32 +0000 (+0000) Subject: Documentation work - pre realease for 6.2 X-Git-Tag: V6_2_0~3 X-Git-Url: http://git.salome-platform.org/gitweb/?a=commitdiff_plain;h=2b7eb3491457d08d359328c1660259c45e18b6b3;p=modules%2Fjobmanager.git Documentation work - pre realease for 6.2 --- diff --git a/doc/advanced.rst b/doc/advanced.rst index 75416a1..6f87c16 100644 --- a/doc/advanced.rst +++ b/doc/advanced.rst @@ -1,30 +1,30 @@ Advanced part ============= -This chapter a melting pot of informations about what the JOBMANAGER do. +This chapter is a melting pot of information about what the JOBMANAGER do. Logs files ++++++++++ Whatever the type of job, the JOBMANAGER provides many files for logging what happens during the job. These files are located in the work directory -of the job in a directory named log. +of the job in a directory named **logs**. For **command** type of job, one log file is created. This file contains the normal and error output of the job. The file name contains the type of job and the date, e.g. **command_Thu_Sep_30_15_04_51_2010.log**. -For **SALOME** type of job, two log files are create. The common file contains the -normal and error output of the SALOME services. The file name contains is like this: -**salome_Wed_Feb_10_13_54_00_2010.log**. The other depends of the type of SALOME jobs: +For **SALOME** type of job, two log files are created. The common file contains the +normal and error output of the SALOME services. The file name is like this: +**salome_Wed_Feb_10_13_54_00_2010.log**. The other file depends of the type of SALOME jobs: - For **Python** type of job, the file contains all the output of the scripts. The file name is like this: **python_Wed_Feb_10_13_54_00_2010.log**. - For **YACS** type of job, the file contains all the output of the schema. The file name is like this: **yacs_Wed_Feb_10_13_54_00_2010.log**. -For jobs that are launch in a batch resource like PBS or LSF, two more files are provided -that contains the normal and output messages of the PBS or LSF job. The files are like this: +For jobs that are launched in a batch resource like PBS or LSF, two more files are provided +that contains the normal and error output messages of the PBS or LSF job. These files are like this: **error.log.runCommand_test_command_Wed_Sep__8_17_02_44_2010** and **output.log.runCommand_test_command_Wed_Sep__8_17_02_44_2010** @@ -32,21 +32,21 @@ How the JOBMANAGER launch the job +++++++++++++++++++++++++++++++++ For each type of job, the JOBMANAGER creates a shell that permits to launch in the resource -the job file. It's in this file that the environnement file is used. For a command job -this file is like this: **runCommand_test_command_Wed_Sep__8_16_59_08_2010.sh**. +the job file. It's in this file that the environment file is used. For a command job, +the file name is like this: **runCommand_test_command_Wed_Sep__8_16_59_08_2010.sh**. If a job has to be launched in a resource with a batch manager like PBS or LSF an another -file is created that contains batch directives. For a command job this file is like +file is created that contains batch directives. For a command job, the file name is like this: **runCommand_test_command_Wed_Sep__8_16_59_08_2010_Batch.sh**. Current limitations +++++++++++++++++++ -Currently, for SALOME type of jobs, the scope of the environnement file is +Currently, for SALOME type of jobs, the scope of the environment file is restricted to the main SALOME session. Distributed containers launched in remote -computers are not in the scope. If you want to give an environnement file to all -your containers, use a SALOME application and copy this environnement file into the **env.d** -directory. You also could generates into your environnement a new file into the **env.d** directory. +computers are not in the scope of the environment file. If you want to give an environment file to all +your containers, use a SALOME application and copy this environment file into the **env.d** +directory. -Currently, logs files does not contain remote containers outputs. +Currently, logs files do not contain remote containers outputs. diff --git a/doc/conf.py b/doc/conf.py index 2168393..54870ec 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -46,9 +46,9 @@ copyright = '2010 CEA/DEN, EDF R&D, A. Ribes' # built documents. # # The short X.Y version. -version = '6.0' +version = '6.2' # The full version, including alpha/beta/rc tags. -release = '6.0' +release = '6.2' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/doc/intro.rst b/doc/intro.rst index 1074d38..4b42133 100644 --- a/doc/intro.rst +++ b/doc/intro.rst @@ -16,32 +16,32 @@ Main functionalities The JOBMANAGER module permits to define three types of jobs: -- User scripts -- Python scripts launched in a SALOME session -- YACS schemas +- User scripts. +- Python scripts launched in a SALOME session. +- YACS schemas. -The module handle different types of computational resources: +The module handles different types of computational resources: -- Interactive computers (rsh, ssh) -- Clusters managed by batch systems like PBS, LSF or SGE +- Interactive computers (rsh, ssh). +- Clusters managed by batch systems like PBS, LSF or SGE. User's job list could be saved and loaded. Finally, the module provides -an editor for managing user's SALOME resource. +an editor for managing user's SALOME resources. General description of the GUI ++++++++++++++++++++++++++++++ -JOBMANAGER's GUI is divided in four parts : +JOBMANAGER's GUI is divided in four parts: -- Part 1 shows your job list. It also provides many buttons +- Part 1 shows the user job list. It also provides many buttons to manage a job (create, delete, start, ...). - Part 2 shows the parameters of the job selected in the list of part 1. - It's divided in two tabs. The first is for the general informations, the second + It's divided in two tabs. The first is for the general information, the second is dedicated to the files. -- Part 3 provides a text widget dedicated of SALOME KERNEL messages. It provides - the buttons for loading or saving the job list. -- Part 4 is divided in two tabs. The first provides a summary of jobs status. - The second provides the SALOME resource editor. +- Part 3 provides a text widget dedicated of SALOME messages. It provides + the buttons for loading or saving the job list. It also provides an auto refresh button. +- Part 4 is divided in two tabs. The first tab provides a summary of user's jobs status. + The second tab provides the SALOME resource editor. .. figure:: images/jobmanager_overview_part.png :scale: 75 % diff --git a/doc/job.rst b/doc/job.rst index 90eb5d3..7a50139 100644 --- a/doc/job.rst +++ b/doc/job.rst @@ -3,44 +3,44 @@ Main JOBMANAGER concept: Job This chapter explains the main JOBMANAGER concept: a job. -What is a job ? -+++++++++++++++ +What is a job? +++++++++++++++ -A job is a work that a user want to perform on a computation resource (single computer or a cluster). -The JOBMANAGER different types of job depending of what is the user want to do. +A job is a work that a user wants to perform on a computation resource (single computer or a cluster). +The JOBMANAGER provides different types of job depending of what a user wants to do. -There is three types of described in the table below. +There are three types of described in the table below. ======================== ============================================================================== **Type of job** **Description** ======================== ============================================================================== -**Command script** It's a shell script containing the users commands. This kind of job is not - related. It could be used to launch any codes. -**SALOME Python script** It's Python script that will be launched into a SALOME session dedicated to +**Command script** It's a shell script containing the user's commands. This kind of job is not + related to SALOME. It could be used to launch any codes. +**SALOME Python script** It's a Python script that will be launched into a SALOME session dedicated to this script. -**YACS schema** It's YACS schema that will be launched into a SALOME session dedicated to this +**YACS schema** It's a YACS schema that will be launched into a SALOME session dedicated to this schema. ======================== ============================================================================== Job content description +++++++++++++++++++++++ -All types of job share some attributes. There could be specific attribute for some types of job. It will -be indicated in this documentation if there is some specific attributes. A job has to kinds of attributes: +All types of job share some attributes. There could be specific attributes for some types of jobs. These exceptions +will be indicated in the future in this documentation. A job has two kinds of attributes: attributes that describes the job himself, and attributes that describes the computation requirements. -The first table below describes the attributes of the job. +The first table below describes the attributes of a job. ======================== ================ ============================================================================== **Attribute** **Mandatory** **Description** ======================== ================ ============================================================================== **Name** Yes This is the name of the job. It's unique into a SALOME session. -**Type** Yes This is the type of the job. Currently there is three types: *command*, +**Type** Yes This is the type of the job. Currently, there are three types: *command*, *python_salome* and *yacs_file*. -**Job file** Yes This is the name with the location of the file containing the job's data. +**Job file** Yes This is the name, with the location, of the file containing the job's data. Depending of the type it could a *shell* script, a *Python script* or - a *YACS schema*. -**Env file** No An environnement file could be attached to the job. It will be executed before + a *YACS schema*, e.g. **/home/user/work.sh**. +**Env file** No An environment file could be attached to the job. It will be executed before the job. **Input files** No A list of files or directories in the user computer that have to copied into the job's *work directory*. @@ -58,10 +58,10 @@ The second table below describes the attributes of computation requirements. ======================== ============================================================================== **Attribute** **Description** ======================== ============================================================================== -**Maximum duration** It's maximum expected duration of the job. When a batch manager is used, this - time is interpreted as a **walltime** and not a **cputime**. -**Number of cpu** It's the number of cpus/cores resquested. -**Memory** It's the amoun of memory per cpu/core expected. +**Maximum duration** It's the maximum expected duration of the job. When a batch manager is used, this + time is interpreted as a **walltime** and not as a **cputime**. +**Number of cpu** It's the number of cpus/cores requested. +**Memory** It's the amount of memory per cpu/core expected. ======================== ============================================================================== Job's states @@ -74,14 +74,14 @@ A job could have many states in the JOBMANAGER. The table below describes the no ======================== ============================================================================== **Created** The job is correctly created and could be launched. **In_Process** It's a transient state between *Created* and *Queued*. -**Queued** The job is queued into the resource batch manager. +**Queued** The job is queued into the resource's batch manager. **Paused** The job is paused. Currently the JOBMANAGER GUI does not allow to paused a job. **Running** The job is running on the resource. -**Finished** The job has run an it's finished. +**Finished** The job has run and it's finished. ======================== ============================================================================== -The table below describes the errors states. +The table below describes the error states. ======================== ============================================================================== **State** **Description** @@ -90,7 +90,7 @@ The table below describes the errors states. It's often a problem with the selected resource. **Failed** This state means that the execution of the job in the resource failed. **Error** This state is used when a job is loaded and that it cannot be followed. It - mainly happens when a job into a *ssh* resource is running. if the list is - saved, it will an error when the list will be loaded. *ssh* resource cannot - be followed. + mainly happens when a job was launched into a *ssh* resource. If the list is + saved, an error will happen when the list is loaded (*ssh* resource cannot + be followed). ======================== ============================================================================== diff --git a/doc/jobmanager_gui.rst b/doc/jobmanager_gui.rst index a17b32a..7b8c9ad 100644 --- a/doc/jobmanager_gui.rst +++ b/doc/jobmanager_gui.rst @@ -4,7 +4,7 @@ Using the JOBMANAGER GUI Job management with the GUI --------------------------- -This section describes the parts (part 1 and 2 described in the introduction) of the JOBMANAGER GUI +This section describes the parts 1 and 2 (described in the introduction section) of the JOBMANAGER GUI dedicated to the management of jobs. JOBMANAGER provides some buttons to manage the user job list (see figure :ref:`figure_jobmanager_main_buttons`). @@ -22,8 +22,8 @@ The description of each button (framed in blue in the figure) is given in the ta **Button** **Condition of activation** **Description** ======================== ============================ ================================================================ **Create a job** Always activated Launch the job wizard to create a job. -**Edit/Clone a job** Job selected Edit a job in *created* or *Error* or *Failed* or *Not Created*. - Clone a job in other state. +**Edit/Clone a job** Job selected Edit a job in *created* or *Error* or *Failed* or *Not Created* + state. Clone a job in other state. **Start a job** Job selected and job state Start a job. equals to *Created* **Restart a job** Job selected and job state Restart a job. @@ -33,28 +33,28 @@ The description of each button (framed in blue in the figure) is given in the ta **Get job results** Job selected and job state Get job results in the result directory. equals to *Finish* or *Failed* -**Refresh jobs** Always activated Update jobs state. +**Refresh jobs** Always activated Update jobs states. ======================== ============================ ================================================================ -**Tip:** You could use Auto Refresh button in part 3 to enable an automatic refresh. +**Tip:** You could use the **Auto Refresh** button in GUI part 3 to enable an automatic refresh. -When a job selected, the part 2 is filled with all the informations that the JOBMANAGER has on the job. The figure -:ref:`figure_jobmanager_job_focus` shows this part of the GUI. It contains to tabs, the first tab provides the main informations and the run informations. -The second shows the input and output file list of the job. +When a job is selected, the part 2 is filled with all the information that the JOBMANAGER has on the job. The figure +:ref:`figure_jobmanager_job_focus` shows this part of the GUI. It contains two tabs. The first tab provides the main +information and the run information. The second tab shows the input and output file list of the job. .. _figure_jobmanager_job_focus: .. figure:: images/jobmanager_job_focus.png :align: center - **Job widget informations** + **Job widget information** Job creation workflow --------------------- This section describes the workflow when a job is created. The jobmanager uses a wizard to create a job. -The first page (see :ref:`figure_jobmanager_job_workflow_1`) of the wizard is to define the **job name** +The first page (see :ref:`figure_jobmanager_job_workflow_1`) of the wizard permits to define the **job name** and the **job type**. For each job type, the page provides an explanation of what job type refers. .. _figure_jobmanager_job_workflow_1: @@ -62,53 +62,53 @@ and the **job type**. For each job type, the page provides an explanation of wha .. figure:: images/jobmanager_job_workflow_1.png :align: center - **Page 1** + **Create wizard page 1** The second page (see :ref:`figure_jobmanager_job_workflow_2`) permits to add two files, the main job file -(in this example a command file). You could also add an environment file, it is not mandatory. +(in this example a command file). You could also add an environment file that is not mandatory. .. _figure_jobmanager_job_workflow_2: .. figure:: images/jobmanager_job_workflow_2.png :align: center - **Page 2** + **Create wizard page 2** The third page (see :ref:`figure_jobmanager_job_workflow_3`) permits to define the batch parameters related -to your job. You have to define the *Remote work directory*. It's the directory where the job will be executed. -Input files defined in page 4 will be copied in this directory. You have define how many times the job will be running, -the amount of memory needed and the number of processors/cores the job need. +to the job. It's mandatory to define the *Remote work directory* that is the directory where the job will be executed. +Input files defined in page 4 will be copied in this directory. You also have to define the maximum duration, +the amount of memory needed and the number of processors/cores of the job. .. _figure_jobmanager_job_workflow_3: .. figure:: images/jobmanager_job_workflow_3.png :align: center - **Page 3** + **Create wizard page 3** The fourth page (see :ref:`figure_jobmanager_job_workflow_4`) permits to add the input and output files. Input files are files located into the user -computer that have to be transfered into the execution resource. In this page, you also the result -directory where job results and logs will be copied. +computer that have to be transferred into the execution resource. In this page, you could also define the result +directory where job's results and logs will be copied. .. _figure_jobmanager_job_workflow_4: .. figure:: images/jobmanager_job_workflow_4.png :align: center - **Page 4** + **Create wizard page 4** -The fith page (see :ref:`figure_jobmanager_job_workflow_5`) permits to choose the resource in which the job -will executed. You can also define the batch queue that you want to use. +The fifth page (see :ref:`figure_jobmanager_job_workflow_5`) permits to choose the resource where the job +will be executed. You can also define the batch queue that you want to use. .. _figure_jobmanager_job_workflow_5: .. figure:: images/jobmanager_job_workflow_5.png :align: center - **Page 5** + **Create wizard page 5** -The last page (see :ref:`figure_jobmanager_job_workflow_6`) finalize the job creation. You could choose +The last page (see :ref:`figure_jobmanager_job_workflow_6`) finalizes the job creation. You could choose if you want or not start the job at the end of the wizard. .. _figure_jobmanager_job_workflow_6: @@ -116,14 +116,14 @@ if you want or not start the job at the end of the wizard. .. figure:: images/jobmanager_job_workflow_6.png :align: center - **Page 6** + **Create wizard page 6** Loading and saving job list --------------------------- -The JOBMANAGER permits to save and load a list of jobs. +The JOBMANAGER permits to save and load the job list. For some jobs, this feature permits to follow the execution of a job -into different SALOME session. +into a different SALOME session by loading the list. The figure :ref:`figure_jobmanager_load_save_buttons` shows where are located the load and save buttons in the JOBMANAGER GUI. @@ -133,10 +133,10 @@ the load and save buttons in the JOBMANAGER GUI. .. figure:: images/jobmanager_load_save_buttons.png :align: center - **Location of load and save job list** + **Location of load and save job list buttons** -All jobs cannot be followed between to SALOME session. Indeed, It's the *batch* type +All jobs cannot be followed between two SALOME sessions. Indeed, It's the *batch* type of the resource that allows to know if you can or not follow a job. Currently, resources that use **ssh** for batch configuration cannot be followed. In this case, when the JOBMANAGER -load the job, it will set this kind job in the **Error** state. +load the job, it will set this kind of job in the **Error** state. diff --git a/doc/resource.rst b/doc/resource.rst index 60ea567..fda6422 100644 --- a/doc/resource.rst +++ b/doc/resource.rst @@ -7,22 +7,22 @@ Later in the chapter, we use resource for SALOME resource. Definition of a SALOME resource +++++++++++++++++++++++++++++++ -A resource is the SALOME abstraction for managing computers. -A resource contains three different kinds of informations: +A resource is the SALOME abstraction for computer. +A resource contains three different kinds of information: - A name. - A physical description of the computer. -- A description of SALOME installation in this computer. +- A description of a SALOME installation in the computer. -A resource's name could be different from the computer name since you could -have different SALOME installation in the computer. +A resource's name could be different from the computer name since +different SALOME installation could coexist in the computer. Physical description of the computer ------------------------------------ A resource contains a physical description of the computer. -These informations are used by the resource manager (service provided -by the KERNEL) to choose and to user a resource when a container (in YACS) +These information are used by the resource manager (service provided +by the KERNEL) to choose and use a resource when a container (in YACS) or a job (in JOBMANAGER) has to be launched. The description of each attribute is given in the table below. @@ -33,16 +33,16 @@ The description of each attribute is given in the table below. **Attribute** **Mandatory** **Description** ========================== ================ ============================================================= **hostname** Yes It's the network name of the computer. If the computer is a - cluster, you have to give the frontal computer name. + cluster, you have to give the frontal node name. **protocol** Yes Network protocol to use for creating connections (ssh or rsh). **username** Yes User name to use for creating connections. **batch** Yes Type of batch system installed in the resource. Use *ssh* if the resource is a single computer. -**iprotocol** Yes Internal protocol to use on a cluster. +**iprotocol** Yes Internal protocol to use on a cluster (ssh or rsh). **mpiImpl** No MPI implementation to use. -**OS** No It's the operating system name, ex: Linux, Windows, Debian. +**OS** No It's the operating system name, e.g.: Linux, Windows. **nb_node** No It's the amount of node of the computer. **nb_proc_per_node** No It's the amount or processor or core of your computer. **mem_mb** No It's the amount of memory in megabytes per node. @@ -55,7 +55,7 @@ SALOME installation description A resource could contain a SALOME installation description. The description of each attribute is given in the table below. -**Warning:** Attribute **applipath** is *mandatory* with JOBMANAGER SALOME related jobs. +**Warning:** Attribute **applipath** is *mandatory* with JOBMANAGER SALOME related type of job. ========================== ============================================================= **Attribute** **Description** @@ -66,15 +66,15 @@ The description of each attribute is given in the table below. application. ========================== ============================================================= -Where is the resource file ? ----------------------------- +Where is the resource file? +--------------------------- Resources are located into a XML resource file. SALOME tries to find this file in three different locations: 1. If **USER_CATALOG_RESOURCES_FILE** env file is defined, SALOME uses this file. -2. If not in the SALOME application directory: $APPLIPATH/CatalogResources.xml. -3. If not in the directory of the installation of SALOME KERNEL: +2. If not, in the SALOME application directory: $APPLIPATH/CatalogResources.xml. +3. If not, in the directory of the installation of SALOME KERNEL: $KERNEL_ROOT_DIR/share/salome/resources/kernel/CatalogResources.xml. By default, the resource manager creates a resource with the name and the hostname of the user computer. @@ -82,9 +82,9 @@ By default, the resource manager creates a resource with the name and the hostna JOBMANAGER resource management GUI ++++++++++++++++++++++++++++++++++ -The JOBMANAGER provides a panel to manage the resources. This panel is showned in the +The JOBMANAGER provides a panel to manage user's resources. This panel is shown in the figure :ref:`figure_jobmanager_resource_1`. The panel provides some buttons and a list -that shows the aviable resources. You can select one resource to enable buttons. +that shows the available resources. You have to select one resource to enable some buttons. .. _figure_jobmanager_resource_1: @@ -108,7 +108,7 @@ The description of each button is given in the table below. ========================== ============================================================= The figure :ref:`figure_jobmanager_resource_2` shows the panel of a resource. This panel -shows all the informations of a resource. +shows all the information of a resource. .. _figure_jobmanager_resource_2: @@ -136,15 +136,15 @@ To launch a **command** job you need to fill the following attributes: - **username** - **batch** = *ssh* -**Warning:** You have configure your ssh for allowing ssh commands without asking -interactives password (RSA or DSA keys). +**Warning:** You have to configure ssh for allowing ssh commands without asking +interactive password (RSA or DSA keys). -To launch a **SALOME** command job you also need to fill the following attributes: +To launch a **SALOME** type of job, you also need to fill the following attributes: - **applipath** -Using a cluster managed by batch system ---------------------------------------- +Using a cluster managed by a batch system +----------------------------------------- In this scenario, you need to launch a job into a cluster managed by a batch system. @@ -157,8 +157,8 @@ To launch a **command** job you need to fill the following attributes: - **iprotocol** - **nb_proc_per_node** -**Warning:** You have configure your ssh for allowing ssh commands without asking -interactives password (RSA or DSA keys) between your computer and the cluster and between +**Warning:** You have to configure ssh for allowing ssh commands without asking +interactive password (RSA or DSA keys) between your computer and the cluster and between the cluster's nodes. To launch a **SALOME** command job you also need to fill the following attributes: