Environment Configuration
TurboWorkflows settings are described in configuration files under .turbofilemanager_config in the home directory.
Configuration File Location and Structure
Configuration files are managed in the following directory structure.
~/.turbofilemanager_config/
├── machine_data.yaml # Server machine settings
├── localhost/ # Settings for localhost
│ ├── package.yaml
│ ├── queue_data.toml
│ ├── submit_mpi.sh
│ └── submit_nompi.sh
├── remotesrv/ # Settings for remotesrv (example)
│ ├── package.yaml
│ ├── queue_data.toml
│ ├── submit_mpi.sh
│ └── submit_nompi.sh
└── ...
machine_data.yaml
Describes settings for server machines. The format is YAML. It is a dictionary format with server names as keys, and accommodates settings for multiple servers.
Directory for each server
Create a directory for each server (localhost, remotesrv, etc.) and place the following configuration files in it.
package.yaml
Describes settings for execution modules.
queue_data.toml
Describes settings for job schedulers in TOML format. Distinguished by queue labels, it describes job queue names, number of nodes, parallelism, etc.
submit_mpi.sh, submit_nompi.sh
Script templates for executing programs. TurboWorkflows creates job scripts by replacing keywords based on these templates and submits jobs. Prepare submit_mpi.sh for MPI jobs and submit_nompi.sh for serial jobs.
For details on configuration, refer to Configuration Files. Some typical configuration examples are shown below.
Server Machine Settings (machine_data.yaml)
The machine_data.yaml file describes settings for localhost where workflows are executed, and remote servers as needed.
Case 1: Local Workstation
Manage workflows and execute computations on a workstation. This is the configuration for directly executing programs without a job scheduler. Job execution is performed through a shell, and execution status is obtained by monitoring processes.
localhost:
machine_type: local
queuing : false
computation: true
file_manager_root: /mnt/data/workflow
jobsubmit: bash
jobcheck: ps
jobnum_index: 1
Case 2: Frontend (Login) Node of Supercomputer systems
Install TurboWorkflows on a supercomputer and manage workflows on the frontend node. Computations are executed as batch jobs.
localhost:
machine_type: local
queuing: true
computation: true
jobsubmit: /opt/slurm/bin/sbatch
jobcheck: /opt/slurm/bin/squeue
jobdel: /opt/slurm/bin/scancel
jobnum_index: 0
machine_type is local, and jobs are submitted from the same host where workflows are executed. Since this is a system for executing computations, set computation to true.
jobsubmit is the job submission command, jobcheck is the command to get job execution status, and jobdel is the command to delete jobs. Please rewrite appropriately according to the job scheduler system and system settings.
For Slurm, usually specify as follows.
jobsubmit: sbatch jobcheck: squeue jobdel: scancel
For PBS, usually specify as follows
jobsubmit: qsub jobcheck: qstat -u username # replace username with your user name jobdel: qdel
jobnum_index is an integer value that specifies which column (0-based) the JOBID displayed at submitting a job corresponds to when it is split by whitespaces.
For Slurm, it usually looks like this.
$ sbatch job.sh Submitted batch job 42
JOBID (“42” in this case) is in the 3rd column (0-based), so specify 3 for jobnum_index.
For PBS, it usually looks like this.
$ qsub job.sh 42.server-pbs
JOBID (“42.server-pbs” in this case) is in the first column, so specify 0 for jobnum_index.
Note
Commands and displays may vary depending on the system. Please refer to the system’s user manual.
Case 3: Remote Server
Execute workflows on a local workstation and execute computations on a remote server or supercomputer. This is the configuration for cases where a job scheduler is running on the remote system and computations are executed as batch jobs.
remotesrv:
machine_type: remote
queuing : true
computation: true
file_manager_root: /work/xxxx/xxxx/xxxx
jobsubmit: /opt/pbs/bin/qsub
jobcheck: /opt/pbs/bin/qstat
jobdel: /opt/pbs/bin/qdel
jobnum_index: 0
Create an entry with the server name (remotesrv here) as the key.
machine_type is remote, queuing specifies whether a job scheduler is used, and computation specifies whether it is used for computation execution.
jobsubmit, jobcheck, jobdel, and jobnum_index are similar to the supercomputer frontend settings described above. Please specify execution commands and paths on the remote server.
file_manager_root specifies the directory that serves as the starting point on the server side when transferring files. See below for the details.
Please check the SSH settings below.
Case 4: File Server
This is a configuration example for a remote server used as a file server and not used for computation.
filesrv:
machine_type: remote
queuing : false
computation: false
file_manager_root: /mnt/xxxx/xxxx
Set machine_type to remote and computation to false. Specify file_manager_root as appropriate.
SSH Configuration Details
Connection to remote server is made via SSH. Usually, SSH settings are described in ~/.ssh/config. For details, please refer to SSH documentation, etc. A configuration example is shown below.
Host remotesrv
HostName remotesrv.example.com
User myname
IdentityFile ~/.ssh/remotesrv/id_rsa
Create an entry with the server name (remotesrv) in the Host line.
HostName specifies the actual server hostname or IP address, and User specifies the account.
IdentityFile specifies the private key file. This can be omitted when using ssh-agent-forwarding, etc.
SSH Connection Test
To verify that SSH settings are working correctly, perform a manual connection test.
ssh remotesrv
If the connection succeeds, the SSH settings are working correctly.
Connection via Proxy
If you need to connect to a remote machine via a proxy server, add ProxyCommand to ~/.ssh/config.
Host remotesrv
HostName remotesrv.example.com
User myname
ProxyCommand ssh -W %h:%p proxy_host
IdentityFile ~/.ssh/id_rsa
Required SSH Settings
Host: Must match the server name specified in machine_data.yaml.HostName: Specify the actual server hostname or IP address.User: Specify the remote server user name.
About file_manager_root
When using a remote server, file_manager_root must be specified for both the remote server and localhost. When transferring files, file paths are treated as relative paths from file_manager_root. For example,
localhost:
...
file_manager_root: /mnt/data/workflow
...
remotesrv:
...
file_manager_root: /work/myname/workflow_data
...
Suppose this is configured. remotesrv’s /work/myname/workflow_data/results/lrdmc-workflow/pip0_fn.d is treated as a relative path ./results/lrdmc-workflow/pip0_fn.d from root, and is transferred to localhost’s /mnt/data/workflow/results/lrdmc-workflow/pip0_fn.d.
Note
Path relativity: File paths are treated as relative paths from
file_manager_root. Make sure that absolute paths are underfile_manager_root.Symbolic links: Note that symbolic links are resolved.
Directory permissions: Make sure that the destination directory has write permissions.
Set for both local and remote: When using a remote server, you need to set
file_manager_rootfor both localhost and the remote server.
Package Settings (package.yaml)
The package.yaml file configures program packages for each server. It is written in YAML format. It manages execution modules used for turborvb and python. You can separate installation directories by version and select a version at runtime. Also job script templates are specified for job submission. You can use different templates for each package, allowing for cases where specific settings such as loading external modules are required.
A sample package.yaml is shown below.
turborvb:
name: turborvb
binary_path:
stable:
binary_list:
- turborvb-serial.x
- turborvb-mpi.x
- prep-serial.x
- prep-mpi.x
- makefort10.x
- convertfort10mol.x
- convertfort10.x
- readforward-serial.x
- readforward-mpi.x
job_template:
mpi: submit_mpi.sh
nompi: submit_nompi.sh
python:
name: python
binary_path:
stable:
binary_list:
- python3
job_template:
mpi: submit_mpi.sh
nompi: submit_nompi.sh
Version Management for binary_path
You can manage multiple versions in binary_path. Specify the binary path for each version with the version name as the key.
turborvb:
name: turborvb
binary_path:
stable: /opt/turborvb/stable/bin
latest: /opt/turborvb/latest/bin
v1.0: /opt/turborvb/v1.0/bin
binary_list:
- turborvb-serial.x
- turborvb-mpi.x
...
Specify the version to use in workflows with the version parameter (e.g., version="stable").
Path Specification Method
It is recommended to specify absolute paths for binary_path. When using relative paths, be careful as they depend on the current directory at runtime.
Batch Queue Settings (queue_data.toml)
The batch queues are configured for each server. It is written in TOML format. It defines as dictionary format data with queue labels as keys. The queue labels are referred to by the queue_label parameter of the Workflow class instances. The main items are as follows.
mpi
Specifies whether to perform MPI parallelization. If true, execute as an MPI parallel job. Use the mpi version of job_template in package settings. If false, execute as a serial job. Use the nompi version in package settings’ job_template.
max_job_submit
Specifies the maximum number of jobs that can be submitted to the job scheduler. The limit varies by systems.
Adding Custom Variables
In queue_data.toml, you can define arbitrary key-value pairs. These are used as parameters in job templates, and _KEY_ is replaced with the value corresponding to key (case-insensitive).
[default]
mpi=false
max_job_submit=1
# Example of custom variables
num_cores=1
omp_num_threads=1
nodes=1
account="myaccount"
partition="normal"
memory="32GB"
This allows you to embed different node counts, MPI process counts, and thread counts in job scripts for each queue_label.
TOML Format Notes
Care must be taken with TOML data types. For details, please check the TOML specification.
Time: When setting maximum execution time, values must be enclosed in quotes and treated as strings.
max_time="24:00:00" # Correct max_time=24:00:00 # Error (interpreted as time)
Boolean values: Write
trueorfalse(lowercase).yes,no, etc. are treated as strings.Numbers: Both integers and floating point numbers can be used.
Job Script Templates (submit_mpi.sh, submit_nompi.sh)
Prepare job script templates. You can separate templates by package and whether MPI parallelization is used. Write embedded parameters in templates in the _KEY_ format.
Job script formats vary by system. Examples for systems using Slurm and PBS are shown in Configuration File Examples. Some systems may require additional information such as account group specifications.
Using Variables Defined in queue_data.toml
Variables defined in queue_data.toml can also be used in job script templates. Enclose the variable name in uppercase with _ (e.g., _NUM_CORES_, _OMP_NUM_THREADS_).
Variable names are case-insensitive. num_cores and NUM_CORES are treated as the same variable.