8.8 如何创建一个计算使用的工作提交脚本?

为了创建作业提交脚本,我们希望用户了解一些基本的python编程知识和你的作业提交系统。

有两种工作脚本提交模式:本地提交或远程提交,这取决于你是 将ab initio计算提交到你运行USPEX和MATLAB的本地机器还是远程超级计算机。

步骤1:配置Submission/文件夹中的文件

案例一:本地提交。

请在INPUT.txt文件中编辑以下条目:

1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)

然后,需要在你的本地机器上运行ssh服务器。USPEX会连接它并通过ssh运行ab-initio代码。 然后到Submission/文件夹下,这里你需要编辑submitJob_local.py
checkStatus_local.py这两个文件。

你可以在这些文件中找到详细的说明。在一般情况下,你只需告诉USPEX如何提交工作和检查工作 是否已经完成。

文件submitJob_local.py如下:


from subprocess import check_output
import re
import sys

def submitJob_local(index : int, commnadExecutable : str) -> int:
    """
    This routine is to submit job locally
    One needs to do a little edit based on your own case.

    Step 1: to prepare the job script which is required by your supercomputer
    Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
    Step 3: to get the jobID from the screen message
    :return: job ID
    """

    # Step 1
    myrun_content = ''
    myrun_content += '#!/bin/sh\n'
    myrun_content += '#SBATCH -o out\n'
    myrun_content += '#SBATCH -p cpu\n'
    myrun_content += '#SBATCH -J USPEX-' + str(index) + '\n'
    myrun_content += '#SBATCH -t 06:00:00\n'
    myrun_content += '#SBATCH -N 1\n'
    myrun_content += '#SBATCH -n 8\n'
    # myrun_content += 'cd ${PBS_O_WORKDIR}\n' check this, must have /cephfs 
    # suffix with SBATCH in my case
    myrun_content += 'mpirun vasp_std > log\n'
    with open('myrun', 'w') as fp:
        fp.write(myrun_content)

    # Step 2 # It will output some message on the screen like 
    # '2350873.nano.cfn.bnl.local' 
    output = str(check_output('sbatch myrun', shell = True))
    
    # Step 3
    # Here we parse job ID from the output of previous command
    jobNumber = int(re.findall(r'\d+', output)[0])
    return jobNumber

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', dest='index', type=int)
    parser.add_argument('-c', dest='commnadExecutable', type=str)
    args = parser.parse_args()

    jobNumber=submitJob_local(index=args.index,commnadExecutable= \
     args.commnadExecutable)
    print('CALLBACK ' + str(jobNumber))

文件checkStatus_local.py如下:


import argparse
import glob
import os

from subprocess import check_output

_author_ = 'etikhonov'


def checkStatus_local(jobID : int) -> bool:
    """
    This function is to check if the submitted job is done or not
    One needs to do a little edit based on your own case.
    1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)
    Step1: the command to check job by ID.
    Step2: to find the keywords from screen message to determine if the job is done
    Below is just a sample:
    -------------------------------------------------------------------------------
    Job id                    Name             User            Time Use S Queue
    ------------------------- ---------------- --------------- -------- - -----
    2455453.nano              USPEX            qzhu            02:28:42 R cfn_gen04
    -------------------------------------------------------------------------------
    If the job is still running, it will show as above.

    If there is no key words like 'R/Q Cfn_gen04', it indicates the job is done.
    :param jobID:
    :return: doneOr
    """

    # Step 1
    output = str(check_output('qstat {}'.format(jobID), shell=True))
    # Step 2
    doneOr = True
    if ' R ' in output or ' Q ' in output:
        doneOr = False
    if doneOr:
        for file in glob.glob('USPEX*'):
            os.remove(file)  # to remove the log file
    return doneOr

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-j', dest='jobID', type=int)
    args = parser.parse_args()

    isDone = checkStatus_local(jobID=args.jobID)
    print('CALLBACK ' + str(int(isDone)))

案例二:远程提交。

在文件INPUT.txt编辑:

2       : whichCluster (default 0, 1: local submission; 2: remote submission)

然后,进入Submission/文件夹,修改以下两个文件:
submitJob_remote.py and checkStatus_remote.py

文件submitJob_remote.py如下:


import argparse
import os
import re

from subprocess import check_output


def submitJob_remote(workingDir : str, index : int, commandExecutable : str) -> int:
    """
    This routine is to submit job to remote cluster
    One needs to do a little edit based on your own case.
    Step 1: to prepare the job script which is required by your supercomputer
    Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
    Step 3: to get the jobID from the screen message

    :param workingDir: working directory on remote machine
    :param index: index of the structure.
    :param commandExecutable: command executable for current step of optimization
    :return:
    """

    # Step 1
    # Specify the PATH to put your calculation folder
    Home = '/home/etikhonov' # 'pwd' of your home directory of your remote machine
    Address = 'rurik'  # your target server: ssh alias or username@address
    Path = Home + '/' + workingDir + '/CalcFold' + str(index) # Just keep it
    run_content = ''
    run_content += '#!/bin/sh\n'
    run_content += '#SBATCH -o out\n'
    run_content += '#SBATCH -p cpu\n'
    run_content += '#SBATCH -J USPEX-' + str(index) + '\n'
    run_content += '#SBATCH -t 06:00:00\n'
    run_content += '#SBATCH -N 1\n'
    run_content += '#SBATCH -n 8\n'
    run_content += 'cd /cephfs'+ Path + '\n'
    run_content += commandExecutable + '\n'

    with open('myrun', 'w') as fp:
        fp.write(run_content)

    # Create the remote directory
    # Please change the ssh/scp command if necessary.
    try:
        os.system('ssh -i ~/.ssh/id_rsa ' + Address + ' mkdir -p ' + Path)
    except:
        pass

    # Copy calculation files
    # add private key -i ~/.ssh/id_rsa if necessary
    os.system('scp POSCAR   ' + Address + ':' + Path)
    os.system('scp INCAR    ' + Address + ':' + Path)
    os.system('scp POTCAR   ' + Address + ':' + Path)
    os.system('scp KPOINTS  ' + Address + ':' + Path)
    os.system('scp myrun ' + Address + ':' + Path)

    # Step 2
    # Run command
    output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + ' qsub ' \
    + Path + '/myrun', shell=True))

    # Step 3
    # Here we parse job ID from the output of previous command
    jobNumber = int(re.findall(r'\d+', output)[0])
    return jobNumber


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', dest='index', type=int)
    parser.add_argument('-c', dest='commnadExecutable', type=str)
    parser.add_argument('-f', dest='workingDir', type=str)
    args = parser.parse_args()

    jobNumber = submitJob_remote(workingDir=args.workingDir, index=args.index, \
    commnadExecutable=args.commnadExecutable)
    print('CALLBACK ' + str(jobNumber))


文件checkStatus_remote.py如下:


import argparse
import os

from subprocess import check_output

def checkStatus_remote(jobID : int, workingDir : str, index : int) -> bool:
    """
    This routine is to check if the submitted job is done or not
    One needs to do a little edit based on your own case.
    Step1: Specify the PATH to put your calculation folder
    Step2: Check JobID, the exact command to check job by jobID
    :param jobID:
    :param index:
    :param workingDir:
    :return:
    """
    # Step 1
    Home = '/home/etikhonov'  # 'pwd' of your home directory of your remote machine
    Address = 'rurik'  # Your target supercomputer: username@address or ssh alias
    # example of address: user@somedomain.edu -p 2222
    Path = Home + '/' + workingDir + '/CalcFold' + str(index)  # just keep it

    # Step 2
    output = str(check_output('ssh ' + Address + ' qstat ' + str(jobID), shell=True))
    # If you using full adress without ssh alias, 
    # you must provide valid ssh private key like there:
    # output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + \
    #  ' /usr/bin/qstat ' + str(jobID), shell=True))

    if not ' R ' in output or not ' Q ' in output:
        doneOr = True
        # [nothing, nothing] = unix(['scp -i ~/.ssh/id_rsa ' Address ':' \
        # Path '/OUTCAR ./']) %OUTCAR is not necessary by default
        # For reading enthalpy/energy
        os.system('scp ' + Address + ':' + Path + '/OSZICAR ./')  
        # For reading structural info
        os.system('scp ' + Address + ':' + Path + '/CONTCAR ./')  
        # Edit ssh command as above!
    else:
        doneOr = False
    return doneOr


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-j', dest='jobID', type=int)
    parser.add_argument('-i', dest='index', type=int)
    parser.add_argument('-f', dest='workingDir', type=str)
    args = parser.parse_args()

    isDone = checkStatus_remote(jobID=args.jobID, \
       workingDir=args.workingDir, index=args.index)
    print('CALLBACK ' + str(int(isDone)))