如何创建一个计算使用的工作提交脚本?
为了创建作业提交脚本,我们希望用户了解一些基本的python编程知识和你的作业提交系统。
有两种工作脚本提交模式:本地提交或远程提交,这取决于你是 将ab initio计算提交到你运行USPEX和MATLAB的本地机器还是远程超级计算机。
步骤1:配置Submission/文件夹中的文件
案例一:本地提交。
请在INPUT.txt文件中编辑以下条目:
1 : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)
然后,需要在你的本地机器上运行ssh服务器。USPEX会连接它并通过ssh运行ab-initio代码。 然后到Submission/文件夹下,这里你需要编辑submitJob_local.py和
checkStatus_local.py这两个文件。
你可以在这些文件中找到详细的说明。在一般情况下,你只需告诉USPEX如何提交工作和检查工作 是否已经完成。
文件submitJob_local.py如下:
from subprocess import check_output
import re
import sys
def submitJob_local(index : int, commnadExecutable : str) -> int:
"""
This routine is to submit job locally
One needs to do a little edit based on your own case.
Step 1: to prepare the job script which is required by your supercomputer
Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
Step 3: to get the jobID from the screen message
:return: job ID
"""
# Step 1
myrun_content = ''
myrun_content += '#!/bin/sh\n'
myrun_content += '#SBATCH -o out\n'
myrun_content += '#SBATCH -p cpu\n'
myrun_content += '#SBATCH -J USPEX-' + str(index) + '\n'
myrun_content += '#SBATCH -t 06:00:00\n'
myrun_content += '#SBATCH -N 1\n'
myrun_content += '#SBATCH -n 8\n'
# myrun_content += 'cd ${PBS_O_WORKDIR}\n' check this, must have /cephfs
# suffix with SBATCH in my case
myrun_content += 'mpirun vasp_std > log\n'
with open('myrun', 'w') as fp:
fp.write(myrun_content)
# Step 2 # It will output some message on the screen like
# '2350873.nano.cfn.bnl.local'
output = str(check_output('sbatch myrun', shell = True))
# Step 3
# Here we parse job ID from the output of previous command
jobNumber = int(re.findall(r'\d+', output)[0])
return jobNumber
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-i', dest='index', type=int)
parser.add_argument('-c', dest='commnadExecutable', type=str)
args = parser.parse_args()
jobNumber=submitJob_local(index=args.index,commnadExecutable= \
args.commnadExecutable)
print('CALLBACK ' + str(jobNumber))
文件checkStatus_local.py如下:
import argparse
import glob
import os
from subprocess import check_output
_author_ = 'etikhonov'
def checkStatus_local(jobID : int) -> bool:
"""
This function is to check if the submitted job is done or not
One needs to do a little edit based on your own case.
1 : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)
Step1: the command to check job by ID.
Step2: to find the keywords from screen message to determine if the job is done
Below is just a sample:
-------------------------------------------------------------------------------
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
2455453.nano USPEX qzhu 02:28:42 R cfn_gen04
-------------------------------------------------------------------------------
If the job is still running, it will show as above.
If there is no key words like 'R/Q Cfn_gen04', it indicates the job is done.
:param jobID:
:return: doneOr
"""
# Step 1
output = str(check_output('qstat {}'.format(jobID), shell=True))
# Step 2
doneOr = True
if ' R ' in output or ' Q ' in output:
doneOr = False
if doneOr:
for file in glob.glob('USPEX*'):
os.remove(file) # to remove the log file
return doneOr
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-j', dest='jobID', type=int)
args = parser.parse_args()
isDone = checkStatus_local(jobID=args.jobID)
print('CALLBACK ' + str(int(isDone)))
案例二:远程提交。
在文件INPUT.txt编辑:
2 : whichCluster (default 0, 1: local submission; 2: remote submission)
然后,进入Submission/文件夹,修改以下两个文件:
submitJob_remote.py and checkStatus_remote.py
文件submitJob_remote.py如下:
import argparse
import os
import re
from subprocess import check_output
def submitJob_remote(workingDir : str, index : int, commandExecutable : str) -> int:
"""
This routine is to submit job to remote cluster
One needs to do a little edit based on your own case.
Step 1: to prepare the job script which is required by your supercomputer
Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
Step 3: to get the jobID from the screen message
:param workingDir: working directory on remote machine
:param index: index of the structure.
:param commandExecutable: command executable for current step of optimization
:return:
"""
# Step 1
# Specify the PATH to put your calculation folder
Home = '/home/etikhonov' # 'pwd' of your home directory of your remote machine
Address = 'rurik' # your target server: ssh alias or username@address
Path = Home + '/' + workingDir + '/CalcFold' + str(index) # Just keep it
run_content = ''
run_content += '#!/bin/sh\n'
run_content += '#SBATCH -o out\n'
run_content += '#SBATCH -p cpu\n'
run_content += '#SBATCH -J USPEX-' + str(index) + '\n'
run_content += '#SBATCH -t 06:00:00\n'
run_content += '#SBATCH -N 1\n'
run_content += '#SBATCH -n 8\n'
run_content += 'cd /cephfs'+ Path + '\n'
run_content += commandExecutable + '\n'
with open('myrun', 'w') as fp:
fp.write(run_content)
# Create the remote directory
# Please change the ssh/scp command if necessary.
try:
os.system('ssh -i ~/.ssh/id_rsa ' + Address + ' mkdir -p ' + Path)
except:
pass
# Copy calculation files
# add private key -i ~/.ssh/id_rsa if necessary
os.system('scp POSCAR ' + Address + ':' + Path)
os.system('scp INCAR ' + Address + ':' + Path)
os.system('scp POTCAR ' + Address + ':' + Path)
os.system('scp KPOINTS ' + Address + ':' + Path)
os.system('scp myrun ' + Address + ':' + Path)
# Step 2
# Run command
output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + ' qsub ' \
+ Path + '/myrun', shell=True))
# Step 3
# Here we parse job ID from the output of previous command
jobNumber = int(re.findall(r'\d+', output)[0])
return jobNumber
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-i', dest='index', type=int)
parser.add_argument('-c', dest='commnadExecutable', type=str)
parser.add_argument('-f', dest='workingDir', type=str)
args = parser.parse_args()
jobNumber = submitJob_remote(workingDir=args.workingDir, index=args.index, \
commnadExecutable=args.commnadExecutable)
print('CALLBACK ' + str(jobNumber))
文件checkStatus_remote.py如下:
import argparse
import os
from subprocess import check_output
def checkStatus_remote(jobID : int, workingDir : str, index : int) -> bool:
"""
This routine is to check if the submitted job is done or not
One needs to do a little edit based on your own case.
Step1: Specify the PATH to put your calculation folder
Step2: Check JobID, the exact command to check job by jobID
:param jobID:
:param index:
:param workingDir:
:return:
"""
# Step 1
Home = '/home/etikhonov' # 'pwd' of your home directory of your remote machine
Address = 'rurik' # Your target supercomputer: username@address or ssh alias
# example of address: user@somedomain.edu -p 2222
Path = Home + '/' + workingDir + '/CalcFold' + str(index) # just keep it
# Step 2
output = str(check_output('ssh ' + Address + ' qstat ' + str(jobID), shell=True))
# If you using full adress without ssh alias,
# you must provide valid ssh private key like there:
# output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + \
# ' /usr/bin/qstat ' + str(jobID), shell=True))
if not ' R ' in output or not ' Q ' in output:
doneOr = True
# [nothing, nothing] = unix(['scp -i ~/.ssh/id_rsa ' Address ':' \
# Path '/OUTCAR ./']) %OUTCAR is not necessary by default
# For reading enthalpy/energy
os.system('scp ' + Address + ':' + Path + '/OSZICAR ./')
# For reading structural info
os.system('scp ' + Address + ':' + Path + '/CONTCAR ./')
# Edit ssh command as above!
else:
doneOr = False
return doneOr
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-j', dest='jobID', type=int)
parser.add_argument('-i', dest='index', type=int)
parser.add_argument('-f', dest='workingDir', type=str)
args = parser.parse_args()
isDone = checkStatus_remote(jobID=args.jobID, \
workingDir=args.workingDir, index=args.index)
print('CALLBACK ' + str(int(isDone)))