如何创建一个计算使用的工作提交脚本?
为了创建作业提交脚本,我们希望用户了解一些基本的python编程知识和你的作业提交系统。
有两种工作脚本提交模式:本地提交或远程提交,这取决于你是 将ab initio计算提交到你运行USPEX和MATLAB的本地机器还是远程超级计算机。
步骤1:配置Submission/文件夹中的文件
案例一:本地提交。
请在INPUT.txt文件中编辑以下条目:
1 : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)
然后,需要在你的本地机器上运行ssh服务器。USPEX会连接它并通过ssh运行ab-initio代码。 然后到Submission/文件夹下,这里你需要编辑submitJob_local.py和
checkStatus_local.py这两个文件。
你可以在这些文件中找到详细的说明。在一般情况下,你只需告诉USPEX如何提交工作和检查工作 是否已经完成。
文件submitJob_local.py如下:
from subprocess import check_output import re import sys def submitJob_local(index : int, commnadExecutable : str) -> int: """ This routine is to submit job locally One needs to do a little edit based on your own case. Step 1: to prepare the job script which is required by your supercomputer Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc. Step 3: to get the jobID from the screen message :return: job ID """ # Step 1 myrun_content = '' myrun_content += '#!/bin/sh\n' myrun_content += '#SBATCH -o out\n' myrun_content += '#SBATCH -p cpu\n' myrun_content += '#SBATCH -J USPEX-' + str(index) + '\n' myrun_content += '#SBATCH -t 06:00:00\n' myrun_content += '#SBATCH -N 1\n' myrun_content += '#SBATCH -n 8\n' # myrun_content += 'cd ${PBS_O_WORKDIR}\n' check this, must have /cephfs # suffix with SBATCH in my case myrun_content += 'mpirun vasp_std > log\n' with open('myrun', 'w') as fp: fp.write(myrun_content) # Step 2 # It will output some message on the screen like # '2350873.nano.cfn.bnl.local' output = str(check_output('sbatch myrun', shell = True)) # Step 3 # Here we parse job ID from the output of previous command jobNumber = int(re.findall(r'\d+', output)[0]) return jobNumber if __name__ == '__main__': import argparse parser = argparse.ArgumentParser() parser.add_argument('-i', dest='index', type=int) parser.add_argument('-c', dest='commnadExecutable', type=str) args = parser.parse_args() jobNumber=submitJob_local(index=args.index,commnadExecutable= \ args.commnadExecutable) print('CALLBACK ' + str(jobNumber))
文件checkStatus_local.py如下:
import argparse import glob import os from subprocess import check_output _author_ = 'etikhonov' def checkStatus_local(jobID : int) -> bool: """ This function is to check if the submitted job is done or not One needs to do a little edit based on your own case. 1 : whichCluster (0: no-job-script, 1: local submission, 2: remote submission) Step1: the command to check job by ID. Step2: to find the keywords from screen message to determine if the job is done Below is just a sample: ------------------------------------------------------------------------------- Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 2455453.nano USPEX qzhu 02:28:42 R cfn_gen04 ------------------------------------------------------------------------------- If the job is still running, it will show as above. If there is no key words like 'R/Q Cfn_gen04', it indicates the job is done. :param jobID: :return: doneOr """ # Step 1 output = str(check_output('qstat {}'.format(jobID), shell=True)) # Step 2 doneOr = True if ' R ' in output or ' Q ' in output: doneOr = False if doneOr: for file in glob.glob('USPEX*'): os.remove(file) # to remove the log file return doneOr if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-j', dest='jobID', type=int) args = parser.parse_args() isDone = checkStatus_local(jobID=args.jobID) print('CALLBACK ' + str(int(isDone)))
案例二:远程提交。
在文件INPUT.txt编辑:
2 : whichCluster (default 0, 1: local submission; 2: remote submission)
然后,进入Submission/文件夹,修改以下两个文件:
submitJob_remote.py and checkStatus_remote.py
文件submitJob_remote.py如下:
import argparse import os import re from subprocess import check_output def submitJob_remote(workingDir : str, index : int, commandExecutable : str) -> int: """ This routine is to submit job to remote cluster One needs to do a little edit based on your own case. Step 1: to prepare the job script which is required by your supercomputer Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc. Step 3: to get the jobID from the screen message :param workingDir: working directory on remote machine :param index: index of the structure. :param commandExecutable: command executable for current step of optimization :return: """ # Step 1 # Specify the PATH to put your calculation folder Home = '/home/etikhonov' # 'pwd' of your home directory of your remote machine Address = 'rurik' # your target server: ssh alias or username@address Path = Home + '/' + workingDir + '/CalcFold' + str(index) # Just keep it run_content = '' run_content += '#!/bin/sh\n' run_content += '#SBATCH -o out\n' run_content += '#SBATCH -p cpu\n' run_content += '#SBATCH -J USPEX-' + str(index) + '\n' run_content += '#SBATCH -t 06:00:00\n' run_content += '#SBATCH -N 1\n' run_content += '#SBATCH -n 8\n' run_content += 'cd /cephfs'+ Path + '\n' run_content += commandExecutable + '\n' with open('myrun', 'w') as fp: fp.write(run_content) # Create the remote directory # Please change the ssh/scp command if necessary. try: os.system('ssh -i ~/.ssh/id_rsa ' + Address + ' mkdir -p ' + Path) except: pass # Copy calculation files # add private key -i ~/.ssh/id_rsa if necessary os.system('scp POSCAR ' + Address + ':' + Path) os.system('scp INCAR ' + Address + ':' + Path) os.system('scp POTCAR ' + Address + ':' + Path) os.system('scp KPOINTS ' + Address + ':' + Path) os.system('scp myrun ' + Address + ':' + Path) # Step 2 # Run command output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + ' qsub ' \ + Path + '/myrun', shell=True)) # Step 3 # Here we parse job ID from the output of previous command jobNumber = int(re.findall(r'\d+', output)[0]) return jobNumber if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-i', dest='index', type=int) parser.add_argument('-c', dest='commnadExecutable', type=str) parser.add_argument('-f', dest='workingDir', type=str) args = parser.parse_args() jobNumber = submitJob_remote(workingDir=args.workingDir, index=args.index, \ commnadExecutable=args.commnadExecutable) print('CALLBACK ' + str(jobNumber))
文件checkStatus_remote.py如下:
import argparse import os from subprocess import check_output def checkStatus_remote(jobID : int, workingDir : str, index : int) -> bool: """ This routine is to check if the submitted job is done or not One needs to do a little edit based on your own case. Step1: Specify the PATH to put your calculation folder Step2: Check JobID, the exact command to check job by jobID :param jobID: :param index: :param workingDir: :return: """ # Step 1 Home = '/home/etikhonov' # 'pwd' of your home directory of your remote machine Address = 'rurik' # Your target supercomputer: username@address or ssh alias # example of address: user@somedomain.edu -p 2222 Path = Home + '/' + workingDir + '/CalcFold' + str(index) # just keep it # Step 2 output = str(check_output('ssh ' + Address + ' qstat ' + str(jobID), shell=True)) # If you using full adress without ssh alias, # you must provide valid ssh private key like there: # output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + \ # ' /usr/bin/qstat ' + str(jobID), shell=True)) if not ' R ' in output or not ' Q ' in output: doneOr = True # [nothing, nothing] = unix(['scp -i ~/.ssh/id_rsa ' Address ':' \ # Path '/OUTCAR ./']) %OUTCAR is not necessary by default # For reading enthalpy/energy os.system('scp ' + Address + ':' + Path + '/OSZICAR ./') # For reading structural info os.system('scp ' + Address + ':' + Path + '/CONTCAR ./') # Edit ssh command as above! else: doneOr = False return doneOr if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-j', dest='jobID', type=int) parser.add_argument('-i', dest='index', type=int) parser.add_argument('-f', dest='workingDir', type=str) args = parser.parse_args() isDone = checkStatus_remote(jobID=args.jobID, \ workingDir=args.workingDir, index=args.index) print('CALLBACK ' + str(int(isDone)))