8.8 How do I set up a calculation using a job submission script?

To set up a job submission script, we expect users to know some basic knowledge of python programing and your job submission systems.

There are two modes for job submission: local submission or remote submission, depending on whether you submit ab initio calculations to the local machine where you run USPEX, or to a remote supercomputer.

Step 1: Configuring files in Submission/ folder

Case I: Local submission.

Please edit in INPUT.txt file the following tag:

1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)

Then, it is necessary to run ssh server on your local machine. USPEX will connect to it and run ab-initio code via ssh.

Then, go to the directory Submission/, where you need to edit two files: submitJob_local.py and checkStatus_local.py.

One can find the detailed instructions in these files. In general, one just needs to tell USPEX how to submit the job and check if the job has completed or not.

In submitJob_local.py:

<div class="highlight"><pre><span></span>from subprocess import check_output import re import sys def submitJob_local(index : int, commnadExecutable : str) -&gt; int: &quot;&quot;&quot; This routine is to submit job locally One needs to do a little edit based on your own case. Step 1: to prepare the job script which is required by your supercomputer Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc. Step 3: to get the jobID from the screen message :return: job ID &quot;&quot;&quot; # Step 1 myrun_content = &#39;&#39; myrun_content += &#39;#!/bin/sh\n&#39; myrun_content += &#39;#SBATCH -o out\n&#39; myrun_content += &#39;#SBATCH -p cpu\n&#39; myrun_content += &#39;#SBATCH -J USPEX-&#39; + str(index) + &#39;\n&#39; myrun_content += &#39;#SBATCH -t 06:00:00\n&#39; myrun_content += &#39;#SBATCH -N 1\n&#39; myrun_content += &#39;#SBATCH -n 8\n&#39; # myrun_content += &#39;cd ${PBS_O_WORKDIR}\n&#39; check this, must have /cephfs suffix with SBATCH in my case myrun_content += &#39;mpirun vasp_std &gt; log\n&#39; with open(&#39;myrun&#39;, &#39;w&#39;) as fp: fp.write(myrun_content) # Step 2 # It will output some message on the screen like &#39;2350873.nano.cfn.bnl.local&#39; output = str(check_output(&#39;sbatch myrun&#39;, shell=True)) # Step 3 # Here we parse job ID from the output of previous command jobNumber = int(re.findall(r&#39;\d+&#39;, output)[0]) return jobNumber if __name__ == &#39;__main__&#39;: import argparse parser = argparse.ArgumentParser() parser.add_argument(&#39;-i&#39;, dest=&#39;index&#39;, type=int) parser.add_argument(&#39;-c&#39;, dest=&#39;commnadExecutable&#39;, type=str) args = parser.parse_args() jobNumber = submitJob_local(index=args.index, commnadExecutable=args.commnadExecutable) print(&#39;CALLBACK &#39; + str(jobNumber)) </pre></div>

In checkStatus_local.py:

<div class="highlight"><pre><span></span>import argparse import glob import os from subprocess import check_output _author_ = &#39;etikhonov&#39; def checkStatus_local(jobID : int) -&gt; bool: &quot;&quot;&quot; This function is to check if the submitted job is done or not One needs to do a little edit based on your own case. 1 : whichCluster (0: no-job-script, 1: local submission, 2: remote submission) Step1: the command to check job by ID. Step2: to find the keywords from screen message to determine if the job is done Below is just a sample: ------------------------------------------------------------------------------- Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 2455453.nano USPEX qzhu 02:28:42 R cfn_gen04 ------------------------------------------------------------------------------- If the job is still running, it will show as above. If there is no key words like &#39;R/Q Cfn_gen04&#39;, it indicates the job is done. :param jobID: :return: doneOr &quot;&quot;&quot; # Step 1 output = str(check_output(&#39;qstat {}&#39;.format(jobID), shell=True)) # Step 2 doneOr = True if &#39; R &#39; in output or &#39; Q &#39; in output: doneOr = False if doneOr: for file in glob.glob(&#39;USPEX*&#39;): os.remove(file) # to remove the log file return doneOr if __name__ == &#39;__main__&#39;: parser = argparse.ArgumentParser() parser.add_argument(&#39;-j&#39;, dest=&#39;jobID&#39;, type=int) args = parser.parse_args() isDone = checkStatus_local(jobID=args.jobID) print(&#39;CALLBACK &#39; + str(int(isDone))) </pre></div>

Case II: Remote submission.

Please edit in INPUT.txt file the following tag:

2       : whichCluster (default 0, 1: local submission; 2: remote submission)

Finally, go to the directory Submission/, where you need to edit two files:
submitJob_remote.py and checkStatus_remote.py

In submitJob_remote.py:

<div class="highlight"><pre><span></span>import argparse import os import re from subprocess import check_output def submitJob_remote(workingDir : str, index : int, commandExecutable : str) -&gt; int: &quot;&quot;&quot; This routine is to submit job to remote cluster One needs to do a little edit based on your own case. Step 1: to prepare the job script which is required by your supercomputer Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc. Step 3: to get the jobID from the screen message :param workingDir: working directory on remote machine :param index: index of the structure. :param commandExecutable: command executable for current step of optimization :return: &quot;&quot;&quot; # Step 1 # Specify the PATH to put your calculation folder Home = &#39;/home/etikhonov&#39; # &#39;pwd&#39; of your home directory of your remote machine Address = &#39;rurik&#39; # your target server: ssh alias or username@address Path = Home + &#39;/&#39; + workingDir + &#39;/CalcFold&#39; + str(index) # Just keep it run_content = &#39;&#39; run_content += &#39;#!/bin/sh\n&#39; run_content += &#39;#SBATCH -o out\n&#39; run_content += &#39;#SBATCH -p cpu\n&#39; run_content += &#39;#SBATCH -J USPEX-&#39; + str(index) + &#39;\n&#39; run_content += &#39;#SBATCH -t 06:00:00\n&#39; run_content += &#39;#SBATCH -N 1\n&#39; run_content += &#39;#SBATCH -n 8\n&#39; run_content += &#39;cd /cephfs&#39;+ Path + &#39;\n&#39; run_content += commandExecutable + &#39;\n&#39; with open(&#39;myrun&#39;, &#39;w&#39;) as fp: fp.write(run_content) # Create the remote directory # Please change the ssh/scp command if necessary. try: os.system(&#39;ssh -i ~/.ssh/id_rsa &#39; + Address + &#39; mkdir -p &#39; + Path) except: pass # Copy calculation files # add private key -i ~/.ssh/id_rsa if necessary os.system(&#39;scp POSCAR &#39; + Address + &#39;:&#39; + Path) os.system(&#39;scp INCAR &#39; + Address + &#39;:&#39; + Path) os.system(&#39;scp POTCAR &#39; + Address + &#39;:&#39; + Path) os.system(&#39;scp KPOINTS &#39; + Address + &#39;:&#39; + Path) os.system(&#39;scp myrun &#39; + Address + &#39;:&#39; + Path) # Step 2 # Run command output = str(check_output(&#39;ssh -i ~/.ssh/id_rsa &#39; + Address + &#39; qsub &#39; + Path + &#39;/myrun&#39;, shell=True)) # Step 3 # Here we parse job ID from the output of previous command jobNumber = int(re.findall(r&#39;\d+&#39;, output)[0]) return jobNumber if __name__ == &#39;__main__&#39;: parser = argparse.ArgumentParser() parser.add_argument(&#39;-i&#39;, dest=&#39;index&#39;, type=int) parser.add_argument(&#39;-c&#39;, dest=&#39;commnadExecutable&#39;, type=str) parser.add_argument(&#39;-f&#39;, dest=&#39;workingDir&#39;, type=str) args = parser.parse_args() jobNumber = submitJob_remote(workingDir=args.workingDir, index=args.index, commnadExecutable=args.commnadExecutable) print(&#39;CALLBACK &#39; + str(jobNumber)) </pre></div>


In checkStatus_remote.py:

<div class="highlight"><pre><span></span>import argparse import os from subprocess import check_output def checkStatus_remote(jobID : int, workingDir : str, index : int) -&gt; bool: &quot;&quot;&quot; This routine is to check if the submitted job is done or not One needs to do a little edit based on your own case. Step1: Specify the PATH to put your calculation folder Step2: Check JobID, the exact command to check job by jobID :param jobID: :param index: :param workingDir: :return: &quot;&quot;&quot; # Step 1 Home = &#39;/home/etikhonov&#39; # &#39;pwd&#39; of your home directory of your remote machine Address = &#39;rurik&#39; # Your target supercomputer: username@address or ssh alias # example of address: user@somedomain.edu -p 2222 Path = Home + &#39;/&#39; + workingDir + &#39;/CalcFold&#39; + str(index) # just keep it # Step 2 output = str(check_output(&#39;ssh &#39; + Address + &#39; qstat &#39; + str(jobID), shell=True)) # If you using full adress without ssh alias, you must provide valid ssh private key like there: # output = str(check_output(&#39;ssh -i ~/.ssh/id_rsa &#39; + Address + &#39; /usr/bin/qstat &#39; + str(jobID), shell=True)) if not &#39; R &#39; in output or not &#39; Q &#39; in output: doneOr = True # [nothing, nothing] = unix([&#39;scp -i ~/.ssh/id_rsa &#39; Address &#39;:&#39; Path &#39;/OUTCAR ./&#39;]) %OUTCAR is not necessary by default os.system(&#39;scp &#39; + Address + &#39;:&#39; + Path + &#39;/OSZICAR ./&#39;) # For reading enthalpy/energy os.system(&#39;scp &#39; + Address + &#39;:&#39; + Path + &#39;/CONTCAR ./&#39;) # For reading structural info # Edit ssh command as above! else: doneOr = False return doneOr if __name__ == &#39;__main__&#39;: parser = argparse.ArgumentParser() parser.add_argument(&#39;-j&#39;, dest=&#39;jobID&#39;, type=int) parser.add_argument(&#39;-i&#39;, dest=&#39;index&#39;, type=int) parser.add_argument(&#39;-f&#39;, dest=&#39;workingDir&#39;, type=str) args = parser.parse_args() isDone = checkStatus_remote(jobID=args.jobID, workingDir=args.workingDir, index=args.index) print(&#39;CALLBACK &#39; + str(int(isDone))) </pre></div>