8.9 How do I set up a calculation using a job submission script?

To set up a job submission script, we expect users to know some basic knowledge of MATLAB programing and your job submission systems, at least the basic idea of how to work with strings in MATLAB and how to get the job information.

There are two modes for job submission: local submission or remote submission, depending on whether you submit ab initio calculations to the local machine where you run USPEX and MATLAB, or to a remote supercomputer.

Step 1: Configuring files in Submission/ folder

Case I: Local submission.

Please edit in INPUT.txt file the following tag:

1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)

Then, go to the directory Submission/, where you need to edit two files: submitJob_local.m and checkStatus_local.m.

One can find the detailed instructions in these files. In general, one just needs to tell USPEX how to submit the job and check if the job has completed or not.

In submitJob_local.m:

function jobNumber = submitJob_local()
%-------------------------------------------------------------
%This routine is to check if the submitted job is complete or not
%One needs to do a little edit based on your own situation.
%1   : whichCluster (default 0, 1: local submission, 2: remote submission)
%-------------------------------------------------------------

%Step 1: to prepare the job script that is required by your supercomputer
fp = fopen('myrun', 'w');    
fprintf(fp, '#!/bin/sh\n');
fprintf(fp, '#PBS -l nodes=1:ppn=8,walltime=1:30:00 -q cfn_short\n');
fprintf(fp, '#PBS -N USPEX\n');
fprintf(fp, '#PBS -j oe\n');
fprintf(fp, '#PBS -V \n');
fprintf(fp, 'cd ${PBS_O_WORKDIR}\n');
fprintf(fp, 'mpirun -np 4 vasp1 > vasp.out\n');
fclose(fp);

%Step 2: to submit the job with a command like qsub, bsub, llsubmit, etc.

[a,b]=unix(['qsub myrun'])

%Step 3: to get the jobID from the screen message
%It will output some message on the screen like '2350873.nano.cfn.bnl.local'

end_marker = findstr(b,'.');
jobNumber = b(1:end_marker(1)-1); 

In checkStatus_local.m:

function doneOr = checkStatus_local(jobID)
%--------------------------------------------------------------------
%This routine is to check if the submitted job is complete or not
%One needs to do a little edit based on your own case.
1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)
%--------------------------------------------------------------------

%Step1: the command to check job by ID. 
    [a,b] = unix(['qstat ' jobID ''])

%Step2: to find the keywords from the screen message to determine if the job is complete
%Below is just a sample:
%-------------------------------------------------------------------------------
%Job id                    Name             User            Time Use S Queue
%------------------------- ---------------- --------------- -------- - ---------
%2455453.nano              USPEX            qzhu            02:28:42 R cfn_gen04
%-------------------------------------------------------------------------------
%If the job is still running, it will show as above.
%If there are no key words like 'Q/R Cfn_gen04', it indicates the job is complete.
%Therefore, we can use a small MATLAB function findstr to apply this argument.
    if isempty(findstr(b,'R cfn_')) & isempty(findstr(b,'Q cfn_'))   
       doneOr = 1
       unix('rm USPEX*');    % to remove the log file
	end

Case II: Remote submission.

Please edit in INPUT.txt file the following tag:

2       : whichCluster (default 0, 1: local submission; 2: remote submission)
C-20GPa : remoteFolder

Finally, go to the directory Submission/, where you need to edit two files:
submitJob_remote.m and checkStatus_remote.m


In submitJob_remote.m:

function jobNumber = submitJob_remote(USPEX, Index)
%-------------------------------------------------------------
%This routine is to check if the submitted job is complete or not
%2   : whichCluster (default 0, 1: local submission; 2: remote submission)
%C-20GPa : remoteFolder
%-------------------------------------------------------------

%-------------------------------------------------------------
%Step1: To prepare the job script, runvasp.sh
  fp = fopen('runvasp.sh', 'w');
  fprintf(fp, '#!/bin/sh\n');
  fprintf(fp, '#PBS -l nodes=2:ppn=2,walltime=1:30:00\n');
  fprintf(fp, '#PBS -N USPEX\n');
  fprintf(fp, '#PBS -j oe\n');
  fprintf(fp, '#PBS -V \n');
  fprintf(fp, 'cd ${PBS_O_WORKDIR}\n');
  fprintf(fp, '/usr/local/pkg/openmpi-1.4.5/bin/mpirun -np 4 vasp1 > vasp.out\n');
  fclose(fp);
%-------------------------------------------------------------------------------
%Step 2: Copy the files to the remote machine

%Step2-1: Specify the PATH to put your calculation folder
Home = ['/nfs/user08/qiazhu']; %'pwd' of your home directory on remote machine
Address = 'qiazhu@seawulf.stonybrook.edu'; %your target server: username@address
Path = [Home '/' USPEX '/CalcFold' num2str(Index)];  %Just keep it

%Step2-2: Create the remote directory 
% Please change the ssh/scp command if necessary! 
% Sometimes you don't need the -i option
try
[a,b]=unix(['ssh -i ~/.ssh/seawulf ' Address ' mkdir ' USPEX ]);  
catch
end

try
[a,b]=unix(['ssh -i ~/.ssh/seawulf ' Address ' mkdir ' Path ]);
catch
end

%Step2-3: Copy the necessary files (for VASP calculations, we need POSCAR, INCAR, POTCAR,
% KPOINTS and job script)
unix(['scp -i ~/.ssh/seawulf POSCAR   ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf INCAR    ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf POTCAR   ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf KPOINTS  ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf runvasp.sh ' Address ':' Path]);

%------------------------------------------------------------------------------
%Step 3: to submit the job and get JobID, i.e., the exact command to submit the job.
[a,v]=unix(['ssh -i ~/.ssh/seawulf ' Address ' /usr/local/pkg/torque/bin/qsub ' 
           Path '/runvasp.sh'])

% format: Job 1587349.nagling is submitted to default queue <mono>
end_marker = findstr(v,'.');
if strfind(v,'error')
   jobNumber=0;
else
   jobNumber = v(1:end_marker(1)-1);
end


In CheckStatus_remote.m:

function doneOr = checkStatus_remote(jobID, USPEX, Folder)
%--------------------------------------------------------------------
%This routine is to check if the submitted job is complete or not
%One needs to do a little edit based on your own situation.
%--------------------------------------------------------------------

%Step1: Specify the PATH to put your calculation folder
Home = ['/nfs/user08/qiazhu']; %'pwd' of your home directory of your remote machine
Address = 'qiazhu@seawulf.stonybrook.edu';  %Your target: username@address.
Path = [Home '/' USPEX '/CalcFold' num2str(Folder)]; %just keep it
%Step2: Check JobID, the exact command to check job by jobID
[a,b]=unix(['ssh -i ~/.ssh/seawulf ' Address ' /path/to/qstat ' num2str(jobID)])
    tempOr1 = strfind(b, 'R batch');
    tempOr2 = strfind(b, 'Q batch');
    if isempty(tempOr1) & isempty(tempOr2)
      doneOr = 1;
% for vasp, we usually need OSZICAR for reading energy and CONTCAR for reading 
%structure OUTCAR, EIGENVAL, DOSCAR might be needed for reading other properties.
%   unix(['scp -i ~/.ssh/seawulf ' Address ':' Path '/OUTCAR ./']) 
%OUTCAR is not necessary by default
      unix(['scp -i ~/.ssh/seawulf ' Address ':' Path '/OSZICAR ./']) 
%For reading enthalpy/energy
      unix(['scp -i ~/.ssh/seawulf ' Address ':' Path '/CONTCAR ./']) 
%For reading structural info
end

It might take some time to correctly configure these files. To test if it works or not, you can type “USPEX -r” twice and then track the screen information. The first attempt is to check if the jobs are submitted, while the second attempt is to check if USPEX can correctly check the status of the submitted jobs. All of the related information can be found in the screen output message. If MATLAB exits without any errors, you are almost ready to go.

Step 2: Running USPEX periodically

The real calculation starts with the command “USPEX -r > log”. Each time the MATLAB process will check the status of the running ab initio calculations. If the job is complete, MATLAB will go the the calculation folder to read the results, and then submit new calculations. After that, MATLAB will exit. Therefore, one needs to periodically call the command (for example, every 5 minutes). The periodic script can be executed by using either crontab or a shell script.

Crontab

This can be performed using a crontab daemon on your Linux machine. In your user home directory, there should now be the files:

~/call_job
~/CronTab

Here is an example of a 1-line CronTab file from one of our clusters:

*/5 * * * * sh call_job

It states that the interval between job submissions is 5 minutes and points to the file call_job, which should contain the address of the directory where USPEX will be executed, and the file call_job looks like this:

#!/bin/sh
source $HOME/.bashrc
cd /ExecutionDirectory
date >> log
USPEX -r >> log

To activate crontab, type

crontab ~/CronTab

If you want to terminate this run, either edit call_job or remove this crontab by typing

crontab -r

To check if crontab works well, one should also keep tracking the updates of the log file at the beginning of the calculation.

Shell script

You can also prepare the script by using the sleep command in Linux shell. Below is a rather simple script run-uspex.sh:

#!/bin/sh
while [ ! -f ./USPEX_IS_DONE ]; do
   date >> log 
   USPEX -r >> log
   sleep 300
done

Note: keep in mind that this calculation can only be terminated by killing the process ID of this script.