8.10 如何建立一个计算使用的工作提交脚本?

为了建立一个工作提交脚本,我们希望用户了解关于MATLAB编程和你的作业提交系统的一些基本 知识,至少要明白如何处理MATLAB中的字符串和如何获得工作信息的基本思路。

有两种工作脚本提交模式:本地提交或远程提交,这取决于你是 将ab initio计算提交到你运行USPEX和MATLAB的本地机器还是远程超级计算机。

第一步:在提交文件夹Submission/中配置文件

案例一:本地提交。

请在INPUT.txt文件中编辑下列标签:

1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)

然后到Submission/文件夹下,这里你需要编辑submitJob_local.mcheckStatus_local.m这两个文件。

你可以在这些文件中找到详细的说明。在一般情况下,你只需告诉USPEX如何提交工作和检查工作 是否已经完成。

submitJob_local.m里:

function jobNumber = submitJob_local()
%-------------------------------------------------------------
%This routine is to check if the submitted job is complete or not
%One needs to do a little edit based on your own situation.
%1   : whichCluster (default 0, 1: local submission, 2: remote submission)
%-------------------------------------------------------------

%Step 1: to prepare the job script that is required by your supercomputer
fp = fopen('myrun', 'w');
fprintf(fp, '#!/bin/sh\n');
fprintf(fp, '#PBS -l nodes=1:ppn=8,walltime=1:30:00 -q cfn_short\n');
fprintf(fp, '#PBS -N USPEX\n');
fprintf(fp, '#PBS -j oe\n');
fprintf(fp, '#PBS -V \n');
fprintf(fp, 'cd ${PBS_O_WORKDIR}\n');
fprintf(fp, 'mpirun -np 4 vasp1 > vasp.out\n');
fclose(fp);

%Step 2: to submit the job with a command like qsub, bsub, llsubmit, etc.

[a,b]=unix(['qsub myrun'])

%Step 3: to get the jobID from the screen message
%It will output some message on the screen like '2350873.nano.cfn.bnl.local'

end$_marker$ = findstr(b,'.');
jobNumber = b(1:end$_marker$(1)-1);

案例二:远程提交。

请在INPUT.txt文件中编辑下列标签:

2       : whichCluster (default 0, 1: local submission; 2: remote submission)
C-20GPa : remoteFolder

最后到Submission/文件夹下,此处你需要编辑两个文件:
submitJob_remote.mcheckStatus_remote.m


In submitJob_remote.m:

function jobNumber = submitJob_remote(USPEX, Index)
%-------------------------------------------------------------
%This routine is to check if the submitted job is complete or not
%2   : whichCluster (default 0, 1: local submission; 2: remote submission)
%C-20GPa : remoteFolder
%-------------------------------------------------------------

%-------------------------------------------------------------
%Step1: To prepare the job script, runvasp.sh
  fp = fopen('runvasp.sh', 'w');
  fprintf(fp, '#!/bin/sh\n');
  fprintf(fp, '#PBS -l nodes=2:ppn=2,walltime=1:30:00\n');
  fprintf(fp, '#PBS -N USPEX\n');
  fprintf(fp, '#PBS -j oe\n');
  fprintf(fp, '#PBS -V \n');
  fprintf(fp, 'cd ${PBS_O_WORKDIR}\n');
  fprintf(fp, '/usr/local/pkg/openmpi-1.4.5/bin/mpirun -np 4 vasp1 > vasp.out\n');
  fclose(fp);
%-------------------------------------------------------------------------------
%Step 2: Copy the files to the remote machine

%Step2-1: Specify the PATH to put your calculation folder
Home = ['/nfs/user08/qiazhu']; %'pwd' of your home directory on remote machine
Address = 'qiazhu@seawulf.stonybrook.edu'; %your target server: username@address
Path = [Home '/' USPEX '/CalcFold' num2str(Index)];  %Just keep it

%Step2-2: Create the remote directory
% Please change the ssh/scp command if necessary!
% Sometimes you don't need the -i option
try
[a,b]=unix(['ssh -i ~/.ssh/seawulf ' Address ' mkdir ' USPEX ]);
catch
end

try
[a,b]=unix(['ssh -i ~/.ssh/seawulf ' Address ' mkdir ' Path ]);
catch
end

%Step2-3: Copy the necessary files (for VASP calculations, we need POSCAR, INCAR, POTCAR,
% KPOINTS and job script)
unix(['scp -i ~/.ssh/seawulf POSCAR   ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf INCAR    ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf POTCAR   ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf KPOINTS  ' Address ':' Path]);
unix(['scp -i ~/.ssh/seawulf runvasp.sh ' Address ':' Path]);

%------------------------------------------------------------------------------
%Step 3: to submit the job and get JobID, i.e., the exact command to submit the job.
[a,v]=unix(['ssh -i ~/.ssh/seawulf ' Address ' /usr/local/pkg/torque/bin/qsub '
           Path '/runvasp.sh'])

% format: Job 1587349.nagling is submitted to default queue <mono>
end_marker = findstr(v,'.');
if strfind(v,'error')
   jobNumber=0;
else
   jobNumber = v(1:end_marker(1)-1);
end


CheckStatus_remote.m里:

function doneOr = checkStatus_remote(jobID, USPEX, Folder)
%--------------------------------------------------------------------
%This routine is to check if the submitted job is complete or not
%One needs to do a little edit based on your own situation.
%--------------------------------------------------------------------

%Step1: Specify the PATH to put your calculation folder
Home = ['/nfs/user08/qiazhu']; %'pwd' of your home directory of your remote machine
Address = 'qiazhu@seawulf.stonybrook.edu';  %Your target: username@address.
Path = [Home '/' USPEX '/CalcFold' num2str(Folder)]; %just keep it
%Step2: Check JobID, the exact command to check job by jobID
[a,b]=unix(['ssh -i ~/.ssh/seawulf ' Address ' /path/to/qstat ' num2str(jobID)])
    tempOr1 = strfind(b, 'R batch');
    tempOr2 = strfind(b, 'Q batch');
    if isempty(tempOr1) & isempty(tempOr2)
      doneOr = 1;
% for vasp, we usually need OSZICAR for reading energy and CONTCAR for reading
%structure OUTCAR, EIGENVAL, DOSCAR might be needed for reading other properties.
%   unix(['scp -i ~/.ssh/seawulf ' Address ':' Path '/OUTCAR ./'])
%OUTCAR is not necessary by default
      unix(['scp -i ~/.ssh/seawulf ' Address ':' Path '/OSZICAR ./'])
%For reading enthalpy/energy
      unix(['scp -i ~/.ssh/seawulf ' Address ':' Path '/CONTCAR ./'])
%For reading structural info
end

这可能需要一些时间来正确配置这些文件。为了测试它是否可以工作,你可以输入 两次“USPEX -r” 指令,然后监测屏幕信息。第一次的尝试是检查工作是否提交, 而第二次是检查USPEX是否可以正确检测提交工作的状态。所有的相关信息,可以在屏幕上输出信息 中找到。如果MATLAB输出没有任何错误,你可以准备好离开了。

第二步:定期运行USPEX

实际计算以“USPEX -r > log”命令开始。每次 MATLAB程序都会检查ab initio 计算的运行状态。如果工作完成,MATLAB就会前往计算文件夹读取结果,随后提交新的计算。之后, MATLAB将退出。因此,需要定期调用命令(例如,每5分钟)。周期性的脚本可以通过使 用crontab或shell脚本被执行。

Crontab

这可以在你的Linux 机器上执行一个crontab后台程序。在你的用户主目录中, 现在应该有这些文件:

~/call_job
~/CronTab

下面是来自我们的集群的1-line CronTab文件例子的一行:

*/5 * * * * sh call_job

它表示作业提交之间的间隔为5分钟,并指向文件call_job,它应该包含USPEX将被执 行目录的地址,call_job大概看起来是这样:

#!/bin/sh
source $HOME/.bashrc
cd /ExecutionDirectory
date >> log
USPEX -r >> log

为了激活crontab,键入

crontab ~/CronTab

如果你想终止运行,或编辑call_job文件或通过键入以下命令删除 crontab

crontab -r

检查crontab 是否工作正常,你应该在计算开始时追踪日志文件的更新。

Shell脚本

可以在Linux shell中使用sleep睡眠命令准备脚本。下面是一个非常简单 的脚本run-uspex.sh

#!/bin/sh
while [ ! -f ./USPEX_IS_DONE ]; do
   date >> log
   USPEX -r >> log
   sleep 300
done

注意: 记住这个计算可以通过终止该脚本的进程ID来停止。