Cosmomc set on a Cluster BUT systematically runs on 1 node only instead of e.g N requested
-
- Posts: 19
- Joined: November 05 2009
- Affiliation: University of Cape Town
Cosmomc set on a Cluster BUT systematically runs on 1 node o
Hi ,
I managed to compile (2009) cosmomc on a quite powerful SUN cluster.
wmap likelihood and camb (openmp) compiled with ifort.
cosmomc compiled with mpif90 (mpif90 "pointing" to ifort).
When the job is submitted with e.g mpirun -np 4 ... , in the submission script,
I get my 4 chains generated BUT all the 4 chains are systematically ran over 1 single node instead of 4 (1 per node); even if in my submission script, I am requesting 4 nodes (each has 8 CPUS).
In params.ini , I set num_threads = 8. The "threading" of camb seems ok (as seen when logging into the only host node). The cluster uses a "moab/torque protocol" for job submission.
Could anyone suggest me a way out ?
Thanks,
Patrice
I managed to compile (2009) cosmomc on a quite powerful SUN cluster.
wmap likelihood and camb (openmp) compiled with ifort.
cosmomc compiled with mpif90 (mpif90 "pointing" to ifort).
When the job is submitted with e.g mpirun -np 4 ... , in the submission script,
I get my 4 chains generated BUT all the 4 chains are systematically ran over 1 single node instead of 4 (1 per node); even if in my submission script, I am requesting 4 nodes (each has 8 CPUS).
In params.ini , I set num_threads = 8. The "threading" of camb seems ok (as seen when logging into the only host node). The cluster uses a "moab/torque protocol" for job submission.
Could anyone suggest me a way out ?
Thanks,
Patrice
-
- Posts: 19
- Joined: November 05 2009
- Affiliation: University of Cape Town
Cosmomc set on a Cluster BUT systematically runs on 1 node o
I did a mistake in my submission script. A " mpirun -np \ \$nproc - -hostfile \ \$PBS_NODEFILE "
portion was improperly written in my script. My flags were badly positioned. In clear, if one meets such a problem, make sure that your submission script is OK.
Patrice
portion was improperly written in my script. My flags were badly positioned. In clear, if one meets such a problem, make sure that your submission script is OK.
Patrice
-
- Posts: 14
- Joined: October 28 2008
- Affiliation: tt
Cosmomc set on a Cluster BUT systematically runs on 1 node o
Hi,
I have the same problem.
When I run my job I sumit :
#!/bin/bash
#PBS -j eo
#PBS -l select=8:ncpus=8:mpiprocs=8
#PBS -N test
cd \$PBS_O_WORKDIR/cosmomc_p
pwd
export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HOME/local/lib
mpirun -np 8 ./cosmomc params_ep.ini
Indeed, I get 8 chains, but running on 1 cpu and not on 8.
If anyone has a solution.
In the upper script I didn't write \, it was needed to appear in this forum
Thanks in advance.
I have the same problem.
When I run my job I sumit :
#!/bin/bash
#PBS -j eo
#PBS -l select=8:ncpus=8:mpiprocs=8
#PBS -N test
cd \$PBS_O_WORKDIR/cosmomc_p
pwd
export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HOME/local/lib
mpirun -np 8 ./cosmomc params_ep.ini
Indeed, I get 8 chains, but running on 1 cpu and not on 8.
If anyone has a solution.
In the upper script I didn't write \, it was needed to appear in this forum
Thanks in advance.
-
- Posts: 19
- Joined: November 05 2009
- Affiliation: University of Cape Town
Cosmomc set on a Cluster BUT systematically runs on 1 node o
Hi ,
Important detail. On our cluster, I am running mine over several nodes (2 to 4).
Each node has 8 CPUS.Which does not seem to be exactly the same amount of resource you are requesting. Anyway,
here are the complete lines which fixed my problem.
Below,
export/home/pokouma/openmpi/bin/mpirun
is just wher my mpirun is located and
export/home/pokouma/scratch/mcmc_after_brighton_h\
ack_2p/cosmomc params.ini
is my complete call for ./cosmomc params.ini, explicitly telling the cluster where is each one located.
NOTE the importance of of these calls in my submission (MOAB) script :
nproc=`cat \$PBS_NODEFILE | wc -l`
cat \$PBS_NODEFILE
AND the
-np \$nproc -hostfile \$PBS_NODEFILE
flag
##### The part below is the one appearing in my moab submission script) ######
nproc=`cat \$PBS_NODEFILE | wc -l`
cat \$PBS_NODEFILE
/export/home/pokouma/openmpi/bin/mpirun -np \$nproc -hostfile \$PBS_NODEFILE\ /export/home/pokouma/scratch/mcmc_after_brighton_h\
ack_2p/cosmomc params.ini
#############
For me, the cluster automatically takes the number nproc = number of nodes * number of processes per node requested as the number of chains to generate.
The number of threads for (openmp) camb is set in the params.ini file.
Hope this helps
Peace,
Lumumba
Important detail. On our cluster, I am running mine over several nodes (2 to 4).
Each node has 8 CPUS.Which does not seem to be exactly the same amount of resource you are requesting. Anyway,
here are the complete lines which fixed my problem.
Below,
export/home/pokouma/openmpi/bin/mpirun
is just wher my mpirun is located and
export/home/pokouma/scratch/mcmc_after_brighton_h\
ack_2p/cosmomc params.ini
is my complete call for ./cosmomc params.ini, explicitly telling the cluster where is each one located.
NOTE the importance of of these calls in my submission (MOAB) script :
nproc=`cat \$PBS_NODEFILE | wc -l`
cat \$PBS_NODEFILE
AND the
-np \$nproc -hostfile \$PBS_NODEFILE
flag
##### The part below is the one appearing in my moab submission script) ######
nproc=`cat \$PBS_NODEFILE | wc -l`
cat \$PBS_NODEFILE
/export/home/pokouma/openmpi/bin/mpirun -np \$nproc -hostfile \$PBS_NODEFILE\ /export/home/pokouma/scratch/mcmc_after_brighton_h\
ack_2p/cosmomc params.ini
#############
For me, the cluster automatically takes the number nproc = number of nodes * number of processes per node requested as the number of chains to generate.
The number of threads for (openmp) camb is set in the params.ini file.
Hope this helps
Peace,
Lumumba