Cosmomc set on a Cluster BUT systematically runs on 1 node only instead of e.g N requested

Use of Cobaya. camb, CLASS, cosmomc, compilers, etc.
Post Reply
Patrice Okouma
Posts: 19
Joined: November 05 2009
Affiliation: University of Cape Town

Cosmomc set on a Cluster BUT systematically runs on 1 node o

Post by Patrice Okouma » July 04 2010

Hi ,
I managed to compile (2009) cosmomc on a quite powerful SUN cluster.
wmap likelihood and camb (openmp) compiled with ifort.
cosmomc compiled with mpif90 (mpif90 "pointing" to ifort).

When the job is submitted with e.g mpirun -np 4 ... , in the submission script,
I get my 4 chains generated BUT all the 4 chains are systematically ran over 1 single node instead of 4 (1 per node); even if in my submission script, I am requesting 4 nodes (each has 8 CPUS).

In params.ini , I set num_threads = 8. The "threading" of camb seems ok (as seen when logging into the only host node). The cluster uses a "moab/torque protocol" for job submission.

Could anyone suggest me a way out ?

Thanks,
Patrice

Patrice Okouma
Posts: 19
Joined: November 05 2009
Affiliation: University of Cape Town

Cosmomc set on a Cluster BUT systematically runs on 1 node o

Post by Patrice Okouma » July 08 2010

I did a mistake in my submission script. A " mpirun -np \ \$nproc - -hostfile \ \$PBS_NODEFILE "
portion was improperly written in my script. My flags were badly positioned. In clear, if one meets such a problem, make sure that your submission script is OK.

Patrice

Pierre Delsart
Posts: 14
Joined: October 28 2008
Affiliation: tt

Cosmomc set on a Cluster BUT systematically runs on 1 node o

Post by Pierre Delsart » July 26 2010

Hi,
I have the same problem.
When I run my job I sumit :

#!/bin/bash
#PBS -j eo
#PBS -l select=8:ncpus=8:mpiprocs=8
#PBS -N test

cd \$PBS_O_WORKDIR/cosmomc_p
pwd

export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HOME/local/lib

mpirun -np 8 ./cosmomc params_ep.ini


Indeed, I get 8 chains, but running on 1 cpu and not on 8.

If anyone has a solution.

In the upper script I didn't write \, it was needed to appear in this forum

Thanks in advance.

Patrice Okouma
Posts: 19
Joined: November 05 2009
Affiliation: University of Cape Town

Cosmomc set on a Cluster BUT systematically runs on 1 node o

Post by Patrice Okouma » July 27 2010

Hi ,
Important detail. On our cluster, I am running mine over several nodes (2 to 4).
Each node has 8 CPUS.Which does not seem to be exactly the same amount of resource you are requesting. Anyway,
here are the complete lines which fixed my problem.
Below,

export/home/pokouma/openmpi/bin/mpirun
is just wher my mpirun is located and

export/home/pokouma/scratch/mcmc_after_brighton_h\
ack_2p/cosmomc params.ini

is my complete call for ./cosmomc params.ini, explicitly telling the cluster where is each one located.

NOTE the importance of of these calls in my submission (MOAB) script :
nproc=`cat \$PBS_NODEFILE | wc -l`

cat \$PBS_NODEFILE

AND the

-np \$nproc -hostfile \$PBS_NODEFILE

flag

##### The part below is the one appearing in my moab submission script) ######

nproc=`cat \$PBS_NODEFILE | wc -l`
cat \$PBS_NODEFILE

/export/home/pokouma/openmpi/bin/mpirun -np \$nproc -hostfile \$PBS_NODEFILE\ /export/home/pokouma/scratch/mcmc_after_brighton_h\
ack_2p/cosmomc params.ini

#############

For me, the cluster automatically takes the number nproc = number of nodes * number of processes per node requested as the number of chains to generate.

The number of threads for (openmp) camb is set in the params.ini file.

Hope this helps
Peace,
Lumumba

Post Reply