CAMB: optimizing threads / xlf95

Use of Healpix, camb, CLASS, cosmomc, compilers, etc.
Post Reply
Richard Easther
Posts: 16
Joined: April 14 2005
Affiliation: Yale

CAMB: optimizing threads / xlf95

Post by Richard Easther » April 14 2005

I am trying out CAMB on a dual G5 powermac, and on an opteron based cluster. I am simultaneously working with a new compiler (xlf95 on the mac) and working out its optimization flags.

In particular, since both the cluster and the desktop have two processors per node, does anyone have any guidance as to how well CAMB makes use of threading? Simply setting the thread number to 2 in params.ini makes no appreciable difference to the run time -- are there flags I should also be setting at compile time?

Also, has anyone tried to use MPI to parallelize this more aggressively in a clustered environment?

Many thanks, and sorry for the naive questions!

Richard

Antony Lewis
Posts: 1369
Joined: September 23 2004
Affiliation: University of Sussex
Contact:

Re: CAMB: optimizing threads / xlf95

Post by Antony Lewis » April 14 2005

Most compilers that support OPENMP have a compiler option like -openmp to turn it on when compiling, though I don't know anything about that particular compiler.

The thread number setting may not change the speed if it is using both processors automatically anyway. It should be very close to a factor 2 speed up using OPENMP on 2 processors as opposed to using just one CPU.

CAMB doesn't support MPI internally (CMBFAST does I think). But usually you want to use MPI to run multiple CAMB instances rather than one instance faster: for example in cosmomc each node is OPENMP parallized over the CPUs in that node, but chains are parallelized by MPI so each node is running a separate CAMB instance in parallel.

Richard Easther
Posts: 16
Joined: April 14 2005
Affiliation: Yale

CAMB: optimizing threads / xlf95

Post by Richard Easther » April 14 2005

I will fiddle around and see what happens. However, I could imagine taking M chains on N cpus, and then running each chain over N/M cpus.

Each chain will be embarrassingly parallelizable, since it runs almost idependently of the others. However, there is still considerable scope for speeding things up by running each instance of CAMB across multiple CPUs, although at some point you are going to reach diminishing returns when the messaging overhead and possible race conditions in the code are going to slow you down. However, I can certainly arrange matters to N >> M :-)

Colin Hill
Posts: 1
Joined: May 13 2010
Affiliation: Princeton University

CAMB: optimizing threads / xlf95

Post by Colin Hill » May 13 2010

Sorry to dredge up such an old thread, but has anyone attempted to parallelize CAMB using MPI since this was posted? I've been thinking about trying to do it, but thought I'd see if others have tried already. Thanks-

Post Reply