python loop over cells for DG0 elements hangs in parallel

Bug #921472 reported by Marco Morandini
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
DOLFIN
Confirmed
High
Unassigned

Bug Description

This is a follow-up of https://answers.launchpad.net/dolfin/+question/185196

The following code works fine in parallel (4 processors, mpirun -np 4 ) for nx = 13, but hangs, on my pc, as soon as I increase it to nx = 14

As noted by Johan Hake in his first answer the code does not hang if the number of cells are equally distributed on each processor, while it hangs when they are not equally distributed.

I understand that what I'm doing here below should be discouraged; however, I see no reason
for which it should not work.

#-------------
from dolfin import *

nx = 13
mesh = UnitCube(nx,10,10)

DG = FunctionSpace(mesh, "DG", 0)
E = Function(DG)
#print cells(mesh)
i = 0
global_idx = mesh.parallel_data().global_entity_indices(3)
#print mesh
#print global_idx
print "Number of cells", mesh.num_cells()
for c in cells(mesh):
 i = i+1
 #E.vector()[c.index()]=1./c.volume()
 if c.index() > mesh.num_cells():
  print "XXXXX", c.index()
  break
 E.vector()[global_idx.array()[c.index()]]=1./c.volume()
print i
V = VectorFunctionSpace(mesh, "CG", 1)
print "Done VF"

Revision history for this message
Marco Morandini (morandini) wrote :

I fear that nothing guarantees that the global vector index corresponds the the global cell index, even for a scalar function and DG constant elements as here. For this reason the "correct" test (still hanging) should be
(please correct me if I'm wrong):

#-------------------
from dolfin import *
import numpy

nx = 14
mesh = UnitCube(nx,10,10)

DG = FunctionSpace(mesh, "DG", 0)
E = Function(DG)
v = E.vector()
global_idx = mesh.parallel_data().global_entity_indices(3)
dof_map = E.function_space().dofmap()
print "Number of cells", mesh.num_cells()
indices = numpy.array([0], dtype=numpy.uintc)
i = 0
for c in cells(mesh):
 dof_map.tabulate_dofs(indices, c)
 if not v.owns_index(indices[0]):
  print "WTF", c.index(), indices[0]
  break
  v[indices[0]] = 1./c.volume()
 i = i + 1
 print cpp.MPI_process_number(), i
print "Done VF"
#-------------------

I think that the problem with the above code is that every time

v[indices[0]] = 1./c.volume()

is called VecAssemblyBegin is called as well. So, if the assignment instruction is not called
the same number of times by all processes we end up with some processes waiting for exit, and some others
blocked in a PMPI_Allreduce() call that will never return because of the ones that are already past the loop.
If I add some v.apply("replace") at the end then all the processes do exit the loop while, for the same reason as before, the the program still hangs; this seems to support the explanation.
The following code seems to work for me;
I let you to decide if this bug should simply be closed as invalid or suggest a re-design of the python interface.

#-----------------
from dolfin import *
import numpy

nx = 14
mesh = UnitCube(nx,10,10)

DG = FunctionSpace(mesh, "DG", 0)
E = Function(DG)
v = E.vector()
global_idx = mesh.parallel_data().global_entity_indices(3)
dof_map = E.function_space().dofmap()
print "Number of cells", mesh.num_cells()
indices = numpy.array([0], dtype=numpy.uintc)
i = 0
a = numpy.zeros(v.local_size())
(n0, n1) = v.local_range()
print n0, n1
for c in cells(mesh):
 dof_map.tabulate_dofs(indices, c)
 if not v.owns_index(indices[0]):
  print "WTF", c.index(), indices[0]
  break
 a[indices[0] - n0] = 1./c.volume()
v.set_local(a)
print "Done VF"
#-----------------

Revision history for this message
Garth Wells (garth-wells) wrote :

I think that the right solution to this is lo allow indexing of local entries only (using local indices), and insist on using 'set' for possibly off-process terms. This would mirror what the backends PETSc and Epetra.

Changed in dolfin:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Anders Logg (logg) wrote : Re: [Bug 921472] Re: python loop over cells for DG0 elements hangs in parallel

Agree.

--
Anders

On Tue, Feb 14, 2012 at 01:30:14PM -0000, Garth Wells wrote:
> I think that the right solution to this is lo allow indexing of local
> entries only (using local indices), and insist on using 'set' for
> possibly off-process terms. This would mirror what the backends PETSc
> and Epetra.
>
> ** Changed in: dolfin
> Status: New => Confirmed
>
> ** Changed in: dolfin
> Importance: Undecided => High
>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.