JIT can fail silently - seems to happen when there is a problem in dolfin.pc

Bug #892714 reported by Garth Wells
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DOLFIN
Invalid
Critical
Unassigned

Bug Description

I seen on a number of occasions that JIT can fail silently and and program just ends. Since there is not output, it's hard to say what is happening. It could be when there is a 'bug' in dolfin.pc.

Is JIT compilation checked for success? It should be.

Changed in dolfin:
milestone: none → 1.0-rc1
Changed in dolfin:
importance: Undecided → Critical
Revision history for this message
Anders Logg (logg) wrote :

Can you provide an example? It's hard to debug otherwise and it's blocking 1.0-rc1.

Revision history for this message
Garth Wells (garth-wells) wrote : Re: [Bug 892714] Re: JIT can fail silently - seems to happen when there is a problem in dolfin.pc

On 20 November 2011 18:03, Anders Logg <email address hidden> wrote:
> Can you provide an example?

No, which is the problem. I get it on a particular machine, and it
seems to be related to when I link libraries, like PETSc and Trilinos,
to shared libs which are not in default locations. C++ is fine.

Do we have a test for JIT that does not involve FFC? I suspect that
our rubbish dolfin.pc is part of the problem.

Garth

> It's hard to debug otherwise and it's
> blocking 1.0-rc1.
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/892714
>
> Title:
>  JIT can fail silently - seems to happen when there is a problem in
>  dolfin.pc
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/892714/+subscriptions
>

--
Garth N. Wells
Department of Engineering, University of Cambridge
http://www.eng.cam.ac.uk/~gnw20

Revision history for this message
Anders Logg (logg) wrote :

On Sun, Nov 20, 2011 at 07:06:31PM -0000, Garth Wells wrote:
> On 20 November 2011 18:03, Anders Logg <email address hidden> wrote:
> > Can you provide an example?
>
> No, which is the problem. I get it on a particular machine, and it
> seems to be related to when I link libraries, like PETSc and Trilinos,
> to shared libs which are not in default locations. C++ is fine.
>
> Do we have a test for JIT that does not involve FFC? I suspect that
> our rubbish dolfin.pc is part of the problem.

You can either JIT-compile an Expression which does not involve FFC,
or you can set the form compiler to SFC and see if that changes
anything.

--
Anders

Revision history for this message
Garth Wells (garth-wells) wrote :

On 20 November 2011 20:00, Anders Logg <email address hidden> wrote:
> On Sun, Nov 20, 2011 at 07:06:31PM -0000, Garth Wells wrote:
>> On 20 November 2011 18:03, Anders Logg <email address hidden> wrote:
>> > Can you provide an example?
>>
>> No, which is the problem. I get it on a particular machine, and it
>> seems to be related to when I link libraries, like PETSc and Trilinos,
>> to shared libs which are not in default locations. C++ is fine.
>>
>> Do we have a test for JIT that does not involve FFC? I suspect that
>> our rubbish dolfin.pc is part of the problem.
>
> You can either JIT-compile an Expression which does not involve FFC,
> or you can set the form compiler to SFC and see if that changes
> anything.
>

Expressions seem to be fine. It's going to require some deep digging.

Garth

> --
> Anders
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/892714
>
> Title:
>  JIT can fail silently - seems to happen when there is a problem in
>  dolfin.pc
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/892714/+subscriptions
>

--
Garth N. Wells
Department of Engineering, University of Cambridge
http://www.eng.cam.ac.uk/~gnw20

Revision history for this message
Garth Wells (garth-wells) wrote :

On 20 November 2011 20:37, Garth N. Wells <email address hidden> wrote:
> On 20 November 2011 20:00, Anders Logg <email address hidden> wrote:
>> On Sun, Nov 20, 2011 at 07:06:31PM -0000, Garth Wells wrote:
>>> On 20 November 2011 18:03, Anders Logg <email address hidden> wrote:
>>> > Can you provide an example?
>>>
>>> No, which is the problem. I get it on a particular machine, and it
>>> seems to be related to when I link libraries, like PETSc and Trilinos,
>>> to shared libs which are not in default locations. C++ is fine.
>>>
>>> Do we have a test for JIT that does not involve FFC? I suspect that
>>> our rubbish dolfin.pc is part of the problem.
>>
>> You can either JIT-compile an Expression which does not involve FFC,
>> or you can set the form compiler to SFC and see if that changes
>> anything.
>>
>
> Expressions seem to be fine. It's going to require some deep digging.
>

After a of digging, it's numpy that fails silently. It was failing
when FIAT tried to solve a linear system. I can reproduce the failure
with

    from dolfin import *
    import numpy
    A = numpy.array([[3,1], [1,2]])
    b = numpy.array([9,8])
    x = numpy.linalg.solve(A, b)
    print x

but it works if I switch the import order,

    import numpy
    from dolfin import *
    A = numpy.array([[3,1], [1,2]])
    b = numpy.array([9,8])
    x = numpy.linalg.solve(A, b)
    print x

I suspect that it's related to numpy and DOLFIN linking to different
BLAS libraries.

Garth

> Garth
>
>> --
>> Anders
>>
>> --
>> You received this bug notification because you are a member of DOLFIN
>> Core Team, which is subscribed to DOLFIN.
>> https://bugs.launchpad.net/bugs/892714
>>
>> Title:
>>  JIT can fail silently - seems to happen when there is a problem in
>>  dolfin.pc
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/dolfin/+bug/892714/+subscriptions
>>

Revision history for this message
Anders Logg (logg) wrote :

On Mon, Nov 21, 2011 at 09:19:53AM -0000, Garth Wells wrote:
> On 20 November 2011 20:37, Garth N. Wells <email address hidden> wrote:
> > On 20 November 2011 20:00, Anders Logg <email address hidden> wrote:
> >> On Sun, Nov 20, 2011 at 07:06:31PM -0000, Garth Wells wrote:
> >>> On 20 November 2011 18:03, Anders Logg <email address hidden> wrote:
> >>> > Can you provide an example?
> >>>
> >>> No, which is the problem. I get it on a particular machine, and it
> >>> seems to be related to when I link libraries, like PETSc and Trilinos,
> >>> to shared libs which are not in default locations. C++ is fine.
> >>>
> >>> Do we have a test for JIT that does not involve FFC? I suspect that
> >>> our rubbish dolfin.pc is part of the problem.
> >>
> >> You can either JIT-compile an Expression which does not involve FFC,
> >> or you can set the form compiler to SFC and see if that changes
> >> anything.
> >>
> >
> > Expressions seem to be fine. It's going to require some deep digging.
> >
>
> After a of digging, it's numpy that fails silently. It was failing
> when FIAT tried to solve a linear system. I can reproduce the failure
> with
>
> from dolfin import *
> import numpy
> A = numpy.array([[3,1], [1,2]])
> b = numpy.array([9,8])
> x = numpy.linalg.solve(A, b)
> print x
>
> but it works if I switch the import order,
>
> import numpy
> from dolfin import *
> A = numpy.array([[3,1], [1,2]])
> b = numpy.array([9,8])
> x = numpy.linalg.solve(A, b)
> print x
>
> I suspect that it's related to numpy and DOLFIN linking to different
> BLAS libraries.

Both work for me. If you were a regular user, someone would say that
you have messed up your installation and mark this bug as invalid. ;-)

--
Anders

> Garth
>
> > Garth
> >
> >>
> >>
> >> Title:
> >>  JIT can fail silently - seems to happen when there is a problem in
> >>  dolfin.pc
> >>
> >> To manage notifications about this bug go to:
> >> https://bugs.launchpad.net/dolfin/+bug/892714/+subscriptions
> >>
>

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

Both work for me as well.

Revision history for this message
Garth Wells (garth-wells) wrote :

On 21 November 2011 11:43, Martin Sandve Alnæs
<email address hidden> wrote:
> Both work for me as well.
>

It's related to using optimised BLAS rather than the slow versions
that are part of Ubuntu, which is why it's unlikely to surface unless
linking to libraries in non-default locations.

I think that resolving this will require a good understanding of how
Python modules link to underlying libraries.

Garth

> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/892714
>
> Title:
>  JIT can fail silently - seems to happen when there is a problem in
>  dolfin.pc
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/892714/+subscriptions
>

Anders Logg (logg)
Changed in dolfin:
milestone: 1.0-rc1 → trunk
Revision history for this message
Garth Wells (garth-wells) wrote :

Turns out that this is a bug in ATLAS 3.9.x when ATLAS builds LAPACK. dgels function segfaults, which leads to a silent failure.

Changed in dolfin:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.