Skip to content

lwimager major loop predicts incorrect model visibilities when using large model cubes #9

@o-smirnov

Description

@o-smirnov

This bug has been driving me mental for a month, and caused me to tear the last of my hair out, but now I've localized it. Posting a report here for posterity. The bug comes from the CASA synthesis libs. I don't know if it's been fixed in CASA (and thus in the new https://github.com/radio-astro/casasynthesis package), we need to follow this up. (If it's not fixed in CASA then CHILES is fucked basically, since this affects the major loop of CLEAN).

The bug manifested itself as follows. When using lwimager to predict visibilities into MODEL_DATA from a large image cube, for example:

lwimager fillmodel=1 operation=csclean wprojplanes=0 field=0 spwid=0 stokes=I padding=1 cachesize=4096 niter=0 nchan=116 chanstart=16 chanstep=1 npix=6144 cellsize=0.06arcsec img_chanstart=16 mode=channel ms=CYG-C-1scan-testpaste.MS model=model3 -0-14.img fixed=1

I was getting different results on different machines (MODEL_DATA visibilities with a 10-20% difference!) Funnily enough, visibilities in the higher channels were always identical, it was only the lower-numbered channels that diverged.

I haven't pinpointed the bug but through experiments I've localized its behaviour. Somewhere in https://github.com/casacore/casarest/blob/master/synthesis/MeasurementEquations/CubeSkyEquation.cc is some funky code for deciding if an image cube is "too big" w.r.t. available RAM. If it is "too big", the predict() loop iterates over chunks of channels, picking a chunk size based on RAM size. If it's not too big, predict() processes the whole cube at once.

It looks like only the last chunk is predicted properly. Preceding chunks manifest the abovementioned error. The magnitude of the error depends on chunk size. So on a big-RAM machine where the whole cube is processed in one go, the visibilities are correct. On smaller-RAM machines, as soon as chunking kicks in, the lower channels get predicted incorrectly. Exact numbers depend on RAM size :))) so I was getting crazy inconsistencies between different machines.

Since it doesn't actually report on chunking explicitly, you need to look at the following console output:

2015-10-02 17:40:28     INFO    CSCleanImageSkyModel::solve     niter=0, assuming predict-only mode, converting model image to MODEL_DATA visibilities
0%....10....20....30....40....50....60....70....80....90....100%

It prints a 0 to 100% meter for each chunk. If you only see one 0...100% meter you're good, it did the whole cube in one go. If you see two or more meters, the bug has kicked in.

Obvious workaround for this use case is to split the cube in frequency to ensure single-chunk processing, and invoke lwimager multiple times. Will post more on this elsewhere.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions