Skip to content

[Feat]: Add single-slice indexing to AstroData objects/extensions #83

@teald

Description

@teald

Contact Details

No response

Describe the feature in words

(From a discussion between @chris-simpson and myself)

Summary

It's useful to examine slices of extension data (a single-slice AstroData object). However, doing so involves accessing the internal AstroData<Instrument>.nddata object explicitly and abandoning the AstroData<instrument>'s descriptors by returning an AstroNDData object. For example, if ext is an AstroData<Instrument> object, to keep descriptors around we have to access ext.__class__ and create a new object explicitly:

nddata_row = ext.nddata[row]
ad_row_with_descriptors = ext.__class__(nddata_row, phu=ext.phu, is_single=True)

Since descriptors are broadly useful, it could be useful to instead allow for the AstroData<instrument> object to be directly indexed:

ad_row_with_descriptors = ext[row]

This simplifies accessing the data for single slices.

Details

There is some logic that would be obscured and eventually confusing to developers and the end user if this is not properly implemented.

Current situation.

In the current implementation of AstroData, there are significant differences between AstroData objects tied directly to the AstroData.is_single attribute set during initialization. This is intentional, so that AstroData instances with multiple extensions can return extensions that behave like single slices. For example:

# Create AstroDataMyInstrument instance, assume it's implemented
ad = astrodata.from_file('my_instrument_data.fits') 
print(ad.decorator())
# >>> [<value1>, <value2>, ...]
ext = ad[0]
print(ext.decorator())
# >>> <value1>

Internally, this feature requires extra work to accomplish (usually an if-statement checking .is_single), and to specifically handle the stored data. The internally stored data is not itself any different (see the AstroData.nddata property, for an example).

This is slightly confusing for objects that have a single extension, but are not qualified as is_single until they are accessed as a slice:

ad = AstroDataMyInstrument(nddata=<some NDData object>)
ext = ad[0]

The difference between ad and ext above, in that case, is only the .is_single attribute, which must be manually set and never changed. All other data holds the same format, so this is changing a variety of descriptors' return values quietly. If someone was unaware of this behavior, there's a number of mistakes that could be made, some of which could pass without any logging/intervention until something down the line fails.

Adding further indexing features

Implementing the proposed feature as-is, we now have 3 different meanings for AstroData()[<slice, int>] for the same AstroData class:

  1. Slicing an is_single = False (default) AstroData object with a slice, which will return an AstroData object with is_single = False
  2. Indexing an is_single = False AstroData object, which will return an AstroData object with is_single = True
  3. (this feature) Indexing/slicing an is_single = True AstroData object, which would return an AstroData object with a slice/row of the data and is_single = True

Since these share the same syntax and the only meaningful differences are functionally invisible to the average user, it seems overwhelming.

is_single

TL:DR; I think is_single may be more of a maintenance liability than it's worth, which was highlighted by thinking about this feature request.

Right now, is_single is an implementation detail aimed at improving the user experience by returning single values so users do not have to index the result to get what they'd like. If it were completely omitted and all descriptors returned lists, then getting single values would always look like:

ad = astrodata.from_file('my_data.fits')
my_value = ad.my_descriptor()[0]

Instead, a user must know to do:

ad = astrodata.from_file('my_data.fits')
my_value = ad.ext.my_descriptor()

To be more consistent, is_single could instead be a property that checks if _all_nddata is length 1. There's historical reasons behind why that wasn't the case before, so a change like this would require a major version bump and discussion within the team.

This would require projects using AstroData to adjust for instances where they were indexing objects with the old is_single.

On the package side, the property is very simple:

@property
def is_single(self) -> bool:
    return len(self._all_nddata) == 1

This does not directly address the feature described above, but it would prevent hiding extra logic to accomplish this feature in the initialization/getter for the class, and failure to check/maintain a manually assigned value, when we consider how to implement something that does perform slicing. It also clears up a point of confusion for new users adopting astrodata into their packages by explicitly defining the conditions for single slices, and therefore why single values are being returned.

I am not personally convinced this is the best way to solve this confusion, my gut tells me that features/expectations/assumptions like this are significant enough to consider changing the design around is_single. It may be better to do away with is_single entirely and adopt a consistent returned type to prevent any surprises, since using is_single doesn't functionally save much (and adds an extra character to type, and an extra concept to be aware of). But, that's getting outside the scope of this feature request.

code-description

# ext.is_single = True
row_in_data = ext[row]
assert isinstance(row_in_data, astrodata.AstroData)

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Labels

DRAGONSRelated to DRAGONS developmentenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions