Skip to content

Avoid duplicating arrays #515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
May 15, 2025
Merged

Conversation

majosm
Copy link
Collaborator

@majosm majosm commented Jul 12, 2024

Depends on #503 (merged), #512 (merged), #531 (merged), #549 (merged), #585 (merged), #583 (merged), and #550 (merged).

@inducer note to self: Reviewed up to (but not including) "tweak Inliner/PlaceholderSubstitutor implementations".

Draft because:

  • ⚠️ Squash "corresponding" commits before merging.

Closes #590.

@majosm majosm force-pushed the avoid-array-duplication branch from e813eb2 to 1d49631 Compare July 15, 2024 20:20
@majosm majosm mentioned this pull request Jul 25, 2024
@majosm majosm force-pushed the avoid-array-duplication branch 2 times, most recently from a5dbc5a to c55dee2 Compare August 27, 2024 21:33
@majosm majosm force-pushed the avoid-array-duplication branch from c55dee2 to f6a3014 Compare September 24, 2024 20:36
@majosm majosm force-pushed the avoid-array-duplication branch 3 times, most recently from dd6cd37 to 8ac0b8d Compare February 7, 2025 04:13
@majosm majosm force-pushed the avoid-array-duplication branch 3 times, most recently from ae9c939 to 2ca7a65 Compare February 28, 2025 02:42
@majosm majosm marked this pull request as ready for review February 28, 2025 13:33
@majosm
Copy link
Collaborator Author

majosm commented Feb 28, 2025

Ready for a look after #585, #583, and #550. I've left the duplication checks turned off by default for now, because some things in arraycontext, grudge, etc. still duplicate.

@majosm majosm requested a review from inducer February 28, 2025 13:35
@majosm majosm force-pushed the avoid-array-duplication branch 4 times, most recently from 8bb1a1e to 8789290 Compare March 18, 2025 14:34
@majosm majosm force-pushed the avoid-array-duplication branch from 8789290 to e00662b Compare March 31, 2025 20:24
@inducer inducer marked this pull request as draft March 31, 2025 21:41
Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +105 to +113
# Must disable collision/duplication checking because we're combining
# expressions that were previously in two different call stack frames
# (and were thus cached separately)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to doc: Inliner may produce DAGs with duplicates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -97,6 +169,6 @@ def rewrite_einsums_with_no_broadcasts(expr: MappedT) -> MappedT:
alter its value.
"""
mapper = EinsumWithNoBroadcastsRewriter()
return cast("MappedT", mapper(expr))
return cast("MappedT", mapper(expr, None))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return cast("MappedT", mapper(expr, ())

and save the None checking above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

expr.redn_axis_to_redn_descr,
tags=expr.tags,
axes=expr.axes,)
if (
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe tackle #590? (OK if it doesn't pan out.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab at it. Seemed to work out OK.

@majosm majosm force-pushed the avoid-array-duplication branch from e00662b to d21de9d Compare April 15, 2025 04:36
@majosm majosm requested a review from inducer April 15, 2025 15:07
return all(b_i is a_i for a_i, b_i in zip(a, b, strict=True))


class IdentityMappingHelperMixin:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It struck me that it would likely be better to bake these into the node type. (I.e. Placeholder._if_modified(self, other).) We can probably generate them automatically, too. As it stands, dispatch is name-based, even though it's only "single", i.e. based on the type of the node.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does 69c4e3c look?

I'm not sure how to make it work for DictOfNamedArrays, though. See here:

# FIXME: Doesn't work because __init__ argument is "data" but member
# is "_data"
# return expr.replace_if_different(_data=new_data)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special-casing DictOfNamedArrays is OK, for the reason you mention.

I'm OK with the approach, though I think I'd prefer generated code for this, since it's bound to be a fairly hot code path. There's already some code generation in @array_dataclass, it should be straightforward to add there.

Copy link
Collaborator Author

@majosm majosm May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to implement this in a687544, but in doing so I've seemingly triggered a bunch of unrelated(?) mypy errors. I fixed some in d27da88, but some of the errors that remain don't make a lot of sense. I've also triggered some new pylint errors with my fixes. Any idea what's going on here?

Edit: I guess this is similar to the issues encountered in #598?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: I guess this is similar to the issues encountered in #598?

I don't think the errors you're seeing here are all that related. Some just outright don't make any sense, IMO. For example:

pytato/transform/__init__.py:1115:25: error: Invalid index type "str" for "Call"; expected type "Call"  [index]

I can't make heads or tails of that.

The Pylint issues are similarly weird, maybe it's time to drop Pylint, too?

I think the best proposal I've got is to switch to pyright (the "based" version, to avoid drowning in noise). I've pushed changes that do that. At least most of what pyright has to say seems to make some sense.

Copy link
Owner

@inducer inducer May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the "baseline" feature of basedpyright seems a bit imperfect, too. It seems to complain about errors that are already contained in the baseline. I don't see this locally, but it does seem to happen in CI. 🤔

@majosm majosm force-pushed the avoid-array-duplication branch 4 times, most recently from 4af1241 to d27da88 Compare May 7, 2025 20:32
majosm and others added 24 commits May 14, 2025 17:14
add note about Inliner duplication

deduplicate in inline_calls
now inherits from CachedMapper

add comments explaining result deduplication to MPMSMaterializerCache
simplify EinsumWithNoBroadcastsRewriter by using () instead of None
…lication avoidance logic

move duplication-preventing boilerplate into array types

move replace_if_different code into array_dataclass
@inducer inducer force-pushed the avoid-array-duplication branch from a36694e to 20bd9f0 Compare May 15, 2025 14:56
@inducer inducer enabled auto-merge (rebase) May 15, 2025 14:59
@inducer inducer merged commit e3b9065 into inducer:main May 15, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Factor out if-same-return-old code into functions
2 participants