Skip to content

CSHARP-3222: Add LINQ support for median and percentile accumulators/window functions #1743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

adelinowona
Copy link
Contributor

This PR introduces the capability to calculate the median and percentile of numeric values in the MongoDB aggregation pipeline for $group and $setWindowFields stages.

@adelinowona adelinowona requested review from rstam and sanych-sun July 29, 2025 11:03
@adelinowona adelinowona requested a review from a team as a code owner July 29, 2025 11:03
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial quick review. Will review more thoroughly after next commit.

var method = expression.Method;
var arguments = expression.Arguments;

if (IsMedianMethod(method))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we do if (method.IsOneOf(__medianMethods)).

Do we want to do it differently here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that pattern but I just felt that would be too much boilerplate kind of code and there's an easier way to do it. Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some boilerplate in setting up the __medianMethods static field.

There isn't any boilerplate in the if statement. It's roughly the same.

One advantage of using __medianMethods is that it is VERY precise. There is no danger of false hits like there is with the IsMedianMethod approach.

Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.

Yes. That's older code that predates the newer practice of being more precise.

Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested

var method = expression.Method;
var arguments = expression.Arguments;

if (IsMedianMethod(method))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some boilerplate in setting up the __medianMethods static field.

There isn't any boilerplate in the if statement. It's roughly the same.

One advantage of using __medianMethods is that it is VERY precise. There is no danger of false hits like there is with the IsMedianMethod approach.

Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.

Yes. That's older code that predates the newer practice of being more precise.

@adelinowona adelinowona requested review from rstam and sanych-sun July 29, 2025 23:38

namespace MongoDB.Driver.Linq.Linq3Implementation.Ast.Expressions
{
internal sealed class AstComplexAccumulatorExpression : AstAccumulatorExpression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever but I don't like the idea of a Dictionary<string, AstExpression>. A class that can represent anything in this way is not easy to work with.

I would prefer you just create two new classes AstMedianAccumulatorExpression and AstPercentileAccumulatorExpression.

That would mirror AstMedianExpression and AstPercentileExpression.

It would also avoid the messiness in the VisitComplexAccumulatorExpression method. Instead we could just have two simple methods VisitMedianAccumulatorExpression and VisitPercentileAccumulatorExpression.

The whole point of the Ast classes is to have type-safe representations of MQL, and the use of a Dictionary<string, AstExpression> throws type-safety out the window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I'll revert this then

/// </summary>
/// <param name="source">The sequence of values.</param>
/// <returns>The median value.</returns>
public static double Median(this IEnumerable<decimal> source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned that the return type is double instead of decimal.

Same below.

/// </summary>
/// <param name="source">The sequence of values.</param>
/// <returns>The median value.</returns>
public static double Median(this IEnumerable<float> source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumerable.Average(IEnumerable<float>) returns float.

Should we also?

/// <param name="source">A sequence of values to calculate the percentiles of.</param>
/// <param name="percentiles">The percentiles to compute (each between 0.0 and 1.0).</param>
/// <returns>The percentiles of the sequence of values.</returns>
public static double[] Percentile(this IEnumerable<decimal> source, IEnumerable<double> percentiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return decimal[]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the percentiles be decimal also to match?

/// <param name="source">A sequence of values to calculate the percentiles of.</param>
/// <param name="percentiles">The percentiles to compute (each between 0.0 and 1.0).</param>
/// <returns>The percentiles of the sequence of values.</returns>
public static double[] Percentile(this IEnumerable<float> source, IEnumerable<double> percentiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return float[]?

SHould the percentiles web IEnumerable<flost> to match?

/// <param name="selector">The selector that selects a value from the input document.</param>
/// <param name="window">The window boundaries.</param>
/// <returns>The median of the selected values.</returns>
public static double Median<TInput>(this ISetWindowFieldsPartition<TInput> partition, Func<TInput, decimal> selector, SetWindowFieldsWindow window = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return decimal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants