-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CSHARP-3222: Add LINQ support for median and percentile accumulators/window functions #1743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial quick review. Will review more thoroughly after next commit.
var method = expression.Method; | ||
var arguments = expression.Arguments; | ||
|
||
if (IsMedianMethod(method)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we do if (method.IsOneOf(__medianMethods))
.
Do we want to do it differently here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that pattern but I just felt that would be too much boilerplate kind of code and there's an easier way to do it. Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator
follows a similar pattern already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some boilerplate in setting up the __medianMethods
static field.
There isn't any boilerplate in the if
statement. It's roughly the same.
One advantage of using __medianMethods
is that it is VERY precise. There is no danger of false hits like there is with the IsMedianMethod
approach.
Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.
Yes. That's older code that predates the newer practice of being more precise.
...tionExpressionTranslators/MethodTranslators/MedianMethodToAggregationExpressionTranslator.cs
Outdated
Show resolved
Hide resolved
...tionExpressionTranslators/MethodTranslators/MedianMethodToAggregationExpressionTranslator.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Expressions/AstComplexAccumulatorExpression.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Visitors/AstNodeVisitor.cs
Outdated
Show resolved
Hide resolved
...xpressionTranslators/MethodTranslators/MedianMethodToAggregationExpressionTranslatorTests.cs
Outdated
Show resolved
Hide resolved
...xpressionTranslators/MethodTranslators/WindowMethodToAggregationExpressionTranslatorTests.cs
Show resolved
Hide resolved
...er.Tests/Linq/Linq3ImplementationWithLinq2Tests/Translators/AggregateGroupTranslatorTests.cs
Outdated
Show resolved
Hide resolved
var method = expression.Method; | ||
var arguments = expression.Arguments; | ||
|
||
if (IsMedianMethod(method)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some boilerplate in setting up the __medianMethods
static field.
There isn't any boilerplate in the if
statement. It's roughly the same.
One advantage of using __medianMethods
is that it is VERY precise. There is no danger of false hits like there is with the IsMedianMethod
approach.
Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.
Yes. That's older code that predates the newer practice of being more precise.
...xpressionTranslators/MethodTranslators/MedianMethodToAggregationExpressionTranslatorTests.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Visitors/AstNodeVisitor.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Visitors/AstNodeVisitor.cs
Outdated
Show resolved
Hide resolved
|
||
namespace MongoDB.Driver.Linq.Linq3Implementation.Ast.Expressions | ||
{ | ||
internal sealed class AstComplexAccumulatorExpression : AstAccumulatorExpression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clever but I don't like the idea of a Dictionary<string, AstExpression>
. A class that can represent anything in this way is not easy to work with.
I would prefer you just create two new classes AstMedianAccumulatorExpression
and AstPercentileAccumulatorExpression
.
That would mirror AstMedianExpression
and AstPercentileExpression
.
It would also avoid the messiness in the VisitComplexAccumulatorExpression
method. Instead we could just have two simple methods VisitMedianAccumulatorExpression
and VisitPercentileAccumulatorExpression
.
The whole point of the Ast
classes is to have type-safe representations of MQL, and the use of a Dictionary<string, AstExpression>
throws type-safety out the window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I'll revert this then
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Optimizers/AstGroupingPipelineOptimizer.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Optimizers/AstGroupingPipelineOptimizer.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Optimizers/AstGroupingPipelineOptimizer.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Optimizers/AstGroupingPipelineOptimizer.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Driver/Linq/Linq3Implementation/Ast/Optimizers/AstGroupingPipelineOptimizer.cs
Show resolved
Hide resolved
...tionExpressionTranslators/MethodTranslators/MedianMethodToAggregationExpressionTranslator.cs
Outdated
Show resolved
Hide resolved
...ExpressionTranslators/MethodTranslators/PercentileMethodToAggregationExpressionTranslator.cs
Outdated
Show resolved
Hide resolved
...tionExpressionTranslators/MethodTranslators/WindowMethodToAggregationExpressionTranslator.cs
Outdated
Show resolved
Hide resolved
/// </summary> | ||
/// <param name="source">The sequence of values.</param> | ||
/// <returns>The median value.</returns> | ||
public static double Median(this IEnumerable<decimal> source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned that the return type is double
instead of decimal
.
Same below.
/// </summary> | ||
/// <param name="source">The sequence of values.</param> | ||
/// <returns>The median value.</returns> | ||
public static double Median(this IEnumerable<float> source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enumerable.Average(IEnumerable<float>)
returns float
.
Should we also?
/// <param name="source">A sequence of values to calculate the percentiles of.</param> | ||
/// <param name="percentiles">The percentiles to compute (each between 0.0 and 1.0).</param> | ||
/// <returns>The percentiles of the sequence of values.</returns> | ||
public static double[] Percentile(this IEnumerable<decimal> source, IEnumerable<double> percentiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return decimal[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the percentiles
be decimal
also to match?
/// <param name="source">A sequence of values to calculate the percentiles of.</param> | ||
/// <param name="percentiles">The percentiles to compute (each between 0.0 and 1.0).</param> | ||
/// <returns>The percentiles of the sequence of values.</returns> | ||
public static double[] Percentile(this IEnumerable<float> source, IEnumerable<double> percentiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return float[]
?
SHould the percentiles web IEnumerable<flost>
to match?
/// <param name="selector">The selector that selects a value from the input document.</param> | ||
/// <param name="window">The window boundaries.</param> | ||
/// <returns>The median of the selected values.</returns> | ||
public static double Median<TInput>(this ISetWindowFieldsPartition<TInput> partition, Func<TInput, decimal> selector, SetWindowFieldsWindow window = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return decimal
?
This PR introduces the capability to calculate the median and percentile of numeric values in the MongoDB aggregation pipeline for $group and $setWindowFields stages.