PPL Alerting: Adding PPL Related Models#940
Conversation
87cfc19 to
9d91933
Compare
| pplQuery: String? = null, | ||
| pplQueryResults: List<Map<String, Any?>> = listOf() |
There was a problem hiding this comment.
Do we want to generalize these fields so it could be the any query for also normal query level monitors and what api for cluster metrics? Also the same for the results.
There was a problem hiding this comment.
The List<Map<String, Any?>> typing would work for both DSL and PPL query results but we'd need to be mindful that the format of the contents are not at all the same. For Query-Level Monitor runs, the List<Map<String, Any?>> would contain exactly 1 element of the DSL query results in its usual format. For PPL Monitor runs, the List<Map<String, Any?>> would be of this format:
List<Map<String, Any?>> = listOf(
mapOf(
"region" to "us-east-1",
"max_price" to 450.0,
"avg_latency" to 120.5
),
mapOf(
"region" to "us-west-2",
"max_price" to 380.0,
"avg_latency" to 95.3
),
mapOf(
"region" to "eu-west-1",
"max_price" to 520.0,
"avg_latency" to 145.8
)
)
Seeing that the query results in an Alert are only ever meant to be readable by a user that calls Get Alerts, and not accessed with an alert.field call, the generalization should be fine.
eirsep
left a comment
There was a problem hiding this comment.
There is a general mistake in how we are doing readFrom() and writeTo() - basically the writeable serialization.
old classes being deserialized or serialized will fail over the wire since they expect these new PPL query results or other PPL specific changes. you need to add some version checks before expecting these to be avaiablle and serialized or deserialized.
| val MONITOR_TYPE_PATTERN = Pattern.compile("[a-zA-Z0-9_]{5,25}") | ||
|
|
||
| // hard, nonadjustable limits for PPL Alerting | ||
| const val ALERTING_MAX_NAME_LENGTH = 30 // max length of any name for monitors, triggers, notif actions, etc |
There was a problem hiding this comment.
why are we adding this as part of PPL monitor type addition??
how will this be backward compatible?
There was a problem hiding this comment.
This length check is from a previous Security review of PPL Alerting behind V2 APIs. The name length check is only done on PPL Monitors, it's not going to be done on existing Monitor types.
| * @property customCondition A custom condition expression. Required if using CUSTOM conditions, | ||
| * required to be null otherwise. | ||
| * | ||
| * @opensearch.experimental |
There was a problem hiding this comment.
experimental is present intermittently? either remove everywhere or add where missing
| return builder | ||
| .field("triggered", triggered) | ||
| .field("action_results", actionResults as Map<String, ActionRunResult>) | ||
| .field("ppl_query_results", pplCustomQueryResults) |
There was a problem hiding this comment.
why are the serialized field name not "ppl_sql" prefixed similar to class and variable names.
let's remove sql substring everywhere if possible or make the serialized names consistent.
@lezzago thoughts?
There was a problem hiding this comment.
Fair callout that there is inconsistency between calling it a "PPL" thing or a "PPLSQL" thing. The only trouble with calling everything "PPLSQL" is that the name of the PPL Trigger type has to be ppl_trigger and not ppl_sql_trigger since that would mean users creating PPL Monitors through Dev Tools or programmatically would need to specify that name ppl_sql_trigger, even though we are seeking to hide potential SQL support for now. Our options are:
- Remove SQL substring from everything, and then live with the fact that if PPL Monitors support SQL queries in the future, they will continue to be called just PPL Monitors and PPL Triggers
- Add SQL substring to everything, and then let the construct names hint at future SQL support
I'm leaning toward option 1
There was a problem hiding this comment.
Lets remove SQL since if we add SQL support, that would be a new language and we dont need to show that SQL and PPL are tied together in one plugin.
| for (map in results) { | ||
| out.writeMap(map) | ||
| } | ||
| out.writeVInt(pplBaseQueryResults.size) |
There was a problem hiding this comment.
the readFrom and writeTo() methods don't seem to be consistent... did you add this serde roundtrip in test coverage with new fields and without new fields? if yes, plz respond to this comment with the test methods
There was a problem hiding this comment.
Only a roundtrip test with new fields, and even then one of the new fields is just null as you pointed out in another comment: WriteableTests.test inputrunresult as stream. Will improve the test coverage to not use a null PPL query results field, and to test with + without new PPL fields separately.
|
|
||
| fun randomInputRunResults(): InputRunResults { | ||
| return InputRunResults(listOf(), null) | ||
| return InputRunResults(listOf(), null, null, listOf(), 5L) |
There was a problem hiding this comment.
plz dont add nullable stuff alone. kindly add proper test coverage for inputrunresults serde
There was a problem hiding this comment.
Will change to use an actual list of maps
| constructor( | ||
| monitor: Monitor, | ||
| trigger: QueryLevelTrigger, | ||
| trigger: Trigger, |
There was a problem hiding this comment.
why are we changing this? how or why is this in scope of PPL models related changes?
There was a problem hiding this comment.
Because PPL Monitors will not have their own dedicated PPLMonitorRunner. They will run on QueryLevelMonitorRunner, much like AD and Cluster Metrics Monitors do. The thing is PPL Triggers need to be their own type because they are fundamentally different from QueryLevelTriggers. As such, the type of trigger had to be generalized to Trigger interface so that it can handle either QueryLevelTrigger or PPLTrigger.
This constructor is called by composeQueryLevelAlert in Alerting. There are validations in Alerting that the trigger passed into this constructor is only ever type QueryLevelTrigger or PPLTrigger.
| override var error: Exception?, | ||
| open var actionResults: MutableMap<String, ActionRunResult> = mutableMapOf() | ||
| open var actionResults: MutableMap<String, ActionRunResult> = mutableMapOf(), | ||
| open var pplCustomQueryResults: List<Map<String, Any?>> = listOf() |
There was a problem hiding this comment.
I actually believe keeping it PPL is the better approach for this field specifically. Query-Level Monitors run their query once, then apply painless trigger execution on those results. PPL Monitors with a custom trigger conditions append the condition to the base PPL query. If a PPL Monitor has multiple custom triggers, each Trigger will be running a different query.
In other words, Query-Level Monitors run the query per Monitor execution. PPL Monitors run the query per Trigger execution. Having query results specifically in the Trigger Run Result is unique to PPL Monitors.
There was a problem hiding this comment.
Query-Level Monitors run their query once, then apply painless trigger execution on those results. PPL Monitors with a custom trigger conditions append the condition to the base PPL query. If a PPL Monitor has multiple custom triggers, each Trigger will be running a different query.
Should we change this so its similar to query level monitors? First run the base query and then run the trigger condition query ontop of the PPL query? This is to help search costs. This may be more complex or if we can have this is a temp tables for the results and then run the different trigger queries against it, that would be best. Though this would depend on the feasibility with PPL features.
There was a problem hiding this comment.
This option has been explored and deemed not possible. PPL must support the makeresults key word to make PPL Monitors run in the 2-step process you describe above (1. run base query 2. apply trigger condition to results). Even if this key word was supported, custom condition triggers each run their own query to evaluate trigger condition (base query + custom condition).
That said, optimizations were already done post load testing to run just the base query once, so that number of results triggers need not rerun the base query redundantly. This optimization cannot be done for custom triggers though.
There was a problem hiding this comment.
Lets leave this as an open github issue with PPL plugin for the feature request for performance optimizations. Make sure that pplCustomQueryResults is not customer facing so we could change this variable name down the road.
There was a problem hiding this comment.
The PPL side feature that would eliminate the need for this extra field already exists: opensearch-project/sql#3629
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
… fields in Alert, add extra serde tests Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
PR Reviewer Guide 🔍(Review updated until commit 092a85d)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 092a85d Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 04c0466
Suggestions up to commit d44d5ce
Suggestions up to commit bbebb2a
Suggestions up to commit 6917880
|
|
|
||
| // this is a new check for PPL Alerting specifically, other Monitor types allow | ||
| // themselves to be created without any Triggers | ||
| require(this.triggers.isNotEmpty()) { "PPL Monitor must include at least 1 trigger." } |
There was a problem hiding this comment.
shouldn't there be another check that PPL can have exactly 1 trigger?
There was a problem hiding this comment.
PPL Monitors can have multiple Triggers. They are, however, only allowed to have 1 Input.
| val error: Exception? = null, | ||
| val aggTriggersAfterKey: MutableMap<String, TriggerAfterKey>? = null | ||
| val aggTriggersAfterKey: MutableMap<String, TriggerAfterKey>? = null, | ||
| val pplBaseQueryResults: List<Map<String, Any?>> = listOf(), |
There was a problem hiding this comment.
The base PPL query is what the PPL Monitor is configured with. Custom condition triggers append to this base query to run their own queries.
There was a problem hiding this comment.
should we add comments to make this clear?
There was a problem hiding this comment.
Will add clarifying comment
| @JvmStatic | ||
| @Throws(IOException::class) | ||
| fun parseInner(xcp: XContentParser): PPLInput { | ||
| lateinit var query: String |
There was a problem hiding this comment.
No specific reason for lateinit, can change to initialize with null
|
|
||
| fun randomInputRunResultsWithPPLFields(): InputRunResults { | ||
| return InputRunResults( | ||
| listOf(), |
There was a problem hiding this comment.
what is this field? plz don't pass empty. this won't test serde roundtrip extensively
There was a problem hiding this comment.
It's the results field for the existing Monitor types. The idea is that DSL Monitors would only ever use the results field, and PPL Monitors will only ever use the pplBaseQueryResults field. For testing purposes, though, there's no harm in actually populating this list.
There was a problem hiding this comment.
Upcoming change will only populate this results list, but not touch existing randomInputRunResults() helpers that return a null aggTriggersAfterKey and exception. aggTriggersAfterKey is intentionally left out of serde (reference). Including an exception would cause unit tests outside the scope of this PR to fail because Exceptions' .equals() checks reference equality, not value equality. Fixing this would require a more general effort across the unit tests of all existing Monitor types.
|
|
||
| // regular expression for validating that a string contains | ||
| // only valid chars (letters, numbers, -, _) | ||
| private val validCharsRegex = """^[a-zA-Z0-9_-]+$""".toRegex() |
There was a problem hiding this comment.
is this really needed? Its for the destination id which would be the notification channel id. We dont create or generate an id from this plugin. Does it make sense for this validation?
There was a problem hiding this comment.
This change was requested during a security review of PPL Alerting. The reasoning is that Channel IDs are user input (e.g. in CreateMonitor API) that must be sanitized.
There was a problem hiding this comment.
Ack, have we confirmed with notification plugin that this is how he notification channel id is formatted?
There was a problem hiding this comment.
After offline discussion, will be removing this ID check. Channel creation happens in Notifications plugin, and there are no rules/constraints there on the contents of Notification channel IDs. As such, Alerting can't block Monitor creation based on its own constraints for what a Notifications channel ID should follow.
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
|
Persistent review updated to latest commit bbebb2a |
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 092a85d.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit d44d5ce |
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
|
Persistent review updated to latest commit 04c0466 |
Signed-off-by: Dennis Toepker <toepkerd@amazon.com>
|
Persistent review updated to latest commit 092a85d |
Description
Changes to support PPL Alerting behind existing (V1) Alerting APIs, and to support PPL Alerting using stateful Alerts instead of stateless Alerts.
PPL Monitors will be run by the QueryLevelMonitorRunner (much like AD and Cluster Metrics Monitors), so these changes prepare for that as well.
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.