Skip to content

Conversation

Miqueasher
Copy link
Contributor

@Miqueasher Miqueasher commented Oct 1, 2025

This pr is adds mustache files and a new CWA for custom metrics e2e testing, along with an update to variables (Terraform) to recognize those changes.

Changes made:
Added CWA with otlp specification for custom metrics.
Updated python sample app to collect span/Otel custom metrics for example:
( span = trace.get_current_span()
span.set_attribute("operation.type", "downstream_service")
request_counter.add(1, {"operation.type": "downstream_service"})

Updated variables file to include "custom_metrics_enabled" and "custom_metrics_config"
Added 5 mustache files to validate against new metrics collected from sample app.
References:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AppSignals-CustomMetrics.html#AppSignals-CustomMetrics-OpenTelemetry
https://opentelemetry.io/docs/specs/otel/trace/sdk/
https://opentelemetry.io/docs/specs/otel/glossary/#instrumented-library
https://opentelemetry.io/docs/specs/otel/trace/api/#tracerprovider

Should this be merged and fail a git revert to SHA b58dd61 will be enacted to rollback all changes made.

Ensure you've run the following tests on your changes and include the link below:

To do so, create a test.yml file with name: Test and workflow description to test your changes, then remove the file for your PR. Link your test run in your PR description. This process is a short term solution while we work on creating a staging environment for testing.

NOTE: TESTS RUNNING ON A SINGLE EKS CLUSTER CANNOT BE RUN IN PARALLEL. See the needs keyword to run tests in succession.

  • Run Java EKS on e2e-playground in us-east-1 and eu-central-2
  • Run Python EKS on e2e-playground in us-east-1 and eu-central-2
  • Run metric limiter on EKS cluster e2e-playground in us-east-1 and eu-central-2
  • Run EC2 tests in all regions
  • Run K8s on a separate K8s cluster (check IAD test account for master node endpoints; these will change as we create and destroy clusters for OS patching)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Comment on lines 19 to 23
# Initialize OTEL metrics for span metrics
meter = metrics.get_meter(__name__)
request_counter = meter.create_counter("custom_requests_total", description="Total requests")
response_time_histogram = meter.create_histogram("custom_response_time", description="Response time")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should create two meters - one using metrics.get_meter(__name__) (Agent-based export) like you've done here, and another using Custom export pipeline from public docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd meter has been created.

# Initialize OTEL metrics for span metrics
meter = metrics.get_meter(__name__)
request_counter = meter.create_counter("custom_requests_total", description="Total requests")
response_time_histogram = meter.create_histogram("custom_response_time", description="Response time")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response_time_histogram unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

histogram code commented out for easy implementation after initial test validation.

Comment on lines 62 to 66
# Setup Span Attributes And Initialize Counter To Recieve Custom Metrics
span = trace.get_current_span()
span.set_attribute("operation.type", "aws_sdk_call")
request_counter.add(1, {"operation.type": "aws_sdk_call"})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't think we need custom metrics on every API. Probably one is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from all API's except sdk call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we setting OTEL_RESOURCE_ATTRIBUTES="service.name=$YOUR_SVC_NAME,deployment.environment.name=$YOUR_ENV_NAME" anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is being set in 'main.tf'

variable "custom_metrics_config" {
description = "JSON configuration for custom metrics"
type = string
default = "{}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazon-cloudwatch-custom-agent.json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It replaces the empty curly brackets as the default now.

Comment on lines 10 to 11
name: Service
value: {{serviceName}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also see deployment.environment.name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you will see anything in remote service metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants