You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -11,7 +11,7 @@ In addition, Amazon EKS now provides network monitoring visualizations in the {a
11
11
12
12
These capabilities are enabled by https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor.html[Amazon CloudWatch Network Flow Monitor].
13
13
14
-
== Use Cases
14
+
== Use cases
15
15
16
16
=== Measure network performance to detect anomalies
17
17
Several teams standardize on an observability stack that allows them to measure their system’s performance, visualize system metrics and be alarmed in the event that a specific threshold is breached. Container network observability in EKS aligns with this by exposing key system metrics that you can scrape to broaden observability of your system’s network performance at the pod and worker node level.
@@ -24,23 +24,137 @@ A lot of teams run EKS as the foundation for their platforms, making it the foca
24
24
25
25
== Features
26
26
27
-
. Performance metrics - This feature allows you to scrape network-related system metrics for pods and worker nodes directly from the Network Flow Monitor Agent running in your EKS cluster.
28
-
. Service map - This feature dynamically visualizes intercommunication between workloads in the cluster, allowing you to quickly disclose key metrics (RT, RTO, and DT) associated with network flows between communicating pods.
27
+
. Performance metrics - This feature allows you to scrape network-related system metrics for pods and worker nodes directly from the Network Flow Monitor (NFM) Agent running in your EKS cluster.
28
+
. Service map - This feature dynamically visualizes intercommunication between workloads in the cluster, allowing you to quickly disclose key metrics (retransmissions - RT, retransmission timeouts - RTO, and data transferred - DT) associated with network flows between communicating pods.
29
29
. Flow table - With this table, you can monitor the top talkers across the Kubernetes workloads in your cluster from three different angles: {aws} service view, cluster view, and external view. For each view, you can see the retransmissions, retransmission timeouts, and data transferred between the source pod and its destination.
30
30
* {aws} service view: Shows top talkers to {aws} services (DynamoDB and S3)
31
31
* Cluster view: Shows top talkers within the cluster (east ← to → west)
32
32
* External view: Shows top talkers to cluster-external destinations outside {aws}
33
33
34
-
== Get Started
35
-
To get started, enable Container Network Observability in the EKS console for a new or existing cluster. This will automate the creation of Network Flow Monitor dependencies (https://docs.aws.amazon.com/networkflowmonitor/2.0/APIReference/API_CreateScope.html[Scope] and https://docs.aws.amazon.com/networkflowmonitor/2.0/APIReference/API_CreateMonitor.html[Monitor] resources). In addition, you will have to install the https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-agents-kubernetes-eks.html[Network Flow Monitor Agent add-on]. Alternatively, you can install these dependencies using the `{aws} CLI`, https://docs.aws.amazon.com/eks/latest/APIReference/API_Operations_Amazon_Elastic_Kubernetes_Service.html[EKS APIs] (for the add-on), https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-API-operations.html[NFM APIs] or Infrastructure as Code (like https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/networkflowmonitor_monitor[Terraform]). Once these dependencies are in place, you can configure your preferred monitoring tool to scrape network performance metrics for pods and worker nodes from the NFM agent. To visualize the network activity and performance of your workloads, you can navigate to the EKS console under the “Network” tab of the cluster’s observability dashboard.
34
+
== Get started
35
+
To get started, enable Container Network Observability in the EKS console for a new or existing cluster. This will automate the creation of Network Flow Monitor (NFM) dependencies (https://docs.aws.amazon.com/networkflowmonitor/2.0/APIReference/API_CreateScope.html[Scope] and https://docs.aws.amazon.com/networkflowmonitor/2.0/APIReference/API_CreateMonitor.html[Monitor] resources). In addition, you will have to install the https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-agents-kubernetes-eks.html[Network Flow Monitor Agent add-on]. Alternatively, you can install these dependencies using the `{aws} CLI`, https://docs.aws.amazon.com/eks/latest/APIReference/API_Operations_Amazon_Elastic_Kubernetes_Service.html[EKS APIs] (for the add-on), https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-API-operations.html[NFM APIs] or Infrastructure as Code (like https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/networkflowmonitor_monitor[Terraform]). Once these dependencies are in place, you can configure your preferred monitoring tool to scrape network performance metrics for pods and worker nodes from the NFM agent. To visualize the network activity and performance of your workloads, you can navigate to the EKS console under the “Network” tab of the cluster’s observability dashboard.
36
36
37
37
When using Network Flow Monitor in EKS, you can maintain your existing observability workflow and technology stack while leveraging a set of additional features which further enable you to understand and optimize the network layer of your EKS environment. You can learn more about the https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor.pricing.html[Network Flow Monitor pricing here].
38
38
39
+
== Prerequisites and important notes
40
+
41
+
. As mentioned above, if you enable Container Network Observability from the EKS console, the underlying NFM resource dependencies (Scope and Monitor) will be automatically created on your behalf, and you will be guided through the installation process of the EKS add-on for NFM.
42
+
. If you want to enable this feature using Infrastructure as Code (IaC) like Terraform, you will have to define the following dependencies in your IaC: NFM Scope, NFM Monitor, EKS add-on for NFM. In addition, you'll have to grant the https://docs.aws.amazon.com/aws-managed-policy/latest/reference/CloudWatchNetworkFlowMonitorAgentPublishPolicy.html[relevant permissions] to the EKS add-on using https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html[Pod Identity].
43
+
. You must be running a minimum version of 1.1.0 for the NFM agent's EKS add-on.
44
+
45
+
=== Required IAM permissions
46
+
47
+
==== EKS add-on for NFM agent
48
+
You can use the `CloudWatchNetworkFlowMonitorAgentPublishPolicy` https://docs.aws.amazon.com/aws-managed-policy/latest/reference/CloudWatchNetworkFlowMonitorAgentPublishPolicy.html[{aws} managed policy] with Pod Identity. This policy contains permissions for the NFM agent to send telemetry reports (metrics) to a Network Flow Monitor endpoint.
49
+
[source,json,subs="verbatim,attributes"]
50
+
----
51
+
{
52
+
"Version" : "2012-10-17",
53
+
"Statement" : [
54
+
{
55
+
"Effect" : "Allow",
56
+
"Action" : [
57
+
"networkflowmonitor:Publish"
58
+
],
59
+
"Resource" : "*"
60
+
}
61
+
]
62
+
}
63
+
----
64
+
65
+
==== Container Network Observability in the EKS console
66
+
The following permissions are required to enable the feature and visualize the service map and flow table in the console.
If you are using Terraform to manage your {aws} cloud infrastructure, you can include the following resource configurations to enable Container Network Observability for your cluster.
identifier = "us-east-1" # this must be the same region that the cluster is in
136
+
}
137
+
138
+
tags = {
139
+
Name = "example"
140
+
}
141
+
}
142
+
```
143
+
144
+
===== EKS add-on for NFM
145
+
146
+
```
147
+
resource "aws_eks_addon" "example" {
148
+
cluster_name = aws_eks_cluster.example.name
149
+
addon_name = "aws-network-flow-monitoring-agent"
150
+
}
151
+
```
152
+
39
153
== How does it work?
40
154
41
-
=== Performance Metrics
155
+
=== Performance metrics
42
156
43
-
==== System Metrics
157
+
==== System metrics
44
158
If you are running third party (3P) tooling to monitor your EKS environment (such as Prometheus and Grafana), you can scrape the supported system metrics directly from the Network Flow Monitor agent. These metrics can be sent to your monitoring stack to expand measurement of your system’s network performance at the pod and worker node level. The available metrics are listed in the table, under Supported system metrics.
45
159
46
160
image::images/nfm-eks-metrics-workflow.png[Illustration of scraping system metrics]
@@ -62,7 +176,7 @@ OPEN_METRICS_PORT:
62
176
Range: [0..65535]
63
177
----
64
178
65
-
==== Flow Level Metrics
179
+
==== Flow level metrics
66
180
In addition, Network Flow Monitor captures network flow data along with flow level metrics: retransmissions, retransmission timeouts, and data transferred. This data is processed by Network Flow Monitor and visualized in the EKS console to surface traffic in your cluster’s environment, and how it’s performing based on these flow level metrics.
67
181
68
182
The diagram below depicts a workflow in which both types of metrics (system and flow level) can be leveraged to gain more operational intelligence.
@@ -74,60 +188,71 @@ image::images/nfm-eks-metrics-types-workflow.png[Illustration of workflow with d
74
188
75
189
Important note: The scraping of system metrics from the NFM agent and the process of the NFM agent pushing flow-level metrics to the NFM backend are independent processes.
76
190
77
-
===== Supported System Metrics
191
+
===== Supported system metrics
78
192
79
193
Important note: system metrics are exported in https://openmetrics.io/[OpenMetrics] format.
80
194
81
-
[%header,cols="3"]
82
-
|===
195
+
[%header,cols="4"]
196
+
|====
83
197
84
198
|Metric name
85
199
|Type
200
+
|Dimensions
86
201
|Description
87
202
88
203
|ingress_flow_count
89
204
|Counter
205
+
|podName, podNamespace, nodeName
90
206
|Numbers of flows to a pod
91
207
92
208
|egress_flow_count
93
209
|Counter
210
+
|podName, podNamespace, nodeName
94
211
|Number of flows from a pod to anywhere
95
212
96
213
|ingress_pkt_count
97
214
|Counter
215
+
|podName, podNamespace, nodeName
98
216
|Number of TCP packets received by a pod
99
217
100
218
|egress_pkt_count
101
219
|Counter
220
+
|podName, podNamespace, nodeName
102
221
|Number of TCP packets sent out by a pod
103
222
104
223
|ingress_bytes_count
105
224
|Counter
225
+
|podName, podNamespace, nodeName
106
226
|Number of bytes received by a pod
107
227
108
228
|egress_bytes_count
109
229
|Counter
230
+
|podName, podNamespace, nodeName
110
231
|Number of bytes sent out by a pod
111
232
112
233
|bw_in_allowance_exceeded
113
234
|Counter
235
+
|eniID, nodeName
114
236
|Number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance
115
237
116
238
|bw_out_allowance_exceeded
117
239
|Counter
240
+
|eniID, nodeName
118
241
|Number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance
119
242
120
243
|pps_allowance_exceeded
121
244
|Counter
245
+
|eniID, nodeName
122
246
|Packets per second limit breached at a pod
123
247
124
248
|conntrack_allowance_exceeded
125
249
|Counter
250
+
|eniID, nodeName
126
251
|Connection Track limit breached. An event will be generated if 90 to 95% conntrack table limit is reached and logged on the node.
127
252
128
-
|===
253
+
|====
129
254
130
-
===== Supported System Metrics
255
+
===== Supported flow level metrics
131
256
132
257
[%header,cols="3"]
133
258
|===
@@ -150,7 +275,7 @@ Important note: system metrics are exported in https://openmetrics.io/[OpenMetri
150
275
151
276
|===
152
277
153
-
=== Service Map and Flow Table
278
+
=== Service map and flow table
154
279
155
280
image::images/nfm-eks-workflow.png[Illustration of how NFM works with EKS]
156
281
@@ -167,7 +292,7 @@ image::images/nfm-eks-workflow.png[Illustration of how NFM works with EKS]
167
292
168
293
The network flows pulled from the Top Contributors API are scoped to a 1 hour time range, and can include up to 500 flows from each category. For the service map, this means up to 1000 flows can be sourced and presented from the Intra AZ and Inter AZ flow categories over a 1 hour time range. For the flow table, this means that up to 3000 network flows can be sourced and presented from all 6 network flow categories over a 2 hour time range.
169
294
170
-
===== Example: Service Map
295
+
===== Example: Service map
171
296
172
297
_Deployment view_
173
298
@@ -185,7 +310,7 @@ _Pod view_
185
310
186
311
image::images/photo-gallery-pod.png[Illustration of service map with photo-gallery app in pod view]
187
312
188
-
===== Example: Flow Table
313
+
===== Example: Flow table
189
314
190
315
_{aws} service view_
191
316
@@ -195,8 +320,9 @@ _Cluster view_
195
320
196
321
image::images/cluster-view.png[Illustration of flow table in cluster view]
197
322
198
-
== Considerations and Limitations
323
+
== Considerations and limitations
199
324
* Container Network Observability in EKS is only available in regions where https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-Regions.html[Network Flow Monitor is supported].
200
325
* Supported system metrics are in OpenMetrics format, and can be directly scraped from the Network Flow Monitor (NFM) agent.
201
326
* To enable Container Network Observability in EKS using Infrastructure as Code (IaC) like https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/networkflowmonitor_monitor[Terraform], you need to have these dependencies defined and created in your configurations: NFM scope, NFM monitor and the NFM agent.
202
-
* Network Flow Monitor supports up to approximately 5 million flows per minute. This is approximately 5,000 EC2 instances (EKS worker nodes) with the Network Flow Monitor agent installed. Installing agents on more than 5000 instances may affect monitoring performance until additional capacity is available.
327
+
* Network Flow Monitor supports up to approximately 5 million flows per minute. This is approximately 5,000 EC2 instances (EKS worker nodes) with the Network Flow Monitor agent installed. Installing agents on more than 5000 instances may affect monitoring performance until additional capacity is available.
328
+
* You must be running a minimum version of 1.1.0 for the NFM agent's EKS add-on.
Copy file name to clipboardExpand all lines: latest/ug/workloads/workloads-add-ons-available-eks.adoc
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -701,7 +701,7 @@ The {aws} provider for the Secrets Store CSI Driver is an add-on that enables re
701
701
702
702
The add-on does not require IAM permissions. However, application pods will require IAM permissions to fetch secrets from {aws} Secrets Manager and parameters from {aws} Systems Manager Parameter Store. After installing the add-on, access must be configured via IAM Roles for Service Accounts (IRSA) or EKS Pod Identity. To use IRSA, please refer to the Secrets Manager https://docs.aws.amazon.com/secretsmanager/latest/userguide/integrating_ascp_irsa.html[IRSA setup documentation]. To use EKS Pod Identity, please refer to the Secrets Manager https://docs.aws.amazon.com/secretsmanager/latest/userguide/ascp-pod-identity-integration.html[Pod Identity setup documentation].
703
703
704
-
{aws} suggests the `AWSSecretsManagerClientReadOnlyAccess` managed policy.
704
+
{aws} suggests the `AWSSecretsManagerClientReadOnlyAccess` https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_available-policies.html#security-iam-awsmanpol-AWSSecretsManagerClientReadOnlyAccess[managed policy].
705
705
706
706
For more information about the required permissions, see `AWSSecretsManagerClientReadOnlyAccess` in the {aws} Managed Policy Reference.
0 commit comments