generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 30
Updated the Docker File, eks-aperf.sh and adding eks-aperf-README.md Adding support async-profiler on EKS #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jrishabh248
wants to merge
1
commit into
aws:main
Choose a base branch
from
jrishabh248:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,300 @@ | ||
| # Running APerf on EKS/Kubernetes | ||
|
|
||
| This guide explains how to run APerf on Amazon EKS (Elastic Kubernetes Service) or any Kubernetes cluster to collect performance metrics without requiring SSH access to the Kubernetes nodes. | ||
|
|
||
| ## Overview | ||
|
|
||
| This repository provides an automated script (`eks-aperf.sh`) that deploys APerf profiling pods on Amazon EKS worker nodes. It creates a privileged pod on a target node, runs performance profiling, generates reports, and copies the results locally. | ||
|
|
||
| ## Features | ||
|
|
||
| - **Node-specific deployment**: Target specific EKS worker nodes for profiling | ||
| - **Automated profiling**: Runs APerf record and report generation automatically | ||
| - **Security validation**: Checks namespace security policies before deployment | ||
| - **Resource monitoring**: Shows current resource usage on target nodes | ||
| - **Automatic cleanup**: Removes pods after profiling completion | ||
| - **Flexible configuration**: Customizable CPU/memory limits and APerf options | ||
| - **Multi-architecture support**: Works with both AMD64 and ARM64 nodes | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before you begin, ensure you have the following tools installed and configured on your laptop: | ||
|
|
||
| - **Git** - To clone this repository | ||
| - **kubectl** - Installed and connected to your EKS cluster | ||
| - **Docker** - To build the APerf container image | ||
| - **AWS CLI** - With AWS credentials configured for ECR access | ||
| - **jq** - For JSON parsing (used in scripts) | ||
| - Appropriate RBAC permissions to create pods in the target namespace | ||
| - Target namespace must allow privileged pods | ||
|
|
||
| ### Worker Node Requirements | ||
|
|
||
| - **Host path** `/opt/k8s/async-profiler/async-profiler-4.2-linux-arm64` must exist | ||
| - **Async Profiler binaries** must be present in the host path directory | ||
| - Directory should contain the complete async-profiler installation | ||
| - Proper file permissions for profiler execution | ||
|
|
||
| ### Application Pod Requirements | ||
|
|
||
| For the APerf pod to successfully profile your application pods, **both the application pod and APerf pod must mount the same async profiler host path**: | ||
|
|
||
| **Application Pod Configuration:** | ||
| ```yaml | ||
| volumeMounts: | ||
| - name: profiler-vol | ||
| mountPath: /opt/async-profiler | ||
| volumes: | ||
| - name: profiler-vol | ||
| hostPath: | ||
| path: /opt/k8s/async-profiler/async-profiler-4.2-linux-arm64 | ||
| type: DirectoryOrCreate | ||
| ``` | ||
|
|
||
| **Important**: Both pods must use the **exact same host path** to ensure the profiler can access and profile the target application processes. | ||
|
|
||
| ## Setup Instructions | ||
|
|
||
| ### Step 1: Clone Repository | ||
|
|
||
| Clone this repository to your laptop: | ||
| ```bash | ||
| git clone https://github.com/jrishabh248/aperf | ||
| cd aperf | ||
| ``` | ||
|
|
||
| ### Step 2: Verify Prerequisites | ||
|
|
||
| Ensure kubectl is connected to your EKS cluster: | ||
| ```bash | ||
| kubectl get nodes | ||
| ``` | ||
|
|
||
| Verify AWS credentials are configured: | ||
| ```bash | ||
| aws sts get-caller-identity | ||
| ``` | ||
|
|
||
| ### Step 3: Create ECR Repository | ||
|
|
||
| Create an Amazon ECR repository to contain the APerf image: | ||
|
|
||
| ```bash | ||
| # Optional: Set your AWS region if needed | ||
| # export AWS_REGION="us-west-2" | ||
|
|
||
| # Optional: Set AWS profile if needed | ||
| # export AWS_PROFILE="your-profile-name" | ||
|
|
||
| # Create ECR repository | ||
| aws ecr create-repository --repository-name aperf --region $AWS_REGION | ||
|
|
||
| # Get ECR repository URL | ||
| APERF_ECRREPO=$(aws ecr describe-repositories --repository-names aperf --region $AWS_REGION | jq -r '.repositories[0].repositoryUri') | ||
| echo "ECR Repository URL: $APERF_ECRREPO" | ||
|
|
||
| # Authenticate with ECR | ||
| aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $APERF_ECRREPO | ||
| ``` | ||
|
|
||
| ### Step 4: Build and Push APerf Container Image | ||
|
|
||
| Build the multi-architecture APerf container image and push it to ECR: | ||
|
|
||
| ```bash | ||
| # Build and push multi-architecture image (supports both AMD64 and ARM64) | ||
| docker buildx build --push --platform linux/amd64,linux/arm64 -t ${APERF_ECRREPO}:latest -f ./Dockerfile . | ||
| ``` | ||
|
|
||
| **Note**: If you don't have `docker buildx` configured for multi-platform builds, you can build for your specific architecture: | ||
|
|
||
| ```bash | ||
| docker build -t ${APERF_ECRREPO}:latest -f ./Dockerfile . | ||
| docker push ${APERF_ECRREPO}:latest | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Basic Usage | ||
|
|
||
| ```bash | ||
| ./eks-aperf.sh --aperf_image=<ECR_IMAGE_URL> --node=<NODE_NAME> | ||
| ``` | ||
|
|
||
| ### Advanced Usage | ||
|
|
||
| ```bash | ||
| ./eks-aperf.sh \ | ||
| --aperf_image=900063036704.dkr.ecr.ap-south-1.amazonaws.com/aperf \ | ||
| --node=ip-192-168-81-49.ap-south-1.compute.internal \ | ||
| --namespace=profiling \ | ||
| --aperf_options="--profile --profile-java -i 1 -p 150" \ | ||
| --cpu-request=2.0 \ | ||
| --memory-request=2Gi \ | ||
| --cpu-limit=8.0 \ | ||
| --memory-limit=8Gi | ||
| ``` | ||
|
|
||
| ### Identify Target Node | ||
|
|
||
| Find a target Kubernetes node where you want to collect APerf metrics: | ||
|
|
||
| ```bash | ||
| # List all pods with their nodes to identify a target node | ||
| kubectl get pods -A -o wide | ||
| ``` | ||
|
|
||
| Example node name: `ip-10-0-120-104.us-west-2.compute.internal` | ||
|
|
||
| ## Parameters | ||
|
|
||
| | Parameter | Required | Default | Description | | ||
| |-----------|----------|---------|-------------| | ||
| | `--aperf_image` | Yes | - | ECR image URL containing APerf profiler | | ||
| | `--node` | Yes | - | Kubernetes node name to run profiling on | | ||
| | `--namespace` | No | `default` | Kubernetes namespace for pod deployment | | ||
| | `--aperf_options` | No | `""` | Additional options to pass to APerf | | ||
| | `--cpu-request` | No | `1.0` | CPU request for the profiling pod | | ||
| | `--memory-request` | No | `1Gi` | Memory request for the profiling pod | | ||
| | `--cpu-limit` | No | `4.0` | CPU limit for the profiling pod | | ||
| | `--memory-limit` | No | `4Gi` | Memory limit for the profiling pod | | ||
| | `--help` | No | - | Show help message | | ||
|
|
||
| ## What the Script Does | ||
|
|
||
| 1. **Validates inputs**: Checks required parameters and cluster access | ||
| 2. **Node validation**: Verifies target node exists and shows instance type | ||
| 3. **Security check**: Validates namespace security policies for privileged pods | ||
| 4. **Resource monitoring**: Displays current resource usage on target node | ||
| 5. **Pod deployment**: Creates and deploys APerf profiling pod on target node | ||
| 6. **Profiling execution**: Runs APerf record with specified options | ||
| 7. **Report generation**: Creates APerf report and packages it as tar.gz | ||
| 8. **File retrieval**: Copies profiling results to local directory | ||
| 9. **Cleanup**: Removes the profiling pod from the cluster | ||
|
|
||
| ## Pod Configuration | ||
|
|
||
| The script creates a privileged pod with the following characteristics: | ||
|
|
||
| - **Privileged security context**: Required for system-level profiling | ||
| - **Host PID namespace**: Access to host processes | ||
| - **Host network**: Direct network access | ||
| - **Volume mounts**: | ||
| - `/boot` (read-only): Access to kernel symbols | ||
| - `/opt/async-profiler`: APerf profiler binaries | ||
| - **Node selector**: Ensures pod runs on specified node | ||
|
|
||
| ## Example Commands | ||
|
|
||
| ### Profile for 60 seconds with profiling enabled | ||
| ```bash | ||
| bash ./eks-aperf.sh \ | ||
| --aperf_image="${APERF_ECRREPO}:latest" \ | ||
| --node="ip-10-0-120-104.us-west-2.compute.internal" \ | ||
| --aperf_options="-p 60 --profile" \ | ||
| --namespace="aperf" | ||
| ``` | ||
|
|
||
| ### Profile with custom resource limits | ||
| ```bash | ||
| bash ./eks-aperf.sh \ | ||
| --aperf_image="${APERF_ECRREPO}:latest" \ | ||
| --node="ip-10-0-120-104.us-west-2.compute.internal" \ | ||
| --cpu-request="2.0" \ | ||
| --memory-request="2Gi" \ | ||
| --cpu-limit="8.0" \ | ||
| --memory-limit="8Gi" | ||
| ``` | ||
|
|
||
| ### Profile Java applications with 1-second intervals | ||
| ```bash | ||
| ./eks-aperf.sh \ | ||
| --aperf_image=your-ecr-repo/aperf:latest \ | ||
| --node=ip-10-0-1-100.ec2.internal \ | ||
| --aperf_options="--profile-java -i 1 -d 60" | ||
| ``` | ||
|
|
||
| ## Output Files | ||
|
|
||
| The script generates timestamped files: | ||
| - `aperf_report_YYYYMMDD-HHMMSS.tar.gz`: Complete profiling report archive | ||
|
|
||
| ## Example Output | ||
|
|
||
| ```bash | ||
| $ bash ./eks-aperf.sh --aperf_image="${APERF_ECRREPO}:latest" --namespace=aperf --node ip-10-0-120-104.us-west-2.compute.internal --aperf_options="-p 30 --profile" | ||
|
|
||
| Tageted node instance type... m6g.8xlarge | ||
| Check namespace security policy... Namespace 'aperf' has 'privileged' policy - privileged pods allowed. | ||
| Resource usage for pods on ip-10-0-120-104.us-west-2.compute.internal: | ||
| NAMESPACE NAME CPU(cores) MEMORY(bytes) | ||
| kube-system aws-node-fddbt 3m 58Mi | ||
| kube-system ebs-csi-node-dgjl9 1m 31Mi | ||
| kube-system kube-proxy-ct4n5 1m 15Mi | ||
| mongodb mongodb-5bd6669d6b-kmn4w 723m 1636Mi | ||
|
|
||
| Created pod configuration for node: ip-10-0-120-104.us-west-2.compute.internal | ||
| Deploying pod to Kubernetes... pod/aperf-pod-ip-10-0-120-104-us-west-2-compute-internal created | ||
| Waiting for pod to start... pod/aperf-pod-ip-10-0-120-104-us-west-2-compute-internal condition met | ||
| Running aperf profiling... | ||
| Generating aperf report... | ||
| Copying files from pod aperf-pod-ip-10-0-120-104-us-west-2-compute-internal... | ||
| Deleting pod to clean up resources... pod "aperf-pod-ip-10-0-120-104-us-west-2-compute-internal" deleted | ||
| Files copied to aperf_report_20250626-133204.tar.gz | ||
| Done! | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Common Issues | ||
|
|
||
| 1. **"Privileged pods NOT allowed"** | ||
| - The namespace has security restrictions | ||
| - Use a namespace that allows privileged pods or modify security policies | ||
|
|
||
| 2. **"Pod failed to reach ready state"** | ||
| - Check node resources and availability | ||
| - Verify ECR image accessibility from the node | ||
| - Review pod logs for specific errors | ||
|
|
||
| 3. **"No aperf report directory found"** | ||
| - APerf profiling may have failed | ||
| - Check APerf options and target processes | ||
| - Verify profiler has necessary permissions | ||
|
|
||
| ### Getting Node Names | ||
| ```bash | ||
| kubectl get nodes | ||
| ``` | ||
|
|
||
| ### Checking Namespace Security Policies | ||
| ```bash | ||
| kubectl get namespace <namespace> -o yaml | grep -A5 labels | ||
| ``` | ||
|
|
||
| ## Security Considerations | ||
|
|
||
| - The APerf pod runs with privileged access to collect all system-level metrics | ||
| - The pod has access in read-only mode to the host's `/boot` directory and processes PIDs | ||
| - Ensure your cluster's security policies allow privileged pods if required | ||
| - The pod is automatically cleaned up after execution | ||
| - Only use on trusted clusters and nodes | ||
| - Ensure ECR images are from trusted sources | ||
| - Monitor resource usage during profiling | ||
|
|
||
|
|
||
|
|
||
| ## Requirements Summary | ||
|
|
||
| - Kubernetes cluster with worker nodes | ||
| - kubectl access with pod creation permissions | ||
| - ECR access for pulling profiler images | ||
| - Sufficient node resources for profiling overhead | ||
| - **Worker Node Setup**: | ||
| - Host directory `/opt/k8s/async-profiler/async-profiler-4.2-linux-arm64` must exist | ||
| - Async Profiler binaries installed in the host path | ||
| - Proper file permissions for profiler execution | ||
| - **Application Pod Configuration**: | ||
| - Application pods must mount the same async profiler host path | ||
| - Both application and APerf pods need identical volume mount configuration | ||
| - Shared host path enables profiler access to target processes | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title says "Updated the Docker File, eks-aperf.sh and adding eks-aperf-README.md", but seems the change only includes readme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar pull request was opened here: https://github.com/aws/aperf/pull/355/files