This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Observability in EKS Anywhere

Monitoring, Logging, and Tracing for EKS Anywhere Clusters.

1: Overview
2: Verify EKS Anywhere cluster status
3: Connect EKS Anywhere clusters to the EKS console
4: Configure Fluent Bit for CloudWatch
5: Expose metrics for EKS Anywhere components

1 - Overview

Overview of observability in EKS Anywhere

Most Kubernetes-conformant observability tools can be used with EKS Anywhere. You can optionally use the EKS Connector to view your EKS Anywhere cluster resources in the Amazon EKS console, reference the Connect to console page for details. EKS Anywhere includes the AWS Distro for Open Telemetry (ADOT) and Prometheus for metrics and tracing as EKS Anywhere Curated Packages. You can use popular tooling such as Fluent Bit for logging, and can track the progress of logging for ADOT on the AWS Observability roadmap . For more information on EKS Anywhere Curated Packages, reference the Package Management Overview .

AWS Integrations

AWS offers comprehensive monitoring, logging, alarming, and dashboard capabilities through services such as Amazon CloudWatch , Amazon Managed Prometheus (AMP) , and Amazon Managed Grafana (AMG) . With CloudWatch, you can take advantage of a highly scalable, AWS-native centralized logging and monitoring solution for EKS Anywhere clusters. With AMP and AMG, you can monitor your containerized applications EKS Anywhere clusters at scale with popular Prometheus and Grafana interfaces.

Resources

2 - Verify EKS Anywhere cluster status

Verify the status of EKS Anywhere clusters

Note

To check the status of a single cluster, configure kubectl to communicate with the cluster by setting the KUBECONFIG environment variable to point to your cluster’s kubeconfig file.
To check the status of workload clusters from a management cluster, configure kubectl with the kubeconfig of the management cluster.

Check cluster nodes

To verify the expected number of cluster nodes are present and running, use the kubectl command to show that nodes are Ready.

Worker nodes are named using the cluster name followed by the worker node group name. In the example below, the cluster name is mgmt and the worker node group name is md-0. The other nodes shown in the response are control plane or etcd nodes.

kubectl get nodes

NAME                              STATUS   ROLES           AGE   VERSION
mgmt-clrt4                        Ready    control-plane   3d22h   v1.27.1-eks-61789d8
mgmt-md-0-5557f7c7bxsjkdg-l2kpt   Ready    <none>          3d22h   v1.27.1-eks-61789d8

Check cluster machines

To verify that the expected number of cluster machines are present and running, use the kubectl command to show that the machines are Running.

The machine objects are named using the cluster name as a prefix and there should be one created for each node in your cluster. In the example below, the command was run against a management cluster with a single attached workload cluster. When the command is run against a management cluster, all machines for the management cluster and attached workload clusters are shown.

kubectl get machines -A

NAMESPACE     NAME                              CLUSTER   NODENAME                          PROVIDERID                                       PHASE     AGE     VERSION
eksa-system   mgmt-clrt4                        mgmt      mgmt-clrt4                        vsphere://421a801c-ac46-f47e-de1f-f070ef990c4d   Running   3d22h   v1.27.1-eks-1-27-4
eksa-system   mgmt-md-0-5557f7c7bxsjkdg-l2kpt   mgmt      mgmt-md-0-5557f7c7bxsjkdg-l2kpt   vsphere://421a4b9b-c457-fc4d-458a-d5092f981c5d   Running   3d22h   v1.27.1-eks-1-27-4
eksa-system   w01-7hzfh                         w01       w01-7hzfh                         vsphere://421a642b-f4ef-5764-47f9-5b56efcf8a4b   Running   15h     v1.27.1-eks-1-27-4
eksa-system   w01-etcd-z2ggk                    w01                                         vsphere://421ac003-3a1a-7dd9-ac83-bd0c75370cc4   Running   15h     
eksa-system   w01-md-0-799ffd7946x5gz8w-p94mt   w01       w01-md-0-799ffd7946x5gz8w-p94mt   vsphere://421a7b77-ca57-dc78-18bf-f361081a2c5e   Running   15h     v1.27.1-eks-1-27-4

Check cluster components

To verify cluster components are present and running, use the kubectl command to show that the system Pods are Running. The number of Pods may vary based on the infrastructure provider (vSphere, bare metal, Snow, Nutanix, CloudStack), and whether the cluster is a workload cluster or a management cluster.

kubectl get pods -A

NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS      AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-8665b88c65-v982t       1/1     Running   0             3d22h
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-67595c55d8-z7627   1/1     Running   0             3d22h
capi-system                         capi-controller-manager-88bdd56b4-wnk66                          1/1     Running   0             3d22h
capv-system                         capv-controller-manager-644d9864dc-hbrcz                         1/1     Running   1 (16h ago)   3d22h
cert-manager                        cert-manager-548579646f-4tgb2                                    1/1     Running   0             3d22h
cert-manager                        cert-manager-cainjector-cbb6df554-w5fjx                          1/1     Running   0             3d22h
cert-manager                        cert-manager-webhook-54f748c89b-qnfr2                            1/1     Running   0             3d22h
eksa-packages                       ecr-credential-provider-package-4c7mk                            1/1     Running   0             3d22h
eksa-packages                       ecr-credential-provider-package-nvlkb                            1/1     Running   0             3d22h
eksa-packages                       eks-anywhere-packages-784c6fc8b9-2t5nr                           1/1     Running   0             3d22h
eksa-system                         eksa-controller-manager-76f484bd5b-x6qld                         1/1     Running   0             3d22h
etcdadm-bootstrap-provider-system   etcdadm-bootstrap-provider-controller-manager-6bcdd4f5d7-wvqw8   1/1     Running   0             3d22h
etcdadm-controller-system           etcdadm-controller-controller-manager-6f96f5d594-kqnfw           1/1     Running   0             3d22h
kube-system                         cilium-lbqdt                                                     1/1     Running   0             3d22h
kube-system                         cilium-operator-55c4778776-jvrnh                                 1/1     Running   0             3d22h
kube-system                         cilium-operator-55c4778776-wjjrk                                 1/1     Running   0             3d22h
kube-system                         cilium-psqm2                                                     1/1     Running   0             3d22h
kube-system                         coredns-69797695c4-kdtjc                                         1/1     Running   0             3d22h
kube-system                         coredns-69797695c4-r25vv                                         1/1     Running   0             3d22h
kube-system                         etcd-mgmt-clrt4                                                  1/1     Running   0             3d22h
kube-system                         kube-apiserver-mgmt-clrt4                                        1/1     Running   0             3d22h
kube-system                         kube-controller-manager-mgmt-clrt4                               1/1     Running   0             3d22h
kube-system                         kube-proxy-588gj                                                 1/1     Running   0             3d22h
kube-system                         kube-proxy-hrksw                                                 1/1     Running   0             3d22h
kube-system                         kube-scheduler-mgmt-clrt4                                        1/1     Running   0             3d22h
kube-system                         kube-vip-mgmt-clrt4                                              1/1     Running   0             3d22h
kube-system                         vsphere-cloud-controller-manager-7vzjx                           1/1     Running   0             3d22h
kube-system                         vsphere-cloud-controller-manager-cqfs5                           1/1     Running   0             3d22h

Check control plane components

You can verify the control plane is present and running by filtering Pods by the control-plane=controller-manager label.

kubectl get pod -A -l control-plane=controller-manager

NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS      AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-8665b88c65-v982t       1/1     Running   0             3d21h
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-67595c55d8-z7627   1/1     Running   0             3d21h
capi-system                         capi-controller-manager-88bdd56b4-wnk66                          1/1     Running   0             3d21h
capv-system                         capv-controller-manager-644d9864dc-hbrcz                         1/1     Running   1 (15h ago)   3d21h
eksa-packages                       eks-anywhere-packages-784c6fc8b9-2t5nr                           1/1     Running   0             3d21h
etcdadm-bootstrap-provider-system   etcdadm-bootstrap-provider-controller-manager-6bcdd4f5d7-wvqw8   1/1     Running   0             3d21h
etcdadm-controller-system           etcdadm-controller-controller-manager-6f96f5d594-kqnfw           1/1     Running   0             3d21h

Check workload clusters from management clusters

Set up CLUSTER_NAME and KUBECONFIG environment variable for the management cluster:

export CLUSTER_NAME=mgmt
export KUBECONFIG=${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig

Check control plane resources for all clusters

Use the command below to check the status of cluster control plane resources. This is useful to verify clusters with multiple control plane nodes after an upgrade. The status for the management cluster and all attached workload clusters is shown.

kubectl get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io -n eksa-system

NAME   CLUSTER   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE     VERSION
mgmt   mgmt      true          true                   1          1       1                       3d22h   v1.27.1-eks-1-27-4
w01    w01       true          true                   1          1       1         0             16h     v1.27.1-eks-1-27-4

Use the command below to check the status of a cluster resource. This is useful to verify cluster health after any mutating cluster lifecycle operation. The status for the management cluster and all attached workload clusters is shown.

kubectl get clusters.cluster.x-k8s.io -A -o=custom-columns=NAME:.metadata.name,CONTROLPLANE-READY:.status.controlPlaneReady,INFRASTRUCTURE-READY:.status.infrastructureReady,MANAGED-EXTERNAL-ETCD-INITIALIZED:.status.managedExternalEtcdInitialized,MANAGED-EXTERNAL-ETCD-READY:.status.managedExternalEtcdReady

NAME   CONTROLPLANE-READY   INFRASTRUCTURE-READY   MANAGED-EXTERNAL-ETCD-INITIALIZED   MANAGED-EXTERNAL-ETCD-READY
mgmt   true                 true                   <none>                              <none>
w01    true                 true                   true                                true

3 - Connect EKS Anywhere clusters to the EKS console

Connect an EKS Anywhere cluster to the EKS console

The EKS Connector lets you connect your EKS Anywhere cluster to the EKS console. The connected console displays the EKS Anywhere cluster, its configuration, workloads, and their status. EKS Connector is a software agent that runs on your EKS Anywhere cluster and registers the cluster with the EKS console

Visit the EKS Connector documentation for details on how to configure and run the EKS Connector.

4 - Configure Fluent Bit for CloudWatch

Using Fluent Bit for logging with EKS Anywhere clusters and CloudWatch

Fluent Bit is an open source, multi-platform log processor and forwarder which allows you to collect data/logs from different sources, then unify and send them to multiple destinations. It’s fully compatible with Docker and Kubernetes environments. Due to its lightweight nature, using Fluent Bit as the log forwarder for EKS Anywhere clusters enables you to stream application logs into Amazon CloudWatch Logs efficiently and reliably.

You can additionally use CloudWatch Container Insights to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices running on EKS Anywhere clusters. CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. You can also set CloudWatch alarms on metrics that Container Insights collects.

On this page, we show how to set up Fluent Bit and Container Insights to send logs and metrics from your EKS Anywhere clusters to CloudWatch.

Prerequisites

An AWS Account (see AWS documentation to get started)
An EKS Anywhere cluster with IAM Roles for Service Account (IRSA) enabled: With IRSA, an IAM role can be associated with a Kubernetes service account. This service account can provide AWS permissions to the containers in any Pod that use the service account, which enables the containers to securely communicate with AWS services. This removes the need to hardcode AWS security credentials as environment variables on your nodes. See the IRSA configuration page for details.

Note

The example uses eksapoc as the EKS Anywhere cluster name. You must adjust the configuration in the examples below if you use a different cluster name. Specifically, make sure to adjust the fluentbit.yaml manifest accordingly.
The example uses the us-west-2 AWS Region. You must adjust the configuration in the examples below if you are using a different region.

Before setting up Fluent Bit, first create an IAM Policy and Role to send logs to CloudWatch.

Step 1: Create IAM Policy

Go to IAM Policy in the AWS console.
Click on JSON as shown below:
Create below policy on the IAM Console. Click on Create Policy as shown:

        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "EKSAnywhereLogging",
                    "Effect": "Allow",
                    "Action": "cloudwatch:*",
                    "Resource": "*"
                }
            ]
        }

Step 2: Create IAM Role

Go to IAM Role in the AWS console.
Follow the steps as shown below:

In Identity Provider, enter the OIDC provider you created as a part of IRSA configuration.

In Audience, select sts.amazonaws.com. Click on Next.
Select permission name which we have created in Create IAM Policy
Provide a Role name EKSAnywhereLogging and click Next.
Copy the ARN as shown below and save it locally for the next step.

Step 3: Install Fluent Bit

Create the amazon-cloudwatch namespace using this command:
```
kubectl create namespace amazon-cloudwatch
```

Create the Service Account for cloudwatch-agent and fluent-bit under the amazon-cloudwatch namespace. In this section, we will use Role ARN which we saved earlier . Replace $RoleARN with your actual value.

cat << EOF | kubectl apply -f -
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloudwatch-agent
namespace: amazon-cloudwatch
annotations:
# set this with value of OIDC_IAM_ROLE
eks.amazonaws.com/role-arn: "$RoleARN"
# optional: Defaults to "sts.amazonaws.com" if not set
eks.amazonaws.com/audience: "sts.amazonaws.com"
# optional: When set to "true", adds AWS_STS_REGIONAL_ENDPOINTS env var
#   to containers
eks.amazonaws.com/sts-regional-endpoints: "true"
# optional: Defaults to 86400 for expirationSeconds if not set
#   Note: This value can be overwritten if specified in the pod
#         annotation as shown in the next step.
eks.amazonaws.com/token-expiration: "86400"
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: amazon-cloudwatch
annotations:
# set this with value of OIDC_IAM_ROLE
eks.amazonaws.com/role-arn: "$RoleARN"
# optional: Defaults to "sts.amazonaws.com" if not set
eks.amazonaws.com/audience: "sts.amazonaws.com"
# optional: When set to "true", adds AWS_STS_REGIONAL_ENDPOINTS env var
#   to containers
eks.amazonaws.com/sts-regional-endpoints: "true"
# optional: Defaults to 86400 for expirationSeconds if not set
#   Note: This value can be overwritten if specified in the pod
#         annotation as shown in the next step.
eks.amazonaws.com/token-expiration: "86400"
EOF

The above command creates two Service Accounts:

serviceaccount/cloudwatch-agent created
serviceaccount/fluent-bit created

Now deploy Fluent Bit in your EKS Anywhere cluster to scrape and send logs to CloudWatch:

kubectl apply -f "https://anywhere.eks.amazonaws.com/manifests/fluentbit.yaml"

You should see the following output:

clusterrole.rbac.authorization.k8s.io/cloudwatch-agent-role changed
clusterrolebinding.rbac.authorization.k8s.io/cloudwatch-agent-role-binding changed
configmap/cwagentconfig changed
daemonset.apps/cloudwatch-agent changed
configmap/fluent-bit-cluster-info changed
clusterrole.rbac.authorization.k8s.io/fluent-bit-role changed
clusterrolebinding.rbac.authorization.k8s.io/fluent-bit-role-binding changed
configmap/fluent-bit-config changed
daemonset.apps/fluent-bit changed

You can verify the DaemonSets have been deployed with the following command:
```
kubectl -n amazon-cloudwatch get daemonsets
```

If you are running the EKS connector , you can verify the status of DaemonSets by logging into AWS console and navigate to Amazon EKS -> Cluster -> Resources -> DaemonSets

Step 4: Deploy a test application

Deploy a simple test application to verify your setup is working properly.

Step 5: View cluster logs and metrics

Cloudwatch Logs

Open the CloudWatch console . The link opens the console and displays your current available log groups.
Choose the EKS Anywhere clustername that you want to view logs for. The log group name format is /aws/containerinsights/my-EKS-Anywhere-cluster/cluster.

Log group name /aws/containerinsights/my-EKS-Anywhere-cluster/application has log source from /var/log/containers.

Log group name /aws/containerinsights/my-EKS-Anywhere-cluster/dataplane has log source for kubelet.service, kubeproxy.service, and docker.service
To view the deployed test application logs, click on the application LogGroup, and click on Search All
Type HTTP 1.1 200 in the search box and press enter. You should see logs as shown below:

Cloudwatch Container Insights

Open the CloudWatch console . The link opens the Container Insights performance Monitoring console and displays a dropdown to select your EKS Clusters.

For more details on CloudWatch logs, please refer What is Amazon CloudWatch Logs?

5 - Expose metrics for EKS Anywhere components

Expose metrics for EKS Anywhere components

Some Kubernetes system components like kube-controller-manager, kube-scheduler, kube-proxy and etcd (Stacked) expose metrics only on the localhost by default. In order to expose metrics for these components so that other monitoring systems like Prometheus can scrape them, you can deploy a proxy as a Daemonset on the host network of the nodes. The proxy pods also need to be configured with control plane tolerations so that they can be scheduled on the control plane nodes.

For etcd metrics, the steps outlined below are applicable only for stacked etcd setup. For Unstacked/External etcd, metrics are already exposed on https://<etcd-machine-ip>:2379/metrics endpoint and can be scraped by Prometheus directly without deploying a proxy.

Configure Proxy

To configure a proxy for exposing metrics on an EKS Anywhere cluster, you can perform the following steps:

Create a config map to store the proxy configuration.

Below is an example ConfigMap if you use HAProxy as the proxy server.

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: metrics-proxy
data:
  haproxy.cfg: |
    defaults
      mode http
      timeout connect 5000ms
      timeout client 5000ms
      timeout server 5000ms
      default-server maxconn 10

    frontend kube-proxy
      bind \${NODE_IP}:10249
      http-request deny if !{ path /metrics }
      default_backend kube-proxy
    backend kube-proxy
      server kube-proxy 127.0.0.1:10249 check

    frontend kube-controller-manager
      bind \${NODE_IP}:10257
      http-request deny if !{ path /metrics }
      default_backend kube-controller-manager
    backend kube-controller-manager
      server kube-controller-manager 127.0.0.1:10257 ssl verify none check

    frontend kube-scheduler
      bind \${NODE_IP}:10259
      http-request deny if !{ path /metrics }
      default_backend kube-scheduler
    backend kube-scheduler
      server kube-scheduler 127.0.0.1:10259 ssl verify none check

    frontend etcd
      bind \${NODE_IP}:2381
      http-request deny if !{ path /metrics }
      default_backend etcd
    backend etcd
      server etcd 127.0.0.1:2381 check
EOF

Create a daemonset for the proxy and mount the config map volume onto the proxy pods.

Below is an example configuration for the HAProxy daemonset.

cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metrics-proxy
spec:
  selector:
    matchLabels:
      app: metrics-proxy
  template:
    metadata:
      labels:
        app: metrics-proxy
    spec:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      hostNetwork: true
      containers:
        - name: haproxy
          image: public.ecr.aws/eks-anywhere/kubernetes-sigs/kind/haproxy:v0.20.0-eks-a-54
          env:
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
          ports:
            - name: kube-proxy
              containerPort: 10249
            - name: kube-ctrl-mgr
              containerPort: 10257
            - name: kube-scheduler
              containerPort: 10259
            - name: etcd
              containerPort: 2381
          volumeMounts:
            - mountPath: "/usr/local/etc/haproxy"
              name: haproxy-config
      volumes:
        - configMap:
            name: metrics-proxy
          name: haproxy-config
EOF

Configure Client Permissions

Create a new cluster role for the client to access the metrics endpoint of the components.

cat << EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: metrics-reader
rules:
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get
EOF

Create a new cluster role binding to bind the above cluster role to the client pod’s service account.

cat << EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: metrics-reader-binding
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
roleRef:
  kind: ClusterRole
  name: metrics-reader
  apiGroup: rbac.authorization.k8s.io
EOF

Verify that the metrics are exposed to the client pods by running the following commands:

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  tolerations:
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule
  containers:
  - command:
    - /bin/sleep
    - infinity
    image: curlimages/curl:latest
    name: test-container
    env:
    - name: NODE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
EOF

kubectl exec -it test-pod -- sh
export TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10257/metrics"
curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10259/metrics"
curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:10249/metrics"
curl -H "Authorization: Bearer ${TOKEN}" "http://${NODE_IP}:2381/metrics"