Install Vector on Kubernetes
Kubernetes, also known as k8s, is an open source container orchestration system for automating application deployment, scaling, and management. This page covers installing and managing Vector on the Kubernetes platform.
The agent role is designed to collect all log data on each Kubernetes Node. Vector runs as a DaemonSet and tails logs for the entire Pod, automatically enriching those logs with Kubernetes metadata via the Kubernetes API. Collection is handled automatically and it intended for you to adjust your pipeline as necessary using Vector’s sources, transforms, and sinks.
We recommend running Vector in its own Kubernetes namespace. In the instructions here we’ll use
vector as a namespace but you’re free to choose your own.
kubectl create namespace --dry-run=client -o yaml vector > namespace.yaml
cat <<-'KUSTOMIZATION' > kustomization.yaml # Override the namespace of all of the resources we manage. namespace: vector bases: # Include Vector recommended base (from git). - github.com/timberio/vector/distribution/kubernetes/kubectl?ref=v0.15.0 images: # Override the Vector image to avoid use of the sliding tag. - name: timberio/vector newName: timberio/vector newTag: v0.15.0-debian resources: # A namespace to keep the resources at. - namespace.yaml configMapGenerator: # Provide a custom `ConfigMap` for Vector. - name: vector-agent-config files: - vector.toml generatorOptions: # We don't want a suffix for the `ConfigMap` name. disableNameSuffixHash: true KUSTOMIZATION
cat <<-'VECTORCFG' > vector.toml # The Vector Kubernetes integration automatically defines a # `kubernetes_logs` source that is made available to you. # You do not need to define a log source. VECTORCFG
kubectl install -k .
"kubectl logs -n vector daemonset/\(_controller_resource_name)"
Vector is an end-to-end observability data pipeline designed to deploy under various roles. You mix and match these roles to create topologies. The intent is to make Vector as flexible as possible, allowing you to fluidly integrate Vector into your infrastructure over time. The deployment section demonstrates common Vector pipelines:
Vector checkpoints the current read position after each successful read. This ensures that Vector resumes where it left off when it’s restarted, which prevents data from being read twice. The checkpoint positions are stored in the data directory which is specified via the global
data_dir option, but can be overridden via the
data_dir option in the file source directly.
kubernetes_logs source can skip the logs from the individual
containers of a particular Pod. Add an annotation
vector.dev/exclude-containers to the Pod and enumerate the names of all the containers to exclude in the value of the annotation like so:
This annotation makes Vector skip logs originating from the
container2 of the Pod marked with the annotation, while logs from other containers in the Pod are collected.
Vector enriches data with Kubernetes context. You can find a comprehensive list of fields in the
kubernetes_logs source output docs.
Vector provides rich filtering options for Kubernetes log collection:
- Built-in Pod and container exclusion rules
exclude_paths_glob_patternsoption enables you to exclude Kubernetes log files by filename and path.
extra_field_selectoroption specifies the field selector to filter Pods with, to be used in addition to the built-in
extra_label_selectoroption specifies the label selector filter Pods with, to be used in addition to the built-in
Vector requires access to the Kubernetes API. Specifically, the
kubernetes_logs source source uses the
/api/v1/pods endpoint to “watch” Pods from all namespaces.
Modern Kubernetes clusters run with a role-based access control (RBAC) scheme. RBAC-enabled clusters require some configuration to grant Vector the authorization to access Kubernetes API endpoints. As RBAC is currently the standard way of controlling access to the Kubernetes API, we ship the necessary configuration out of the box. See
ClusterRoleBinding, and a
ServiceAccount in our kubectl YAML config and the
rbac configuration in the Helm chart.
If your cluster doesn’t use any access control scheme and doesn’t restrict access to the Kubernetes API, you don’t need to provide any extra configuration, as Vector should just work.
Clusters using a legacy ABAC scheme aren’t officially supported, although Vector might work if you configure access properly. We encourage you to switch to RBAC. If you use a custom access control scheme, make sure that Vector is granted access to the
Vector communicates with the Vector API to enrich the data it collects with Kubernetes context. In order to do that, Vector needs access to the Kubernetes API server. If Vector is running in a Kubernetes cluster, Vector connects to that cluster using the Kubernetes-provided access information.
In addition to access, Vector implements proper desync handling to ensure that communication is safe and reliable. This ensures that Vector doesn’t overwhelm the Kubernetes API or compromise its stability.
Vector’s Helm chart deployments provide quality of life around setup and maintenance of metrics pipelines in Kubernetes. Each of the Helm charts provides an
internal_metrics source and
prometheus sink out of the box. Agent deployments also expose
host_metrics via the same
Charts come with options to enable Prometheus integration via annotations or Prometheus Operator integration via PodMonitor. The Prometheus
node_exporter agent isn’t required when the
host_metrics source is enabled.
By default, Vector merges partial messages that are split due to the Docker size limit. For everything else, we recommend that you use the
reduce transform, which enables you to handle custom merging of things like stacktraces.
By default, the
kubernetes_logs source skips logs from Pods that have a
vector.dev/exclude: "true" label. You can configure additional exclusion rules via label or field selectors. See the available options.
To ensure that all data is collected, Vector continues to collect logs from Pods for some time after their removal. This ensures that Vector obtains some of the most important data, such as crash details.
We recommend the resource limits listed below when running Vector on Kubernetes.
resources: requests: memory: "64Mi" cpu: "500m" limits: memory: "1024Mi" cpu: "6000m"
kubernetes_logs component is stateless, which means that its behavior is consistent across each input.
For the agent role, Vector stores its state in the host-mapped directory with a static path. If it’s redeployed, it’s able to continue from where it was interrupted.
Vector is tested extensively against Kubernetes. In addition to Kubernetes being Vector’s most popular installation method, Vector implements a comprehensive end-to-end test suite for all minor Kubernetes versions beginning with 1.14.