Configuration

Vector configuration

This section covers configuring Vector and creating pipelines like the one shown above. Vector requires only a single TOML configurable file, which you can specify via the --config flag when starting vector:

vector --config /etc/vector/vector.toml

Example

vector.toml
# Set global options
data_dir = "/var/lib/vector"
# Ingest data by tailing one or more files
[sources.apache_logs]
type = "file"
include = ["/var/log/apache2/*.log"] # supports globbing
ignore_older = 86400 # 1 day
# Structure and parse the data
[transforms.apache_parser]
inputs = ["apache_logs"]
type = "regex_parser" # fast/powerful regex
regex = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'
# Sample the data to save on cost
[transforms.apache_sampler]
inputs = ["apache_parser"]
type = "sampler"
hash_field = "request_id" # sample _entire_ requests
rate = 50 # only keep 50%
# Send structured data to a short-term storage
[sinks.es_cluster]
inputs = ["apache_sampler"] # only take sampled data
type = "elasticsearch"
host = "http://79.12.221.222:9200" # local or external host
index = "vector-%Y-%m-%d" # daily indices
# Send structured data to a cost-effective long-term storage
[sinks.s3_archives]
inputs = ["apache_parser"] # don't sample for S3
type = "aws_s3"
region = "us-east-1"
bucket = "my-log-archives"
key_prefix = "date=%Y-%m-%d" # daily partitions, hive friendly format
batch_size = 10000000 # 10mb uncompressed
gzip = true # compress final objects
encoding = "ndjson" # new line delimited JSON

Global Options

Key

Type

Description

OPTIONAL

data_dir

string

The directory used for persisting Vector state, such as on-disk buffers, file checkpoints, and more. Please make sure the Vector project has write permissions to this dir. See Data Directory for more info. no default example: "/var/lib/vector"

Sources

Name

Description

file

Ingests data through one or more local files and outputs log events.

journald

Ingests data through log records from journald and outputs log events.

kafka

Ingests data through Kafka 0.9 or later and outputs log events.

statsd

Ingests data through the StatsD UDP protocol and outputs metric events.

stdin

Ingests data through standard input (STDIN) and outputs log events.

syslog

Ingests data through the Syslog 5424 protocol and outputs log events.

tcp

Ingests data through the TCP protocol and outputs log events.

udp

Ingests data through the UDP protocol and outputs log events.

vector

Ingests data through another upstream Vector instance and outputs log and metric events.

+ request a new source

Transforms

Name

Description

add_fields

Accepts log events and allows you to add one or more log fields.

add_tags

Accepts metric events and allows you to add one or more metric tags.

coercer

Accepts log events and allows you to coerce log fields into fixed types.

field_filter

Accepts log and metric events and allows you to filter events by a log field's value.

grok_parser

Accepts log events and allows you to parse a log field value with Grok.

json_parser

Accepts log events and allows you to parse a log field value as JSON.

log_to_metric

Accepts log events and allows you to convert logs into one or more metrics.

lua

Accepts log events and allows you to transform events with a full embedded Lua engine.

regex_parser

Accepts log events and allows you to parse a log field's value with a Regular Expression.

remove_fields

Accepts log events and allows you to remove one or more log fields.

remove_tags

Accepts metric events and allows you to remove one or more metric tags.

sampler

Accepts log events and allows you to sample events with a configurable rate.

tokenizer

Accepts log events and allows you to tokenize a log field's value by splitting on white space, ignoring special wrapping characters, and zipping the tokens into ordered field names.

+ request a new transform

Sinks

Name

Description

aws_cloudwatch_logs

Batches log events to AWS CloudWatch Logs via the PutLogEvents API endpoint.

aws_kinesis_streams

Batches log events to AWS Kinesis Data Stream via the PutRecords API endpoint.

aws_s3

Batches log events to AWS S3 via the PutObject API endpoint.

blackhole

Streams log and metric events to a blackhole that simply discards data, designed for testing and benchmarking purposes.

clickhouse

Batches log events to Clickhouse via the HTTP Interface.

console

Streams log and metric events to the console, STDOUT or STDERR.

elasticsearch

Batches log events to Elasticsearch via the _bulk API endpoint.

file

Streams log events to a file.

http

Batches log events to a generic HTTP endpoint.

kafka

Streams log events to Apache Kafka via the Kafka protocol.

prometheus

Exposes metric events to Prometheus metrics service.

splunk_hec

Batches log events to a Splunk HTTP Event Collector.

tcp

Streams log events to a TCP connection.

vector

Streams log events to another downstream Vector instance.

+ request a new sink

How It Works

Composition

The primary purpose of the configuration file is to compose pipelines. Pipelines are formed by connecting sources, transforms, and sinks through the inputs option.

Notice in the above example each input references the id assigned to a previous source or transform.

Config File Location

The location of your Vector configuration file depends on your platform or operating system. For most Linux based systems the file can be found at /etc/vector/vector.toml.

Data Directory

Vector requires a data_dir value for on-disk operations. Currently, the only operation using this directory are Vector's on-disk buffers. Buffers, by default, are memory-based, but if you switch them to disk-based you'll need to specify a data_dir.

Environment Variables

Vector will interpolate environment variables within your configuration file with the following syntax:

vector.toml
[transforms.add_host]
type = "add_fields"
[transforms.add_host.fields]
host = "${HOSTNAME}"

The entire ${HOSTNAME} variable will be replaced, hence the requirement of quotes around the definition.

Escaping

You can escape environment variable by preceding them with a $ character. For example $${HOSTNAME} will be treated literally in the above environment variable example.

Format

The Vector configuration file requires the TOML format for it's simplicity, explicitness, and relaxed white-space parsing. For more information, please refer to the excellent TOML documentation.

Template Syntax

Select configuration options support Vector's template syntax to produce dynamic values derived from the event's data. There are 2 special syntaxes:

  1. Strftime specifiers. Ex: date=%Y/%m/%d

  2. Event fields. Ex: {{ field_name }}

Each are described in more detail below.

Strftime specifiers

For simplicity, Vector allows you to supply strftime specifiers directly as part of the value to produce formatted timestamp values based off of the event's timestamp field.

For example, given the following log event:

LogEvent {
"timestamp": chrono::DateTime<2019-05-02T00:23:22Z>,
"message": "message"
"host": "my.host.com"
}

And the following configuration:

vector.toml
[sinks.my_s3_sink_id]
type = "aws_s3"
key_prefix = "date=%Y-%m-%d"

Vector would produce the following value for the key_prefix field:

date=2019-05-02

This effectively enables time partitioning for the aws_s3 sink.

Event fields

In addition to formatting the timestamp field, Vector allows you to directly access event fields with the {{ <field-name> }} syntax.

For example, given the following log event:

LogEvent {
"timestamp": chrono::DateTime<2019-05-02T00:23:22Z>,
"message": "message"
"application_id": 1
}

And the following configuration:

vector.toml
[sinks.my_s3_sink_id]
type = "aws_s3"
key_prefix = "application_id={{ application_id }}/date=%Y-%m-%d"

Vector would produce the following value for the key_prefix field:

application_id=1/date=2019-05-02

This effectively enables application specific time partitioning.

Value Types

All TOML values types are supported. For convenience this includes: