tokenizer transform

Accepts `log` events and allows you to tokenize a field's value by splitting on white space, ignoring special wrapping characters, and zip the tokens into ordered field names.

The tokenizer transform accepts log events and allows you to tokenize a field's value by splitting on white space, ignoring special wrapping characters, and zip the tokens into ordered field names.

Config File

vector.toml (simple)
vector.toml (advanced)
[transforms.my_transform_id]
type = "tokenizer" # must be: "tokenizer"
inputs = ["my-source-id"]
field_names = ["timestamp", "level", "message"]
# For a complete list of options see the "advanced" tab above.

Examples

Given the following log line:

log
{
"message": "5.86.210.12 - zieme4647 [19/06/2019:17:20:49 -0400] "GET /embrace/supply-chains/dynamic/vertical" 201 20574"
}

And the following configuration:

vector.toml
[transforms.<transform-id>]
type = "tokenizer"
field = "message"
fields = ["remote_addr", "ident", "user_id", "timestamp", "message", "status", "bytes"]

A log event will be emitted with the following structure:

{
// ... existing fields
"remote_addr": "5.86.210.12",
"user_id": "zieme4647",
"timestamp": "19/06/2019:17:20:49 -0400",
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": "201",
"bytes": "20574"
}

A few things to note about the output:

  1. The message field was overwritten.

  2. The ident field was dropped since it contained a "-" value.

  3. All values are strings, we have plans to add type coercion.

  4. Special wrapper characters were dropped, such as

    wrapping [...] and "..." characters.

How It Works

Blank Values

Both " " and "-" are considered blank values and their mapped field will be set to null.

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

Special Characters

In order to extract raw values and remove wrapping characters, we must treat certain characters as special. These characters will be discarded:

  • "..." - Quotes are used tp wrap phrases. Spaces are preserved, but the wrapping quotes will be discarded.

  • [...] - Brackets are used to wrap phrases. Spaces are preserved, but the wrapping brackets will be discarded.

  • \ - Can be used to escape the above characters, Vector will treat them as literal.

Types

By default, extracted (parsed) fields all contain string values. You can coerce these values into types via the types table as shown in the Config File example above. For example:

[transforms.my_transform_id]
# ...
# OPTIONAL - Types
[transforms.my_transform_id.types]
status = "int"
duration = "float"
success = "bool"
timestamp = "timestamp|%s"
timestamp = "timestamp|%+"
timestamp = "timestamp|%F"
timestamp = "timestamp|%a %b %e %T %Y"

The available types are:

Type

Desription

bool

Coerces to a true/false boolean. The 1/0 and t/f values are also coerced.

float

Coerce to 64 bit floats.

int

Coerce to a 64 bit integer.

string

Coerces to a string. Generally not necessary since values are extracted as strings.

timestamp

Coerces to a Vector timestamp. strftime specificiers must be used to parse the string.

Troubleshooting

The best place to start with troubleshooting is to check the Vector logs. This is typically located at /var/log/vector.log, then proceed to follow the Troubleshooting Guide.

If the Troubleshooting Guide does not resolve your issue, please:

  1. If encountered a bug, please file a bug report.

  2. If encountered a missing feature, please file a feature request.

  3. If you need help, join our chat/forum community. You can post a question and search previous questions.

Alternatives

Finally, consider the following alternatives:

Resources