regex_parser transform

Accepts `log` events and allows you to parse a field's value with a Regular Expression.

The regex_parser transform accepts log events and allows you to parse a field's value with a Regular Expression.

Config File

vector.toml (example)
vector.toml (schema)
vector.toml (specification)
[transforms.my_transform_id]
# REQUIRED - General
type = "regex_parser" # must be: "regex_parser"
inputs = ["my-source-id"]
regex = "^(?P<host>[\\w\\.]+) - (?P<user>[\\w]+) (?P<bytes_in>[\\d]+) \\[(?P<timestamp>.*)\\] \"(?P<method>[\\w]+) (?P<path>.*)\" (?P<status>[\\d]+) (?P<bytes_out>[\\d]+)$"
# OPTIONAL - General
drop_field = true # default
field = "message" # default
# OPTIONAL - Types
[transforms.my_transform_id.types]
status = "int"
duration = "float"
success = "bool"
timestamp = "timestamp|%s"
timestamp = "timestamp|%+"
timestamp = "timestamp|%F"
timestamp = "timestamp|%a %b %e %T %Y"

Options

Key

Type

Description

REQUIRED - General

type

string

The component type required must be: "regex_parser"

inputs

[string]

A list of upstream source or transform IDs. See Config Composition for more info. required example: ["my-source-id"]

regex

string

The Regular Expression to apply. Do not inlcude the leading or trailing /. See Failed Parsing and Regex Debugger for more info. required example: (see above)

OPTIONAL - General

drop_field

bool

If the field should be dropped (removed) after parsing. default: true

field

string

The field to parse. See Failed Parsing for more info. default: "message"

OPTIONAL - Types

types.*

string

A definition of mapped field types. They key is the field name and the value is the type. strftime specifiers are supported for the timestamp type. required enum: "string", "int", "float", "bool", and "timestamp\|strftime"

Examples

Given the following log line:

log
{
"message": "5.86.210.12 - zieme4647 5667 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"
}

And the following configuration:

vector.toml
[transforms.<transform-id>]
type = "regex_parser"
field = "message"
regex = '^(?P<host>[\w\.]+) - (?P<user>[\w]+) (?P<bytes_in>[\d]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$'
[transforms.<transform-id>.types]
bytes_int = "int"
timestamp = "timestamp|%m/%d/%Y:%H:%M:%S %z"
status = "int"
bytes_out = "int"

A log event will be emitted with the following structure:

{
// ... existing fields
"bytes_in": 5667,
"host": "5.86.210.12",
"user_id": "zieme4647",
"timestamp": <19/06/2019:17:20:49 -0400>,
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": 201,
"bytes": 20574
}

Things to note about the output:

  1. The message field was overwritten.

  2. The bytes_in, timestamp, status, and bytes_out fields were coerced.

How It Works

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

Failed Parsing

If the field value fails to parse against the provided regex then an error will be logged and the event will be kept or discarded depending on the drop_failed value.

A failure includes any event that does not successfully parse against the provided regex. This includes bad values as well as events missing the specified field.

Performance

The regex_parser source has been involved in the following performance tests:

Learn more in the Performance sections.

Regex Debugger

To test the validity of the regex option, we recommend the Golang Regex Tester as it's Regex syntax closely follows Rust's.

Regex Syntax

Vector follows the documented Rust Regex syntax since Vector is written in Rust. This syntax follows a Perl-style regular expression syntax, but lacks a few features like look around and backreferences.

Named Captures

You can name Regex captures with the <name> syntax. For example:

^(?P<timestamp>.*) (?P<level>\w*) (?P<message>.*)$

Will capture timestamp, level, and message. All values are extracted as string values and must be coerced with the types table.

More info can be found in the Regex grouping and flags documentation.

Flags

Regex flags can be toggled with the (?flags) syntax. The available flags are:

Flag

Descriuption

i

case-insensitive: letters match both upper and lower case

m

multi-line mode: ^ and $ match begin/end of line

s

allow . to match \n

U

swap the meaning of x* and x*?

u

Unicode support (enabled by default)

x

ignore whitespace and allow line comments (starting with #)

For example, to enable the case-insensitive flag you can write:

(?i)Hello world

More info can be found in the Regex grouping and flags documentation.

Types

By default, extracted (parsed) fields all contain string values. You can coerce these values into types via the types table as shown in the Config File example above. For example:

[transforms.my_transform_id]
# ...
# OPTIONAL - Types
[transforms.my_transform_id.types]
status = "int"
duration = "float"
success = "bool"
timestamp = "timestamp|%s"
timestamp = "timestamp|%+"
timestamp = "timestamp|%F"
timestamp = "timestamp|%a %b %e %T %Y"

The available types are:

Type

Desription

bool

Coerces to a true/false boolean. The 1/0 and t/f values are also coerced.

float

Coerce to 64 bit floats.

int

Coerce to a 64 bit integer.

string

Coerces to a string. Generally not necessary since values are extracted as strings.

timestamp

Coerces to a Vector timestamp. strftime specificiers must be used to parse the string.

Troubleshooting

The best place to start with troubleshooting is to check the Vector logs. This is typically located at /var/log/vector.log, then proceed to follow the Troubleshooting Guide.

If the Troubleshooting Guide does not resolve your issue, please:

  1. If encountered a bug, please file a bug report.

  2. If encountered a missing feature, please file a feature request.

  3. If you need help, join our chat/forum community. You can post a question and search previous questions.

Alternatives

Finally, consider the following alternatives:

Resources