file source

Ingests data through one or more local files and outputs `log` events.

The file source is in beta. Please see the current enhancements and bugs for known issues. We kindly ask that you add any missing issues as it will help shape the roadmap of this component.

The file source ingests data through one or more local files and outputs log events.

Config File

vector.toml (example)
vector.toml (schema)
vector.toml (specification)
[sources.my_source_id]
# REQUIRED - General
type = "file" # must be: "file"
exclude = ["/var/log/nginx/access.log"]
include = ["/var/log/nginx/*.log"]
# OPTIONAL - General
data_dir = "/var/lib/vector" # no default
fingerprint_bytes = 256 # default, bytes
glob_minimum_cooldown = 1000 # default, milliseconds
ignore_older = 86400 # no default, seconds
ignored_header_bytes = 0 # default, bytes
max_line_bytes = 102400 # default, bytes
start_at_beginning = false # default
# OPTIONAL - Context
file_key = "file" # default
host_key = "host" # default

Options

Key

Type

Description

REQUIRED - General

type

string

The component type required must be: "file"

exclude

[string]

Array of file patterns to exclude. Globbing is supported. Takes precedence over the include option. required example: ["/var/log/nginx/access.log"]

include

[string]

Array of file patterns to include. Globbing is supported. required example: ["/var/log/nginx/*.log"]

OPTIONAL - General

data_dir

string

The directory used to persist file checkpoint positions. By default, the global data_dir is used. Please make sure the Vector project has write permissions to this dir. See Checkpointing for more info. no default example: "/var/lib/vector"

fingerprint_bytes

int

The number of bytes read off the head of the file to generate a unique fingerprint. See File Identification for more info. default: 256 unit: bytes

glob_minimum_cooldown

int

Delay between file discovery calls. This controls the interval at which Vector searches for files. See Auto Discovery and Globbing for more info. default: 1000 unit: milliseconds

ignore_older

int

Ignore files with a data modification date that does not exceed this age. See File Rotation for more info. no default example: 86400 unit: seconds

ignored_header_bytes

int

The number of bytes to skipe ahead (or ignore) when generating a unique fingerprint. This is helpful if all files share a common header. See File Identification for more info. default: 0 unit: bytes

max_line_bytes

int

The maximum number of a bytes a line can contain before being discarded. This protects against malformed lines or tailing incorrect files. default: 102400 unit: bytes

start_at_beginning

bool

When true Vector will read from the beginning of new files, when false Vector will only read new data added to the file. See Read Position for more info. default: false

OPTIONAL - Context

file_key

string

The key name added to each event with the full path of the file. See Context for more info. default: "file"

host_key

string

The key name added to each event representing the current host. See Context for more info. default: "host"

Examples

Given the following input:

/var/log/rails.log
2019-02-13T19:48:34+00:00 [info] Started GET "/" for 127.0.0.1

A log event will be emitted with the following structure:

log
{
"timestamp": <timestamp> # current time,
"message": "2019-02-13T19:48:34+00:00 [info] Started GET "/" for 127.0.0.1",
"file": "/var/log/rails.log", # original file
"host": "10.2.22.122" # current nostname
}

The "timestamp", "file", and "host" keys were automatically added as context. You can further parse the "message" key with a transform, such as the regex transform.

How It Works

Auto Discovery

Vector will continually look for new files matching any of your include patterns. The frequency is controlled via the glob_minimum_cooldown option. If a new file is added that matches any of the supplied patterns, Vector will begin tailing it. Vector maintains a unique list of files and will not tail a file more than once, even if it matches multiple patterns. You can read more about how we identify file in the Identification section.

Checkpointing

Vector checkpoints the current read position in the file after each successful read. This ensures that Vector resumes where it left off if restarted, preventing data from being read twice. The checkpoint positions are stored in the data directory which is specified via the global data_dir option but can be overridden via the data_dir option in the file sink directly.

Context

By default, the file source will add context keys to your events via the file_key and host_key options.

Delivery Guarantee

Due to the nature of this component, it offers a best effort delivery guarantee.

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

File Deletions

If a file is deleted Vector will flush the current buffer and stop tailing the file.

File Identification

By default, Vector identifies files by creating a cyclic redundancy check (CRC) on the first 256 bytes of the file. This serves as a fingerprint to uniquely identify the file. The amount of bytes read can be controlled via the fingerprint_bytes and ignored_header_bytes options.

This strategy avoids the common pitfalls of using device and inode names since inode names can be reused across files. This enables Vector to properly tail files in the event of rotation.

File Rotation

Vector will follow files across rotations in the manner of tail, and because of the way Vector identifies files, Vector will properly recognize newly rotated files regardless if you are using copytruncate or create directive. To ensure Vector handles rotated files properly we recommend:

  1. Ensure the includes paths include rotated files. For example, use

    /var/log/nginx*.log to recognize /var/log/nginx.2.log.

  2. Use either the copytruncate or create directives when rotating files.

    If historical data is compressed, or altered in any way, Vector will not be

    able to properly identify the file.

  3. Only delete files when they have exceeded the ignore_older age. While

    extremely rare, this ensures you do not delete data before Vector has a

    chance to ingest it.

Globbing

Globbing is supported in all provided file paths, files will be autodiscovered continually at a rate defined by the glob_minimum_cooldown option.

Line Delimiters

Each line is read until a new line delimiter (the 0xA byte) or EOF is found.

Read Position

By default, Vector will read new data only for newly discovered files, similar to the tail command. You can read from the beginning of the file by setting the start_at_beginning option to true.

Previously discovered files will be checkpointed, and the read position will resume from the last checkpoint.

Troubleshooting

The best place to start with troubleshooting is to check the Vector logs. This is typically located at /var/log/vector.log, then proceed to follow the Troubleshooting Guide.

If the Troubleshooting Guide does not resolve your issue, please:

  1. If encountered a bug, please file a bug report.

  2. If encountered a missing feature, please file a feature request.

  3. If you need help, join our chat/forum community. You can post a question and search previous questions.

Resources