Labeled tab-separated values
Labeled Tab-separated Values (LTSV) format is a variant of Tab-separated values (TSV). Each record in a LTSV file is represented as a single line. Each field is separated by TAB and has a label and a value. The label and the value have been separated by ':'. With the LTSV format, you can parse each line by splitting with TAB (like original TSV format) easily, and extend any fields with unique labels in no particular order.
As a replacement for Common Log Format
Common Log Format and Combined Log Format, its extended variant, have been widely used as standard web server log format. However, it is notoriously hard to parse, making it difficult for later analyses. Here is a sample parser in Perl:
my @common = qw/host ident user time req status size/;
my @combined = qw/referer ua/;
my @re_unquote = ( qr/\"(.*?)\"/, qr/\"((?:\\[\\\"]|.)*?)\"/ );
my @re_common = map {
qr{
\A
(\S+) [ ] # host
(\S+) [ ] # ident
(\S+) [ ] # user
(\[.*?\]) [ ] # time
$_ [ ] # req
(\S+) [ ] # status
(\S+) # size
}msx
} @re_unquote;
my @re_combined = map { qr/\G\s+$_ $_/ms } @re_unquote;
sub parse_line {
my $line = shift;
my %rec;
my $escaped = !( index( $line, '\"' ) < 0 );
@rec{@common} = ( $line =~ m/$re_common[$escaped]/gc );
@rec{@combined} = ( $line =~ m/$re_combined[$escaped]/ );
return \%rec;
}LTSV makes it as simple as the following:
sub parse_line_ltsv {
+{ map { split ':', $_, 2 } split "\t", shift };
}To log in LTSV instead of Common Log Format on Apache HTTP Server, use the following directive.
LogFormat "host:%h\tident:%l\tuser:%u\ttime:%t\treq:%r\tstatus:%>s\tsize:%b\treferer:\%{Referer}i\tua:%{User-Agent}i" combined_ltsvSee also
- Tab-separated values
- Delimiter-separated values