Skip to content
Merged

Dev #21

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# tsvkit

`tsvkit` is a fast, ergonomic toolkit for working with TSV tables. Written in Rust, it brings familiar data-wrangling verbs (join, cut, filter, mutate, summarize, reshape, slice, pretty-print) to the command line with consistent column selection, rich expressions, and streaming-friendly performance. The CLI is inspired by projects such as `csvtk`, `csvkit`, `datamash`, `awk`, `xsv`, and `mlr`, and many options are intentionally compatible with `csvtk` so existing users can transition quickly.
`tsvkit` is a fast, ergonomic toolkit for working with TSV tables. Written in Rust, it brings familiar data-wrangling verbs (join, cut, filter, mutate, summarize, reshape, slice, pretty-print) to the command line with consistent column selection, rich expressions, and streaming-friendly performance. The CLI is inspired by projects such as `csvtk`, `csvkit`, `datamash`, `awk`, `xsv`, and `mlr`, and many options are intentionally compatible with `csvtk` so existing users can adapt quickly.

## Table of Contents
- [Overview](#overview)
Expand Down Expand Up @@ -30,7 +30,7 @@
- [Additional tips](#additional-tips)

## Overview
`tsvkit` combines versatile column selection with an expression engine for statistics, filtering, and data transformation. This makes it straightforward to generate matrices from `samtools idxstats` or `featureCounts`, compute multi-column summaries, or pipe TSV/Excel data through complex workflows without leaving the shell. Multi-sheet Excel workbooks are supported alongside `.tsv`, `.tsv.gz`, and `.tsv.xz` files.
`tsvkit` combines versatile column selection with an expression engine for statistics, filtering, and data transformation. This makes it straightforward to join multiple files and select column from each file to generate data matrix (e.g. gene count table), filter row based on selected columns, compute multi-column summaries, or pipe TSV/Excel data through complex workflows without leaving the shell. Multi-sheet Excel workbooks are supported alongside `.tsv`, `.tsv.gz`, and `.tsv.xz` files.

### Key features
- Stream-friendly processing; every command reads from files or standard input and writes to standard output.
Expand Down
12 changes: 8 additions & 4 deletions src/filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -98,16 +98,20 @@ pub fn run(args: FilterArgs) -> Result<()> {
.collect::<Vec<_>>();
let bound = bind_expression(expr_ast, &headers, false)?;
let expected_width = headers.len();

if !headers.is_empty() {
writeln!(writer, "{}", headers.join("\t"))?;
}
let header_line = (!headers.is_empty()).then(|| headers.join("\t"));
let mut header_written = false;
for record in reader.records() {
let record = record.with_context(|| format!("failed reading from {:?}", args.file))?;
if should_skip_record(&record, &input_opts, Some(expected_width)) {
continue;
}
if evaluate(&bound, &record) {
if !header_written {
if let Some(line) = header_line.as_ref() {
writeln!(writer, "{}", line)?;
}
header_written = true;
}
emit_record(&record, &mut writer)?;
}
}
Expand Down