barcodactyl

Split barcoded reads into per-barcode FASTQ, SAM, or BAM files.

Streaming Regex-based FASTQ(.gz), SAM, BAM

Features

  • Supports FASTQ(.gz), SAM, and BAM input
  • Outputs per-barcode files plus an unassigned bin
  • Preserves input format by default (FASTQ→FASTQ, SAM→SAM, BAM→BAM)
  • Regex-based barcode detection (default: _barcodeNN, checks read name and RG tag)
  • Streaming implementation for large datasets

Install

pip install barcodactyl

Usage

# Default: output format matches input
barcodactyl reads.fastq -o out/

# Force output format
barcodactyl reads.bam -o out/ --out-format sam

# Add prefix
barcodactyl reads.fastq --prefix run1_

# Custom regex
barcodactyl reads.sam --pattern "_bc(\d{2})"

Output summary

=== Per-barcode counts ===
barcode01    15,423
barcode02    18,902
unassigned      324
Total reads written: 34,649