This package provides k-mer serialization methods for the package kmers,
TaxIds of k-mers are optionally saved, while there’s no frequency information.
K-mers (represented in
uint64 in RAM ) are serialized in 8-Byte
(or less Bytes for shorter k-mers in compact format,
or much less Bytes for sorted k-mers) arrays and
optionally compressed in gzip format with extension of
TaxIds are optionally stored next to k-mers with 4 or less bytes.
Compression rate comparison
No TaxIds stored in this test.
||✔||gzipped plain text|
||✔||✔||gzipped encoded k-mers in fixed-length byte array|
||✔||✔||✔||gzipped encoded k-mers in shorter fixed-length byte array|
||✔||✔||✔||gzipped sorted encoded k-mers|
- a One k-mer is encoded as
uint64and serialized in 8 Bytes.
- b K-mers file is compressed in gzip format by default,
users can switch on global option
-C/--no-compressto output non-compressed file.
- c One k-mer is encoded as
uint64and serialized in 8 Bytes by default.
However few Bytes are needed for short k-mers, e.g., 4 Bytes are enough for
15-mers (30 bits). This makes the file more compact with smaller file size,
controled by global option
- d One k-mer is encoded as
uint64, all k-mers are sorted and compressed
using varint-GB algorithm.
- In all test, flag
--canonicalis ON when running
This package was originally maintained in unikmer.
The magic number of binary format is still
.unikmer for keeping compatibility.