K-FluDB v1.0

Data highlights

Datasets

Funding

K-FluDB is a high-performance k-mer database that optimizes Influenza A surveillance by compressing genomic data by 99.64%. By eliminating redundancy across 50 subtype combinations, it achieves >99% accuracy in subtype identification and provides critical insights into viral reassortment and evolution for proactive public health response.

You can read the published article here: K-FluDB

Any input sequences can be used with this algorithm to generate a compressed version of the data, enabling enhanced subtype-specific or species-specific annotation via a Bowtie2 index. Custom iterations of this pipeline can be adapted for any data type, including sequences from partner pathogen nodes such as Netherlands Arboviruses, Norway Databases, Spanish Pathogens Portal (RELECOV 2.0), Sweden (SLU) and Swiss Pathogens Portal.

Availability and implementation Three versions of K-FluDB, optimized for read lengths of 75, 150, and 300 nucleotides, are freely available at https://zenodo.org/records/17203072, and the source code is available at https://github.com/usjunco/pange

v1.0