k-mer

In bioinformatics, k-mers are substrings of length $k$ contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides (i.e. A, T, G, and C), k-mers are capitalized upon to assemble DNA sequences,^[1] improve heterologous gene expression,^[2]^[3] identify species in metagenomic samples,^[4] and create attenuated vaccines.^[5] Usually, the term k-mer refers to all of a sequence's subsequences of length $k$ , such that the sequence AGAT would have four monomers (A, G, A, and T), three 2-mers (AG, GA, AT), two 3-mers (AGA and GAT) and one 4-mer (AGAT). More generally, a sequence of length $L$ will have $L-k+1$ k-mers and $n^{k}$ total possible k-mers, where $n$ is number of possible monomers (e.g. four in the case of DNA).

[1]

[2]

[3]

[4]

[5]

k-mer

Substrings of length k contained in a biological sequence / From Wikipedia, the free encyclopedia

Dear Wikiwand AI, let's keep it short by simply answering these key questions: