FM-index

In computer science, an FM-index is a compressed full-text substring index based on the Burrows–Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,^[1] who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for Full-text index in Minute space.^[2]

It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. The query time, as well as the required storage space, has a sublinear complexity with respect to the size of the input data.

The original authors have devised improvements to their original approach and dubbed it "FM-Index version 2".^[3] A further improvement, the alphabet-friendly FM-index, combines the use of compression boosting and wavelet trees^[4] to significantly reduce the space usage for large alphabets.

The FM-index has found use in, among other places, bioinformatics.^[5]

[1]

[2]

[3]

[4]

[5]