Skip to main content

Data Structures and Algorithms

Log Structured Merge Trees (LSM)

A data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data

Skip Lists

A probabilistic data structure that allows O(logn){\mathcal {O}}(\log n) search complexity as well as O(logn){\mathcal {O}}(\log n) insertion complexity within an ordered sequence of nn elements. Thus it can get the best features of a sorted array (for searching) while maintaining a linked list-like structure that allows insertion, which is not possible with a static array.

Bloom Filters

A space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

https://en.wikipedia.org/wiki/Bloom_filter

HyperLogLog

HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset.[1] Calculating the exact cardinality of the unique elements of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical accuracy (standard error) of 2%, using 1.5 kB of memory

https://en.wikipedia.org/wiki/HyperLogLog

Geospatial

GeoHash

Geohash - Wikipedia

QuadTree

Quadtree - Wikipedia

Google S2

Overview | S2Geometry