Range matching (RM) is a crucial component in computer systems, widely used in address translation, hard drives, network switches, and many more applications. RM is performed whenever one wishes to locate a range that contains an input number, given a large set of ranges. Any page-based mechanism uses RM, as pages are basically ranges. Longest prefix matching (LPM) uses ternary rules, which are also ranges. Firewalls are one example of multidimensional RM since ACL rules consist of several wildcarded fields that can be represented as ranges.
Existing algorithms for RM are either page-based, tree-based, or hash-table-based. Either way, they all rely on pointer-chasing techniques, which impede their scalability to large range sets that spill out of the CPU core cache. Furthermore, their inherent lack of access locality and the data-dependent nature of their memory accesses make hardware prefetchers ineffective.
Our research focuses on a new data structure, the Range Query Recursive Model Index (RQ-RMI). RQ-RMI uses shallow neural-nets (NN) for learning the distribution of a given set of ranges, turning the costly lookup operations into NN inference. Crucially, the RQ-RMI training algorithm guarantees a tight bound on its lookup latency, ensures its correctness, and converges fast: it can index hundreds of thousands of ranges in a few hundred milliseconds using a single CPU core.