Benchmarking packet classification algorithms can be tricky. While a synthetic combination of packet traces and classification rules can be generated using ClassBench, it is almost impossible to find a real-world packet trace alongside the rules used for classifying its packets. Even if you have access to real-world data, it might not be enough for your needs. This was the case in our work: we designed a new packet classification algorithm that could scale the number of classification rules, and wanted to test it on a real-world data-set with a very large number of rules. When we got to the point of benchmarking, we contacted several organizations and asked them to share their data annonymously. Those that agreeed had used considerably less rules than what we desired.
Since we could not find a large enough set of real-world classification rules, we had to generate one synthetically. Nevertheless, we still wanted to test our algorithm on real-world packet traces (e.g., the ones from CAIDA). Thing is, you cannot test a packet classification algorithm on a traffic unrelated to the classification rules, since you cannot guarantee the rule-match distribution. For example, it is possible that all the packets will match the same rule. This renders the benchmark as ineffective: since the single matched rule will most probably get cached, the algorithm will never access the RAM. The performance results would be wonderful, but irrelevant.
One solution is to modify the trace packet headers to match the rules in the generated rule-set. This raises another question: what is the correct way to map between the packets and the rules? Can we say that the packets match 100% of the rules? And if so, in what order? Given a packet, which rule should it match? This is a difficult question to answer, as there are many parameters that can be tuned. At the end, we decided that all rules will be matched eventually. It makes sense: given a long-enough packet trace, if there is a rule that never get matched it can be considered as dead, and practicaly can be removed from the rule-set (which misses the point). Moreover, the mapping from packets to rules would have to be random, as the information on the relation between the two is lacking.
At the end, we took a real-world packet trace from CAIDA, exracted the 5-tuples from its packet headers, and uniformly mapped each unique 5-tuple to a matching rule. Note that this method preserves the traffic characteristis, including its temporal locality and inter-packet delay. To do the extraction and mapping effeciently, we developed these tools:
PCAP File Analyzer (GitHub)
Given a PCAP file with a trace of packets, this tool effectively assignes a unique integer for each unique 5-tuple header. The list of integers represent the packets’ temporal locality and can be saved as a textual file. The tool can also be used to extract the inter-packet delay (in usec) and packet sizes (in bytes) from the PCAP file.
ClassBench to 5-Tuple Packet Map Generator (GitHub)
Given a ClassBench rule-set, this tool calculates a unique 5-tuple per rule that matches that rule (and not any other rule). The output is saved as a textual file in the following format:
[#rule]: [ip-protocol] [src-ip] [dst-ip] [src-port] [dst-port]
ClassBench to Open vSwitch (OVS) RuleSet Mapper (GitHub)
A simple tool that takes a ClassBench ruleset file and creates a textual file with OpenFlow rules that can be lodaed into OVS.
DPDK Packet Generator (GitHub)
A native DPDK app that reads the abovementioned text files (locality, inter-packet-delay, packet-sizes, packet-map) and generates packets on the fly on a DPDK interface. The packes do not hold any meaningful payload; this is okay, as we only want to test the packet classification technique. This tool can also be used to generate packets with a constant/adaptive rates.
We used these tools for benchmarking our paper Scaling Open vSwitch with a Computational Cahce (USENIX, NSDI’22). Please cite us in case you make use of any of these tools.