During the last three months of 2016, REANNZ had an intern from the Networked System Research Lab of the University of Glasgow working with us, Richard Cziva. Richard has been thinking about how to better understand the quality of the service perceived by our members.
The issue is that while there are many network monitoring systems (e.g. network-wide SNMP or Netflow counters), these systems usually focus on low-level layers of the network (by looking at packet drops, link statuses etc.) and only provide a coarse grain view of the particular domain using metrics that don't give you an idea of how users actually experience network performance.
When research and education traffic crosses domain boundaries, it is affected by inter-domain routing changes, configuration errors and congestions in transit domains. This is especially important for us in New Zealand, because our geographic isolation means that means that traffic usually has to cross more than one boundary to get to its destination, amplifying these effects. To understand multi-domain performance, the perfSONAR project introduced a worldwide measurement infrastructure that enables specific measurements such as throughput, round-trip time (RTT), one-way delay and traceroute. That sounds great, but in practice it is mostly used to detect packet loss in unexpected places which it does by generating synthetic traffic, it also only provides measurements between a set of hosts which can be quite far away from actual end-users, so it too doesn't give you an idea of how users actually experience the network.
In order to gain a better understanding of user perceived traffic, Richard has designed Ruru, a real-time, passive monitoring system developed from scratch at REANNZ. Ruru runs on a commodity server using a DPDK-enabled network card and calculates round-trip time (RTT) on all of a users' individual TCP flows to understand wide-area latency. It also maps source and destination IP addresses to geographical locations (city/country) as well as to AS numbers and visualises these measurements in real-time. Ruru aggregates statistics by source and destination locations and AS numbers and allows a real-time understanding of wide-area traffic.
This image shows Ruru mapping traffic source and destination. The colours represent the round-trip time (RTT).
Ruru design and architecture:
Ruru works by analysing all traffic that is directed to it (this could be done by tapping a physical link or mirroring traffic to it from an edge router). The RTT measurements are performed in an Intel Data-Plane Development Kit (DPDK) application by tracking SYN/SYN-ACK/first ACK packets of all TCP flows in a highly optimised, multi-core way. The application calculates the RTT between the source host and the tap point between the tap point and the destination. The sum of these two values is essentially the end-to-end RTT perceived by the end user, however both parts of the RTTs can be very interesting, depending on the deployment scenario.
The DPDK application publishes RTT measurements on ZeroMQ sockets to our module that retrieves geolocation and AS numbers for the source and destination IPs using databases from IP2Location. After that, the enriched measurements are sent to our frontend server, where statistics are calculated and individual latency measurements are forwarded to the user's browser via websockets.
In the browser, a high-performance map is used to draw all individual latency measurements received from a websocket. The map itself is using WebGL and is capable of drawing thousands of lines per second. As well as a live map, the UI also presents the aggregated statistics (min, max, median, average etc.) and options to search between statistics. The statistics provided can be used to identify poorly performing source or destination ASs regions, as well as newly introduced low-latency CDNs.
Ruru is designed to process measurements as a modular pipeline, where new modules can be intorduced easily. For instance, one could easily add a filter that only keeps NREN traffic (by filtering on AS numbers) in the pipeline without modifying the measurement of the geo-localiser modules.
Presentations on Ruru:
Ruru has gathered attention from both the NREN and networking research community. Presentations so far include the SCOttish Networking Event (SCONE) in Glasgow and the upcoming eResearch NZ conference in Queenstown.
Richard Cziva is a PhD candidate from the Networked System Research Lab of the University of Glasgow with international publications in the research topics of Software-Defined Networking and Network Function Virtualization. Richard has been working as an intern with REANNZ, funded by both REANNZ and the Graduate School Mobility Scholarship of the College of Science and Engineering, University of Glasgow. Richard has previously worked at NORDUnet (the Nordic NREN) in Copenhagen, where he developed SDN/Openflow facilities for the GÉANT Testbed project.
All member traffic information included in this project is anonymous. If you have any questions about this please contact REANNZ.