It’s one thing to collect high volume NetFlow and quite another to report on it and that’s really where the rub lies. The amount of engineering necessary to collect millions of flows per second and write them to some type of backend really isn’t all that hard for a seasoned developer. Getting the data back out of the storage in a timely manner, is where expert design comes into play. And really, how fast is fast? Well, Google sort of sets that bar and they are generally pretty much instant with results. Drawing comparisons to Google however, really isn’t all that fare. First of all, the Google architecture is supported by thousands of servers. Most enterprise NetFlow collection consumers don’t have the deep pockets necessary to build out “a similar to Google” architecture. They want it all on one system or at least perceived as one system.
A single system however, is not always the best choice. Sometimes sending all the flows back from a remote location over a WAN can cause congestion. In these situations, distributed collectors makes the most sense. Enterprise NetFlow systems will provide single interface for viewing the data across dozens of collectors all while keeping the distribution transparent to the user. In other situations, sending the flows up to the cloud makes the most sense where a cluster of servers can support the collection. The vendor can help you decide on which strategy is the best fit for your environment.
If you speak with the vendor, some will tell you that their systems scales to millions of flows per second while dropping nothing. The question is: how do you know if they are or aren’t dropping flows? To reliably answer this, the system should be counting and reporting on the flow sequence numbers. If the collector is missing flow sequence numbers (MFSNs) from a specific router and only one router, the problem is likely the router or the network. If the collector is seeing MFSNs across all routers that are sending flows, then the problem starts to look like the collector not keeping up. Without monitoring for MFSNs, we can’t be sure if the collector is collecting everything we expect it to.
When problems strike, a fast responsive interface is great but, it needs to be combined with ease of filtering. The ability to quickly include this and exclude that really speeds up the investigative process. All vendors claim the best interface. Make sure you try it out for yourself and watch any short tutorials they may have to help you appreciate the vendor specific features.
With VoIP, BitTorrent, Skype, iCloud and the like now on the network, administrators are dealing with even more flows. On the NetFlow and IPFIX reporting side of things, vendors often find that 2-3 issues come into play when scaling NetFlow tools:
High speed NetFlow collection can lead to very large database tables. Large tables, if not indexed or queried correctly can lead to poor performance in traffic analysis reporting. As a consumer, how a vendor deals with enormous amounts of flow data can and should be part of the vendor selection process.
High NetFlow volumes does not necessarily mean you have to use multiple distributed NetFlow collectors. Many NetFlow and IPFIX collectors can handle tens of thousands or even over one hundred thousand flows per second with a single appliance (e.g. Scrutinizer). Distributed NetFlow collection should be configured when sending all of the flows over a wide area link doesn’t make sense. Enterprise NetFlow analysis requires a careful understanding of the IT managers goal, the budget constraints and the potential bottle neck areas on the network.
Work with your vendor to determine if a single flow collector or if distributed NetFlow collection is in your companies best interest. Beware of the necessary add-on modules and remember to ask about the yearly maintenance cost.
Join NetFlow Developments on Linkedin.
]]>