It’s one thing to collect high volume NetFlow and quite another to report on it and that’s really where the rub lies. The amount of engineering necessary to collect millions of flows per second and write them to some type of backend really isn’t all that hard for a seasoned developer. Getting the data back out of the storage in a timely manner, is where expert design comes into play. And really, how fast is fast? Well, Google sort of sets that bar and they are generally pretty much instant with results. Drawing comparisons to Google however, really isn’t all that fare. First of all, the Google architecture is supported by thousands of servers. Most enterprise NetFlow collection consumers don’t have the deep pockets necessary to build out “a similar to Google” architecture. They want it all on one system or at least perceived as one system.
A single system however, is not always the best choice. Sometimes sending all the flows back from a remote location over a WAN can cause congestion. In these situations, distributed collectors makes the most sense. Enterprise NetFlow systems will provide single interface for viewing the data across dozens of collectors all while keeping the distribution transparent to the user. In other situations, sending the flows up to the cloud makes the most sense where a cluster of servers can support the collection. The vendor can help you decide on which strategy is the best fit for your environment.
If you speak with the vendor, some will tell you that their systems scales to millions of flows per second while dropping nothing. The question is: how do you know if they are or aren’t dropping flows? To reliably answer this, the system should be counting and reporting on the flow sequence numbers. If the collector is missing flow sequence numbers (MFSNs) from a specific router and only one router, the problem is likely the router or the network. If the collector is seeing MFSNs across all routers that are sending flows, then the problem starts to look like the collector not keeping up. Without monitoring for MFSNs, we can’t be sure if the collector is collecting everything we expect it to.
When problems strike, a fast responsive interface is great but, it needs to be combined with ease of filtering. The ability to quickly include this and exclude that really speeds up the investigative process. All vendors claim the best interface. Make sure you try it out for yourself and watch any short tutorials they may have to help you appreciate the vendor specific features.
Need more information? Visit the Free NetFlow web site to learn more about what to expect from the best free NetFlow collector.
As learned from the part 1 blog above, the phrase “Flow Direction” can mean at least two things in the world of NetFlow and IPFIX. To the security professional, flow direction means “who started it”. To the engineer who is developing software to export IPFIX, flow direction tells us where the flow was metered and was it done ingress or egress. This post is about the later.
Here’s IANA’s IPFIX definition of flow direction: The direction of the Flow observed at the Observation Point. There are only two values defined – ingress and egress.
Above, we can see that another term "observation point" has been introduced. As a result, in order to explain flow direction, we need to understand what an observation point (OP) is. The OP tells us where the flow was metered and by metered I mean collected. In an effort to put a visual to these terms, lets consider the diagram below:
NOTICE ABOVE:
Given the above, traffic leaving the host is considered ingress to the router's eth0 and egresses from the router's eth1. The traffic coming back from the cloud is ingress to the router's eth1 and egresses from the router's eth0. As a result, x and y can potentially report both forward and return traffic. Therefore, in IPFIX we need the flowDirection information element because if we are talking about a single interface on the router, we can determine if the metering was done as the traffic was coming ingress which is indicated by 0x00 or if it was coming egress 0x01.
There are some "post-" information elements which report egress traffic counts. The idea is that a single observation point "x" follows traffic through the routing process which can report both the original and post- counters:
Keep in mind that it's unnecessary to have "post-" versions of all the information elements, or to duplicate every element according to whether it observes incoming or outgoing traffic. Developers can simply create an Observation Point on the outgoing interface and report the ingress Information Elements.
NOTE: Above, the flow collector will not know the location of the OP or whether the flow was collected ingress or egress. Knowing this could become important for traffic analysis reasons but, there is no metadata currently available on this. If there was, certainly the sequence of events would become important as well.
Suppose that the observation points are associated with a traffic filter which discards packets. It's crucial to know whether the reported counts represent the original traffic ingressing the filter, or the filtered traffic egressing the filter. Therefore we define the observation as the "incoming" count for consistency and to avoid ambiguity.
MAC addresses may be useful for simple observations based on router interfaces. However, if you're following traffic through a sequence of processes and you want to report the count between each of them, you'll need some other way of identifying the specific observation points.
For example, if you put filtering and sampling processes ahead of your switching infra and you report the layer2FrameTotalCount between each process:
Each of these observation points would report the same MAC addresses, which means they could only be distinguished by their observation point IDs. Working out the details on how to export this series of observations could end up being a fun project!
Questions? Reach out to me or my friend Paul Aitken who was a HUGE help on this post. I’ve written on this topic a few times and with each blog I’d like to think that I’m getting closer to helping those struggling with this area of IPFIX. I wrote a post titled ingress or egress back in June of 2009 then again here in Feb of 2012 and again here in May of 2013. I guess I’ll keep trying.
]]>