Showing posts with label NetFlow. Show all posts
Showing posts with label NetFlow. Show all posts

Tuesday, June 14, 2016

Merchant silicon based routing, flow analytics, and telemetry

Drivers for growth describes how switches built on merchant silicon from Broadcom ASICs dominate the current generation of data center switches, reduce hardware costs, and support an open ecosystem of switch operating systems (Cumulus Linux, OpenSwitch, Dell OS10, Broadcom FASTPATH, Pica8 PicOS, Open Network Linux, etc.).

The router market is poised to be similarly disrupted with the introduction of devices based on Broadcom's Jericho ASIC, which has the capacity to handle over 1 million routes in hardware (the full Internet routing table is currently around 600,000 routes).
An edge router is a very pricey box indeed, often costing anywhere from $100,000 to $200,000 per 100 Gb/sec port, depending on features in the router and not including optical cables that are also terribly expensive. Moreover, these routers might only be able to cram 80 ports into a half rack or full rack of space. The 7500R universal spine and 7280R universal leaf switches cost on the order of $3,000 per 100 Gb/sec port, and they are considerably denser and less expensive. - Leaving Fixed Function Switches Behind For Universal Leafs
Broadcom Jericho ASICs are currently available in Arista 7500R/7280R routers and in Cisco NCS 5000 series routers. Expect further disruption to the router market when white box versions of the 1U router hardware enter the market.
There was general enthusiasm for Broadcom Jericho based routers in a recent discussion on the North American Network Operators' Group (NANOG) mailing list, Arista Routing Solutions, so merchant silicon based routers should be expected to sell well.
The Broadcom Jericho ASICs also include hardware instrumentation to support industry standard sFlow traffic monitoring and streaming telemetry. For example, the following commands enable sFlow on all ports on an Arista router:
sflow source-interface Management1
sflow destination 170.1.1.11
sflow polling-interval 30
sflow sample 65535
sflow run
See EOS System Configuration Guide for details.

While Cisco supports standard sFlow on merchant silicon based switch platforms, see Cisco adds sFlow support, Cisco adds sFlow support to Nexus 9K series, and Cisco SF250, SG250, SF350, SG350, SG350XG, and SG550XG series switches. Unfortunately, IOS XR on Cisco's Jericho based routers doesn't yet support sFlow. Instead, a complex set of commands is required to configure Cisco's proprietary NetFlow and streaming telemetry protocols:
RP/0/RP0/CPU0:router#config
RP/0/RP0/CPU0:router(config)#flow exporter-map exp1
RP/0/RP0/CPU0:router(config-fem)#version v9
RP/0/RP0/CPU0:router(config-fem-ver)#options interface-table timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#options sampler-table timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#template data timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#template options timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#exit 
RP/0/RP0/CPU0:router(config-fem)#transport udp 12515
RP/0/RP0/CPU0:router(config-fem)#source Loopback0
RP/0/RP0/CPU0:router(config-fem)#destination 170.1.1.11
RP/0/RP0/CPU0:router(config-fmm)#exit
RP/0/RP0/CPU0:router(config)#flow monitor-map MPLS-IPv6-fmm
RP/0/RP0/CPU0:router(config-fmm)#record mpls ipv6-fields labels 3
RP/0/RP0/CPU0:router(config-fmm)#exporter exp1
RP/0/RP0/CPU0:router(config-fmm)#cache entries 10000
RP/0/RP0/CPU0:router(config-fmm)#cache permanent
RP/0/RP0/CPU0:router(config-fmm)#exit
RP/0/RP0/CPU0:router(config)#sampler-map FSM
RP/0/RP0/CPU0:router(config-sm)#random 1 out-of 65535
RP/0/RP0/CPU0:router(config-sm)# exit
And further commands are needed to enable monitoring on each interface (and there can be a large number of interfaces given the high port density of these routers):
RP/0/RP0/CPU0:router(config)#interface HundredGigE 0/3/0/0
RP/0/RP0/CPU0:router(config-if)#flow mpls monitor MPLS-IPv6-fmm sampler FSM ingress
See Netflow Configuration Guide for Cisco NCS 5500 Series Routers, IOS XR Release 6.0.x for configuration details and limitations.

We are still not done, further steps are required to enable the equivalent to sFlow's streaming telemetry.

Create policy file defining the counters to export:
{
 "Name": "Test",
 "Metadata": {
  "Version": 25,
  "Description": "This is a sample policy",
  "Comment": "This is the first draft",
  "Identifier": "data that may be sent by the encoder to the mgmt stn"
 },
 "CollectionGroups": {
  "FirstGroup": {
  "Period": 30,
  "Paths": [
   "RootOper.InfraStatistics.Interface(*).Latest.GenericCounters"
   ]
  }
 }
}
Copy the policy file to router:
$ scp Test.policy cisco@170.1.1.1:/telemetry/policies
Finally, configure the JSON encoder:
Router# configure
Router(config)#telemetry encoder json
Router(config-telemetry-json)#policy group FirstGroup
Router(config-policy-group)#policy Test
Router(config-policy-group)#destination ipv4 170.1.1.11 port 5555
Router(config-policy-group)#commit
See Cisco IOS XR Telemetry Configuration Guide for details.
Software defined analytics describes how the sFlow architecture disaggregates the flow analytics pipeline and integrates telemetry export to reduce complexity and increase flexibility. The reduced configuration complexity is clearly illustrated by the two configuration examples above.

Unlike the complex and disparate monitoring mechanisms in IOS XR, sFlow offers a simple, flexible and unified monitoring solution that exposes the full monitoring capabilities of the Broadcom Jericho ASIC. Expect a future release of IOS XR to add the sFlow support since sFlow a natural fit for the hardware capabilities of Jericho based router platforms and the addition of sFlow support will provide feature parity with Cisco's merchant silicon based switches.

Finally, the real-time visibility provided by sFlow supports a number of important use cases for high performance routers, including:
  • DDoS mitigation
  • Load balancing ECMP paths
  • BGP route analytics
  • Traffic engineering
  • Usage based accounting
  • Enforcing usage quotas

Thursday, June 2, 2016

OVS Orbit podcast with Ben Pfaff

OVS Orbit Episode 6 is a wide ranging discussion between Ben Pfaff and Peter Phaal of the industry standard sFlow measurement protocol, implementation of sFlow in Open vSwitch, network analytics use cases and application areas supported by sFlow, including: OpenStack, Open Network Virtualization (OVN), DDoS mitigation, ECMP load balancing, Elephant and Mice flows, Docker containers, Network Function Virtualization (NFV), and microservices.

Follow the link to see listen to the podcast, read the extensive show notes, follow related links, and to subscribe to the podcast.

Friday, May 6, 2016

sFlow to IPFIX/NetFlow

RESTflow explains how the sFlow architecture shifts the flow cache from devices to external software and describes how the sFlow-RT REST API can be used to program and query flow caches. Exporting events using syslog describes how flow records can be exported using the syslog protocol to Security Information and Event Management (SIEM) tools such as Logstash and and Splunk. This article demonstrates how sFlow-RT can be used to define and export the flows using the IP Flow Information eXport (IPFIX) protocol (the IETF standard based on NetFlow version 9).

For example, the following command defines a cache that will maintain flow records for TCP flows on the network, capturing IP source and destination addresses, source and destination port numbers and the bytes transferred and sending flow records to address 10.0.0.162:
curl -H "Content-Type:application/json" -X PUT --data \ '{"keys":"ipsource,ipdestination,tcpsourceport,tcpdestinationport", \
"value":"bytes", "ipfixCollectors":["10.0.0.162"]}' \
http://localhost:8008/flow/tcp/json
Running Wireshark's tshark command line utility on 10.0.0.162 verifies that flows are being received:
# tshark -i eth0 -V udp port 4739
Running as user "root" and group "root". This could be dangerous.
Capturing on lo
Frame 1 (134 bytes on wire, 134 bytes captured)
    Arrival Time: Aug 24, 2013 10:44:06.096082000
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 134 bytes
    Capture Length: 134 bytes
    [Frame is marked: False]
    [Protocols in frame: eth:ip:udp:cflow]
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Destination: 00:00:00_00:00:00 (00:00:00:00:00:00)
        Address: 00:00:00_00:00:00 (00:00:00:00:00:00)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: 00:00:00_00:00:00 (00:00:00:00:00:00)
        Address: 00:00:00_00:00:00 (00:00:00:00:00:00)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: IP (0x0800)
Internet Protocol, Src: 10.0.0.162 (10.0.0.162), Dst: 10.0.0.162 (10.0.0.162)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 120
    Identification: 0x0000 (0)
    Flags: 0x02 (Don't Fragment)
        0.. = Reserved bit: Not Set
        .1. = Don't fragment: Set
        ..0 = More fragments: Not Set
    Fragment offset: 0
    Time to live: 64
    Protocol: UDP (0x11)
    Header checksum: 0x2532 [correct]
        [Good: True]
        [Bad : False]
    Source: 10.0.0.162 (10.0.0.162)
    Destination: 10.0.0.162 (10.0.0.162)
User Datagram Protocol, Src Port: 56109 (56109), Dst Port: ipfix (4739)
    Source port: 56109 (56109)
    Destination port: ipfix (4739)
    Length: 100
    Checksum: 0x15b9 [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
Cisco NetFlow/IPFIX
    Version: 10
    Length: 92
    Timestamp: Aug 24, 2013 10:44:06.000000000
        ExportTime: 1377366246
    FlowSequence: 74
    Observation Domain Id: 0
    Set 1
        Template FlowSet: 2
        FlowSet Length: 40
        Template (Id = 258, Count = 8)
            Template Id: 258
            Field Count: 8
            Field (1/8)
                .000 0000 1000 0010 = Type: exporterIPv4Address (130)
                Length: 4
            Field (2/8)
                .000 0000 1001 0110 = Type: flowStartSeconds (150)
                Length: 4
            Field (3/8)
                .000 0000 1001 0111 = Type: flowEndSeconds (151)
                Length: 4
            Field (4/8)
                .000 0000 0000 1000 = Type: IP_SRC_ADDR (8)
                Length: 4
            Field (5/8)
                .000 0000 0000 1100 = Type: IP_DST_ADDR (12)
                Length: 4
            Field (6/8)
                .000 0000 1011 0110 = Type: TCP_SRC_PORT (182)
                Length: 2
            Field (7/8)
                .000 0000 1011 0111 = Type: TCP_DST_PORT (183)
                Length: 2
            Field (8/8)
                .000 0000 0101 0101 = Type: BYTES_TOTAL (85)
                Length: 8
    Set 2
        DataRecord (Template Id): 258
        DataRecord Length: 36
        Flow 1
            ExporterAddr: 10.0.0.20 (10.0.0.20)
            [Duration: 65.000000000 seconds]
                StartTime: Aug 24, 2013 10:43:01.000000000
                EndTime: Aug 24, 2013 10:44:06.000000000
            SrcAddr: 10.0.0.16 (10.0.0.16)
            DstAddr: 10.0.0.20 (10.0.0.20)
            SrcPort: 48859
            DstPort: 443
            Octets: 228045
The output demonstrates how the flow cache definition is exported as an IPFIX Template and the individual flow records are exported as one or more Flow entries within a DataRecord.

What might not be apparent is that the single configuration command to sFlow-RT enabled network wide monitoring of TCP connections, even in a network containing hundreds of physical switches, thousands of virtual switches, different switch models, multiple vendors etc. In contrast, if devices maintain their own flow caches then each switch needs to be re-configured whenever monitoring requirements change - typically a time consuming and complex manual process, see Software defined analytics.
While IPFIX provides a useful method of exporting IP flow records to legacy monitoring solutions, logging flow records is only a small subset of the applications for sFlow analytics. The real-time networking, server, and application analytics provided by sFlow-RT delivers actionable data through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Tuesday, September 8, 2015

Cisco adds sFlow support to Nexus 9K series

Cisco adds support for the sFlow standard in the Cisco Nexus 9000 Series 7.0(3)I2(1) NX-OS Release. Combined with the Nexus 3000/3100 series, which have included sFlow support since NX-OS 5.0(3)U4(1),  Cisco now offers cost effective, built-in, visibility across the full spectrum of data center switches.
Cisco network engineers might not be familiar with the multi-vendor sFlow technology since it is a relatively new addition to Cisco products. The article, Cisco adds sFlow support, describes some of the key features of sFlow and contrasts them to Cisco NetFlow.
Nexus 9000 switches can be operated in NX-OS mode or ACI mode:
  • NX-OS mode includes a number of open features such as sFlow, Python, NX-API, and Bash that integrate with an open ecosystem of orchestration tools such as Puppet, Chef, CFEngine, and Ansible. "By embracing the open culture of development and operations (DevOps) and creating a more Linux-like environment in the Cisco Nexus 9000 Series, Cisco enables IT departments with strong Linux skill sets to meet business needs efficiently," Cisco Nexus 9000 Series Switches: Integrate Programmability into Your Data Center. Open APIs are becoming increasingly popular, preventing vendor lock-in, and allowing organizations to benefit from the rapidly increasing range of open hardware and software solutions to reduce costs and increase agility.
  • ACI mode is a closed solution that relies on proprietary hardware and places the switches under the control of Cisco's APIC (Application Policy Infrastructure Controller) - eliminating many of the features, including sFlow, available in NX-OS mode. The ACI solution is more expensive and the closed platform locks customers into Cisco hardware and solutions.
SDN fabric controllers compares tightly coupled (ACI) and loosely federated (NX-OS) approaches to virtualizing data center networking and there are a number of articles on this blog exploring use cases for real-time sFlow analytics in the data center.

Monday, December 1, 2014

Open vSwitch 2014 Fall Conference


Open vSwitch is an open source software virtual switch that is popular in cloud environments such as OpenStack. Open vSwitch is a standard Linux component that forms the basis of a number of commercial and open source solutions for network virtualization, tenant isolation, and network function virtualization (NFV) - implementing distributed virtual firewalls and routers.

The recent Open vSwitch 2014 Fall Conference agenda included a wide variety speakers addressing a range of topics, including: large scale operation experiences at Rackspace, implementing stateful firewalls, Docker networking,  and acceleration technologies (Intel DPDK and Netmap/VALE).

The video above is a recording of the following sFlow related talk from the conference:
Traffic visibility and control with sFlow (Peter Phaal, InMon)
sFlow instrumentation has been included in Open vSwitch since version 0.99.1 (released 25 Jan 2010). This talk will introduce the sFlow architecture and discuss how it differs from NetFlow/IPFIX, particularly in regards to delivering real-time flow analytics to an SDN controller. The talk will demonstrate that sFlow measurements from Open vSwitch are identical to sFlow measurements made in hardware on bare metal switches, providing unified, end-to-end, measurement across physical and virtual networks. Finally, Open vSwitch / Mininet will be used to demonstrate Elephant flow detection and marking using a combination of sFlow and OpenFlow.
Slides and videos for all the conference talks will soon be available on the Open vSwitch web site.

Sunday, November 10, 2013

UDP packet replication using Open vSwitch

UDP protocols such as sFlow, syslog, NetFlow, IPFIX and SNMP traps, have many advantages for large scale network and system monitoring, see Push vs Pull.  In a typical deployment each managed element is configured to send UDP packets to a designated collector (specified by an IP address and port). For example, in a simple sFlow monitoring system all the switches might be configured to send sFlow data to UDP port 6343 on the host running the sFlow analysis application. Complex deployments may require multiple analysis applications, for example: a first application providing analytics for software defined networking, a second focused on host performance, a third addressing packet capture and security, and the fourth looking at application performance. In addition, a second copy of each application may be required for redundancy. The challenge is getting copies of the data to all the application instances in an efficient manner.

There are a number of approaches to replicating UDP data, each with limitations:
  1. IP Multicast - if the data is sent to an IP multicast address then each application could subscribe to the multicast channel receive a copy of the data. This sounds great in theory, but in practice configuring and maintaining IP multicast connectivity can be a challenge. In addition, all the agents and collectors would need to support IP multicast. In addition, IP multicast also doesn't address the situation where you have multiple applications running on single host and so each application has to be receive the UDP data on a different port.
  2. Replicate at source - each agent could be configured to send a copy of the data to each application. Replicating at source is a configuration challenge (all agents need to be reconfigured if you add an additional application). This approach is also wasteful of bandwidth - multiple copies of the same data are send across the network.
  3. Replicate at destination - a UDP replicator, or "samplicator" application receives the stream of UDP messages, copies them and resends them to each of the applications. This functionality may be deployed as a stand alone application, or be an integrated function within an analysis application. The replicator application is a single point of failure - if it is shut down none of the applications receive data. The replicator adds delay to the measurements and at high data rates can significantly increase UDP loss rate as the datagrams are received, sent, and received again. 
This article will examine a fourth option, using software defined networking (SDN) techniques to replicate and distribute data within the network. The Open vSwitch is implemented in the Linux kernel and includes OpenFlow and network virtualization features that will be used to build the replication network.

First, you will need a server (or virtual machine) running a recent version of Linux. Next download and install Open vSwitch.

Next, configure the Open vSwitch to handle networking for the server:
ovs-vsctl add-br br0
ovs-vsctl add-port br0 eth0
ifconfig eth0 0
ifconfig br0 10.0.0.1/24
Now configure the UDP agents to send their data to 10.0.0.1. You should be able to run a collector application for each service port (e.g. sFlow 6343, syslog 514, etc.).

The first case to consider is replicating the datagrams to a second port on the server (sending packets to App 1 and App 2 in the diagram). First, use the ovs-vsctl command to list the OpenFlow port numbers on the virtual switch:
% ovs-vsctl --format json --columns name,ofport list Interface
{"data":[["eth0",1],["br1",65534]],"headings":["name","ofport"]}
We are interested in replicating packets received on eth0 and output shows that the corresponding OpenFlow port is 1.

The Open vSwitch provides a command line utility ovs-ofctl that uses the OpenFlow protocol to configure forwarding rules in the vSwitch. The following OpenFlow rule will replicate sFlow datagrams:
in_port=1 dl_type=0x0800 nw_proto=17 tp_dst=6343 actions=LOCAL,mod_tp_dst:7343,normal
The match part of the rule looks for packets received on port 1 (in_port=1), where the Ethernet type is IPv4 (dl_type=0x0800), the IP protocol is UDP (nw_protocol=17), and the destination UDP port is 6343 (tp_dst=6343). The actions section of the rule is the key to building the replication function. The LOCAL action delivers the original packet as intended. The destination port is then changed to 7343 (mod_tp_dst:7343) and the modified packet is sent through the normal processing path to be delivered to the application.

Save this rule to a file, say replicate.txt, and then use ovs-ofctl to apply the rule to br0:
ovs-ofctl add-flows br0 replicate.txt
At this point a second sFlow analysis application listening for sFlow datagrams on port 7343 should start receiving data - sflowtool is a convenient way to verify that the packets are being received:
sflowtool -p 7343
The second case to consider is replicating the datagrams to a remote host (sending packets to App 3 in the diagram).
in_port=1 dl_type=0x0800 nw_proto=17 tp_dst=6343 actions=LOCAL,mod_tp_dst:7343,normal,mod_nw_src:10.0.0.1,mod_nw_dst:10.0.0.2,normal
The extended rule includes additional actions that modify the source address of the packets (mod_nw_src:10.0.0.1) and the destination IP address (mod_nw_dst:10.0.0.2) and sends the packet through the normal processing path. Since we are relying on the routing functionality in the Linux stack to deliver the packet, make sure that routing in enabled - see How to Enable IP Forwarding in Linux.
Unicast reverse path filtering (uRPF) is mechanism that routers use to drop spoofed packets (i.e. packets where the source address doesn't belong to the subnet on the access port the packet was received on). uRPF should be enabled wherever practical because spoofing is used in a variety of security and denial of service attacks, e.g. DNS amplification attacks. By modifying the IP source address to be the address of the forwarding host (10.0.0.1) rather than the original source IP address the OpenFlow rule ensures that the packet will pass through uRPF filters, both on the host and on the access router. Rewriting the sFlow source address does not cause any problems because the sFlow protocol identifies the original source of the data within its payload and doesn't rely on the IP source address. However, other UDP protocols (for example, NetFlow/IPFIX) rely on the IP source address to identify the source of the data. In this case, removing the mod_nw_src action will leave the IP source address unchanged, but the packet may well be dropped by uRPF filters. Newer Linux distributions implement strict uRPF by default, however it can be disabled if necessary, see Reverse Path Filtering.
This article has only scratched the surface of capabilities of the Open vSwitch. In situations where passing the raw packets across the network isn't feasible the Open vSwitch can be configured to send the packets over a tunnel (sending packets to App 4 in the diagram). Tunnels, in conjunction with OpenFlow, can be used to create a virtual UDP distribution overlay network with its own addressing scheme and topology - Open vSwitch is used by a number of network virtualization vendors (e.g. VMware NSX). In addition, more complex filters can also be implemented, forwarding datagrams based on source subnet to different collectors etc.

The replication functions don't need to be performed in software in the virtual switch. OpenFlow rules can be pushed to OpenFlow capable hardware switches which can perform the replication, or source based forwarding functions at wire speed. A full blown controller based solution isn't necessarily required, the ovs-ofctl command can be used to push OpenFlow rules to physical switches.

More generally, building flexible UDP datagram distribution and replication networks is an interesting use case for software defined networking. The power of software defined networking is that you can adapt the network behavior to suit the needs of the application - in this case overcoming the limitations of existing UDP distribution solutions by modifying the behavior of the network.

Wednesday, August 28, 2013

RESTflow

Figure 1: Embedded, on-switch flow cache with flow record export
This article describes RESTflow™, a new method for exporting flow records that has significant advantages over current approaches to flow export.

A flow record summarizes a set of packets that share common attributes - for example, a typical flow record includes ingress interface, source IP address, destination IP address, IP protocol, source TCP/UDP port, destination TCP/UDP port, IP ToS, start time, end time, packet count and byte count.

Figure 1 shows the steps performed by the switch in order to construct flow records. First the stream of packets is likely to be sampled (particularly in high-speed switches). Next, the sampled packet header is decoded to extract key fields. A hash function is computed over the keys in order to look up the flow record in the flow cache. If an existing record is found, its values are updated, otherwise a record is created for the new flow. Records are flushed from the cache based on protocol information (e.g. if a FIN flag is seen in a TCP packet), a timeout, inactivity, or when the cache is full. The flushed records are finally sent to the traffic analysis application using one of the many formats that switches use to export flow records (e.g. NetFlow, IPFIX, J-Flow, NetStream, etc.).
Figure 2: External software flow cache with flow record export
Figure 2 shows the relationship between the widely supported sFlow standard for packet export and flow export. With sFlow monitoring, the decode, hash, flow cache and flush functionality are no longer implemented on the switch. Instead, sampled packet headers are immediately sent to the traffic analysis application which decodes the packets and analyzes the data. In typical deployments, large numbers of switches stream sFlow data to a central sFlow analyzer. In addition, sFlow provides a polling function; switches periodically send standard interface counters to the traffic analysis applications, eliminating the need for SNMP polling, see Link utilization.
There are significant advantages to moving the flow cache to external software: the article, Superlinear, discusses some of the scaleability implications of on device flow caches and, Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX, describes how on device flow caches delay measurements and makes them less useful for software defined networking (SDN) applications.
The following example uses the sFlow-RT analyzer to demonstrate flow record export based on sFlow packet data received from a network of switches.
Figure 3: Performance aware software defined networking
Figure 3 from Performance aware software defined networking shows how sFlow-RT exposes the active flow cache to applications that address important use cases, such as DDoS mitigationlarge flow load balancingmulti-tenant performance isolationtraffic engineering, and packet capture.

The recent extension of the REST API to support flow record export provides a useful log of network activity that can be incorporated in security, information and event management (SIEM) tools.

Three types of query combine to deliver the RESTflow flexible flow definition and export interface:

1. Define flow cache

The following command instructs the central sFlow-RT analytics engine running on host 10.0.0.162 to build a flow cache for TCP flows and log the completed flows:
curl -H "Content-Type:application/json" -X PUT --data '{"keys":"ipsource,ipdestination,tcpsourceport,tcpdestinationport", "value":"bytes", "log":true}' http://10.0.0.162:8008/flow/tcp/json
What might not be apparent is that the single configuration command to sFlow-RT enabled network wide monitoring of TCP connections, even in a network containing hundreds of physical switches, thousands of virtual switches, different switch models, multiple vendors etc. In contrast, if devices maintain their own flow caches then each switch needs to be re-configured whenever monitoring requirements change - typically a time consuming and complex manual process, see Software defined analytics.

To illustrate the point, the following command defines an additional network wide flow cache for records describing DNS (UDP port 53) requests and log the completed flows:
curl -H "Content-Type:application/json" -X PUT --data '{"keys":"ipsource", "value":"frames", "filter":"udpdestinationport=53", "log":true}' http://10.0.0.162:8008/flow/dns/json

2. Query flow cache definition

The following command retrieves the flow definitions:
$ curl http://10.0.0.162:8008/flow/json
{
 "dns": {
  "filter": "udpdestinationport=53",
  "fs": ",",
  "keys": "ipsource",
  "log": true,
  "n": 5,
  "t": 2,
  "value": "frames"
 },
 "tcp": {
  "fs": ",",
  "keys": "ipsource,ipdestination,tcpsourceport,tcpdestinationport",
  "log": true,
  "n": 5,
  "t": 2,
  "value": "bytes"
 }
}
The definition for a specific flow can also be retrieved:
$ curl http://10.0.0.162:8008/flow/tcp/json
{
 "fs": ",",
 "keys": "ipsource,ipdestination,tcpsourceport,tcpdestinationport",
 "log": true,
 "n": 5,
 "t": 2,
 "value": "bytes"
}

3. Retrieve flow records

The following command retrieves flow records logged by all the flow caches:
curl http://10.0.0.162:8008/flows/json?maxFlows=2
[
 {
  "agent": "10.0.0.20",
  "dataSource": "2",
  "end": 1377658682679,
  "flowID": 250,
  "flowKeys": "10.0.0.162",
  "name": "dns",
  "start": 1377658553679,
  "value": 400
 },
 {
  "agent": "10.0.0.20",
  "dataSource": "5",
  "end": 1377658681678,
  "flowID": 249,
  "flowKeys": "10.0.0.20,10.0.0.236,47571,3260",
  "name": "tcp",
  "start": 1377658613678,
  "value": 1217600
 }
]
And the following command retrieves flow records from a specific cache:
$ curl "http://10.0.0.162:8008/flows/json?name=dns&maxFlows=2"
[
 {
  "agent": "10.0.0.28",
  "dataSource": "53",
  "end": 1377658938378,
  "flowID": 400,
  "flowKeys": "10.0.0.162",
  "name": "dns",
  "start": 1377658398378,
  "value": 400
 },
 {
  "agent": "10.0.0.20",
  "dataSource": "2",
  "end": 1377658682679,
  "flowID": 251,
  "flowKeys": "10.0.0.71",
  "name": "dns",
  "start": 1377658612679,
  "value": 400
 }
]
The JSON encoded text based output is easy to read and widely supported by programming tools.
Transporting large amounts of flow data using a text based protocol might seem inefficient when compared to binary flow record export protocols such as IPFIX, NetFlow etc. However, one of the advantages of a REST API is that it builds on the mature and extensive capabilities built into the HTTP protocol stack. For example, most HTTP clients are capable of handling compression and will set the HTTP Accept-Encoding header to indicate that they are willing to accept compressed data. The sFlow-RT web server responds by compressing the data before sending it, resulting in a 20 times reduction in data volume. Similarly, using a REST API, allows users to leverage the existing infrastructure to load balance, encrypt, authenticate, cache and proxy requests.
The real power of the RESTflow API becomes apparent when it is accessed programmatically. For example, the following Python script defines the TCP flow described earlier and continuously retrieves new flow records:
#!/usr/bin/env python
import requests
import json
import signal

rt = 'http://10.0.0.162:8008'
name = 'tcp'

def sig_handler(signal,frame):
  requests.delete(rt + '/flow/' + name + '/json');
  exit(0)
signal.signal(signal.SIGINT, sig_handler)

flow = {'keys':'ipsource,ipdestination,tcpsourceport,tcpdestinationport',
        'value':'frames',
        'log':True}
r = requests.put(rt + '/flow/' + name + '/json',data=json.dumps(flow))

flowurl = rt + '/flows/json?name=' + name + '&maxFlows=100&timeout=60'
flowID = -1
while 1 == 1:
  r = requests.get(flowurl + "&flowID=" + str(flowID))
  if r.status_code != 200: break
  flows = r.json()
  if len(flows) == 0: continue

  flowID = flows[0]["flowID"]
  flows.reverse()
  for f in flows:
    print str(f['flowKeys']) + ',' + str(int(f['value'])) + ',' + str(f['end'] - f['start']) + ',' + f['agent'] + ',' + str(f['dataSource'])
The following command runs the script, which results in the newly arriving flow records being printed as comma separated text:
$ ./tcp_flows.py 
10.0.0.16,10.0.0.236,38834,3260,4000,98100,10.0.0.16,5
10.0.0.151,10.0.0.152,39046,22,837800,60000,10.0.0.28,2
10.0.0.151,10.0.0.152,39046,22,851433,60399,10.0.0.20,25
10.0.0.20,10.0.0.16,443,48859,12597,64000,10.0.0.253,1
10.0.0.152,10.0.0.151,22,39046,67049,61800,10.0.0.28,19
Instead of simply printing the flow records, the script could easily add them to scale out databases like MongoDB so that they can be combined with other types of information and easily searched.

The sFlow-RT REST API doesn't just provide access to completed flows, access to real-time information on in progress flows is available by querying the central flow cache. For example, the following command searches the flow cache and reports the most active flow in the network (based on current data transfer rate, i.e. bits/second).
$ curl http://10.0.0.162:8008/metric/ALL/tcp/json
[{
 "agent": "10.0.0.28",
 "dataSource": "2",
 "metricN": 9,
 "metricName": "tcp",
 "metricValue": 29958.06882871171,
 "topKeys": [
  {
   "key": "10.0.0.20,10.0.0.28,443,40870",
   "updateTime": 1377664899679,
   "value": 29958.06882871171
  },
  {
   "key": "10.0.0.236,10.0.0.28,3260,56044",
   "updateTime": 1377664888679,
   "value": 23.751630816369214
  }
 ],
 "updateTime": 1377664899679
}]
As well as identifying the most active flow, the query result also identifies the switch and port carrying the traffic (out of potentially tens of thousands of ports being monitored).

While flow records are a useful log of completed flows, the ability to track flows in real time transforms traffic monitoring from a reporting tool to a powerful driver for active control, unlocking the capabilities of software defined networking to dynamically adapt the network to changing demand. Embedded flow caches in networking devices are not easily accessible and even if there were a programmatic way to access the on device cache, polling thousands of devices would take so long that the information would be stale by the time it was retrieved.
Figure 4: Visibility and the software defined data center
Looking at the big picture, flow export is only one of many functions that can be performed by an sFlow analyzer, some of which have been described on this blog. Providing simple, programmatic, access allows these functions to be integrated into the broader orchestration system. REST APIs are the obvious choice since they are already widely used in data center orchestration and monitoring tools.
Embedded flow monitoring solutions typically require CLI access to the network devices to define flow caches and direct flow record export. Access to switch configurations is tightly controlled by the network management team and configuration changes are often limited to maintenance windows. In part this conservatism results because hardware resource limitations on the devices need to be carefully managed - for example, a misconfigured flow cache can destabilize the performance of the switch. In contrast, the central sFlow analyzer is software running on a server with relatively abundant resources that can safely support large numbers of requests without any risk of destabilizing the network.
The REST APIs in sFlow-RT are part of a broader movement to break out of the networking silo and integrate management of network resources with the orchestration tools used to automatically manage compute, storage and application resources. Automation transforms the network from a fragile static resource into a robust and flexible resource that can be adapted to support the changing demands of the applications it supports.

Wednesday, May 1, 2013

Software defined analytics

Figure 1: Performance aware software defined networking
Software defined networking (SDN) separates the network Data Plane and Control Plane, permitting external software to monitor and control network resources. Open Southbound APIs like sFlow and OpenFlow are an essential part of this separation, connecting network devices to external controllers, which in turn present high level Open Northbound APIs to SDN applications.

This article demonstrates the architectural similarities between OpenFlow and sFlow configuration and use within an SDN stack. Developers working in the SDN field are likely familiar with the configuration and use of OpenFlow and it is hoped that this comparison will be helpful as a way to understand how to incorporate sFlow measurement technology to create performance aware SDN solutions such as load balancing, DDoS protection and packet brokers.

OpenFlow and sFlow

In this example, Open vSwitch, Floodlight and sFlow-RT are used to demonstrate how switches are configured to use the OpenFlow and sFlow protocols to communicate with the centralized control plane. Next, representative Northbound REST API calls are used to illustrate how control plane software presents network wide visibility and control functionality to SDN applications.

1. Connect switches to control plane

Configure each switch to connect to the OpenFlow controller:
ovs-vsctl set-controller br0 tcp:10.0.0.1:6633
Similarly, configure each switch to send measurements to the sFlow analyzer:
ovs-vsctl -- --id=@sflow create sflow agent=eth0  target=\"10.0.0.1:6343\" sampling=1000 polling=20 -- -- set bridge br0 sflow=@sflow
2. REST APIs for network wide visibility and control

The following command uses the Floodlight static flow pusher API to set up a forwarding path:
curl -d '{"switch": "00:00:00:00:00:00:00:01", "name":"flow-mod-1", "cookie":"0", "priority":"32768", "ingressport":"1","active":"true", "actions":"output=2"}' http://10.0.0.1:8080/wm/core/staticflowentrypusher/json
The following command uses sFlow-RT's flow API to setup monitoring of TCP flows across all switches:
curl -H "Content-Type:application/json" -X PUT --data "{keys:'ipsource,ipdestination,tcpsourceport,tcpdestinationport', value:'bytes'}" http://10.0.0.1:8008/flow/tcp/json
Next, the following command finds the top TCP flow currently in progress anywhere in network:
curl http://10.0.0.162:8008/metric/ALL/tcp/json
[{
 "agent": "10.0.0.30",
 "dataSource": "2",
 "metricN": 14,
 "metricName": "incoming",
 "metricValue": 3.4061718002956964E7,
 "topKeys": [{
  "key": "10.0.0.52,10.0.0.54,80,52577",
  "updateTime": 1367092118446,
  "value": 3.4061718002956964E7
 }],
 "updateTime": 1367092118446
}]
The response doesn't just identify the flow, HTTP packets from a web server 10.0.0.52 to a client 10.0.0.54, it also identifies the switch and port carrying the traffic, information that would allow the OpenFlow controller to take action to rate limit, tap, re-route or block this traffic.
Figure 2: Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX
The flexible Software Defined Analytics (SDA) functionality shown in this example is possible because the sFlow architecture shifts analytic functions to external software, relying on minimal core measurements embedded in the switch hardware data plane to deliver wire-speed performance. The simplicity and openness of the sFlow standard has resulted in widespread adoption in merchant silicon and by switch vendors.

In contrast, measurement technologies such as Cisco NetFlow and IPFIX perform traffic analysis using specialized hardware on the switch.  Configuring the hardware measurement features can be complex: for example, monitoring TCP flows using Cisco's Flexible NetFlow requires the following CLI commands:

1. define a flow record
flow record tcp-analysis
match transport tcp destination-port
match transport tcp source-port
match ipv4 destination address
match ipv4 source address
collect counter bytes
2. specify the collector
flow exporter export-to-server
destination 10.0.0.1
transport udp 9985
template data timeout 60
3. define a flow cache
flow monitor my-flow-monitor
record tcp-analysis
exporter export-to-server
cache timeout active 60
4. enable flow monitoring on each switch interface:
interface Ethernet 1/0
ip flow monitor my-flow-monitor input
...
interface Etherent 1/48
ip flow monitor my-flow-monitor input
5. For network wide visibility, go to step 1 and repeat for each switch in the network

Based on the architecture of on-switch flow analysis and this configuration example, it is apparent that there are limitations to this approach to monitoring, particularly in the context of software defined networking:
  1. Flexible NetFlow is complex to configure, see Complexity Kills.
  2. Configuration changes to switches are typically limited to infrequent maintenance windows making it difficult to deploy new measurements.
  3. Each flow cache (step 3 in the Flexible NetFlow configuration) consumes significant on-switch memory, limiting the number of simultaneous flow measurements that can be made, and taking memory that could be used for additional forwarding rules.
  4. Hardware differences mean that measurements are inconsistent between vendors, or even between different products from the same vendor, see Snowflakes, IPFIX, NetFlow and sFlow.
  5. Adding support for new protocols, like GRE, VXLAN etc. involves upgrading switch firmware and may require new hardware.
What about using OpenFlow counters to drive analytics? Since maintaining OpenFlow counters relies on switch hardware to decode packets and track flows, OpenFlow based traffic measurement shares many of the same limitations described for NetFlow/IPFIX, see Hey, You Darned Counters! Get Off My ASIC!

On the other hand, software defined analytics based on the sFlow standard is highly scaleable and extremely flexible. For example, adding an additional flow definition to report on tunneled traffic across the data center involves a single additional REST API call:
url -H "Content-Type:application/json" -X PUT --data "{keys:'stack,ipsource,ipdestination,ipsource.1,ipdestination.1', value:'bytes'}" http://10.0.0.1:8008/flow/stack/json
The following command retrieves the top tunneled flow:
curl http://10.0.0.1:8008/metric/ALL/stack/json
[{
 "agent": "10.0.0.253",
 "dataSource": "3",
 "metricN": 6,
 "metricName": "stack",
 "metricValue": 74663.29589986047,
 "topKeys": [{
  "key": "eth.ip.gre.ip.tcp,10.0.0.151,10.0.0.152,10.0.201.1,10.0.201.2",
  "updateTime": 1367096917146,
  "value": 74663.29589986047
 }],
 "updateTime": 1367096917146
}]
The result shows that the top tunneled flow currently traversing the network is a TCP connection in a GRE tunnel between inner addresses 10.0.201.1 and 10.0.201.2.
Note: Monitoring and controlling tunneled traffic is an important use case since tunnels are widely used for network virtualization and IPv6 migration, see Tunnels and Down the rabbit hole.
Perhaps the greatest limitation of on-switch flow analysis is the fact that the measurements are delayed on the switch, making them inaccessible to SDN applications, see Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX. Centralized flow analysis liberates measurements from the devices to deliver real-time network wide analytics that support new classes of performance aware SDN application such as: load balancing, DDoS protection and packet brokers.

Tuesday, January 8, 2013

Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX

Figure 1: Low latency software defined networking control loop
The articles SDN and delay and Delay and stability describe the critical importance of low measurement delay in constructing stable and effective controls. This article will examine the difference in measurement latency between sFlow and NetFlow/IPFIX and their relative suitability for driving control decisions.
Figure 2: sFlow and NetFlow agent architectures
Figure 2 illustrates shows the architectural differences between the sFlow and IPFIX/NetFlow instrumentation in a switch:
  1. NetFlow/IPFIX Cisco NetFlow and IPFIX (the IETF standard based on NetFlow) define a protocol for exporting flow records. A flow record summarizes a set of packets that share common attributes - for example, a typical flow record includes ingress interface, source IP address, destination IP address, IP protocol, source TCP/UDP port, destination TCP/UDP port, IP ToS, start time, end time, packet count and byte count. Figure 2 shows the steps performed by the switch in order to construct flow records. First the stream of packets is likely to be sampled (particularly in high-speed switches). Next, the sampled packet header is decoded to extract key fields. A hash function is computed over the keys in order to look up the flow record in the flow cache. If an existing record is found, its values are updated, otherwise a record is created for the new flow. Records are flushed from the cache based on protocol information (e.g. if a FIN flag is seen in a TCP packet), a timeout, inactivity, or when the cache is full. The flushed records are finally sent to the traffic analysis application.
  2. sFlow With sFlow monitoring, the decode, hash, flow cache and flush functionality are no longer implemented on the switch. Instead, sampled packet headers are immediately sent to the traffic analysis application which decodes the packets and analyzes the data. In addition, sFlow provides a polling function, periodically sending standard interface counters to the traffic analysis applications, eliminating the need for SNMP polling, see Link utilization
The flow cache introduces significant measurement delay for NetFlow/IPFIX based monitoring since the measurements are only accessible to management applications once they are flushed from the cache and sent to a traffic analyzer. In contrast, sFlow has no cache - measurements are immediately sent and can be quickly acted upon, resulting in extremely low measurement delay.

Open vSwitch is a useful testbed for demonstrating the impact of the flow cache on measurement delay since it can simultaneously export both NetFlow and sFlow, allowing a side-by-side comparison. The article, Comparing sFlow and NetFlow in a vSwitch, describes how to configure sFlow and NetFlow on the Open vSwitch and demonstrates some of the differences between the two measurement technologies. However, this article focusses on the specific issue of measurement delay.

Figure 3 shows the experimental setup, with sFlow directed to InMon sFlow-RT and NetFlow directed to SolarWinds Real-Time NetFlow Analyzer.

Note: Both tools are available at no charge, making it easy for anyone to reproduce these results.


Figure 3: Latency of large flow detection using sFlow and NetFlow
The charts in Figure 3 show how each technology reports on a large data transfer. The charts have been aligned to have the same time axis so you can easily compare them. The vertical blue line indicates the start of the data transfer.
  1. sFlow By analyzing the continuous stream of sFlow messages from the switch, sFlow-RT immediately detects and continuously tracks the data transfer from the moment the data transfer starts to its completions just over two minutes later.
  2. NetFlow The Real-Time NetFlow Analyzer doesn't report on the transfer until it receives the first NetFlow record 60 seconds after the data transfer started, indicated by the first vertical red line. The 60 delay corresponds to the active timeout used to flush records from the flow cache. A second NetFlow record, indicated by the second red line, is responsible for the second spike 60 seconds later, and a final NetFlow record, received after the transfer completes and indicated by the third red line, is responsible for the third spike in the chart.
Note: A one minute active timeout is the lowest configurable value on many Cisco switches (the default is 30 minutes), see Configuring NetFlow and NetFlow Data Export.

The large measurement delay imposed by the NetFlow/IPFIX flow cache makes the technology unsuitable for SDN control applications. The measurement delay can lead to instability since the controller is never sure of the current traffic levels and may be taking action based on stale data reported for flows that are no longer active.

In contrast, the sFlow measurement system quickly detects and continuously tracks large flows, allowing an SDN traffic management application to reconfigure switches and balance the paths that active flows take across the network.

Thursday, November 1, 2012

Finding elephants



The Blind Men and the Elephant
John Godfrey Saxe (1816-1887)

It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.

The First approached the Elephant,
And happening to fall
Against his broad and sturdy side,
At once began to bawl:
"God bless me! but the Elephant
Is very like a wall!"

The Second, feeling of the tusk,
Cried, "Ho, what have we here,
So very round and smooth and sharp?
To me 'tis mighty clear
This wonder of an Elephant
Is very like a spear!"

The Third approached the animal,
And happening to take
The squirming trunk within his hands,
Thus boldly up and spake:
"I see," quoth he, "the Elephant
Is very like a snake!"

The Fourth reached out an eager hand,
And felt about the knee
"What most this wondrous beast is like
Is mighty plain," quoth he:
"'Tis clear enough the Elephant
Is very like a tree!"

The Fifth, who chanced to touch the ear,
Said: "E'en the blindest man
Can tell what this resembles most;
Deny the fact who can,
This marvel of an Elephant
Is very like a fan!"

The Sixth no sooner had begun
About the beast to grope,
Than seizing on the swinging tail
That fell within his scope,
"I see," quoth he, "the Elephant
Is very like a rope!"

And so these men of Indostan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong,
Though each was partly in the right,
And all were in the wrong!

MORAL.

So oft in theologic wars,
The disputants, I ween,
Rail on in utter ignorance
Of what each other mean,
And prate about an Elephant
Not one of them has seen!

This poem is just one version of the popular Bind Men and an Elephant story. There are many conclusions that you can draw from the story, but this version captures the idea that you can't properly understand something unless you have seen it in its entirety.

Similar arguments occur among data center operations teams advocating for different performance monitoring technologies such as NetFlow, IPFIX, SNMP, WMI, JMX, libvirt etc. Many of these arguments arise because teams are siloed, with each group arguing from their narrow perspective and dismissing the concerns of other teams. While narrowly focussed tools have their place, they miss the big picture.

Cloud-based architectures tightly couple network, storage and compute resources into a large scale, flexible, platform for delivering application services. The article, System boundary, describes the need for a holistic approach to organizing and managing data center resources. A comprehensive, cloud-oriented, approach to monitoring is essential for troubleshoot performance problems, automating operations and fully exploiting the potential of cloud architectures to increase efficiency by adapting to changing demand.

The sFlow standard addresses the challenge of cloud monitoring by embedding instrumentation throughout the cloud infrastructure in order to provide a comprehensive, real-time, view of the performance of all the individual network, server and application resources as well as the cloud as a whole. The sFlow architecture is designed to see elephants, not just individual body parts. It's only by gaining this comprehensive view of performance that large scale cloud data center environments can be properly understood and managed.

Wednesday, September 19, 2012

Packets and Flows

Figure 1: Sending a picture over a packet switched network
Figure 1 illustrates how data is transferred over a packet switched network. Host A is in the process of transferring a picture to Host B. The picture has been broken up into parts and each part is sent as a separate packet. Three packets containing parts 8, 9 and 10 are in transit and are in the process of being forwarded by switches Z, Y and X respectively.

In the example, Host A is responsible for breaking up the picture into parts and transmitting the packets. Host B is responsible for re-constructing the picture, detecting parts that are missing, corrupted, or delivered out of order and sending acknowledgement packets back to Host A, which is then responsible for resending packets if necessary.

The packet switches are responsible for transmitting, or forwarding, the packets. Each packet switch examines the destination address (e.g. To: B) and sends the packet on a link that will take the packet closer to its destination. The switches are unaware of the contents of the packets they forward, in this case the picture fragments and part numbers. Figure 1 only shows packets relating to the image transfer between Host A and Host B, but in reality the switches will be simultaneously forwarding packets from many other hosts.
Figure 2: Sorting mail
The mail sorting room shown in Figure 2 is a good analogy for the function performed by a packet switch. Letters arrive in the sorting room and are quickly placed into pigeon holes based on destination. The mail sorters don't know or care what's in the letters, they are focused on quickly reading the destination address on each envelope and placing the letter in a pigeon hole along with other letters to the same region so that the letters can be sent to another sorting facility closer to the destination.

In a packet switched network, each host and switch has a different perspective on data transfers and maintains different state in order to perform its task. Managing the performance of the communication system requires a correct understanding of the nature of the task that each element is responsible for and a way to monitor how effectively that task is being performed.

As an example of a poorly fitting model, consider the concept of "flow records" that are often presented as an intuitive way to monitor and understand traffic on packet switched networks. Continuing our example, the data transfer would be represented by two flow records, one accounting for packets from Host A to Host B and another accounting for packets from Host B to Host A.
Figure 3: Telephone operator
There is an inherent appeal in flow records since they are similar to the familiar "call records" that you see on a telephone bill, recording the number dialed, the time the call started and the call duration. However, as the switchboard and patch cords demonstrate in Figure 3, telephones networks are circuit switched, i.e. a dedicated circuit is set up between the two telephones involving all the switches in the path. It is easy to see how a circuit switch might easily generate call records by considering the manual operator. The operator just needs to note when they connected the call using the patch cord, who they connected to, and when they terminated the call by pulling the plug.

Viewing packet switches through the lens of a circuit oriented measurement is misleading. Start by considering the steps that the mail sorter in Figure 2 would have to go through in order to create flow records. The mail sorter would be requiring to keep track of the From: and To: address information on each letter, count the number letters that Host A sent to Host B, open the letters and peek inside to decide whether the letter was part of an existing conversation or the start of a new conversation. This task is extremely cumbersome and error prone, and the resulting flow records don't monitor the task that the mail sorter is actually performing; for example, flow records won't tell you how many postcards, letters and parcels were sorted.

Packet and circuit switched networks have very different characteristics and an effective monitoring system will collect measurements that are relevant to the performance of the network:
  • Circuit switches have a limited number of connections that they can handle and if more calls are attempted, calls are blocked (i.e. receive a busy signal). Blocking probabilities and sizing of circuit switches are analyzed using Erlang calculations.
  • Packet switches don't block. Instead packets are interleaved as they are forwarded. If the number of packets arriving exceeds the forwarding capacity of the switch, then packets may be delayed as they wait to be serviced, or be discarded if there are too many packets already waiting in the queue. Queuing delays and packet discard probabilities are analyzed using queuing theory.
To make the example in Figure 1 concrete, make Host A is an Apache web server, Host B a laptop running a web browser, and the picture transfer a response associated with the HTTP request http://www.lolcats.com/popular/159.html

The following table compares the switch, host and application measurements provided by sFlow and NetFlow:

sFlowNetFlow
SwitchEach switch exports packet oriented measurements, exporting interface counters and randomly sampled packet headers and associated forwarding decisions.Switches exports connection oriented flow records that include source address, destination address, protocol, bytes, packets and duration.

Note: Many switches aren't capable of making these measurements and so go unmonitored.
HostThe server exports standard host metrics, including: CPU, memory and disk performance.None. NetFlow is generally only implemented in network devices.
ApplicationThe web server exports standard HTTP metrics that include request counts and randomly sampled web requests, providing detailed information such as the URL, referrer, server address, client address, user, browser, request bytes, response bytes, status and duration. The web server also reports maximum, idle and active workers.None. NetFlow is typically only implemented in network devices.

NetFlow takes a network centric view of measurement and tries to infer application behavior by examining packets in the network. NetFlow imposes a stateful, connection oriented, model on core devices that should be stateless. Unfortunately, the resulting flow measurements aren't a natural fit for packet switches, providing a distorted view of the operation of these devices. For example, the switch makes forwarding decisions on a packet by packet basis and these decisions can change over the lifetime of a flow. The packet oriented measurements made by sFlow accurately capture forwarding decisions, but flow oriented measurement can be misleading. Another example building on the mail sorting analogy in Figure 2; packet oriented measurements support analysis of small, large and jumbo frames (postcards, letters and parcels), but this detail is lost in flow records.

Flows are an abstraction that is useful for understanding end-to-end traffic traversing the packet switches. However, the flow abstraction describes connections created by the communication end points and to properly measure connection performance, one needs to instrument those end points. Hosts are responsible for initiating and terminating flows and are a natural place to report flows, but the traditional flow model ignores important detail that the host can provide. For example, the host is in a position to include important details about services such as user names, URLs, response times, and status codes as well as information about the computational resources needed to deliver the services; information that is essential for managing service capacity, utilization and response times.

While NetFlow is network centric and tries to infer information about applications from network packets (which is becoming increasingly difficult as more traffic is encrypted), the sFlow standard takes a systems approach, exposing information from network, servers and applications in order to provide a comprehensive view of performance.

Measurement isn't simply about producing pretty charts. The ultimate goal is to be able to act on the measurements and control performance. Control requires a model of behavior that allow performance to be predicted and measurements that characterize demand and show how close the system performance matches the predictions. The sFlow standard is well suited to automation, providing comprehensive measurements based on models of network, server and application performance. The Data center convergence, visibility and control presentation describes the critical role that measurement plays in managing costs and optimizing performance.

Wednesday, September 12, 2012

Snowflakes, IPFIX, NetFlow and sFlow

Snow flakes by Wilson Bentley
Each snowflake is unique and beautiful. However, while such immense diversity is attractive in nature, variation in data center management standards results in operational complexity, making it difficult to implement the automation and control needed to effectively manage at scale.

The following table examines the approaches taken by the IPFIX and sFlow standards by contrasting how they handle four basic aspects of measurement.

Note: The IPFIX standard is based on Cisco's NetFlow™ version 9 protocol and most of the points of comparison apply equally to NetFlow.

IPFIXsFlow
Packet IPFIX currently defines over 50 fields relating to packet header (see IP Flow Information Export (IPFIX) Entities):
  • protocolIdentifier
  • ipClassOfService
  • tcpControlBits
  • sourceTransportPort
  • sourceIPv4Address
  • destinationTransportPort
  • destinationIPv4Address
  • sourceIPv6Address
  • destinationIPv6Address
  • flowLabelIPv6
  • icmpTypeCodeIPv4
  • igmpType
  • sourceMacAddress
  • vlanId
  • ipVersion
  • ipv6ExtensionHeaders
  • destinationMacAddress
  • icmpTypeCodeIPv6
  • icmpTypeIPv4
  • icmpCodeIPv4
  • icmpTypeIPv6
  • icmpCodeIPv6
  • udpSourcePort
  • udpDestinationPort
  • tcpSourcePort
  • tcpDestinationPort
  • tcpSequenceNumber
  • tcpAcknowledgementNumber
  • tcpWindowSize
  • tcpUrgentPointer
  • tcpHeaderLength
  • ipHeaderLength
  • totalLengthIPv4
  • payloadLengthIPv6
  • ipTTL
  • nextHeaderIPv6
  • ipDiffServCodePoint
  • ipPrecedence
  • fragmentFlags
  • ipPayloadLength
  • udpMessageLength
  • isMulticast
  • ipv4IHL
  • ipv4Options
  • tcpOptions
  • ipTotalLength
  • ethernetHeaderLength
  • ethernetPayloadLength
  • ethernetTotalLength
  • dot1qVlanId
  • dot1qPriority
  • dot1qCustomerVlanId
  • dot1qCustomerPriority
  • ethernetType
The IPFIX standard does not require vendors to support all the fields, each vendor is free to export any combination of fields that they choose, none of the fields are mandatory. The result is that each vendor and each product produces unique and incompatible data.
The sFlow standard specifies a single way to report packet attributes, the packet header, ensuring that every vendor and product produces compatible results.

Every sFlow compatible device deployed since the sFlow standard was published in 2001 provides visibility into every protocol that has ever, or will ever, run over Ethernet. The packet header includes all the protocol fields exported by IPFIX as well as fields associated with emerging protocols such as FCoE, AoE, TRILL, NVGRE and VxLAN that have yet to by defined in IPFIX.
Time IPFIX has over 30 elements that can be used to represent time (see IP Flow Information Export (IPFIX) Entities):
  • flowEndSysUpTime
  • flowStartSysUpTime
  • flowStartSeconds
  • flowEndSeconds
  • flowStartMilliseconds
  • flowEndMilliseconds
  • flowStartMicroseconds
  • flowEndMicroseconds
  • flowStartNanoseconds
  • flowEndNanoseconds
  • flowStartDeltaMicroseconds
  • flowEndDeltaMicroseconds
  • flowDurationMilliseconds
  • flowDurationMicroseconds
  • observationTimeSeconds
  • observationTimeMilliseconds
  • observationTimeMicroseconds
  • observationTimeNanoseconds
  • monitoringIntervalStartMilliSeconds
  • monitoringIntervalEndMilliSeconds
  • collectionTimeMilliseconds
  • maxExportSeconds
  • maxFlowEndSeconds
  • minExportSeconds
  • minFlowStartSeconds
  • maxFlowEndMicroseconds
  • maxFlowEndMilliseconds
  • maxFlowEndNanoseconds
  • minFlowStartMicroseconds
  • minFlowStartMilliseconds
  • minFlowStartNanoseconds
The IPFIX standard allows vendors to report time using these elements in any combination, or to omit timestamps altogether. In order to report time consistently, every agent must have a real-time clock and be time synchronized. Finally, it is left up the vendors to decide how often to export data and so an IPFIX collector must understand each vendor's implementation in order to be certain that it has received all the data and detect data loss.
The sFlow standard requires that data be sent immediately. The stateless nature of the protocol means that data can be combined and timestamps added by the central sFlow collector without any need for timestamps or time synchronization among the agents.

Note: The sFlow datagrams do contain a time stamp, the agent uptime in milliseconds at the time the datagram was sent.
SamplingIPFIX currently defines eight different algorithms for packet sampling (see IANA Packet Sampling Parameters):
  • Systematic count-based Sampling
  • Systematic time-based Sampling
  • Random n-out-of-N Sampling
  • Uniform probabilistic Sampling
  • Property match Filtering
  • Hash based Filtering using BOB
  • Hash based Filtering using IPSX
  • Hash based Filtering using CRC
Vendors are not required to implement any of these algorithms and are free to invent their own sampling schemes (see NetFlow-lite). In addition, many of the standard algorithms can be shown to be inaccurate.
The sFlow standard mandates a single, statistically valid, sampling algorithm. All sFlow compliant vendors and products, implement the same algorithm and produce accurate, interoperable results.
URLThere is no-standard IPFIX element for exporting a URL. However, IPFIX does allow vendor extensions, resulting in multiple schemes for exporting URL data. Examples include:
  • nProbe URLs are additional fields that can be included as flow keys when configuring the probe.
  • Dell SonicWall URLs are included in an HTTP specific table and link to flow records.
  • Citrix AppFlow URLs are included in an HTTP request table with links to additional HTTP response and ingress/egress connection tables. 
In each case, in addition to the URL element itself being vendor specific, the information model associated with the exported URLs is also unique, reflecting the internal architecture of the exporting device.
The sFlow standard mandates a set of HTTP counters and transaction attributes that ensures consistent reporting from HTTP aware entities such as web servers (Apache, Tomcat, NGINX etc.) and load balancers (F5 etc.), irrespective of vendor or internal architecture.

Each URL is exported as part of the standard transaction record that includes: client IP, server IP, referrer, authuser, user-agent, mime-type, status, request-bytes, response-bytes, response time. In addition, the sFlow standard defines a unified data model that links measurements from network devices, servers and application instances to provide a comprehensive, data center wide, view of performance.

From the examples in the table, it is apparent that IPFIX and sFlow standards take two very different approaches. The IPFIX standard is descriptive, defining a standard set of attributes that vendors can use to describe the information that they choose to export. The result is that vendors use IPFIX to differentiate each product, reporting a unique and inconsistent set of measurements based on its internal architecture and product features. In contrast, the sFlow standard is prescriptive, defining a set of measurements that every vendor must implement. While IPFIX provides a way to describe each "snowflake", the sFlow standard results from vendors working together to identifying common measurements and implement them in an interoperable way.

Henry Ford transformed the auto industry by moving from hand-made, custom parts to standardized components and processes that allowed for mass production. The data center is undergoing a similar transformation, from small, static, custom environments to large scale, commoditized, flexible, cloud architectures. The sFlow standard delivers the universal performance measurements needed for automation, enjoys broad vendor support, and along with other disruptive technologies like 10G Ethernet, merchant silicon, Software Defined Networking (SDN), OpenFlow, networked storage and virtualization is enabling this transformation.