B0 - SDN and OpenFlow - The Harsh Reality

May 2, 2017 | Author: Srdjan Milenkovic | Category: N/A

Share Embed Donate

Report this link

Short Description

SDN...

Description

SDN AND OPENFLOW THE HYPE AND THE HARSH REALITY

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

SDN AND OPENFLOW THE HYPE AND THE HARSH REALITY Ivan Pepelnjak, CCIE#1354 Emeritus

Copyright © 2014 ipSpace.net AG

WARNING AND DISCLAIMER This book is a collection of blog posts written between March 2011 and the book publication date, providing independent information about Software Defined Networking and OpenFlow. Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. Read the introductory paragraphs before the blog post headings to understand the context in which the blog posts have been written, and make sure you read the Introduction section. The information is provided on an “as is” basis. The authors, and ipSpace.net shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book.

© Copyright ipSpace.net 2014

Page ii

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

CONTENT AT A GLANCE FOREWORD ............................................................................................................................IV INTRODUCTION .....................................................................................................................VI 1

THE INITIAL HYPE ......................................................................................................1-1

2

SOFTWARE DEFINED NETWORKING 101 ................................................................2-1

3

OPENFLOW BASICS ...................................................................................................3-1

4

OPENFLOW IMPLEMENTATION NOTES ....................................................................4-1

5

OPENFLOW SCALABILITY CHALLENGES ..................................................................5-1

6

OPENFLOW AND SDN USE CASES..........................................................................6-1

7

SDN BEYOND OPENFLOW ......................................................................................7-1

© Copyright ipSpace.net 2014

Page iii

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

FOREWORD Ivan asked me to write the intro for his latest book on Software Defined Networking and I'm a bit mystified why. Granted, he's like the control plane to my forwarding plane. The brilliant technical insights I've gathered from Ivan's web site and webinars have provided me with valuable content and creative inspiration ever since I first discovered it. In fact, I almost feel like I'm cheating at my job. Every time I clarify SDN in a conversation with, "It's the decoupling of the logical from the physical," I want to insert a footnote referencing him. I remember the first time I heard him on a podcast, I thought to myself "This guy must be super smart, because he sounds like a Bond villain and I can only grasp 50% of what he's saying." I started telling colleagues about him, "Hey, check this guy out. His webinars will make your brain bleed out of your ears!" Trust me, in my circle that's a HUGE compliment. When I was chosen to attend my first Tech Field Day event, I was most excited because I would finally get to meet Ivan in person. All my engineering friends were jealous and I was almost apoplectic when the moment finally arrived, fearful I would do something foolish like confuse SMTP and SNMP. This is when I discovered a really wonderful aspect to Ivan, if you're ever lucky enough to interact with him personally (stalking doesn't count), you'll find him to be witty, friendly, generous and gracious. He never makes you feel stupid for not understanding a protocol, the details of an RFC or an IEEE standard. He's the consummate educator and a giving mentor to almost anyone who asks. The more I know him, the more I admire and respect his dedication to engineering. It truly is a vocation for him.

© Copyright ipSpace.net 2014

Page iv

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

I guess I need to say something about SDN now, so here goes. While it could be the idea that finally revolutionizes networking, data centers and even security, I advise caution. Vendors will latch onto this new buzzword like a pitbull and promote it like the industry's new secret sauce. With this book, you'll be able to separate facts from hype and make some educated decisions regarding your own infrastructure.

Michele Chubirka Security architect, analyst, writer and podcaster December 2013

© Copyright ipSpace.net 2014

Page v

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

INTRODUCTION OpenFlow and Software Defined Networks (SDN) entered mainstream awareness in March 2011 when several large cloud providers and Internet Service Providers formed Open Networking Foundation. More than three years later, the media still doesn’t understand the basics of SDN, and many networking engineers feel threatened by what they see as a fundamental shift in the way they do their jobs. In the meantime, I published over a hundred blog posts on ipSpace.net trying to debunk the myths, explain how SDN and OpenFlow work, and what their advantages and limitations are. Most of the posts were responses to external triggers – false claims, vendor launches, or questions I received from my readers. This book contains a collection of the most relevant blog posts describing the concepts of SDN and OpenFlow. I cleaned up the blog posts and corrected obvious errors and omissions, but also tried to leave most of the content intact. The commentaries between the individual blog posts will help you understand the timeline or the context in which a particular blog post was written. The book covers these topics: 

The debunking of the initial hype surrounding OpenFlow public launch and the most blatant misconceptions (Chapter 1);



Overview of what SDN is, what it benefits might be, and deliberations whether or not it makes sense (Chapter 2);

© Copyright ipSpace.net 2014

Page vi

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Introduction to OpenFlow, from architectural basics to protocol details, and deployment and forwarding models (Chapter 3);



OpenFlow implementation notes, describing the peculiarities of hardware and software implementations of OpenFlow switches (Chapter 4);



OpenFlow scalability challenges, from control-plane complexity to packet punting and limitations of flow table updates (Chapter 5);



OpenFlow use cases, from production deployment @ Google to interesting ready-to-use architectures and musings on potential future uses (Chapter 6);



SDN beyond OpenFlow (Chapter 7), covering BGP-based SDN, NETCONF, I2RS, Cisco’s OnePK and Plexxi’s controller-based data center fabrics.

You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

As always, please do feel free to send me any questions you might have – the best way to reach me is to use the contact form on my web site (www.ipSpace.net). Happy reading! Ivan Pepelnjak July 2014

© Copyright ipSpace.net 2014

Page vii

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

1

THE INITIAL HYPE

Academic researchers were working on OpenFlow concepts (distributed data plane with centralized controller) for years, but in early 2011 a fundamental marketing shift happened: major cloud providers (Google) and Internet Service Providers (Deutsche Telekom) created Open Networking Foundation (ONF) to push forward commercial adoption of OpenFlow and Software Defined Networking (SDN) – or at least their definition of it. Since then, every single vendor started offering SDN products. Almost none of them come even close to the (narrow) vision promoted by the Open Networking Foundation (centralized control plane with distributed data plane), NEC’s ProgrammableFlow being a notable exception. Most vendors decided to SDN-wash their existing products, branding their existing APIs Open, and claiming they have SDN-enabled products.

MORE INFORMATION You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

As usual, the industry media didn’t help – they enthusiastically jumped onto the OpenFlow/SDN bandwagon and started propagating myths. More than two years later they still don’t understand the fundamentals of SDN, and tend to focus exclusively on how SDN is supposed to hurt Cisco (or not).

IN THIS CHAPTER: OPEN NETWORKING FOUNDATION – FABRIC CRAZINESS REACHES NEW HEIGHTS OPENFLOW FAQ: WILL THE HYPE EVER STOP? OPENFLOW IS LIKE IPV6 FOR THE RECORD: I AM NOT AGAINST OPENFLOW NETWORK FIELD DAY – FIRST IMPRESSIONS I APOLOGIZE, BUT I’M EXCITED THE REALITY – TWO YEARS LATER CONTROL AND DATA PLANE SEPARATION – THREE YEARS LATER TWO AND A HALF YEARS AFTER OPENFLOW DEBUT, THE MEDIA REMAINS CLUELESS WHERE’S THE REVOLUTIONARY NETWORKING INNOVATION? FALLACIES OF GUI

© Copyright ipSpace.net 2014

Page 1-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In March 2011, industry media quickly picked up the buzz created by the Open Networking Foundation (ONF) press releases and started exaggerating the already extravagant claims made by ONF, prompting me to write the following blog post.

OPEN NETWORKING FOUNDATION – FABRIC CRAZINESS REACHES NEW HEIGHTS Some of the biggest buyers of the networking gear have decided to squeeze some extra discount out of the networking vendors and threatened them with open-source alternative, hoping to repeat the Linux/Apache/MySQL/PHP saga that made it possible to build server farms out of low-cost commodity gear with almost zero licensing costs. They formed the Open Networking Foundation, found a convenient technology (OpenFlow) and launched another major entrant in the Buzzword Bingo – Software-Defined Networking (SDN). Networking vendors, either trying to protect their margins by stalling the progress of this initiative, or stampeding into another Wild West Gold Rush (hoping to unseat their bigger competitors with low-cost standard-based alternatives) have joined the foundation in hordes; the list of initial members reads like Who’s Who in Networking. Now, let’s try to figure out what SDN might be all about. The ONF Mission Statement (on the first page) says “SDN allows owners and operators of networks to control and manage their networks to best serve their needs.” Are the founding members of ONF trying to tell us they have no control over their networks and lack network management systems? It must be something else. How about this one (from the same paragraph): “OpenFlow seeks to increase network functionality while lowering

© Copyright ipSpace.net 2014

Page 1-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

the cost associated with operating networks.” Now we’re getting somewhere – I told you it was all about reducing costs (starting with the networking vendors’ margins). (Some of) the industry media happily joined the craze, parroting meaningless phrases from various press releases. Consider, for example, this article from IT World Canada. “SDN would give network operators the ability to virtualize network resources, being able to dynamically improve latency or security on demand” If you want to do it, you can do it today, using dynamic routing protocols or QoS (latency), vShield/VSG (on-demand security) or a number of virtualized networking appliances. Also, protocols like RSVP to signal per-session bandwidth needs have been around for more than a decade, but somehow never caught on. Must be the fault of those stupid networking vendors. “Sites like Facebook, Google or Yahoo would be able to tailor their networks so searches would be blindingly fast” I never realized the main search problem was network bandwidth. I always somehow thought it was related to large datasets, CPU, database indices ... Anyhow, if the network bandwidth is the bottleneck, why don’t they upgrade to the next-generation Ethernet (10G/40G). Ah, yes, it might be expensive. How about deploying Clos network architecture? Ouch, might be a nightmare to configure and manage. How exactly will SDN solve this problem? “Stock exchanges could assure brokerage customers on the other side of the globe they’d get financial data as fast as a dealer beside the exchange.” Will SDN manage to flatten & shrink the earth, will it change the speed of light, or will it use large-scale quantum entanglement? “It could be programmed to order certain routers to be powered down during off-peak power periods.” What stops you from doing that today?

© Copyright ipSpace.net 2014

Page 1-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Don’t get me wrong – OpenFlow might be a good idea and it will probably lead to interesting new opportunities (assuming they can solve the scalability and resilience issues) ... and I’m absolutely looking forward to the podcast we’re recording later today (available on Packet Pushers web site). However, there are plenty of open standards in the networking industry (including XML-based network configuration and management) waiting to be used. There are also (existing, standard) technologies that you can use to solve most of the problems these people are complaining about. The problem is that these standards and technologies are not used by operating systems or applications (when was the last time you’ve deployed a server running OSPF to have seamless multihoming?) The main problems we’re facing today arise primarily from non-scalable application architectures and broken TCP/IP stack. In a world with scale-out applications you don’t need fancy combinations of routing, bridging and whatever-else; you just need fast L3 transport between endpoints. In an Internet with decent session layer or a multipath transport layer (be it SCTP, Multipath TCP or something else) you don’t need load balancers, BGP sessions with end-customers to support multihoming, or LISP. All these kludges were invented to support OS/App people firmly believing in fallacies of distributed computing. How is SDN supposed to change that? I’m anxiously waiting to see an answer beyond marketing/positioning/negotiating bullshit bingo.

© Copyright ipSpace.net 2014

Page 1-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Not surprisingly, the OpenFlow hype did not subside, and totally inaccurate articles started appearing in industry press, prompting me to write yet another rant in April 2011.

OPENFLOW FAQ:WILL THE HYPE EVER STOP? Network World has published another masterpiece last week: FAQ: What is OpenFlow and why is it needed? Following the physics-changing promises made during the Open Network Foundation launch, one would hope to get some straight facts; obviously things don’t work that way. Let’s walk through some of the points. While most of them might not be too incorrect from an oversimplified perspective, they do over-hype a potentially useful technology way out of proportions. NW: “OpenFlow is a programmable network protocol designed to manage and direct traffic among routers and switches from various vendors.” This one is just a tad misleading. OpenFlow is actually a protocol that allows a controller to download forwarding tables into one or more switches. Whether that manages or directs traffic depends on what controller is programmed to do. NW: “The technology consists of three parts: [...] and a proprietary OpenFlow protocol for the controller to talk securely with switches.” Please do decide what you think proprietary means. All parts of the OpenFlow technology are defined in publicly available documents under BSD-like license. NW: “OpenFlow is designed to provide consistency in traffic management and engineering by making this control function independent of the hardware it's intended to control.” How can a lowlevel flow-table-control API provide what this statement claims it does? It all depends on the controller implementation.

© Copyright ipSpace.net 2014

Page 1-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

NW: “The programmability of the MPLS capabilities of a particular vendor's platform is specific to that vendor.” And the OpenFlow-related capabilities of individual switches will depend on specific implementations by specific vendors. Likewise, the capabilities of an OpenFlow controller will be specific to that vendor. What exactly is the fundamental change? NW: “MPLS is a Layer 3 technique while OpenFlow is a Layer 2 method” Do I need to elaborate on this gem? Let’s just point out that OpenFlow works with MAC addresses, IP subnets, IP flow 5tuples, VLANs or MPLS labels. Whatever a switch can do, OpenFlow can control it. But wait ... OpenFlow has no provision for IPv6 at all. Maybe Network World is so futuristic they consider a technology without IPv6 support a layer-2 technology.

© Copyright ipSpace.net 2014

Page 1-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In another blog post, I compared OpenFlow to IPv6 – the evangelists of both technologies promised way more than the technologies were ever capable of delivering.

OPENFLOW IS LIKE IPV6 Frequent eruptions of OpenFlow-related hype (a recent one caused by Brocade Technology Day Summit; I’m positive Interop will not lag behind) call for a continuous myth-busting efforts. Let’s start with a widely-quoted (and immediately glossed-over) fact from Professor Scott Shenker, a founding board member of the ONF: “[OpenFlow] doesn't let you do anything you couldn't do on a network before.” To understand his statement, remember that OpenFlow is nothing more than a standardized version of communication protocol between control and data plane. It does not define a radically new architecture, it does not solve distributed or virtualized networking challenges and it does not create new APIs that the applications could use. The only thing it provides is the exchange of TCAM (flow) data between a controller and one or more switches. Cold fusion-like claims are nothing new in the IT industry. More than a decade ago another group of people tried to persuade us that changing the network layer address length from 32 bits to 128 bits and writing it in hex instead of decimal solves global routing and multihoming and improves QoS, security and mobility. After the reality distortion field collapsed, we were left with the same set of problems exacerbated by the purist approach of the original IPv6 architects.

© Copyright ipSpace.net 2014

Page 1-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Learn from the past bubble bursts. Whenever someone makes an extraordinary claim about OpenFlow, remember the “it can’t do anything you couldn’t do before” fact and ask yourself: 

Did we have a similar functionality in the past? If not, why not? Was there no need or were the vendors too lazy to implement it (don't forget they usually follow the money)?



Did it work? If not, why not?



If it did - do we really need a new technology to replace a working solution?



Did it get used? If not, why not? What were the roadblocks? Why would OpenFlow remove them?

Repeat this exercise regularly and you’ll probably discover the new emperor’s clothes aren’t nearly as shiny as some people would make you believe.

© Copyright ipSpace.net 2014

Page 1-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The OpenFlow pundits quickly labeled me as an OpenFlow hater, but I was just my grumpy old self ;) Here’s the blog post (from May 2011) that tried to set the record straight (not that such things would ever work).

FOR THE RECORD: I AM NOT AGAINST OPENFLOW ... as some of its supporters seem to believe every now and then (I do get severe allergic reaction when someone claims it will change the laws of physics or when I’m faced with technical inaccuracies not to mention knee-jerking financial experts). Even more, assuming it can cross the adoption gap, it could fundamentally change the business models of networking vendors (maybe not in the way you’d like them to be changed). You can read more about my OpenFlow views in the article I wrote for SearchNetworking. On the more technological front, I still don’t expect to see miracles. Most OpenFlow-related ideas I’ve heard about have been tried (and failed) before. I fail to see why things would be different just because we use a different protocol to program the forwarding tables.

© Copyright ipSpace.net 2014

Page 1-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In just a few months, everyone was talking about OpenFlow and SDN, and Stephen Foskett, the mastermind behind GestaltIT, decided to organize the first ever OpenFlow symposium in September 2011. The vendor and user presentations we’ve seen at that symposium, combined with the vendor presentations we’ve attended during the Networking Tech Field Day 2 seemed very promising – everyone was talking about the right topics and tried to address real-life scalability concerns.

NETWORK FIELD DAY – FIRST IMPRESSIONS We finished a fantastic Network Field Day (second edition) yesterday. While it will take me a while (and 20+ blog posts) to recover from the information blast I received during the last two days, here are the first impressions: Explosion of innovation – and it’s not just OpenFlow and/or SDN. Last year we’ve seen some great products and a few good ideas (earning me the “grumpy old man that’s hard to make smile” fame), this year almost every vendor had something that excited me. If you were watching the video stream, you probably got sick and tired of my “wow, that’s cool” comments. I apologize, but that’s how I felt. Everyone gets the problem ... and some of the vendors were trying to tell us what the problem is in an CIO-level pitch. Not a good idea. However, it’s refreshing to see that everyone identified the same problem (large-scale data centers, VM mobility ...), that it’s the problem we’re all familiar with, and that it’s actually getting solved.

© Copyright ipSpace.net 2014

Page 1-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most vendors have sensible answers. They are addressing different parts of the big problem, they talk about different technologies, but the answers aren’t bad. For example, every time I spotted a scalability issue, they were aware of it and/or had good answers (if not a solution). Layer-2 is fading away (again). While every switching vendor will tell you how you can build large L2 domains with their fabric, nobody is actually pushing them anymore. And the only time layer-2 Data Center Interconnect (DCI) appeared on a slide, there was a unicorn image next to it. Even more, two vendors actually said they think long-distance VM mobility is not a good idea (you’ll have to watch the videos to figure out who they were). We’re cutting through the hype. Even the OpenFlow symposium was hypeless. It’s so nice being able to spend three days with highly intelligent people who are excited about the next great thing (whatever it is), while being perfectly realistic about its current state and its limitations. You’ll see lots of new things in the future. Even if you’re working in an SMB environment, you might get exposed to OpenFlow in the not-too-distant future (more about that in an upcoming post). Get ready for a bumpy ride. Lots of exciting technologies are being developed. Some of them make perfect sense, some others less so. Some of them might work, some might fade away (not because they would be inherently bad, but because of bad execution). Now is the time to jump on those bandwagons – get involved (hint: you just might start with IPv6), build a test lab, kick the tires, figure out whether the new technologies might be a good fit for your environment when they become stable. Disclosure: vendors mentioned in this post indirectly covered my travel expenses. Read the full disclosure (or a more precise one by Tony Bourke).

© Copyright ipSpace.net 2014

Page 1-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Even more, the real-life approach of numerous vendors I’ve seen during those two events made me overly optimistic – I thought we just might be able to get to real-life OpenFlow and SDN use cases without the usual vendor jousting and get-rich-quick startup mentality. This is what I wrote in October 2011:

I APOLOGIZE, BUT I’M EXCITED The last few days were exquisite fun: it was great meeting so many people focusing on a single technology (OpenFlow) and concept (Software-Defined Networking, whatever that means) that just might overcome some of the old obstacles (and introduce new ones). You should be at least a bit curious what this is all about, and even if you don’t see yourself ever using OpenFlow or any other incarnation of SDN in your network, it never hurts to enhance your resume with another technology (as long as it’s relevant; don’t put CICS programmer at the top of it). Watching the presentations from the OpenFlow symposium is a great starting point. I would start with the ones from Igor Gashinsky (Yahoo!) and Ed Crabbe (Google) – they succinctly explained the problems they’re facing in their networks and how they feel OpenFlow could solve them. If you’re an IaaS cloud provider, this is the time to start thinking about potentials OpenFlow could bring to your network, and if you’re not talking to NEC, BigSwitch or Nicira, you’re missing out. I would also talk with Juniper (more about that later). Next step: watch the vendor presentations from the OpenFlow symposium. Kyle Forster presented a high-level overview of Big Switch architecture, Curt Beckmann from Brocade added a healthy dose of reality check (highly appreciated), David Meyer (Cisco) presented an interesting perspective on robustness and complexity (and several OpenFlow use cases), Don Clark from NEC talked about

© Copyright ipSpace.net 2014

Page 1-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

their OpenFlow products (watch the video, PDF is not online) and finally David Ward from Juniper presented the hybrid approach: use OpenFlow in combination (not as a replacement) with existing technologies. The afternoon technical Q&A panel just confirmed that numerous vendors well understand the challenges associated with OpenFlow deployments outside of small lab setups, and that they’re actively working on solving those problems and making OpenFlow a viable technology. Two vendors expanded their coverage of OpenFlow during the Network Field Day: David Ward from Juniper did a technical deep dive (don’t skip the Junos automation part at the beginning of the video, it’s interesting ... and you just might spot the VRF Smurf) and NEC even showed us a demo of their OpenFlow-based switched network. Luckily there are still some coolheaded people around (read Ethan Banks’ OpenFlow State of the Union and Derick Winkworth’s More Open Flow Symposium Notes), but I can’t help myself. The grumpy old man from L3 ivory tower is excited (listen to PacketPushers OpenFlow/SDN podcast if you don’t believe me), and not just about OpenFlow. I still can’t believe that I stumbled upon so many interesting or cool technologies or solutions in the last few days. Could be that it’s just vendors adapting to the blogging audience, or there actually might be something fundamentally new coming to light like MPLS (then known as tag switching) was in the late 1990s. Disclosure: vendors mentioned in this post indirectly covered my travel expenses. Read the full disclosure (or a more precise one by Tony Bourke).

© Copyright ipSpace.net 2014

Page 1-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The hard reality of intervening two years has crushed all my high hopes. This is the reality of OpenFlow and SDN as I see it in November 2013:

THE REALITY – TWO YEARS LATER Major vendors (with the exception of NEC) haven’t made any progress. Juniper still hasn’t delivered on its promises. Cisco still hasn’t shipped an OpenFlow switch or an SDN controller (although they’ve announced both months ago). Brocade supposedly has OpenFlow on their high-end routers and Arista supports OpenFlow on its old high-end switch (but not in GA EOS release). Every major vendor is talking about SDN, but it’s mostly SDN-washing (aka CLI-in-API-disguise). Cisco is talking about OnePK, and has shipping early adopter SDK kit, but it will take a while before we see OnePK in GA code on a widespread platform. Startups aren’t doing any better. Big Switch is treading water and trying to find a useful use case for their controller. Nicira was acquired by VMware and is moving away from OpenFlow. Contrail was acquired by Juniper and recently shipped its product (which has nothing to do with OpenFlow and not much with SDN). LineRate Systems was acquired by F5 and disappeared. We haven’t seen customer deployments either. Facebook is doing interesting things (but from what I’ve heard they’re not OpenFlow-based), Google has an OpenFlow/SDN deployment, but they could have done the exact same thing with classical routers and PCEP, Microsoft’s SDN is based on BGP (and works fine). It seems like the reality hit OpenFlow and it was a very hard hit … and according to Gartner we haven’t reached the trough of disillusionment yet.

© Copyright ipSpace.net 2014

Page 1-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In January 2014 I took another look at what the Open Networking Foundation founding members managed to achieve between March 2011 (the beginning of OpenFlow/SDN hype) and early 2014. The only one that made significant progress on the “centralized control plane” front was Google. Since I wrote this blog post, Facebook launched their own switch operating system, which seems to be working along the same lines as classical network operating systems (one device, one control plane).

CONTROL AND DATA PLANE SEPARATION – THREE YEARS LATER Almost three years ago the OpenFlow/SDN hype exploded and the Open Networking Foundation started promoting the concept of physically separate control and data planes. Let’s see how far its founding members got in the meantime: 

Google implemented their inter-DC WAN network with switches that use OpenFlow within a switching fabric and BGP/IS-IS and something akin to PCEP between sites;



Facebook is working on the networking platform for their Open Compute Project. It seems they’ve got to switch hardware specs; I haven’t heard about software running on those switches yet … or maybe they’ll go down the same path as Google (We got cheap switches, and we have our own software. Goodbye and thank you!)

© Copyright ipSpace.net 2014

Page 1-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Yahoo! was talking about custom changes to standard networking protocols. Haven’t heard about their progress since the first OpenFlow Symposium; the April 2012 presentation from Igor Gashinsky still concluded with “Where’s My Pony?”



Deutsche Telekom is still using traditional routers and a great NFV platform.



Microsoft implemented SDN using BGP, using a central controller, but not a centralized control plane.



I have no idea what Verizon is doing.

In the networking vendor world, NEC seems to be the only company with a mature commercial product that matches the ONF definition of SDN. Cisco has just shipped the initial version of their controller, as did HP, and those products seem pretty limited at the moment. Wondering why I didn’t include Big Switch Networks in the above list? My definition of shipping includes publicly available product documentation, or (at the very minimum) something resembling a data sheet with feature description, system requirements and maximum limits. I couldn’t find either on Big Switch web site. On the other hand, the virtual networking world was always full of solutions with separate control and data planes, starting with the venerable VMware Distributed vSwitch and Nexus 1000V, and continuing with newer entrants, from Hyper-V extensible switch and VMware NSX to Juniper Contrail and IBM’s 5000V and DOVE. Some of these solutions were used years before the explosion of OpenFlow/SDN hype (only we didn’t know we should call them SDN).

© Copyright ipSpace.net 2014

Page 1-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In the meantime, the industry media still hasn’t grasped the basics of SDN. Here’s my response to a particularly misleading article written in November 2013:

TWO AND A HALF YEARS AFTER OPENFLOW DEBUT, THE MEDIA REMAINS CLUELESS If you repeat something often enough, it becomes a “fact” (or an urban myth). SDN is no exception; industry press loves to explain SDN like this: [SDN] takes the high-end features built into routers and switches and puts them into software that can run on cheaper hardware. Corporations still need to buy routers and switches, but they can buy fewer of them and cheaper ones. That nice soundbite contains at least one stupidity per sentence: SDN cannot move hardware features into software. If a device relies on hardware forwarding, you cannot move the same feature into software without significantly impacting the forwarding performance. SDN software runs on cheaper hardware. Ignoring the intricacies of custom ASICs and merchant silicon (and the fact that Cisco produces more custom ASICs than all merchant silicon vendors combined), complexity and economies of scale dictate the hardware costs. It’s pretty hard to make cheaper hardware with the same performance and feature set. However, all networking vendors bundle the software with the hardware devices and expense R&D costs (instead of including them in COGS) to boost their perceived margins.

© Copyright ipSpace.net 2014

Page 1-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Does the above paragraph sound like Latin to you? Don’t worry – just keep in mind that software usually costs about as much (or more) as the hardware it runs on, but you don’t see that. Corporations can buy fewer routers and switches. It can’t get any better than this. If you need 100 10GE ports, you need 100 10GE ports. If you need two devices for two WAN uplinks (for redundancy), you need two devices. SDN won’t change the port count, redundancy requirements, or laws of physics. Corporations can buy cheaper [routers and switches]. Guess what – you still need the software to run them, and until we see price tags of SDN controllers, and do a TCO calculation, claims like this one remain wishful thinking (you did notice I’m extremely diplomatic today, didn’t you?).

© Copyright ipSpace.net 2014

Page 1-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Finally, numerous marketers and SDN/OpenFlow pundits keep repeating how they’ll save the (networking) world and bring true nirvana to the network operations with their flashy new gadgets. Nothing can be further from the truth because we cannot get rid of the legacy permeating the whole TCP/IP stack, as I explained in this post written in July 2013:

WHERE’S THE REVOLUTIONARY NETWORKING INNOVATION? In his recent blog post Joe Onisick wrote “What network virtualization doesn’t provide, in any form, is a change to the model we use to deploy networks and support applications. [...] All of the same broken or misused methodologies are carried forward. [...] Faithful replication of today’s networking challenges as virtual machines with encapsulation tunnels doesn’t move the bar for deploying applications.” Much as I agree with him, we can’t change much on planet Earth due to the fact that VMs use Ethernet NICs (so we need some form of VLANs to cater to infinite creativity of some people), IP addresses (so we need L3 forwarding), broken TCP stack (requiring load balancers to fix it), and obviously can’t be relied upon to be sufficiently protected (so we need external firewalls). Furthermore, unless we manage to stop shifting the problems around, the networking as a whole won’t get simpler. What overlay network virtualization does bring us is a decoupling that makes physical infrastructure less complex so it can focus on packet forwarding instead of zillions of customer-specific features preferably baked in custom ASICs. Obviously that’s not a good thing for everyone out there.

© Copyright ipSpace.net 2014

Page 1-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The final bit of hype I want to dispel is the misleading focus on CLI that we use to configure networking devices. CLI is not the problem, and GUI will not save the world.

FALLACIES OF GUI I love Greg Ferro’s characterization of CLI: We need to realise that the CLI is a “power tools” for specialist tradespeople and not a “knife and fork” for everyday use. However, you do know that most devices’ GUI offers nothing more than what CLI does, don’t you? Where’s the catch? For whatever reason, people find colorful screens full of clickable items less intimidating than a blinking cursor on black background. Makes sense – after all, you can see all the options you have; you can try pulling down things to explore possible values, and commit the changes once you think you enabled the right set of options. Does that make a product easier to use? Probably. Will it result in better-performing product? Hardly. Have you ever tried to configure OSPF through GUI? How about trying to configure usernames and passwords for individual wireless users? In both cases you’re left with the same options you’d have in CLI (because most vendors implement GUI as eye candy in front of the CLI or API). If you know how to configure OSPF or RADIUS server, GUI helps you break the language barrier (example: moving from Cisco IOS to Junos), if you don’t know what OSPF is, GUI still won’t save the day ... or it might, if you try clicking all the possible options until you get one that seems to work (expect a few meltdowns on the way if you’re practicing your clicking skills on a live network).

© Copyright ipSpace.net 2014

Page 1-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

What the casual network admins need are GUI wizards – a tool that helps you achieve a goal while keeping your involvement to a minimum. For example: “I need IP routing between these three boxes. Go do it!” should translate into “Configure OSPF in area 0 on all transit interfaces.” When you see a GUI offering this level of abstraction please let me know. In the meantime, I’m positive that the engineers who have to get a job done quickly prefer using CLI over clickety-click GUI (and I’m not the only one), regardless of whether they have to configure a network device, Linux server, Apache, MySQL, MongoDB or a zillion other products. Why do you think Microsoft invested so heavily in PowerShell

© Copyright ipSpace.net 2014

Page 1-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

2

SOFTWARE DEFINED NETWORKING 101

Open Networking Foundation (ONF) launched in March 2011 quickly defined Software Defined Networking (SDN) as architecture with centralized control plane that controls multiple physically distinct devices. That definition definitely suits one of the ONF founding members (Google), but is it relevant to the networking community at large? Or does it make more sense to focus on network programmability, or using existing protocols (BGP) in novel ways? This chapter contains my introductory posts on the SDN-related topics, musings on what makes sense, and a few thoughts on career changes we might experience in the upcoming years. You’ll find more details in subsequent chapters, including an overview of OpenFlow, in-depth analysis of OpenFlow-based architectures, some real-life OpenFlow and SDN deployments, and alternate approaches to SDN.

MORE INFORMATION You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

IN THIS CHAPTER: WHAT EXACTLY IS SDN (AND DOES IT MAKE SENSE)? BENEFITS OF SDN DOES CENTRALIZED CONTROL PLANE MAKE SENSE? HOW DID SOFTWARE DEFINED NETWORKING START? WE HAD SDN IN 1993 … AND DIDN’T KNOW IT STILL WAITING FOR THE STUPID NETWORK IS CLI IN MY WAY … OR IS IT JUST A SYMPTOM OF A BIGGER PROBLEM? OPENFLOW AND SDN – DO YOU WANT TO BUILD YOUR OWN RACING CAR? SDN, WINDOWS AND FRUITY ALTERNATIVES SDN, CAREER CHOICES AND MAGIC GRAPHS RESPONSE: SDN’S CASUALTIES

© Copyright ipSpace.net 2014

Page 2-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The very strict definition of SDN as understood by Open Networking Foundation promotes an architecture with strict separation between a controller and totally dumb devices that cannot do more than forward packets based on forwarding rules downloaded from the controller. Does that definition make sense? This is what I wrote in January 2014:

WHAT EXACTLY IS SDN (AND DOES IT MAKE SENSE)? When Open Networking Foundation claimed ownership of Software-Defined Networking, they defined it as separation of control and data plane: [SDN is] The physical separation of the network control plane from the forwarding plane, and where a control plane controls several devices. Does this definition make sense or is it too limiting? Is there more to SDN? Would a broader scope make more sense?

A BIT OF A HISTORY It’s worth looking at the founding members of ONF and their interests: most of them are large cloud providers looking for cheapest possible hardware, preferably using a standard API so it can be sourced from multiple suppliers, driving the prices even lower. Most of them are big enough to write their own control plane software (and Google already did). A separation of control plane (running their own software) and data plane (implemented in a lowcost white-label switches) was exactly what they wanted to see, and the Stanford team working on

© Copyright ipSpace.net 2014

Page 2-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow provided the architectural framework they could use. No wonder ONF pushes this particular definition of SDN.

MEANWHILE DEEP BELOW THE CLOUDY HEIGHTS I have yet to meet a customer (academics might be an exception) that would consider writing their own control-plane software; most of my customers aren’t anywhere close to writing an SDN application on top of a controller framework (Open Daylight, Cisco XNC or HP VAN SDN controller). Buying a shrink-wrapped application bundled with commercial support might be a different story … but then nobody really cares whether such a solution uses OpenFlow or RFC 2549; the protocols and encapsulation mechanisms used within a controller-based network solution are often proprietary and thus impossible to troubleshoot anyway. On the other hand, I keep hearing about common themes: 

The need for faster, more standardized, and automated provisioning;



The need for programmable network elements and vendor-neutral programming mechanisms (I’m looking at you, netmod working group);



Centralized policies and decision making based on end-to-end visibility;



Easier integration of network elements with orchestration and provisioning systems.

Will physical separation of control and forward plane solve any of these? It might, but there are numerous tools out there that can do the same without overhauling everything we’ve been doing in the last 30 years.

© Copyright ipSpace.net 2014

Page 2-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

We don’t need the physical separation of control plane to solve our problems (although the ability to control individual forwarding entries does help)… and it will probably take a decade before we glimpse the promised savings of white-label switches and open-source software (even Greg Ferro stopped believing that).

NOW WHAT? Does it make sense to accept the definition of SDN that makes sense to ONF founding members but not to your environment? Shall we strive for a different definition of SDN or just move on, declare it as meaningless as the clouds, and focus on solving our problems? Would it be better to talk about NetOps? Maybe we should stop talking and start doing – there are plenty of things you can do within existing networks using existing protocols.

© Copyright ipSpace.net 2014

Page 2-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Every new networking technology is supposed to solve most of our headaches. SDN is no exception. The reality might be a bit different.

BENEFITS OF SDN Paul Stewart wrote a fantastic blog post in May 2014 listing the potential business benefits of SDN (as promoted by SDN evangelists and SDN-washing vendors). Here’s his list: 

Abstracted Control Plane for a Central Point of Management



Granular Control of Flows (as required/desired)



Network Function Virtualization and Service Chaining



Decreased dependence on devices like load balancers



Facilitation of system orchestration



Easier troubleshooting/visibility



Platform for chargeback/showback



Decreased complexity and cost



Increased ability to utilize hardware and interconnections



DevOps friendly architecture

I have just one problem with this list – I’ve seen a similar list of benefits of IPv6:

© Copyright ipSpace.net 2014

Page 2-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-1: IPv6 myths

Unfortunately, the reality of IT in general and IPv6 in particular is a bit different. The overly hyped IPv6 benefits remain myths and legends; all we got were longer addresses, incompatible protocols (OSPFv3 anyone), and half-thought-out implementations (example: DNS autoconfiguration) ridden with religious wars (try to ask “why don’t we have first-hop router in DHCPv6” on any IPv6 mailing list ;). For more information, watch the fantastically cynical presentation Enno Rey had @ Troopers 2014 IPv6 Security summit, or my IPv6 resources.

© Copyright ipSpace.net 2014

Page 2-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

With Open Networking Foundation adamantly promoting their definition of SDN, and based on experiences with previous (now mostly extinct) centralized architectures, one has to ask a simple question: does it make sense? Here’s what I thought in May 2014:

DOES CENTRALIZED CONTROL PLANE MAKE SENSE? A friend of mine sent me a challenging question: You've stated a couple of times that you don't favor the OpenFlow version of SDN due to a variety of problems like scaling and latency. What model/mechanism do you like? Hybrid? Something else? Before answering the question, let’s step back and ask another one: “Does centralized control plane, as evangelized by ONF, make sense?”

A BIT OF HISTORY As always, let’s start with one of the greatest teachers: history. We’ve had centralized architectures for decades, from SNA to various WAN technologies (SDH/SONET, Frame Relay and ATM). They all share a common problem: when the network partitions, the nodes cut off from the central intelligence stop functioning (in SNA case) or remain in a frozen state (WAN technologies). One might be tempted to conclude that the ONF version of SDN won’t fare any better than the switched WAN technologies. Reality is far worse:

© Copyright ipSpace.net 2014

Page 2-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



WAN technologies had little control-plane interaction with the outside world (example: Frame Relay LMI), and those interactions were run by the local devices, not from the centralized control plane;



WAN devices (SONET/SDH multiplexers, or ATM and Frame Relay switches) had local OAM functionality that allowed them to detect link or node failures and reroute around them using preconfigured backup paths. One could argue that those devices had local control plane, although it was never as independent as control planes used in today’s routers. Interestingly, MPLS-TP wants to reinvent the glorious past and re-introduce centralized path management, yet again proving RFC 1925 section 2.11.

The last architecture (that I remember) that used truly centralized control plane was SNA, and if you’re old enough you know how well that ended.

WOULD CENTRAL CONTROL PLANE MAKE SENSE IN LIMITED DEPLOYMENTS? Central control plane is obviously a single point of failure, and network partitioning is a nightmare if you have a central point of control. Large-scale deployments of ONF variant of SDN are thus out of question. But does it make sense to deploy centralized control plane in smaller independent islands (campus networks, data center availability zones)? Interestingly, numerous data center architectures already use centralized control plane, so we can analyze how well they perform:

© Copyright ipSpace.net 2014

Page 2-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Juniper XRE can control up to four EX8200 switches, or a total of 512 10GE ports;



Nexus 7700 can control 64 fabric extenders with 3072 ports, plus a few hundred directly attached 10GE ports;



HP IRF can bind together two 12916 switches for a total of 1536 10GE ports;



QFabric Network Node Group could control eight nodes, for a total of 384 10GE ports.

NEC ProgrammableFlow seems to be an outlier – they can control up to 200 switches, for a total of over 9000 GE (not 10GE) ports… but they don’t run any control-plane protocol (apart from ARP and dynamic MAC learning) with the outside world. No STP, LACP, LLDP, BFD or routing protocols. One could argue that we could get an order of magnitude beyond those numbers if only we were using proper control plane hardware (Xeon CPUs, for example). I don’t buy that argument till I actually see a production deployment, and do keep in mind that NEC ProgrammableFlow Controller uses decent Intel-based hardware. Real-time distributed systems with fast feedback loops are way more complex than most people looking from the outside realize (see also RFC 1925, section 2.4).

DOES CENTRAL CONTROL PLANE MAKE SENSE? It does in certain smaller-scale environments (see above)… as long as you can guarantee redundant connectivity between then controller and controlled devices, or don’t care what happens after link loss (see also wireless access points). Does it make sense to generate a huge hoopla while reinventing this particular wheel? I would spend my energy doing something else.

© Copyright ipSpace.net 2014

Page 2-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

I absolutely understand why NEC went down this path – they did something extraordinary to differentiate themselves in a very crowded market. I also understand why Google decided to use this approach, and why they evangelize it as much as they do. I’m just saying that it doesn’t make that much sense for the rest of us. Finally, do keep in mind that the whole world of IT is moving toward scale-out architectures. Netflix & Co are already there, and the enterprise world is grudgingly doing the first steps. In the meantime, OpenFlow evangelists talk about the immeasurable revolutionary merits of centralized scale-up architecture. They must be living on a different planet.

© Copyright ipSpace.net 2014

Page 2-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Just in case you’re wondering how the OpenFlow/SDN movement started, here’s a bit of pre-2011 history.

HOW DID SOFTWARE DEFINED NETWORKING START? Software-Defined Networking is clearly a tautological term – after all, software defined networking device behavior ever since we stopped using Token Ring MAUs and unmanaged hubs. Open Networking Foundation claims it owns the definition of the term (which makes approximately as much sense as someone claiming they own the definition of red-colored clouds), but I was always wondering who coined the term in the first place. I finally found the answer in a fantastic overview of technologies and ideas that led to OpenFlow and SDN published in December 2013 issue of acmqueue. According to that article, SDN first appeared in an article published by MIT Technology Review that explains how Nick McKeown and his team at Stanford use OpenFlow: Frustrated by this inability to fiddle with Internet routing in the real world, Stanford computer scientist Nick McKeown and colleagues developed a standard called OpenFlow that essentially opens up the Internet to researchers, allowing them to define data flows using software--a sort of "software-defined networking." You did notice the “a sort of” classification and quotes around SDN, didn’t you? It’s pretty obvious how the article uses “software-defined networking” to illustrate the point… but once marketing took over all hope for reasonable discussion was lost, and SDN became even more meaningless as cloud.

© Copyright ipSpace.net 2014

Page 2-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Assuming we forget the ONF-promoted definition of SDN and define SDN as “network programmed from a central controller”, it’s obvious we had SDN for at least 20 years.

WE HAD SDN IN 1993 … AND DIDN’T KNOW IT I had three SDN 101 presentations during my 2013 visit to South Africa and had tried really hard to overcome my grumpy skeptic self and find the essence of SDN while preparing for them. As I’ve been thinking about controllers, central visibility and network device programmability, it struck me: we already had SDN in 1993. In 1993 we were (among other things) an Internet Service Provider offering dial-up and leased line Internet access. Being somewhat lazy, we hated typing the same commands in every time we had to provision a new user (in pre-TACACS+ days we had to use local authentication to have autocommand capability for dial-up users) and developed a solution that automatically changed the router configurations after we added a new user. Here’s a high-level diagram of what we did:

© Copyright ipSpace.net 2014

Page 2-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-2: Simple router provisioning system built in 1993

HTML user interface (written in Perl) gave the operators easy access to user database (probably implemented as a text file – we were true believers in NoSQL movement in those days), and a backend Perl script generated router configuration commands from the user definitions and downloaded them (probably through rcp – the details are a bit sketchy) to the dial-up access servers. Next revision of the software included support for leased line users – the script generated interface configurations and static routes for our core router (it was actually an MGS, but I found no good MGS images on the Internet) or one of the access server (for users using asynchronous modems). How is that different from all the shiny new stuff vendors are excitedly talking about? Beats me, I can’t figure it out ;) … and as I said before, you don’t always need new protocols to solve old problems.

© Copyright ipSpace.net 2014

Page 2-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

While we’re happily arguing the merits of reinvented architectures, we keep forgetting that the basics of sound network architecture were known for over a decade… and we still haven’t made any progress getting closer to them.

STILL WAITING FOR THE STUPID NETWORK More than 15 years ago the cover story of ACM netWorker magazine discussed the dawn of the stupid network – an architecture with smart edge nodes and simple packet forwarding code. Obviously we learned nothing in all those years – we’re still having the same discussions. Here are a few juicy quotes from that article (taken completely out of context solely for your enjoyment). The telcos seemed to "fall asleep at the switch" at the core of their network. "Keep it simple, stupid," or KISS, is an engineering virtue. The Intelligent Network, however, is anything but simple; it is a marketing concept for scarce, complicated, highpriced services. The Intelligent Network impedes innovation. Existing features are integrally spaghetticoded into the guts of the network, and new features must intertwine with the old. Infrastructure improvements are rapidly making the telcos' Intelligent Network a distinctly second-rate choice. The bottom line, though, is not the infrastructure; it is the innovation that the Stupid Network unleashes. The whole article is well worth reading, more so considering it’s over 15 years old and still spot-on.

© Copyright ipSpace.net 2014

Page 2-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Some SDN proponents claim that the way we configure networking devices (using CLI) is the biggest networking problem we’re facing today. They also conveniently forget that every scalable IT solution uses automation, text files and CLI… because they work, and allow experienced operators to work faster.

IS CLI IN MY WAY … OR IS IT JUST A SYMPTOM OF A BIGGER PROBLEM? My good friend Ethan published a blog post in February 2014 rightfully complaining how various vendor CLIs hamper our productivity. He’s absolutely correct from the productivity standpoint, and I agree with his conclusions (we need a layer of abstraction), but there’s more behind the scenes. We’re all sick of CLI. I don’t think anyone would disagree. However, CLI is not our biggest problem. We happen to be exposed to the CLI on a daily basis due to lack of automation tools and lack of abstraction layer; occasional fights with the usual brown substance flowing down the application stack don’t help either. The CLI problem is mostly hype. The “we need to replace CLI with (insert-your-favorite-gizmo)” hype was generated by SDN startups (one in particular) that want to sell their “disruptive” way of doing things to the venture capitalists. BTW, the best way to configure their tools is through CLI. CLI is still the most effective way of doing things – ask any really proficient sysadmin, web server admin or database admin how they manage their environment. It’s not through point-andclick GUI, it’s through automation tools coupled with simple CLI commands (because automation tools don’t work that well when they have to simulate mouse clicks).

© Copyright ipSpace.net 2014

Page 2-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

CLI generates vendor lock-in. Another pile of startup hype – in this case coming from startups that want to replace the network device lock-in with controller lock-in (here’s a similar story).

WE’RE NOT UNIQUE Startups and pundits would like to persuade you how broken “traditional” networking is, but every other field in IT has to deal with the same problems – just try to manage Windows server with Linux commands, or create tables on Microsoft SQL server with MySQL or Oracle syntax … even Linux distributions don’t have the same command set. The true difference between other IT fields and networking is that the other people did something to solve their problems while we keep complaining. Networking is no worse than any other IT discipline; we just have to start moving forward, create community tools, and vote with our wallets. Whenever you have a choice between two comparable products from different vendors, buy the one that offers greater flexibility and programmability. Don’t know what to look for? Talk with your server- and virtualization buddies (I hope you’re on speaking term with them, or it’s high time you buy them a beer or two). If they happen to use Puppet or Chef to manage servers, you might try to use the same tools to manage your routers and switches. Your favorite boxes don’t support the tools used by the rest of your IT? Maybe it’s time to change the vendor.

© Copyright ipSpace.net 2014

Page 2-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

It’s reasonably easy to add automation and orchestration on top of existing network implementation. Throwing away decades of field experience and replacing existing solutions with an OpenFlow-based controller is a totally different story as I explained in May 2013:

OPENFLOW AND SDN – DO YOU WANT TO BUILD YOUR OWN RACING CAR? The OpenFlow zealots are quick to point out the beauties of the centralized control plane, and the huge savings you can expect from using commodity hardware and open-source software. What they usually forget to tell you is that you also have to reinvent all the wheels the networking industry has invented in the last 30 years. Imagine you want to build your own F1 racing car... but the only component you got is a superduper racing engine from Mercedes Benz. You're left with the "easy" task of designing the car body, suspension, gears, wheels, brakes and a few other choice bits and pieces. You can definitely do all that if you're Google or McLaren team, but not if you're a Sunday hobbyist mechanic. No wonder some open-source OpenFlow controllers look like Red Bull Flugtag contestants. Does that mean we should ignore OpenFlow? Absolutely not, but unless you want to become really fluent in real-time event-driven programming (which might look great on your resume), you should join me watching from the sidelines until there's a solid controller (maybe we'll get it with Daylight, Floodlight definitely doesn't fit the bill) and some application architecture blueprints.

© Copyright ipSpace.net 2014

Page 2-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Till then, it might make sense to focus on more down-to-earth technologies; after all, you don't exactly need OpenFlow and a central controller to solve real-life problems, like Tail-f clearly demonstrated with their NCS software.

© Copyright ipSpace.net 2014

Page 2-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

“Openness” (for whatever value of “Open”) is another perceived benefit of SDN. In reality, you’re trading hardware vendor lock-in for controller vendor lock-in.

SDN,WINDOWS AND FRUITY ALTERNATIVES Brad Hedlund made a pretty valid comment to my “NEC Launched a Virtual OpenFlow Switch” blog post: “On the other hand, it's NEC end-to-end or no dice”, implicating the ultimate vendor lock-in. Of course he’s right and while, as Bob Plankers explains, you can never escape some lock-in (part 1, response from Greg Ferro, part 2 – all definitely worth reading), you do have to ask yourself “am I looking for Windows or Mac?” There are all sorts of arguments one hears from Mac fanboys (here’s a networking related one) but regardless of what you think of Mac and OSX, there’s the undisputable truth: compared to reloadful experience we get on most Windows-based boxes, Macs and OSX are rock solid; I have to reboot my Macbook every other blue moon. Even Windows is stable when running on a Macbook (apart from upgrade-induced reboots). Before you start praising Steve Jobs and blaming Bill Gates and Microsoft at large, consider a simple fact: OSX runs on a tightly controller hardware platform built with stability and reliability in mind. Windows has to run on every possible underperforming concoction a hardware vendor throws at you (example: my “high-end” laptop cannot record system audio because the 6-letter hardware vendor wanted to save $0.02 on the sound chipset and chose the cheapest possible one), and has to deal with all sort of crap third-party device drivers loaded straight into the operating system kernel.

© Copyright ipSpace.net 2014

Page 2-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Now, what do you want to have in your mission-critical SDN/OpenFlow data center networking infrastructure: a Mac-like tightly controlled and vendor-tested mix of equipment and associated controller, or a Windows-like hodgepodge of boxes from numerous vendors, controlled by thirdparty software that might have never encountered the exact mix of the equipment you have. If you’re young and brazen (like I was two decades ago), go ahead and be your own system integrator. If you’re too old and covered with vendor-inflicted scars, you might prefer a tested endto-end solution regardless of what Gartner says in vendor-sponsored reports (and even solutions that vendor X claims were tested don’t always work). Just don’t forget to consider the cost of downtime in your total-cost-of-ownership calculations.

© Copyright ipSpace.net 2014

Page 2-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

SDN controllers will replace networking engineers, at least if you believe what the SDN or virtualization vendors are telling you. I don’t think we have to worry about that happening in foreseeable future (and nothing changed since I wrote the following blog post in late 2012).

SDN, CAREER CHOICES AND MAGIC GRAPHS The current explosion of SDN hype (further fueled by recent VMworld announcement of SoftwareDefined Data Centers) made some networking engineers understandably nervous. This is the question I got from one of them: I have 8 plus years in Cisco, have recently passed my CCIE RS theory, and was looking forward to complete the lab test when this SDN thing hit me hard. Do you suggest completing the CCIE lab looking at this new future of Networking? Short answer: the sky is not falling, CCIE still makes sense, and IT will still need networking people. However, as I recently collected a few magic graphs for a short keynote speech, let me reuse them to illustrate this particular challenge we’re all facing. Starting with the obvious, here’s the legendary Diffusion of Innovations: every idea is first adopted by a few early adopters, followed by early and late majority.

© Copyright ipSpace.net 2014

Page 2-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-3: Diffusion of ideas (source: Wikipedia)

Networking in general is clearly in the late majority/laggards phase. What’s important for our discussion is the destruction of value-add through the diffusion process. Oh my, I sound like a freshly-baked MBA whiz-kid, let’s reword it: as a technology gets adopted, more people understand it, the job market competition increases, and thus it’s harder to get a well-paying job in that particular technology area. Supporting Windows desktops might be a good example.

© Copyright ipSpace.net 2014

Page 2-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

As a successful technology matures, it moves through the four parts of another magic matrix (this one from Boston Consulting Group).

Figure 2-4: Boston Consulting Group matrix

Initially every new idea is a great unknown, with only a few people brave enough to invest time in it (CCIE R&S before Cisco made it mandatory for Silver/Gold partner status). After a while, the successful ideas explode into stars with huge opportunities and fat margins (example: CCIE R&S a decade ago, Nicira-style SDN today … at least for Nicira’s founders), degenerates into a cash cow as

© Copyright ipSpace.net 2014

Page 2-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

the market slowly gets saturated (CCIE R&S is probably at this stage by now) and finally (when everyone starts doing it) becomes an old dog not worth bothering with. Does it make sense to invest into something that’s probably in a cash cow stage? The theory says “as much as needed to keep it alive”, but don’t forget that CCIE R&S will likely remain very relevant a long time: 

The protocol stacks we’re using haven’t changed in the last three decades (apart from extending the address field from 32 to 128 bits), and although people are working on proposals like MPTCP, those proposals are still in experimental stage;



Regardless of all the SDN hoopla, neither OpenFlow nor other SDN technologies address the real problems we’re facing today: lack of session layer in TCP and the use of IP addresses in application layer. They just give you different tools to implement today’s kludges.



Cisco is doing constant refreshes of its CCIE programs to keep them in the early adopters or early majority technology space, so the CCIE certification is not getting commoditized.



If you approach the networking certifications the right way, you’ll learn a lot about the principles and fundamentals, and you’ll need that knowledge regardless of the daily hype.

Now that I’ve mentioned experimental technologies – don’t forget that not all of them get adopted (even by early adopters). Geoffrey Moore made millions writing a book that pointed out that obvious fact. Of course he was smart enough to invent a great-looking wrapper – he called it Crossing the Chasm.

© Copyright ipSpace.net 2014

Page 2-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-5: The chasm before the mainstream market adoption (source: Crossing the Chasm & Inside the Tornado)

The crossing the chasm dilemma is best illustrated with Gartner Hype Cycles. After all the initial hype (that we’ve seen with OpenFlow and SDN) resulting in peak of inflated expectations, there’s the ubiquitous through of disillusionment. Some technologies die in that quagmire; in other more successful cases we eventually figure out how to use them (slope of enlightenment).

© Copyright ipSpace.net 2014

Page 2-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 2-6: Gartner hype cycle (source: Wikipedia)

We still don’t know how well SDN will be doing crossing the chasm (according to the latest Gartner’s charts, OpenFlow still hasn’t reached the hype peak - I dread what's still lying ahead of us); we’ve seen only a few commercial products and none of them has anything close to widespread adoption (not to mention the reality of three IT geographies).

© Copyright ipSpace.net 2014

Page 2-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Anyhow, since you’ve decided you want to work in networking, one thing is certain: technology will change (whatever the change will be), and it will happen with or without you. At every point in your career you have to invest some of your time into learning something new. Some of those new things will be duds; others might turn into stars. See also Private Clouds Will Change IT Jobs, Not Eliminate Them by Mike Fratto. Finally, don’t ask me for “what will the next big thing be” advice. Browse through the six years of my blog posts. You might notice a clear shift in focus; it’s there for a reason.

© Copyright ipSpace.net 2014

Page 2-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Finally, here’s a response to an “industry press” gem I wrote in 2013:

RESPONSE: SDN’S CASUALTIES An individual focused more on sensationalism than content deemed it appropriate to publish an article declaring networking engineers an endangered species on an industry press web site that I considered somewhat reliable in the past. The resulting flurry of expected blog posts included an interesting one from Steven Iveson in which he made a good point: it’s easy for the cream-of-the-crop not to be concerned, but what about others lower down the pile. As always, it makes sense to do a bit of reality check. 

While everyone talks about SDN, the products are scarce, and it will take years before they’ll appear in a typical enterprise network. Apart from NEC’s Programmable Flow and overlay networks, most other SDN-washed things I’ve seen are still point products.



Overlay virtual networks seem to be the killer app of the moment. They are extremely useful and versatile ... if you’re not bound to VLANs by physical appliances. We’ll have to wait for at least another refresh cycle before we get rid of them.



Data center networking is hot and sexy, but it’s only a part of what networking is. I haven’t seen a commercial SDN app for enterprise WAN, campus or wireless (I’m positive I’m wrong – write a comment to correct me), because that’s not where the VCs are looking at the moment.

Also, consider that the my job will be lost to technology sentiments started approximately 200 years ago and yet the population has increased by almost an order of magnitude in the meantime, there

© Copyright ipSpace.net 2014

Page 2-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

are obviously way more jobs now (in absolute terms) than there were in those days, and nobody in his right mind wants to do the menial chores that the technology took over. Obviously you should be worried if you’re a VLAN provisioning technician. However, with everyone writing about SDN you know what’s coming down the pipe, and you have a few years to adapt, expand the scope of your knowledge, and figure out where it makes sense to move (and don’t forget to focus on where you can add value, not what job openings you see today). If you don’t do any of the above, don’t blame SDN when the VLANs (finally) join the dinosaurs and you have nothing left to configure. Finally, I’m positive there will be places using VLANs 20 years from now. After all, AS/400s and APPN are still kicking and people are still fixing COBOL apps (that IBM just made sexier with XML and Java support).

© Copyright ipSpace.net 2014

Page 2-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

3

OPENFLOW BASICS

Based on exorbitant claims made by the industry press you might have concluded there must be some revolutionary concepts in the OpenFlow technology. Nothing could be further from the truth – OpenFlow is a very simple technology that allows a controller to program forwarding entries in a networking device. Did you ever encounter Catalyst 5000 with Route Switch Module (RSM), or a combination of Catalyst 5000 and an external router, using Multilayer Switching (MLS)? Those products used architecture identical to OpenFlow almost 20 years ago, the only difference being the relative openness of OpenFlow protocol. This chapter will answer a number of basic OpenFlow questions, including:

MORE INFORMATION You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



What is OpenFlow?



What can different versions of OpenFlow do?



How can a controller implement control-plane protocols (like LACP, STP or routing protocols) … and does it have to?



Can we deploy OpenFlow in combination with traditional forwarding mechanisms?

IN THIS CHAPTER: MANAGEMENT, CONTROL AND DATA PLANES IN NETWORK DEVICES AND SYSTEMS WHAT EXACTLY IS THE CONTROL PLANE? WHAT IS OPENFLOW? WHAT IS OPENFLOW (PART 2)? OPENFLOW PACKET MATCHING CAPABILITIES OPENFLOW ACTIONS OPENFLOW DEPLOYMENT MODELS FORWARDING MODELS IN OPENFLOW NETWORKS YOU DON’T NEED OPENFLOW TO SOLVE EVERY AGE-OLD PROBLEM OPENFLOW AND IPSILON: NOTHING NEW UNDER THE SUN

© Copyright ipSpace.net 2014

Page 3-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW: BIOS DOES NOT A SERVER MAKE SDN CONTROLLER NORTHBOUND API IS THE CRUCIAL MISSING PIECE IS OPENFLOW THE BEST TOOL FOR OVERLAY VIRTUAL NETWORKS? IS OPENFLOW USEFUL?

© Copyright ipSpace.net 2014

Page 3-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The fundamental principle underlying OpenFlow and Software Defined Networking (as defined by Open Networking Foundation) is the decoupling of control- and data plane, with data (forwarding) plane running in a networking device (switch or router) and control plane being implemented in a central controller, which controls numerous dumb devices. Let’s start with the basics – what are data, control and management planes?

MANAGEMENT, CONTROL AND DATA PLANES IN NETWORK DEVICES AND SYSTEMS Every single network device (or a distributed system like QFabric) has to perform at least three distinct activities: 

Process the transit traffic (that’s why we buy them) in the data plane;



Figure out what’s going on around it with the control plane protocols;



Interact with its owner (or Network Management System – NMS) through the management plane.

Routers are used as a typical example in every text describing the three planes of operation, so let’s stick to this time-honored tradition: 

Interfaces, IP subnets and routing protocols are configured through management plane protocols, ranging from CLI to NETCONF and the latest buzzword – northbound RESTful API;

© Copyright ipSpace.net 2014

Page 3-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Router runs control plane routing protocols (OSPF, EIGRP, BGP …) to discover adjacent devices and the overall network topology (or reachability information in case of distance/path vector protocols);



Router inserts the results of the control-plane protocols into Routing Information Base (RIB) and Forwarding Information Base (FIB). Data plane software or ASICs uses FIB structures to forward the transit traffic.



Management plane protocols like SNMP can be used to monitor the device operation, its performance, interface counters …

© Copyright ipSpace.net 2014

Page 3-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 3-1: Management, control and data planes

The management plane is pretty straightforward, so let’s focus on a few intricacies of the control and data planes. We usually have routing protocols in mind when talking about Control plane protocols, but in reality the control plane protocols perform numerous other functions including: 

Interface state management (PPP, LACP);



Connectivity management (BFD, CFM);

© Copyright ipSpace.net 2014

Page 3-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Adjacent device discovery (hello mechanisms present in most routing protocols, ES-IS, ARP, IPv6 ND, uPNP SSDP);



Topology or reachability information exchange (IP/IPv6 routing protocols, IS-IS in TRILL/SPB, STP);



Service provisioning (RSVP for IntServ or MPLS/TE, uPNP SOAP calls);

Data plane should be focused on forwarding packets but is commonly burdened by other activities: 

NAT session creation and NAT table maintenance;



Neighbor address gleaning (example: dynamic MAC address learning in bridging, IPv6 SAVI);



Netflow Accounting (sFlow is cheap compared to Netflow);



ACL logging;



Error signaling (ICMP).

Data plane forwarding is hopefully performed in dedicated hardware or in high-speed code (within the interrupt handler on low-end Cisco IOS routers), while the overhead activities usually happen on the device CPU (sometimes even in userspace processes – the switch from high-speed forwarding to user-mode processing is commonly called punting). In reactive OpenFlow architectures a punting decision sends a packet all the way to the OpenFlow controller.

© Copyright ipSpace.net 2014

Page 3-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Regardless of the implementation details, it’s obvious the device CPU represents a significant bottleneck (in some cases the switch to CPU-based forwarding causes several magnitudes lower performance) – the main reason one has to rate-limit ACL logging and protect the device CPU with Control Plane Protection features.

© Copyright ipSpace.net 2014

Page 3-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

It seems it’s easy to define what a network device control plane is (and how it’s different from the data plane)… until someone starts unearthing the interesting corner cases.

WHAT EXACTLY IS THE CONTROL PLANE? Tassos opened an interesting can of worms in a comment to my Management, Control and Data Planes post: Is ICMP response to a forwarded packet (TTL exceeded, fragmentation needed or destination unreachable) a control- or data-plane activity? Other control plane protocols (BGP, OSPF, LDP, LACP, BFD ...) are more clear-cut – they run between individual network devices (usually adjacent, but there’s also targeted LDP and multihop BGP) and could be (at least in theory) made to run across a separate control plane network (or VRF). Control plane protocols usually run over data plane interfaces to ensure shared fate – if the packet forwarding fails, the control plane protocol fails as well – but there are scenarios (example: optical gear) where the data plane interfaces cannot process packets, forcing you to run control plane protocols across a separate set of interfaces. Typical control plane protocols aren’t data-driven: BGP, LACP or BFD packet is never sent as a direct response to a data plane packet. ICMP is different: some ICMP packets are sent as replies to other ICMP packets, others are triggered by data plane packets (ICMP unreachables and ICMPv6 neighbor discovery).

© Copyright ipSpace.net 2014

Page 3-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Trying to classify protocols based on where they’re run is also misleading. It’s true that the networking device CPU almost always generates ICMP requests and responses (it doesn’t make sense to spend silicon real estate to generate ICMP responses). In some cases, ICMP packets might be generated in the slow path, but that’s just how a particular network operating system works. Let’s ignore those dirty details for the moment; just because a device’s CPU touches a packet doesn’t make that packet a control plane packet. Vendor terminology doesn’t help us either – most vendors talk about Control Plane Policing or Protection. These mechanisms usually apply to control plane protocols as well as data plane packets punted from ASICs to the device CPU. Even IETF terminology isn’t exactly helpful – while C in ICMP does stand for Control, it doesn’t necessarily imply control plane involvement. ICMP is simply a protocol that passes control messages (as opposed to user data) between IP devices. Honestly, I’m stuck. Is ICMP a control plane protocol that’s triggered by data plane activity or is it a data plane protocol? Can you point me to an authoritative source explaining what ICMP is? Share your thoughts in the comments!

© Copyright ipSpace.net 2014

Page 3-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Now that we know what data, control and management planes are, let’s see how OpenFlow fits into the picture.

WHAT IS OPENFLOW? A typical networking device (bridge, router, switch, LSR ...) runs all the control protocols (including port aggregation, STP, TRILL, MAC address learning and routing protocols) in the control plane (usually implemented in central CPU or supervisor module), and downloads the forwarding instructions into the data plane structures, which can be simple lookup tables or specialized hardware (hash tables or TCAMs). In architectures with distributed forwarding hardware the control plane has to use a communications protocol to download the forwarding information into data plane instances. Every vendor uses its own proprietary protocol (Cisco uses IPC – InterProcess Communication – to implement distributed CEF); OpenFlow tries to define a standard protocol between control plane and associated data plane elements. The OpenFlow zealots would like you to believe that we’re just one small step away from implementing Skynet; the reality is a bit more sobering. You need a protocol between control and data plane elements in all distributed architectures, starting with modular high-end routers and switches. Almost every modular high-end switch that you can buy today has one or more supervisor modules and numerous linecards performing distributed switching (preferably over a crossbar matrix, not over a shared bus). In such a switch, OpenFlow-like protocol runs between supervisor module(s) and the linecards.

© Copyright ipSpace.net 2014

Page 3-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Moving into more distributed space, the fabric architectures with central control plane (HP’s IRF, Cisco’s VSS) use an OpenFlow-like protocol between the central control plane and forwarding instances. You might have noticed that all vendors support a limited number of high-end switches in a central control plane architecture (Cisco’s VSS cluster has two nodes and HP’s IRF cluster can have up to four high-end switches). This decision has nothing to do with vendor lock-in and lack of open protocols but rather reflects the practical challenges of implementing a high-speed distributed architecture (alternatively, you might decide to believe the whole networking industry is a confusopoly of morons who are unable to implement what every post-graduate student can simulate with open source tools). Moving deeper into the technical details, the OpenFlow Specs page on the OpenFlow web site contains a link to the OpenFlow Switch Specification v1.1.0, which defines: 

OpenFlow tables (the TCAM structure used by OpenFlow);



OpenFlow channel (the session between an OpenFlow switch and an OpenFlow controller);



OpenFlow protocol (the actual protocol messages and data structures).

The designers of OpenFlow had to make the TCAM structure very generic if they wanted to offer an alternative to numerous forwarding mechanisms implemented today. Each entry in the flow tables contains the following fields: ingress port, source and destination MAC address, ethertype, VLAN tag & priority bits, MPLS label & traffic class (starting with OpenFlow 1.1), IP source and destination address (and masks), layer-4 IP protocol, IP ToS bits and TCP/UDP port numbers.

© Copyright ipSpace.net 2014

Page 3-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

To make the data plane structures scalable, OpenFlow 1.1 introduces a concept of multiple flow tables linked into a tree (and group tables to support multicasts and broadcasts). This concept allows you to implement multi-step forwarding, for example: 

Check inbound ACL (table #1)



Check QoS bits (table #2)



Match local MAC addresses and move into L3/MPLS table; perform L2 forwarding otherwise (table #3)



Perform L3 or MPLS forwarding (tables #4 and #5).

You can pass metadata between tables to make the architecture even more versatile. The proposed flow table architecture is extremely versatile (and I’m positive there’s a PhD thesis being written proving that it is a superset of every known and imaginable forwarding paradigm), but it will have to meet the harsh reality before we’ll see a full-blown OpenFlow switch products. You can implement the flow tables in software (in which case the versatility never hurts, but you’ll have to wait a few years before the Moore Law curve catches up with terabit speeds) or in hardware where the large TCAM entries will drive the price up.

© Copyright ipSpace.net 2014

Page 3-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

I started getting more detailed OpenFlow questions after the initial “What is OpenFlow” post, and tried to answer the most common ones in a follow-up post.

WHAT IS OPENFLOW (PART 2)? Here’s a typical list of questions I’m getting from my readers: I don’t think OpenFlow is clearly defined yet. Is it a protocol? A model for Control plane – Forwarding plane FP interaction? An abstraction of the forwarding-plane? An automation technology? Is it a virtualization technology? I don’t think there is consensus on these things yet. OpenFlow is very well defined. It’s a control plane (controller) – data plane (switch) protocol that allows control plane to: 

Modify forwarding entries in the data plane;



Send control protocol (or data) packets through any port of any controlled data-plane devices;



Receive (and process) packets that cannot be handled by the data plane forwarding rules. These packets could be control-plane protocol packets (for example, LLDP) or user data packets that need special processing.

As part of the protocol, OpenFlow defines abstract data plane structures (forwarding table entries) that have to be implemented by OpenFlow-compliant forwarding devices (switches). Is it an abstraction of the forwarding plane? Yes, as far as it defines data structures that can be used in OpenFlow messages to update data plane forwarding structures.

© Copyright ipSpace.net 2014

Page 3-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Is it an automation technology? No, but it can be used to automate the network deployments. Imagine a cluster of OpenFlow controllers with shared configuration rules that use packet carrying capabilities of OpenFlow protocol to discover network topology (using LLDP or a similar protocol), build a shared topology map of the network, and use it to download forwarding entries into the controlled data planes (switches). Such a setup would definitely automate new device provisioning in a large-scale network. Alternatively, you could use OpenFlow to create additional forwarding (actually packet dropping) entries in access switches or wireless access points deployed throughout your network, resulting in a scalable multi-vendor ACL solution. Is it a virtualization technology? Of course not. However, its data structures can be used to perform MAC address, IP address or MPLS label lookup and push user packets into VLANs (or push additional VLAN tags to implement Q-in-Q) or MPLS-labeled frames, so you can implement most commonly used virtualization techniques (VLANs, Q-in-Q VLANs, L2 MPLS-based VPNs or L3 MPLS-based VPNs) with it. There’s no reason you couldn’t control soft switch (embedded in the hypervisor) with OpenFlow. An open-source hypervisor switch implementation (Open vSwitch) that has “many extensions for virtualization” is already available and can be used with Xen/XenServer (it’s the default networking stack in XenServer 6.0) or KVM. Open vSwitch became the de-facto OpenFlow switch reference implementation. It’s used by many hardware and software vendors, including VMware, which uses Open vSwitch in the multi-hypervisor version of NSX.

© Copyright ipSpace.net 2014

Page 3-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

I’m positive the list of Open vSwitch extensions is hidden somewhere in its somewhat cryptic documentation (or you could try to find them in the source code), but the list of OpenFlow 1.2 proposals implemented by Open vSwitch or sponsored by Nicira should give you some clues: 

IPv6 matching with IPv6 header rewrite;



Virtual Port Tunnel configuration protocol and GRE/L3 tunnel support.



Controller master/slave switch. A must for resilient large-scale solutions.

Summary: OpenFlow is like C++. You can use it to implement all sorts of interesting solutions, but it’s just a tool.

© Copyright ipSpace.net 2014

Page 3-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow can match on almost any field in layer-2 (Ethernet, MPLS, 802.1Q, PBB, MPLS), layer-3 (IPv4 and IPv6) and layer-4 (TCP and UPD) headers. Here’s an overview covering OpenFlow version 1.0 through 1.3.

OPENFLOW PACKET MATCHING CAPABILITIES The original OpenFlow specification (version 1.0) allowed a controller to specify matches on MACand IPv4 addresses in forwarding entries downloaded to OpenFlow switches. Later versions of OpenFlow protocol added matching capabilities on almost all fields encountered in typical modern networks as shown in the following table (see release notes of the latest OpenFlow specification for more details). Match condition

Version

Input port

1.0

Ethernet source and destination MAC addresses

1.0

Ethernet frame type

1.0

VLAN tag

1.0

802.1p value

1.0

© Copyright ipSpace.net 2014

Page 3-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Match condition

Version

802.1ad (Q-in-Q) VLAN tags

1.1

Provider Backbone Bridging (PBB – 802.1ah)

1.3

MPLS tags

1.1

MPLS bottom-of-stack matching

1.3

NETWORK LAYER MATCHING Source and destination IP addresses (with subnet masks)

1.0

ToS/DSCP bits

1.0

Layer-4 IP protocol

1.0

IP addresses in ARP packets

1.0

IPv6 header fields (addresses, traffic class, higher-level protocols)

1.2

IPv6 extension headers

1.3

TRANSPORT LAYER MATCHING TCP and UDP port numbers

© Copyright ipSpace.net 2014

1.0

Page 3-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Match condition

Version

SCTP port numbers

1.1

ICMP type and code fields

1.0

ICMPv6 support

1.2

OTHER OPTIONS Extensible matching (matching on any bit pattern)

1.2

OpenFlow switches might not support all match conditions specified in the OpenFlow version they support. For example, most data center switches don’t support MPLS or PBB matching. Furthermore, some switches might implement certain matching actions in software. For example, early OpenFlow code for HP Procurve switches implemented layer-3 forwarding in hardware and layer-2 forwarding in software, resulting in significantly reduced forwarding performance.

© Copyright ipSpace.net 2014

Page 3-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

After matching a packet, an OpenFlow forwarding entry performs a list of actions on the matched packet. This blog post lists actions supported in OpenFlow versions 1.0 through 1.3.

OPENFLOW ACTIONS Every OpenFlow forwarding entry has two components: 

Flow match specification, which can use any combination of fields listed in the previous table;



List of actions to be performed on the matched packets.

Initial OpenFlow specification contained the basic actions one needs to implement MAC- and IPv4 forwarding as well as actions one might need to implement NAT or load balancing. Later versions of the OpenFlow protocol added support for MPLS, IPv6 and Provider Backbone Bridging (PBB). An OpenFlow switch OpenFlow switches might not support all actions specified in the OpenFlow version they support. For example, most switches don’t support MAC, IP address or TCP/UDP port number rewrites.

OpenFlow action

Version

Send to output port (or normal processing)

1.0

Set output queue

1.1

© Copyright ipSpace.net 2014

Page 3-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow action Process the packet through specified group (example: LAG or fast

Version 1.1

failover) Drop packet

1.0

Send input packet to controller

1.0

Add or remove 802.1q VLAN ID and 802.1p priority

1.0

Rewrite source or destination MAC address

1.0

Add or remove 802.1ad (Q-in-Q) tags

1.1

Provider Backbone Bridging (PBB – 802.1ah) push and pop

1.3

Push or pop MPLS tags

1.1

NETWORK LAYER ACTIONS Rewrite source or destination IP address

1.0

Rewrite DSCP header

1.0

Decrement TTL

1.1

© Copyright ipSpace.net 2014

Page 3-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow action

Version

TRANSPORT LAYER ACTIONS Rewrite TCP or UDP port numbers

1.0

Rewrite TCP and UDP port numbers

1.0

OTHER OPTIONS Extensible rewriting (rewriting any bit pattern)

© Copyright ipSpace.net 2014

1.2

Page 3-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The all-or-nothing approach to OpenFlow was quickly replaced with a more realistic approach. An OpenFlow-only deployment is potentially viable in dedicated greenfield environments, but even there it’s sometimes better to rely on functionality already available in networking devices instead of reinventing all the features and protocols that were designed, programmed, tested and deployed in the last 20 years. Not surprisingly, the traditional networking vendors quickly moved from OpenFlow-only approach to a plethora of hybrid solutions.

OPENFLOW DEPLOYMENT MODELS I hope you never believed the “OpenFlow networking nirvana” hype in which smart open-source programmable controllers control dumb low-cost switches, busting the “networking = mainframes” model and bringing the Linux-like golden age to every network. As the debates during the OpenFlow symposium clearly illustrated, the OpenFlow reality is way more complex than it appears at a first glance. To make it even more interesting, at least four different models for OpenFlow deployment have already emerged:

© Copyright ipSpace.net 2014

Page 3-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

NATIVE OPENFLOW The switches are totally dumb; the controller performs all control-plane functions, including running control-plane protocols with the outside world. For example, the controller has to use packet-out messages to send LACP, LLDP and CDP packets to adjacent servers and packet-in messages to process inbound control-plane packets from attached devices. This model has at least two serious drawbacks even if we ignore the load placed on the controller by periodic control-plane protocols: 

The switches need IP connectivity to the controller for the OpenFlow control session. They can use out-of-band network (where OpenFlow switches appear as IP hosts), similar to the QFabric architecture. They could also use in-band communication sufficiently isolated from the OpenFlow network to prevent misconfigurations (VLAN 1, for example), in which case they would probably have to run STP (at least in VLAN 1) to prevent bridging loops.



Fast control loops like BFD are hard to implement with a central controller, more so if you want to have very fast response time.

NEC seems to be using this model quite successfully (although they probably have a few extensions), but already encountered inherent limitations: a single controller can control up to ~50 switches and rerouting around failed links takes around 200 msec (depending on the network size). For more details, watch their Networking Tech Field Day presentation. NEC has since enhanced the scalability of their controller – a single controller cluster can manage over a 200 switches.

© Copyright ipSpace.net 2014

Page 3-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

NATIVE OPENFLOW WITH EXTENSIONS A switch controlled entirely by the OpenFlow controller could perform some of the low-level controlplane functions independently. For example, it could run LLDP and LACP, and bundle physical links into port channels (link aggregation groups). Likewise, it could perform load balancing across multiple links without involvement of the controller. OpenFlow got multipathing support in version 1.1. In late 2013 there are only a few commercially-available switches supporting OpenFlow 1.3 (vendors decided to skip versions 1.1 and 1.2).

Some controller vendors went down that route and significantly extended OpenFlow 1.1. For example, Nicira has added support for generic pattern matching, IPv6 and load balancing. Needless to say, the moment you start using OpenFlow extensions or functionality implemented locally on the switch, you destroy the mirage of the nirvana described at the beginning of the article – we’re back in the muddy waters of incompatible extensions and hardware compatibility lists. The specter of Fiber Channel looms large.

SHIPS IN THE NIGHT Switches have traditional control plane; OpenFlow controller manages only certain ports or VLANs on trunked links. The local control plane (or linecards) can perform the tedious periodic tasks like running LACP, LLDP and BFD, passing only the link status to the OpenFlow controller. The controller-

© Copyright ipSpace.net 2014

Page 3-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

to-switch communication problem is also solved: the TCP session between them traverses the nonOpenFlow part of the network. This approach is commonly used in academic environments where OpenFlow is running in parallel with the production network. It’s also one of the viable pilot deployment models.

INTEGRATED OPENFLOW OpenFlow classifiers and forwarding entries are integrated with the traditional control plane. For example, Juniper’s OpenFlow implementation inserts compatible flow entries (those that contain only destination IP address matching) as ephemeral static routes into RIB (Routing Information Base). OpenFlow-configured static routes can also be redistributed into other routing protocols.

Figure 3-2: Integrated OpenFlow (source: Juniper's presentation @ OpenFlow Symposium)

© Copyright ipSpace.net 2014

Page 3-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Going a step further, Juniper’s OpenFlow model presents routing tables (including VRFs) as virtual interfaces to the OpenFlow controller (or so it was explained to me). It’s thus possible to use OpenFlow on the network edge (on user-facing ports), and combine the flexibility it offers with traditional routing and forwarding mechanisms. From my perspective, this approach makes most sense: don’t rip-and-replace the existing network with a totally new control plane, but augment the existing well-known mechanisms with functionality that’s currently hard (or impossible) to implement. You’ll obviously lose the vague promised benefits of Software Defined Networking, but I guess that the ability to retain field-proven mechanisms while adding customized functionality and new SDN applications more than outweighs that.

© Copyright ipSpace.net 2014

Page 3-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

An OpenFlow network can emulate any network behavior supported by its components (hardware or virtual switches), from hop-by-hop forwarding to path-based forwarding paradigms.

FORWARDING MODELS IN OPENFLOW NETWORKS A few days ago Tom (@NetworkingNerd) Hollingsworth asked a seemingly simple question: “OpenFlow programs hop-by-hop packet forwarding, right? No tunnels?” and wasn’t satisfied with my standard answer, so here’s a longer explanation. Before we get started, keep in mind OpenFlow is just a tool that one can use (or not) in numerous environments. Tom’s question is (almost) equivalent to “C programs use string functions, right?” Some do, some don’t, depends on what you’re trying to do.

POINT OPENFLOW DEPLOYMENTS Sometimes you can solve your problem by using OpenFlow on individual (uncoupled) devices. Typical use cases: 

Edge security policy – authenticate users (or VMs) and deploy per-user ACLs before connecting a user to the network (example: IPv6 first-hop security);



Programmable SPAN ports – use OpenFlow entries on a single switch to mirror selected traffic to SPAN port;



DoS traffic blackholing – use OpenFlow to block DoS traffic as close to the source as possible, using N-tuples for more selective traffic targeting than the more traditional RTBH approach.

© Copyright ipSpace.net 2014

Page 3-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Traffic redirection – use OpenFlow to redirect interesting subset of traffic to network services appliance (example: IDS).

Using OpenFlow on one or more isolated devices is simple (no interaction with adjacent devices) and linearly scalable – you can add more devices and controllers as needed because there’s no tight coupling anywhere in the system.

FABRIC OPENFLOW DEPLOYMENTS Most OpenFlow products being developed these days try to solve the OpenFlow fabric use case (because existing data center fabrics and Internet clearly don’t work, right?). In these scenarios the OpenFlow controller manages all the switches in the forwarding path and has to install forwarding entries on every one of them. Not surprisingly, developers of these products took different approaches based on their understanding of networking challenges and limitations of OpenFlow devices. Some solutions (example: VMware NSX) bypass the complexities of fabric forwarding by establishing end-to-end something-over-IP tunnels, effectively reducing the fabric to a single hop. Path-based forwarding. Install end-to-end path forwarding entries into the fabric and assign user traffic to paths at the edge nodes (aka Edge and Core OpenFlow). Bonus points if you’re smart enough to pre-compute and install backup paths. If this looks like a description of MPLS LSPs, FECs and FRR, you’re spot on. There are only so many ways you can solve a problem in a scalable way.

© Copyright ipSpace.net 2014

Page 3-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The dirty details of path-based forwarding vary based on the hardware capabilities of the switches you use and your programming preferences. Using MPLS or PBB would be the cleanest option – those packet formats are well understood by network troubleshooting tools, so an unlucky engineer trying to fix a problem in an OpenFlow-based fabric would have a fighting chance. Unfortunately you won’t see much PBB or MPLS in OpenFlow products any time soon – they require OpenFlow 1.3 (or vendor extensions) and hardware support that’s often lacking in switches used for OpenFlow forwarding these days. OpenFlow controller developers are trying to bypass those problems with creative uses of packet headers (VLAN or MAC rewrite comes to mind), making a troubleshooters job much more interesting. Hop-by-hop forwarding. Install flow-matching N-tuples in every switch along the path. Results in an architecture that works great in PowerPoint and lab tests, but breaks down in anything remotely similar to a production network due to scalability problems, primarily FIB update challenges. If an OpenFlow controller using hop-by-hop forwarding paradigm implements proactive flow installation (install N-tuples based on configuration and topology), it just might work in small deployments. If it uses reactive flow installation (punt new flows to the controller, install microflow entries on every hop for each new flow), it deserves a nomination for Darwin Award.

WHY DOES IT MATTER? Would you buy a core router that only supports RIPv1? Would you use a solution that uses PBR instead of routing protocols? Would you use NetFlow-based forwarding with flows being instantiated by a central router (remember Multi-Layer Switching on Cat5000)? Probably not – we’ve learned the hard way which protocols and architectures work and which ones don’t.

© Copyright ipSpace.net 2014

Page 3-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow is an emerging technology, and you’ll stumble upon numerous vendors (from startups to major brand names) selling you OpenFlow-based solutions (and pixie dust). It’s important to understand how these solutions work behind the scenes when evaluating them. Everything will work great in your 2-node proof-of-concept lab, but you might encounter severe scalability limitations in real-life deployment.

© Copyright ipSpace.net 2014

Page 3-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Networking engineers’ reactions to OpenFlow were easy to predict – from “this will never work” to “here’s how I can solve my problem with OpenFlow.” It turns out we can solve many problems without involving OpenFlow; the traditional networking protocols are often good enough.

YOU DON’T NEED OPENFLOW TO SOLVE EVERY AGEOLD PROBLEM Two great blog posts appeared almost simultaneously: evergreen Fallacies of Distributed Computing from Bob Plankers and forward-looking Understanding Hadoop Clusters and the Network from Brad Hedlund. Read them both before continuing (they are both great reads) and try to figure out why I’m mentioning them in the same sentence (no, it’s not the fact that Hadoop uses distributed computing). OK, here’s the quote that ties them together. While describing rack awareness Brad wrote: What is NOT cool about Rack Awareness at this point is the manual work required to define it the first time, continually update it, and keep the information accurate. If the rack switch could auto-magically provide the Name Node with the list of Data Nodes it has, that would be cool. Or vice versa, if the Data Nodes could auto-magically tell the Name Node what switch they’re connected to, that would be cool too. Even more interesting would be a OpenFlow network, where the Name Node could query the OpenFlow controller about a Node’s location in the topology.

© Copyright ipSpace.net 2014

Page 3-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The “only” problem with Brad’s reasoning is that we already have the tools to do exactly what he’s looking for. The magic acronym is LLDP (802.1AB). LLDP has been standardized years ago and is available on numerous platforms, including Catalyst and Nexus switches, and Linux operating system (for example, lldpad is part of the standard Fedora distribution). Not to mention that every DCB-compliant switch must support LLDP as the DCBX protocol uses LLDP to advertise DCB settings between adjacent nodes. The LLDP MIB is standard and allows anyone with SNMP read access to discover the exact local LAN topology – the connected port names, adjacent nodes (and their names), and their management addresses (IPv4 or IPv6). The management addresses that should be present in LLDP advertisements can then be used to expand the topology discovery beyond the initial set of nodes (assuming your switches do include it in LLDP advertisement; for example, NX-OS does but Force10 doesn't). Building the exact network topology from LLDP MIB is a very trivial exercise. Even a somewhat reasonable API is available (yeah, having an API returning a network topology graph would be even cooler). Mapping the Hadoop Data Nodes to ToR switches and Name Nodes can thus be done on existing gear using existing protocols. Would OpenFlow bring anything to the table? Actually not, it also needs packets exchanged between adjacent devices to discover the topology and the easiest thing for OpenFlow controllers to use is ... ta-da ... LLDP ... oops, OFDP, because LLDP just wasn’t good enough. The “only” difference is that in the traditional network the devices would send LLDP packets themselves, whereas in the OpenFlow world the controller would use Packet-Out messages of the OpenFlow control session to send LLDP packets from individual controlled devices and wait for Packet-In messages from other device to discover which device received them.

© Copyright ipSpace.net 2014

Page 3-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The Linux configuration wouldn’t change much. If you want the switches to see the hosts, you still have to run LLDP (or OFDP or whatever you call it) daemon on the hosts. Last but definitely not least, you could use well-defined SNMP protocol with a number of readilyavailable Linux or Windows libraries to read the LLDP results available in the SNMP MIB in the “old world” devices. I’m still waiting to see the high-level SDN/OpenFlow API; everything I’ve seen so far are OpenFlow virtualization attempts (multiple controllers accessing the same devices) and discussions indicating standard API isn’t necessarily a good idea. Really? Haven’t you learned anything from the database world? So, why did I mention the two posts at the beginning of this article? Because Bob pointed out that “those who cannot remember the past are condemned to fulfill it.” At the moment, OpenFlow seems to fit the bill perfectly.

© Copyright ipSpace.net 2014

Page 3-34

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

We’re not coming to the “skeptic” part of this chapter. Let’s start with an easy observation: ideas similar to OpenFlow were floated in 1990’s (and failed miserably).

OPENFLOW AND IPSILON: NOTHING NEW UNDER THE SUN Several companies were trying to solve the IP+ATM integration problem in mid-nineties, most of them using IP-based architectures (Cisco, IBM, 3Com), while Ipsilon tried its luck with a flow-based solutions. I found a great overview of IP+ATM solutions in an article published on the University of Washington web site. This is what the article has to say about Ipsilon’s approach (and if you really want to know the details, read GSMP (RFC 1987) and Ipsilon Flow Management Protocol (RFC 1953)): An IP switch controller routes like an ordinary router, forwarding packets on a default VC. However, it also performs flow classification for traffic optimization. Replace IP switch controller with OpenFlow controller and default VC with switch-to-controller OpenFlow session. Once a flow is identified, the IP switch sets up a cut-through connection by first establishing a VC for subsequent flow traffic, and then by asking the upstream node to use this VC.

© Copyright ipSpace.net 2014

Page 3-35

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Likewise, some people propose downloading 5-tuples or 12-tuples in all the switches along the flow path. The only difference is that 15 years ago engineers understood virtual circuit labels use fewer resources than 5-to-12-tuple policy-based routing. As expected, Ipsilon’s approach had a few scaling issues. From the same article: The bulk of the criticism, however, relates to Ipsilon's use of virtual circuits. Flows are associated with application-to-application conversations and each flow gets its very own VC. Large environments like the Internet with millions of individual flows would exhaust VC tables. Not surprisingly, a number of people (myself included) that still remember a bit of the networking history are making the exact same argument about usage of microflows in OpenFlow environments ... but it seems RFC 1925 (section 2.11) will yet again carry the day. An hour after publishing this blog post, I realized (reading an article by W.R. Koss) that Ed Crabbe mentioned Ipsilon being the first attempt at SDN during his OpenFlow Symposium presentation.

© Copyright ipSpace.net 2014

Page 3-36

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Continuing the skeptic streak: do you really expect to get a network operating system just because you have a protocol that allows you to download forwarding tables into a switch? The blog post was written in 2011, when the shortcomings of OpenFlow weren’t that well understood. Three years later (August 2014), all we have is a single production-grade commercial controller (NEC ProgrammableFlow).

OPENFLOW: BIOS DOES NOT A SERVER MAKE Greg (@etherealmind) Ferro invited me to a Packet Pushers podcast discussing OpenFlow with Matt Davey (then working at University of Indiana). I was pleasantly surprised by Matt’s realistic attitude (you should really listen to the whole podcast), it was nice to hear that they’re running a countrywide pilot with OpenFlow-enabled switches deployed at several universities, and some of the applications he mentioned (for example, the capability to download ACLs into the switch from your customized application) definitely tickled my inner geek. However, I’m even more convinced that the brouhaha surrounding Open Networking Foundation has little grounds in the realities of OpenFlow. Remember: OpenFlow is a protocol allowing controlling software to download forwarding table entries into one or more switches (which can be L2, L3 or LSR switches). Any OpenFlow-based solution requires two components: the switching hardware with OpenFlow-capable firmware and the controlling software using the OpenFlow protocol. The OpenFlow protocol will definitely enable many copycat vendors to buy merchant silicon, put it together and start selling their product with little investment in R&D (like the PC motherboard manufacturers are doing today). I am also positive the silicon manufacturers (like Broadcom) will

© Copyright ipSpace.net 2014

Page 3-37

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

have “How to build OpenFlow Switch with Our Chipset” application notes available as soon as they find OpenFlow commercially viable. Hopefully we’ll see another Dell (or HP) emerge, producing lowcost reasonable-quality products in the low-end to mid-range market ... but all these switches will still need networking software controlling them. If you’re old enough to remember the original PCs from IBM, you’ll easily recognize the parallels. IBM documented PC hardware architecture and BIOS API (you even got BIOS source code), allowing numerous third-party vendors to build adapter cards (and later PC clones), but all those machines had to run an operating system ... and most of them used MS-DOS (and later Windows). Almost three decades later, vast majority of PCs still run on Microsoft’s operating systems. Some people think that the potential adoption of OpenFlow protocol will magically materialize opensource software to control the OpenFlow switches, breaking the bonds of proprietary networking solutions. In reality, the companies that invested heavily in networking software (Cisco, Juniper, HP and a few others) might be the big winners ... if they figure out fast enough that they should morph into software-focused companies. Cisco has clearly realized the winds are changing and started talking about inclusion of OpenFlow in NX-OS operating system. I would bet their first OpenFlow implementation won’t be an OpenFlowenabled Nexus switch.

© Copyright ipSpace.net 2014

Page 3-38

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Moving a bit further, you cannot program a controller unless it has a well-defined API you can use (the northbound API). More than two years after the creation of the Open Networking Foundation, we still don’t have a specification (not even a public draft), and ever controller vendor uses a different API. The situation might improve with the release of Open Daylight, an open-source OpenFlow controller that will (if it becomes widely used) set a de-facto standard.

SDN CONTROLLER NORTHBOUND API IS THE CRUCIAL MISSING PIECE Imagine you’d like to write a simple Perl (or Python, Ruby, JavaScript – you get the idea) script to automate a burdensome function on your server (or router/switch from any vendor running Linux/BSD behind the scenes) that the vendor never bothered to implement. The script interpreter relies on numerous APIs being available from the operating system – from process API (to load and start the interpreter) to file system API, console I/O API, memory management API, and probably a few others. Now imagine none of those APIs would be standardized (various mutually incompatible dialects of Tcl used by Cisco IOS come to mind) – that’s the situation we’re facing in the SDN land today. If we accept the analogy of OpenFlow being the x86 instruction set (it’s actually more like the pcode machine from UCSD Pascal days, but let’s not go there today), and all we want to do is to write a simple script that will (for example) redirect the backup-to-tape traffic to secondary path during peak hours, we need a standard API to get the network topology, create a path across the network,

© Copyright ipSpace.net 2014

Page 3-39

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

and create an ingress Forwarding Equivalence Class (FEC) to map the backup traffic to that path. In short, we need what’s called SDN Controller Northbound API.

THERE IS NO STANDARD NORTHBOUND API I have some bad news for you: nobody is working on standardizing such an API (read a great summary by Brad Casemore, and make sure to read all the articles he linked to). Are you old enough to remember the video games for early IBM PC? None of them used MS-DOS. They were embedded software solutions that you had to boot off a floppy disk (remember those?) and then they took over all the hardware you had. That’s exactly what we have in the SDN land today. Don’t try to tell me I’ve missed Flowvisor – an OpenFlow controller that allocates slices of actual hardware to individual OpenFlow controllers. I haven’t; but using Flowvisor to solve this problem is like using Xen (or KVM or ESXi) to boot multiple embedded video games in separate VMs. Not highly useful for a regular guy trying to steer some traffic around the network (or any one of the other small things that bother us), is it? Also, don’t tell me each SDN controller has an API. While NEC and startups like Big Switch Networks are creating something akin to a network operating system that we could use to program our network (no, I really don’t want to deal with the topology discovery and fast failover myself), and each one of them has an API, no two APIs are even remotely similar. I still remember the days when there were at least a dozen operating systems running on top of 8088 processor, and it was a mission impossible to write a meaningful application that would run on only a few of them without major porting efforts.

© Copyright ipSpace.net 2014

Page 3-40

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

LET’S SPECULATE There might be several good reasons for the current state of affairs: 

The only people truly interested in OpenFlow are the Googles of the world (Nicira is using OpenFlow purely as an information transfer tool to get MAC-to-IP mappings into their vSwitches);



Developers figure out all sorts of excellent reasons why their dynamic and creative work couldn’t possibly be hammered into tight confines of a standard API;



Nobody is interested in creating a Linux-like solution; everyone is striving to achieve the maximum possible vendor lock-in;



We still don’t know what we’re looking for.

The reality is probably a random mixture of all four (and a few others), but that doesn’t change the basic facts: until there’s a somewhat standard and stable API (like SQL-86) that I could use with SDN controllers from multiple vendors, I’m better off using Cisco ONE or Junos XML API, otherwise I’m just trading lock-ins (as ecstatic users of umbrella network management systems would be more than happy to tell you). On the other hand, if I stick with Cisco or Juniper (and implement a simple abstraction layer in my application to work with both APIs) at least I could be pretty positive they’ll still be around in a year or two.

© Copyright ipSpace.net 2014

Page 3-41

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

When you have a hammer, every problem seems like a nail. Nicira and later Open Daylight tried to implement network virtualization with OpenFlow. As it turns out, they might have used a wrong tool.

IS OPENFLOW THE BEST TOOL FOR OVERLAY VIRTUAL NETWORKS? Overlay virtual networks were the first commercial-grade OpenFlow use case – Nicira’s Network Virtualization Platform (NVP – now VMware NSX for Multiple Hypervisors) used OpenFlow to program the hypervisor virtual switches (Open vSwitches – OVS). OpenStack is using the same approach in its OVS Neutron plugin, and it seems Open Daylight aims to reinvent that same wheel, replacing OVS plugin running on the hypervisor host agent with central controller. Does that mean that one should use OpenFlow to implement overlay virtual networks? Not really, OpenFlow is not exactly the best tool for the job.

EASY START: ISOLATED LAYER-2 OVERLAY NETWORKS Most OVS-based solutions (VMware NSX for Multiple Hypervisors, OpenStack …) use OpenFlow to program forwarding entries in hypervisor virtual switches. In an isolated layer-2 overlay virtual network OpenFlow isn’t such a bad fit – after all, the hypervisor virtual switches need nothing more than mapping between VM MAC addresses and hypervisor transport IP addresses, and that information is readily available in the cloud orchestration system.

© Copyright ipSpace.net 2014

Page 3-42

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The OpenFlow controller can thus proactively download the forwarding information to the switches, and stay out of the forwarding path, ensuring reasonable scalability. BTW, even this picture isn’t all rosy – Nicira had to implement virtual tunnels to work around the OpenFlow point-to-point interface model.

THE FIRST GLITCHES: LAYER-2 GATEWAYS Adding layer-2 gateways to overlay virtual networks reveals the first shortcomings of OpenFlow. Once the layer-2 environment stops being completely deterministic (layer-2 gateways introduce the need for dynamic MAC learning), the solution architects have only a few choices: 

Perform dynamic MAC learning in the OpenFlow controller – all frames with unknown source MAC addresses are punted to the controller, which builds the dynamic MAC address table and downloads the modified forwarding information to all switches participating in a layer-2 segment. This is the approach used by NEC’s ProgrammableFlow solution. Drawback: controller gets involved in the data plane, which limits the scalability of the solution.



Offload dynamic MAC learning to specialized service nodes, which serve as an intermediary between the predictive static world of virtual switching, and the dynamic world of VLANs. It seems NVP used this approach in one of the early releases. Drawback: The service nodes become an obvious chokepoint; an additional hop through a service node increases latency.



Give up, half-ditch OpenFlow, and implement either dynamic MAC learning in virtual switches in parallel with OpenFlow, or reporting of dynamic MAC addresses to the controller using a nonOpenFlow protocol (to avoid data path punting to the controller). It seems recent versions of VMware NSX use this approach.

© Copyright ipSpace.net 2014

Page 3-43

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

THE KILLER: DISTRIBUTED LAYER-3 FORWARDING Every layer-2 overlay virtual networking solution must eventually support distributed layer-3 forwarding (the customers that matter usually want that for one reason or another). Regardless of how you implement the distributed forwarding, hypervisor switches need ARP entries (see this blog post for more details), and have to reply to ARP queries from the virtual machines. Even without the ARP proxy functionality, someone has to reply to the ARP queries for the default gateway IP address.

ARP is a nasty beast in an OpenFlow world – it’s a control-plane protocol and thus not implementable in the pure OpenFlow switches. The implementers have (yet again) two choices: 

Punt the ARP packets to the controller, which yet again places the OpenFlow controller in the forwarding path (and limits its scalability);



Solve layer-3 forwarding with a different tool (approach used by VMware NSX and distributed layer-3 forwarding in OpenStack Icehouse).

DO WE REALLY NEED OPENFLOW? With all the challenges listed above, does it make sense to use OpenFlow to control overlay virtual networks? Not really. OpenFlow is like a Swiss Army knife (or a duck) – it can solve many problems, but is not ideal for any one of them.

© Copyright ipSpace.net 2014

Page 3-44

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Instead of continuously adjusting the tool to make it fit for the job, let’s step back a bit and ask another question: what information do we really need to implement layer-2 and layer-3 forwarding in an overlay virtual network? All we need are three simple lookup tables that can be installed via any API mechanism of your choice (Hyper-V uses PowerShell) 

IP forwarding table;



ARP table;



VM MAC-to-underlay IP table. Some implementations would have a separate connected interfaces table; other implementations would merge that with the forwarding table. There are also implementations merging ARP and IP forwarding tables.

These three tables, combined with local layer-2 and layer-3 forwarding is all you need. Wouldn’t it be better to keep things simple instead of introducing yet-another less-than-perfect abstraction layer?

© Copyright ipSpace.net 2014

Page 3-45

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The blog post explaining how OpenFlow doesn’t fit the needs of overlay virtual networks triggered a flurry of questions along the lines of “do you think there’s no need for OpenFlow?” Here’s the response:

IS OPENFLOW USEFUL? OpenFlow is just a tool that allows you to install PBR-like forwarding entries into networking devices using a standard protocol that should work across multiple vendors (more about that in another blog post). From this perspective OpenFlow offers the same functionality as BGP FlowSpec or ForCES, and a major advantage: it’s already implemented in networking gear from numerous vendors. Where could you use PBR-like functionality? I’m positive you already have a dozen ideas with various levels of craziness; here are a few more: 

Network monitoring (flow entries have counters);



Intelligent SPAN ports that collect only the traffic you’re interested in;



Transparent service insertion;



Scale-out stateful network services;



Distributed DoS prevention;



Policy enforcement (read: ACLs) at the network edge.

OpenFlow has another advantage over BGP FlowSpec – it has the packet-in and packet-out functionality that allows the controller to communicate with the devices outside of the OpenFlow network. You could use this functionality to implement new control-plane protocols or (for example) interesting layered authentication scheme that is not available in off-the-shelf switches.

© Copyright ipSpace.net 2014

Page 3-46

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Summary: OpenFlow is a great low-level tool that can help you implement numerous interesting ideas, but I wouldn’t spend my time reinventing the switching fabric wheel (or other things we already do well).

© Copyright ipSpace.net 2014

Page 3-47

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

4

OPENFLOW IMPLEMENTATION NOTES

It’s easy to say “OpenFlow allows you to separate the forwarding and control planes, and control multiple devices from a single controller”, but how do you implement the control plane? How does the control plane interact with the outside world? How do you implement legacy protocols in an OpenFlow controller… and do you have to implement them? You’ll get answers to all these questions in this chapter. Can you build an OpenFlow-based network with existing hardware? Is it possible to build a multivendor network? These questions are answered in the second half of the chapter, which focuses on vendor-specific implementation details.

MORE INFORMATION You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

IN THIS CHAPTER: CONTROL PLANE IN OPENFLOW NETWORKS IS OPEN VSWITCH CONTROL PLANE IN-BAND OR OUT-OF-BAND? IMPLEMENTING CONTROL-PLANE PROTOCOLS WITH OPENFLOW LEGACY PROTOCOLS IN OPENFLOW-BASED NETWORKS OPENFLOW 1.1 IN HARDWARE: I WAS WRONG OPTIMIZING OPENFLOW HARDWARE TABLES OPENFLOW SUPPORT IN DATA CENTER SWITCHES MULTI-VENDOR OPENFLOW – MYTH OR REALITY? HYBRID OPENFLOW, THE BROCADE WAY OPEN DAYLIGHT – INTERNET EXPLORER OR LINUX OF THE SDN WORLD?

© Copyright ipSpace.net 2014

Page 4-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

How do you build a control plane network in a distributed controller-based system? How does the controller communicate with the devices it controls? Should it use in-band or out-of-band communication? This blog post, written in late 2013, tries to provide some answers.

CONTROL PLANE IN OPENFLOW NETWORKS It’s easy to say “SDN is the physical separation of the network control plane from the forwarding plane, and where a control plane controls several devices,” handwave over the details, and let someone else figure them out. Implementing that concept in a reliable manner is a totally different undertaking.

OPENFLOW CONTROL PLANE 101 In an OpenFlow-based network architecture, the controller (or a cluster of redundant controllers) implements control-plane functionality: discovering the network topology and external endpoints (or adjacent network devices), computing the forwarding entries that have to be installed into individual network devices, and downloading them into controlled network devices using OpenFlow protocol.

© Copyright ipSpace.net 2014

Page 4-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-1: OpenFlow control plane 101

OpenFlow is an application-level protocol running on top of TCP (and optionally TLS) – the controller and controlled device are IP hosts using IP connectivity services of some unspecified control plane network. Does that bring back fond memories of SDH/SONET days? It should.

WHO WILL BUILD THE CONTROL PLANE NETWORK? The answer you’ll get from academic-minded OpenFlow zealots is likely “that’s out of scope, let’s focus on the magic new stuff the separation of control- and data plane brings you.” Pretty useless, right? We need a more detailed answer before we start building OpenFlow-based solutions. As always, history is our best teacher: similar architectures commonly used out-of-band controlplane networks.

© Copyright ipSpace.net 2014

Page 4-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OUT-OF-BAND OPENFLOW CONTROL PLANE The easiest (but not necessarily the most cost effective) approach to OpenFlow control plane network is to build a separate network connecting management ports of OpenFlow switches with OpenFlow controller. NEC is using this approach in their ProgrammableFlow solution, as is Juniper in its QFabric architecture. There’s something slightly ironic in this approach: you have to build a traditional L2 or L3 network to control the new gear.

Figure 4-2: Out-of-band OpenFlow control plane network

© Copyright ipSpace.net 2014

Page 4-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

You could (in theory) build another OpenFlow-controlled network to implement controlplane network you need, but you’d quickly end with turtles all the way down.

On the other hand, out-of-band control plane network is safe: we know how to build a robust L3 network with traditional gear, and a controller bug cannot disrupt the control-plane communication. I would definitely use this approach in data center environment, where the costs of implementing a dedicated 1GE control-plane network wouldn’t be prohibitively high. Would the same approach work in WAN/Service Provider environments? Of course it would – after all, we’ve been using it forever to manage traditional optical gear. Does it make sense? It definitely does if you already have an out-of-band network, less so if someone asks you to build a new one to support their bleeding-edge SDN solution.

IN-BAND CONTROL PLANE It’s possible (in theory) to get OpenFlow switches working with in-band control plane, but it’s a complex and potentially risky undertaking. To get an understanding of the complexities involved, read the relevant Open vSwitch documentation, which succinctly explains the challenges and the OVS solution. That solution would work under optimal circumstances on properly configured switches, but I would still use an out-of-band control plane in networks with transit OpenFlow-controlled switches (a transit switch being a switch passing control-plane traffic between controller and another switch).

© Copyright ipSpace.net 2014

Page 4-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

BUT GOOGLE GOT IT TO WORK No, they didn’t. They use OpenFlow within the data center edge to control low-cost fixedconfiguration switches they used to implement a large-scale routing device. They still run IS-IS and BGP between data centers, and use something functionally equivalent to PCEP to download centrally computed traffic-engineering tunnels into the data center edge routers.

© Copyright ipSpace.net 2014

Page 4-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

A few days after I wrote the “Control plane in OpenFlow networks” blog post, I got a comment saying “we worked really hard to implement numerous safeguards that make Open vSwitch in-band control plane safe.” Here’s the whole story:

IS OPEN VSWITCH CONTROL PLANE IN-BAND OR OUTOF-BAND? A few days ago I described how most OpenFlow data center fabric solutions use out-of-band control plane (separate control-plane network). Can we do something similar when running OpenFlow switch (example: Open vSwitch) in a hypervisor host? TL&DR answer: Sure we can. Does it make sense? It depends. Open vSwitch supports in-band control plane, but that’s not the focus of this post.

If you buy servers with a half dozen interfaces (I wouldn't), then it makes perfect sense to follow the usual design best practices published by hypervisor vendors, and allocate a pair of interfaces to user traffic, another pair to management/control plane/vMotion traffic, and a third pair to storage traffic. Problem solved.

© Copyright ipSpace.net 2014

Page 4-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-3: Interfaces dedicated to individual hypervisor functions

Buying servers with two 10GE uplinks (what I would do) definitely makes your cabling friend happy, and reduces the overall networking costs, but does result in slightly more interesting hypervisor configuration. Best case, you split the 10GE uplinks into multiple virtual uplink NICs (example: Cisco/s Adapter FEX, Broadcom's NIC Embedded Switch, or SR-IOV) and transform the problem into a known problem (see above) … but what if you're stuck with two uplinks?

© Copyright ipSpace.net 2014

Page 4-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-4: Logical interfaces created on physical NICs appear as physical interfaces to the hypervisor

OVERLAY VIRTUAL NETWORKS TO THE RESCUE If you implement all virtual networks (used by a particular hypervisor host) with overlay virtual networking technology, you don't have a problem. The virtual switch in the hypervisor (for example, OVS) has no external connectivity; it just generates IP packets that have to be sent across the transport network. The uplinks are thus used for control-plane traffic and encapsulated user traffic the OpenFlow switch is never touching the physical uplinks.

© Copyright ipSpace.net 2014

Page 4-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-5: Overlay virtual networks are not connected to the physical NICs

© Copyright ipSpace.net 2014

Page 4-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

INTEGRATING OPENFLOW SWITCH WITH PHYSICAL NETWORK Finally, there's the scenario where an OpenFlow-based virtual switch (usually OVS) provides VLANbased switching, and potentially interferes with control-plane traffic running over shared uplinks. Most products solve this challenge by somehow inserting the control-plane TCP stack in parallel with the OpenFlow switch.

Figure 4-6: Hypervisor TCP/IP stack running in parallel with the Open vSwitch

© Copyright ipSpace.net 2014

Page 4-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

For example, OVS Neutron agent creates a dedicated bridge for each uplink, and connects OVS uplinks and the host TCP/IP stack to the physical uplinks through the per-interface bridge. That setup ensures the control-plane traffic continues to flow even when a bug in Neutron agent or OVS breaks VM connectivity across OVS. For more details see OpenStack Networking in Too Much Detail blog post published on RedHat OpenStack site.

Figure 4-7: External bridges used by Neutron OVS plugin

© Copyright ipSpace.net 2014

Page 4-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Regardless of how an OpenFlow-based network is implemented, it has to exchange information with the outside world – routing protocol information with adjacent routers, STP BPDUs with adjacent switches, and LACP control frames with all adjacent devices (including some servers). Similar to the forwarding model, the OpenFlow controller designers could use numerous implementation paths.

IMPLEMENTING CONTROL-PLANE PROTOCOLS WITH OPENFLOW The true OpenFlow zealots would love you to believe that you can drop whatever you’ve been doing before and replace it with a clean-slate solution using dumbest (and cheapest) possible switches and OpenFlow controllers. In real world, your shiny new network has to communicate with the outside world … or you could take the approach most controller vendors did, decide to pretend STP is irrelevant, and ask people to configure static LAGs because you’re also not supporting LACP.

HYBRID-MODE OPENFLOW WITH TRADITIONAL CONTROL PLANE If you’re implementing hybrid-mode OpenFlow, you’ll probably rely on the traditional software running in the switches to handle the boring details of control-plane protocols and use OpenFlow only to add new functionality (example: edge access lists).

© Copyright ipSpace.net 2014

Page 4-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Needless to say, this approach usually won’t result in better forwarding behavior. For example, it would be hard to implement layer-2 multipathing in hybrid OpenFlow network if the switches rely on STP to detect and break the loops.

OPENFLOW-BASED CONTROL PLANE In an OpenFlow-only network, the switches have no standalone control plane logic, and thus the OpenFlow controller (or a cluster of controllers) has to implement the control plane and controlplane protocols. This is the approach Google used in their OpenFlow deployment – the OpenFlow controllers run IS-IS and BGP with the outside world. OpenFlow protocol provides two messages the controllers can use to implement any control-plane protocol they wish: 

The Packet-out message is used by the OpenFlow controller to send packets through any port of any controlled switch.



The Packet-in message is used to send messages from the switches to the OpenFlow controller. You could configure the switches to send all unknown packets to the controller, or set up flow matching entries (based on controller’s MAC/IP address and/or TCP/UDP port numbers) to select only those packets the controller is truly interested in.

For example, you could write a very simple implementation of STP (similar to what Avaya is doing on their ERS-series switches when they run MLAG) where the OpenFlow controller would always pretend to be the root bridge and shut down any ports where inbound BPDUs would indicate someone else is the root bridge:

© Copyright ipSpace.net 2014

Page 4-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Get the list of ports with Read State message;



Send BPDUs through all the ports claiming the controller is the root bridge with very high priority;



Configure flow entries that match the multicast destination address used by STP and forward those packets to the controller;



Inspect incoming BPDUs, and shut down the port if the BPDU indicates someone else claims to be a root bridge.

SUMMARY OpenFlow protocol allows you to implement any control-plane protocol you wish in the OpenFlow controller; if a controller does not implement the protocols you need in your data center, it’s not due to lack of OpenFlow functionality, but due to other factors (fill in the blanks). If the OpenFlow product you’re interested in uses hybrid-mode OpenFlow (where the control plane resides in the traditional switch software) or uses OpenFlow to program overlay networks (example: Nicira’s NVP), you don’t have to worry about its control-plane protocols. If, however, someone tries to sell you software that’s supposed to control your physical switches, and does not support the usual set of protocols you need to integrate the OpenFlow-controlled switches with the rest of your network (example: STP, LACP, LLDP on L2 and some routing protocol on L3), think twice. If you use the OpenFlow-controlled part of the network in an isolated fabric or small-scale environment, you probably don’t care whether the new toy supports STP or OSPF; if you want to integrate it with the rest of your existing data center network, be very careful.

© Copyright ipSpace.net 2014

Page 4-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most OpenFlow controller vendors try to ignore the legacy control plane protocols; after all, there’s no glory to be had in implementing LACP, LLDP or STP. Their myopic vision might hinder the success of your OpenFlow deployment, as you’ll have to integrate the new network with the legacy equipment.

LEGACY PROTOCOLS IN OPENFLOW-BASED NETWORKS I’m positive your CIO will get a visit from a vendor offering clean-slate OpenFlow/SDN-based data center fabrics in not so distant future. At that moment, one of the first questions you should ask is “how well does your new wonderland integrate with my existing network?” or more specifically “which L2 and L3 protocols do you support?” At least one of the vendors offering OpenFlow controllers that manage physical switches has a simple answer: use static LAG to connect your existing gear with our OpenFlow-based network (because our controller doesn’t support LACP), use static routes (because we don’t run any routing protocols) and don’t create any L2 loops in your network (because we also don’t have STP). If you wonder how reliable that is, you obviously haven’t implemented a redundant network with static routes before. However, to be a bit more optimistic, the need for legacy protocol support depends primarily on how the new solution integrates with your network.

© Copyright ipSpace.net 2014

Page 4-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Overlay solutions (like VMware NSX) don’t interact with the existing network at all. A hypervisor running Open vSwitch and using STT or GRE appears as an IP host to the network, and uses existing Linux mechanisms (including NIC bonding and LACP) to solve the L2 connectivity issues. Layer-2 gateways included with VMware NSX for multiple hypervisors support STP and LACP. VM-based gateways included with VMware NSX for vSphere run routing protocols (BGP, OSPF and IS-IS) and rely on underlying hypervisor’s support of layer-2 control plane protocols (LACP and LLDP). Hybrid OpenFlow solutions that only modify the behavior of the user-facing network edge (example: per-user access control) are also OK. You should closely inspect what the product does and ensure it doesn’t modify the network device behavior you rely upon in your network, but in principle you should be fine. For example, the XenServer vSwitch Controller modifies just the VM-facing behavior, but not the behavior configured on uplink ports. Rip-and-replace OpenFlow-based network fabrics are the truly interesting problem. You’ll have to connect existing hosts to them, so you’d probably want to have LACP support (unless you’re a VMware-only shop), and they’ll have to integrate with the rest of the network, so you should ask for at least: 

LACP, if you plan to connect anything but vSphere hosts to the fabric … and you’ll probably need a device to connect the OpenFlow-based part of the network to the outside world;



LLDP or CDP. If nothing else, they simplify troubleshooting, and they are implemented on almost everything including vSphere vSwitch.



STP unless the OpenFlow controller implements split horizon bridging like vSphere’s vSwitch, but even then we need basic things like BPDU guard.

© Copyright ipSpace.net 2014

Page 4-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



A routing protocol if the OpenFlow-based solution supports L3 (OSPF comes to mind).

Call me a grumpy old man, but I wouldn’t touch an OpenFlow controller that doesn’t support the above-mentioned protocols. Worst case, if I would be forced to implement a network using such a controller, I would make sure it’s totally isolated from the rest of my network. Even then a single point of failure wouldn’t make much sense, so I would need two firewalls or routers and static routing in redundant scenarios breaks sooner or later. You get the picture. To summarize: dynamic link status and routing protocols were created for a reason. Don’t allow glitzy new-age solutions to daze you, or you just might experience a major headache down the road.

© Copyright ipSpace.net 2014

Page 4-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In 2011 I thought we might have to wait a few years before seeing the first products supporting multiple lookup tables introduced by OpenFlow 1.1. I was wrong about the lack of hardware support for OpenFlow 1.1 – the first proof-of-concept products appeared a few months later. Unfortunately that product never became mainstream because the hardware it uses is too expensive – we had to wait till September 2013 to get first production-grade OpenFlow 1.3 switches (almost all vendors decided to skip OpenFlow versions 1.1 and 1.2).

OPENFLOW 1.1 IN HARDWARE: I WAS WRONG Earlier this month I wrote “we’ll probably have to wait at least a few years before we’ll see a fullblown hardware product implementing OpenFlow 1.1.” (and probably repeated something along the same lines in during the OpenFlow Packet Pushers podcast). I was wrong (and I won’t split hairs and claim that an academic proof-of-concept doesn’t count). Here it is: @nbk1 pointed me to a 100 Gbps switch implementing the latest-and-greatest OpenFlow 1.1. The trick lies in the NP-4 network processors from EZchip. These amazing beasts are powerful enough to handle the linked tables required by OpenFlow 1.1; the researchers “just” had to implement the OpenFlow API and compile OpenFlow TCAM structures into NP-4 microcode. I have to admit I’m impressed (and as some people know, that’s not an easy task). It doesn’t matter whether the solution can handle full 100 Gbps or what the pps figures are; they got very far very soon using off-the-shelf hardware, so it shouldn’t be impossibly hard to repeat the performance and launch a commercial product. The only question is the price of the NP-4 chipset (including associated TCAM they were using) – can someone build a reasonably-priced switch out of that hardware?

© Copyright ipSpace.net 2014

Page 4-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Initial hardware OpenFlow implementations installed OpenFlow forwarding rules in TCAM (the specialized memory used to implement packet filters and policy-based routing), resulting in a dismal maximum number of forwarding entries. Most vendors quickly realized it’s possible to combine multiple hardware tables available in their switching silicon, and present them as a single table to an OpenFlow controller.

OPTIMIZING OPENFLOW HARDWARE TABLES Initial OpenFlow hardware implementations used a simplistic approach: install all OpenFlow entries in TCAM (the hardware that’s used to implement ACLs and PBR) and hope for the best. That approach was good enough to get you a tick-in-the-box on RFP responses, but it fails miserably when you try to get OpenFlow working in a reasonably sized network. On the other hand, many problems people try to solve with OpenFlow, like data center fabrics, involve simple destination-only L2 or L3 switching. Problems that can be solved with destination-only L2- or L3 switching are so similar to what we’re doing with traditional routing protocols that I keep wondering whether it makes sense to reinvent that particular well-working wheel, but let’s not go there. The switching hardware vendors realized in the last months what the OpenFlow developers were doing and started implementing forwarding optimizations – they would install OpenFlow entries that require 12-tuple matching in TCAM, and entries that specify only destination MAC address or destination IP prefix in L2- and L3 switching structures (usually hash tables for L2 switching and

© Copyright ipSpace.net 2014

Page 4-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

some variant of binary tree for L3 switching). The two or three switching tables would appear as a single OpenFlow table to the controller, and the hardware switch would be able to install more flows. Quite ingenious;) The vendors using this approach include Arista (L2), Cisco (L2), and Dell Force 10 (L2 and L3). HP is using both MAC table and TCAM in its 5900 switch, but presents them as two separate tables to the OpenFlow controller (at least that was my understanding of their documentation – please do correct me if I got it wrong), pushing the optimization challenge back to the controller.

© Copyright ipSpace.net 2014

Page 4-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In spring 2014 most data center switching vendors supported OpenFlow on at least some of their products. Here’s an overview documenting the state of data center switching market in May 2014:

OPENFLOW SUPPORT IN DATA CENTER SWITCHES Good news: In the last few months, almost all major data center Ethernet switching vendors (Arista, Cisco, Dell Force 10, HP, and Juniper) released documented GA version of OpenFlow on some of their data center switches. Bad news: no two vendors have even remotely comparable functionality. All the information in this blog post comes from publicly available vendor documentation (configuration guides, command references, release notes). NEC is the only vendor mentioned in this blog post that does not have public documentation, so it’s impossible to figure out (from the outside) what functionality their switches support. Some other facts: 

Most vendors offer OpenFlow 1.0. Exceptions: HP and NEC;



Most vendors have a single OpenFlow lookup table (one of the limitations of OpenFlow 1.0), HP has a single table on 12500, two tables on 5900, and a totally convoluted schema on Procurve switches.



Most vendors work with a single controller. Cisco’s Nexus switches can work with up to 8 concurrent controllers, HP switches with up to 64 concurrent controllers.

© Copyright ipSpace.net 2014

Page 4-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Many vendors optimize the OpenFlow lookup table by installing L2-only or L3-only flow entries in dedicated hardware (which still looks like the same table to the OpenFlow controller);



OpenFlow table sizes remain dismal. Most switches support low thousands of 12-tuple flows. Exception: NEC edge switches supports between 64K and 160K 12-tuple flows.



While everyone supports full 12-tuple matching (additionally, HP supports IPv6, MPLS, and PBB), almost no one (apart from HP) offers significant packet rewrite functionality. Most vendors can set destination MAC address or push a VLAN tag; HP’s 5900 can set any field in the packets, copy/decrement IP or MPLS TTL, and push VLAN, PBB or MPLS tags.

Summary: It’s neigh impossible to implement anything but destination-only L2+L3 switching at scale using existing hardware (the latest chipsets from Intel or Broadcom aren’t much better)… and I wouldn’t want to be a controller vendor dealing with idiosyncrasies of all the hardware out there – all you can do consistently across most hardware switches is forward packets (without rewrites), drop packets, or set VLAN tags.

© Copyright ipSpace.net 2014

Page 4-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Based on the state of OpenFlow support in existing data center switches (see the previous post), it’s fair to ask the question “is it realistic to expect multi-vendor OpenFlow deployments?” The answer I got in May 2013 was “no, unless you want to live with extremely baseline functionality”. The situation wasn’t any better in August 2014 when this chapter was last updated.

MULTI-VENDOR OPENFLOW – MYTH OR REALITY? NEC demonstrated multi-vendor OpenFlow network @ Interop Las Vegas, linking physical switches from Arista, Brocade, Centec, Dell, Extreme, Intel and NEC, and virtual switches in Linux (OVS) and Hyper-V (PF1000) environments in a leaf-and-spine fabric controlled by ProgrammableFlow controller (watch the video of Samrat Ganguly demonstrating the network). Does that mean we’ve entered the era of multi-vendor OpenFlow networking? Not so fast. You see, building real-life networks with fast feedback loops and fast failure reroutes is hard. It took NEC years to get a stable well-performing implementation, and they had to implement numerous OpenFlow 1.0 extensions to get all the features they needed. For example, they circumvented the flow update rate challenges by implementing a very smart architecture effectively equivalent to the Edge+Core OpenFlow ideas. In a NEC-only ProgrammableFlow network, the edge switches (be they PF5240 GE switches or PF1000 virtual switches in Hyper-V environment) do all the hard work, while the core switches do simple path forwarding. Rerouting around a core link failure is thus just a matter of path rerouting, not flow rerouting, reducing the number of entries that have to be rerouted by several orders of magnitude.

© Copyright ipSpace.net 2014

Page 4-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 4-8: Interop 2013 OpenFlow demo network (source: NEC Corporation of America)

In a mixed-vendor environments, ProgrammableFlow controller obviously cannot use all the smarts of the PF5240 switches; it has to fall back to the least common denominator (vanilla OpenFlow 1.0) and install granular flows in every single switch along the path, significantly increasing the time it takes to install new flows after a core link failure.

© Copyright ipSpace.net 2014

Page 4-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Will the multi-vendor OpenFlow get any better? It might – OpenFlow 1.3 has enough functionality to implement the Edge+Core design, but of course there aren’t too many OpenFlow 1.3 products out there ... and even the products that have been announced might not have the features ProgrammableFlow controller needs to scale the OpenFlow fabric. For the moment, the best advice I can give you is “If you want to have a working OpenFlow data center fabric, stick with NEC-only solution.”

© Copyright ipSpace.net 2014

Page 4-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most traditional data center switching vendors implemented hybrid OpenFlow functionality that allows an OpenFlow controller to manage individual ports or VLANs instead of the whole switch. Brocade was probably the first vendor that shipped a working solution (in June 2012).

HYBRID OPENFLOW,THE BROCADE WAY A few days after Brocade unveiled its SDN/OpenFlow strategy, Katie Bromley organized a phone call with Keith Stewart who kindly explained to me some of the background behind their current OpenFlow support. Apart from the fact that it runs on the 100GE adapters, the most interesting part is their twist on the hybrid OpenFlow deployment. The “traditional” hybrid OpenFlow model (what Keith called hybrid switch) is well known (and supported by multiple vendors): an OpenFlow-capable switch has two forwarding tables (or FIBs), a regular one (built from source MAC address gleaning or routing protocol information) and an OpenFlow-controlled one. Some ports of the switch use one of the tables, other ports the other. Effectively, a hardware switch supporting hybrid switch OpenFlow is split into two independent switches that operate in a ships-in-the-night fashion. More interesting is the second hybrid mode Brocade supports: the hybrid port mode, where the OpenFlow FIB augments the traditional FIB. Brocade’s switches using hybrid port approach can operate in protected or unprotected mode: 

Protected hybrid port mode uses OpenFlow FIB for certain VLANs or packets matching a packet filter (ACL). This mode allows you to run OpenFlow in parallel (ships-in-the-night) with the

© Copyright ipSpace.net 2014

Page 4-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

traditional forwarding over the same port – a major win if you’re not willing to spend money for two 100GE ports (one for OpenFlow traffic, another for regular traffic). 

Unprotected hybrid port mode performs a lookup in OpenFlow FIB first and uses the traditional FIB as a fallback mechanism (in case there’s no match in the OpenFlow table). This mode can be used to augment the traditional forwarding mechanisms (example: OpenFlow-controlled PBR) or create value-added services on top of (not in parallel with) the traditional network.

The set of applications that one can build with the hybrid OpenFlow is well known – from policybased routing and traffic engineering to bandwidth-on-demand. However, Brocade MLX has one more trick up its sleeve: it supports packet replication actions that can be used to implement behavior similar to IP Multicast or SPAN port functionality. You can use that feature in environments that need reliable packet delivery over UDP to increase the chance that at least a single copy of the packet will reach the destination. I like the hybrid approach Brocade took (it’s quite similar to what Juniper is doing with its integrated OpenFlow) and the interesting new features (like the packet replication), but the big question remains unanswered: where are the applications (aka OpenFlow controllers)? At the moment, everyone (Brocade included) is partnering with NEC or demoing their gear with public-domain controllers. Is this really the best the traditional networking vendors can do? I sincerely hope not.

© Copyright ipSpace.net 2014

Page 4-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Is Open Daylight the right answer to the controller wars that seemed inevitable in early 2013? Here’s my take (written in February 2013):

OPEN DAYLIGHT – INTERNET EXPLORER OR LINUX OF THE SDN WORLD? You’ve probably heard that the networking hardware vendors decided to pool resources to create an open-source OpenFlow controller. Just in case you’re wondering whether they lost their mind (no, they didn’t), here’s my cynical take. Are you old enough to remember how Microsoft killed the browser market? After the World Wide Web exploded (and caught Microsoft totally unprepared), there was a blooming browser market (with Netscape being the absolute market leader). Microsoft couldn’t compete in that market with an immature product (Internet Explorer) and decided it’s best to destroy the market. They made Internet Explorer freely available and the rest is history – after the free product won the browser wars (it’s hard to beat free and good enough) it took years for reasonable alternatives to emerge. Not surprisingly, browser innovation almost stopped until Internet Explorer lost its dominant market position. Even if you don’t remember Netscape Navigator, you’ve probably heard of Linux. Have you ever wondered how you could get a high-quality open-source operating system for free? Check the list of top Linux contributors (page 9-11 of the Linux Kernel Development report) – Red Hat, Intel, Novell and IBM. You might wonder why Intel and IBM invest in Linux. It’s simple: the less users have to

© Copyright ipSpace.net 2014

Page 4-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

pay for the operating systems, the more money will be left to buy hardware. For more details, you absolutely have to read Be Wary of Geeks Bearing Gifts by Simon Wardley. So what will Daylight be? Another Internet Explorer (killing the OpenFlow controller market, Big Switch in particular) or another Linux (a good product ensuring OpenFlow believers continue spending money on hardware, not software)? I'm hoping we'll get a robust networking Linux, but your guess is as good as mine.

© Copyright ipSpace.net 2014

Page 4-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

5

OPENFLOW SCALABILITY CHALLENGES

An architecture in which a central controller run the control plane and uses attached devices as pure forwarding elements has numerous scalability challenges, including: 

Flow-based forwarding paradigm doesn’t scale;



Hop-by-hop forwarding paradigm imposes significant overhead in large-scale networks. Path forwarding paradigm works much better;



Existing hardware (data center switches) supports low thousands of full OpenFlow entries, making it useless for large-scale deployments;



Existing hardware switches can install at most a few thousand new flow entries per second;



Data plane punting and packet forwarding to the controller in existing switches is extremely slow when compared to the regular data plane forwarding performance.

MORE INFORMATION You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

This chapter describes numerous challenges every OpenFlow controller implementation has to overcome to work well in large-scale environments. Use it as a (partial) checklist when evaluating OpenFlow controller products and solutions.

IN THIS CHAPTER: OPENFLOW FABRIC CONTROLLERS ARE LIGHT-YEARS AWAY FROM WIRELESS ONES OPENFLOW AND FERMI ESTIMATES 50 SHADES OF STATEFULNESS FLOW TABLE EXPLOSION WITH OPENFLOW 1.0 (AND WHY WE NEED OPENFLOW 1.3) FLOW-BASED FORWARDING DOESN’T WORK WELL IN VIRTUAL SWITCHES PROCESS, FAST AND CEF SWITCHING AND PACKET PUNTING CONTROLLER-BASED PACKET FORWARDING IN OPENFLOW NETWORKS CONTROL-PLANE POLICING IN OPENFLOW NETWORKS PREFIX-INDEPENDENT CONVERGENCE (PIC): FIXING THE FIB BOTTLENECK FIB UPDATE CHALLENGES IN OPENFLOW NETWORKS

© Copyright ipSpace.net 2014

Page 5-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

FORWARDING STATE ABSTRACTION WITH TUNNELING AND LABELING EDGE AND CORE OPENFLOW (AND WHY MPLS IS NOT NAT) EDGE PROTOCOL INDEPENDENCE: ANOTHER BENEFIT OF EDGE-AND-CORE LAYERING VIRTUAL CIRCUITS IN OPENFLOW 1.0 WORLD MPLS IS NOT TUNNELING WHY IS OPENFLOW FOCUSED ON L2-4? DOES CPU-BASED FORWARDING PERFORMANCE MATTER FOR SDN? OPENFLOW AND THE STATE EXPLOSION

© Copyright ipSpace.net 2014

Page 5-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow controllers are usually compared with wireless controllers (particularly when someone tries to prove that they’re a good idea). Nothing could be further from the truth.

OPENFLOW FABRIC CONTROLLERS ARE LIGHT-YEARS AWAY FROM WIRELESS ONES When talking about OpenFlow and the whole idea of controller-based networking, people usually say “well, it’s nothing radically new, we’ve been using wireless controllers for years and they work well, so the OpenFlow ones will work as well.” Unfortunately the comparison is totally misleading. While OpenFlow-based data center fabrics and wireless controller-based networks look very similar on a high-level PowerPoint diagram, in reality they’re light-years apart. Here are just a few dissimilarities that make OpenFlow-based fabrics so much more complex than the wireless controllers.

TOPOLOGY MANAGEMENT Wireless controllers work with the devices on the network edge. A typical wireless access point has two interfaces: a wireless interface and an Ethernet uplink, and the wireless controller isn’t managing the Ethernet interface or any control-plane protocols that interface might have to run. The wireless access point communicates with the controller through an IP tunnel and expects someone

© Copyright ipSpace.net 2014

Page 5-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

else to provide IP connectivity, routing and failure recovery. The underlying physical topology of the network is thus totally abstracted and invisible to the wireless controller. Data center fabrics are built from high-speed switches with tens of 10/40GE ports, and the OpenFlow controller must manage topology discovery, topology calculation, flow placement, failure detection and fast rerouting. There are zillions of things you have to do in data center fabrics that you never see in a controller-based wireless network.

TRAFFIC FLOW In traditional wireless networks all traffic flows through the controller (there are some exceptions, but let’s ignore them for the moment). The hub-and-spoke tunnels between the controller and the individual access points carry all the user traffic and the controller is doing all the smart forwarding decisions. In an OpenFlow-based fabric the controller should do a minimal amount of data-plane decisions (ideally: none) because every time you have to punt packets to the controller, you reduce the overall network performance (not to mention the dismal capabilities of today’s switches when they have to do CPU-based packet forwarding across an SSL session).

AMOUNT OF TRAFFIC Wireless access points handle megabits of traffic, making a hub-and-spoke controller-based forwarding a viable alternative.

© Copyright ipSpace.net 2014

Page 5-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Data center fabrics are usually multi-terabit structures (every single pizza-box ToR switch has over a terabit of forwarding capacity) – three to four orders of magnitude faster than the wireless network we’re comparing them with. Controller-based forwarding is totally unrealistic.

FORWARDING INFORMATION In a traditional controller-based wireless network, the access point forwarding is totally stupid – the access points forward the data between directly connected clients (if allowed to do so) or send the data received from them into the IP tunnel established with the controller (and vice versa). There’s no forwarding state to distribute; all an access point needs to know are the MAC addresses of the wireless clients. In an OpenFlow-based fabric the controller must distribute as much forwarding, filtering and rewriting (example: decrease TTL) information as possible to the OpenFlow-enabled switches to minimize the amount of traffic flowing through the controller. Furthermore, smart OpenFlow controllers build forwarding information in a way that allows the switches to cope with the link failures (the controller has to install backup entries with lower matching priority); you wouldn’t want to have an overloaded controller and burnt-out switch CPU every time a link goes down, network topology is lost, and the switch (in deep panic) forwards all the traffic to the controller. The functionality of a good OpenFlow controller that proactively pre-programs backup forwarding entries (example: NEC ProgrammableFlow) is very similar to MPLS Traffic Engineering with Fast Reroute; you cannot expect its complexity to be significantly lower than that.

© Copyright ipSpace.net 2014

Page 5-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

REAL-TIME EVENTS User roaming is the only real-time event in a controller-based wireless network (remember: access point uplink failure is not handled by the controller). Access points do most of the work on their own (the expected behavior is specified in IEEE standards anyway), and the controller just updates the MAC forwarding information. The worst thing that can happen if the controller is too slow is a slight delay experienced by the user (noticeable only on voice calls and by players of WoW sessions running around large buildings). The other near-real-time wireless event is user authentication, which often takes seconds (or my wireless network is severely misconfigured). Yet again, nothing critical; the controller can take its time. In data center fabrics, you have to react to a failure in milliseconds and reprogram the forwarding entries on tens of switches (unless you know what you’re doing and already installed the precomputed backup entries – see above).

FREQUENCY OF REAL-TIME EVENTS Wireless controllers probably handle between tens and few hundreds real-time events per second (unless you had a power glitch and every user wants to log into the network at the same time). OpenFlow controllers that implement flow-based forwarding (flow entries are downloaded into the switches for each individual TCP/UDP session – a patently bad idea if I ever saw one) are designed to handle millions of flow setups per second (not that the physical switches could take that load).

© Copyright ipSpace.net 2014

Page 5-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

SUMMARY As you can see, wireless controllers have nothing to do with OpenFlow controllers; they aren’t even remotely similar in requirements or complexity (the only exception being OpenFlow controllers that program just the network edge, like Nicira’s NVP). Comparing the two is misleading and hides the real scope of the problem; no wonder some people would love you to believe otherwise because that makes selling the controller-based fabrics easier. In reality, an OpenFlow controller managing a physical data center fabric is a complex piece of realtime software, as anyone who tried to build a high-end switch or router has learned the hard way.

© Copyright ipSpace.net 2014

Page 5-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Before going into the details of OpenFlow scalability challenges, let’s try to estimate the size of the problem we’re dealing with.

OPENFLOW AND FERMI ESTIMATES Fast advances in networking technologies (and the pixie dust sprinkled on them) blinded us – we lost our gut feeling and rule-of-thumb. Guess what: contrary to what we love to believe, networking isn’t unique. Physicists faced the same challenge for a long time; one of them was so good that they named the whole problem category after him. Every time someone tries to tell you what your problem is, and how their wonderful new gizmo will solve it, it’s time for another Fermi estimate. Let’s start with a few examples. Data center bandwidth. A few weeks ago a clueless individual working for a major networking vendor wrote a blog post (which unfortunately got pulled before I could link to it) explaining how network virtualization differs from server virtualization because we don’t have enough bandwidth in the data center. A quick estimate shows a few ToR switches have all the bandwidth you usually need (you might need more due to traffic bursts and number of server ports you have to provide, but that’s a different story). VM mobility for disaster avoidance needs. A back-of-the-napkin calculation shows you can’t evacuate more than half a rack per hour over a 10GE link. The response I usually get when I prod networking engineers into doing the calculation: “OMG, that’s just hilarious. Why would anyone want to do that?”

© Copyright ipSpace.net 2014

Page 5-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

And now for the real question that triggered this blog post: some people still think we can implement stateful OpenFlow-based network services (NAT, FW, LB) in hardware. How realistic is that? Scenario: web application(s) hosted in a data center with 10GE WAN uplink. Questions: 

How many new sessions are established per second (how many OpenFlow flows does the controller have to install in the hardware)?



How many parallel sessions will there be (how many OpenFlow flows does the hardware have to support)?

Facts (these are usually the hardest to find) 1. Size of an average web page is ~1MB 2. An average web page loads in ~5 seconds 3. An average web page uses ~20 domains 4. An average browser can open up to 6 sessions per hostname Using facts #3 and #4 we can estimate the total number of sessions needed for a single web page. It’s anywhere between 20 and 120, let’s be conservative and use 20. Using fact #1 and the previous result, we can estimate the amount of data transferred over a typical HTTP session: 50KB. Assuming a typical web page takes 5 seconds to load, a typical web user receives 200 KB/second (1.6 mbps) over 20 sessions or 10KB (80 kbps) per session. Seems low, but do remember that most of the time the browser (or the server) waits due to RTT latency and TCP slow start issues.

© Copyright ipSpace.net 2014

Page 5-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Assuming a constant stream of users with these characteristics, we get 125.000 new sessions over a 10GE every 5 seconds or 25.000 new sessions per second per 10Gbps. Always do a reality check. Is this number realistic? Load balancing vendors support way more connections per second (cps) @ 10 Gbps speeds. F5 BIG-IP 4000s claims 150K cps @ 10 Gbps, and VMware claims its NSX Edge Services Router (improved vShield Edge) will support 30K cps @ 4 Gbps. It seems my guestimate is on the lower end of reality (if you have real-life numbers, please do share them in comments!). Modern web browsers use persistent HTTP sessions. Browsers want to keep sessions established as long as possible, web servers serving high-volume content commonly drop them after ~15 seconds to reduce the server load (Apache is notoriously bad at handling very high number of concurrent sessions). 25.000 cps x 15 seconds = 375.000 flow records. Trident-2-based switches can handle 100K+ L4 OpenFlow entries (at least BigSwitch claimed so when we met @ NFD6). That’s definitely on the low end of the required number of sessions at 10 Gbps; do keep in mind that the total throughput of a typical Trident-2 switch is above 1 Tbps or three orders of magnitude higher. Enterasys switches support 64M concurrent flows @ 1Tbps, which seems to be enough. The flow setup rate on Trident-2-based switches is supposedly still in low thousands, or an order of magnitude too low to support a single 10 Gbps link (the switches based on this chipset usually have 64 10GE interfaces). Now is the time for someone to invoke the ultimate Moore’s Law spell and claim that the hardware will support whatever number of flow entries in not-so-distant future. Good luck with that; I’ll settle for an Intel Xeon server that can be pushed to 25 mpps. OpenFlow has its uses, but large-scale stateful services is obviously not one of them.

© Copyright ipSpace.net 2014

Page 5-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

State kept by networking devices is obviously one of the factors impacting scalability. Let’s see how much state we might need, how we can reduce the amount of state kept in a device, and how we can get rid of real-time state changes.

50 SHADES OF STATEFULNESS A while ago Greg Ferro wrote a great article describing integration of overlay and physical networks in which he wrote that “an overlay network tunnel has no state in the physical network”, triggering an almost-immediate reaction from Marten Terpstra (of RIPE fame, now @ Plexxi) arguing that the network (at least the first ToR switch) knows the MAC and IP address of hypervisor host and thus has at least some state associated with the tunnel. Marten is correct from a purely scholastic perspective (using his argument, the network keeps some state about TCP sessions as well), but what really matters is how much state is kept, which device keeps it, how it’s created and how often it changes.

HOW MUCH STATE DOES A DEVICE KEEP? The end hosts have to keep state of every single TCP and UDP session, but most transit network devices (apart from abominations like NAT) don’t care about those sessions, making Internet as fast as it is.

© Copyright ipSpace.net 2014

Page 5-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Decades ago we had a truly reliable system that kept session state in every single network node; it never lost a packet, but it barely coped with 2 Mbps links (the oldtimers might remember it as X.25). The state granularity should get ever coarser as you go deeper into the network core – edge switches keep MAC address tables and ARP/ND caches of adjacent end hosts, core routers know about IP subnets, routers in public Internet know about the publicly advertised prefixes (including every prefix Bell South ever assigned to one of its single-homed customers), while the high-speed MPLS routers know about BGP next hops and other forwarding equivalence classes (FECs)

WHICH DEVICE KEEPS THE STATE Well-designed architecture has complexity (and state) concentrated at the network edge. The core devices keep minimum state (example: IP subnets), while the edge devices keep session state. In a virtual network case, the hypervisors should know the VM endpoints (MAC addresses, IP addresses, virtual segments) and the physical devices just the hypervisor IP address, not the other way round. Furthermore, as much state as possible should be stored in low-speed devices using software-based forwarding. It’s pretty simple to store a million flows in software-based Open vSwitch (updating them is a different story) and mission-impossible to store 10.000 5-tuple flows in Trident 2 chipset used by most ToR switches.

© Copyright ipSpace.net 2014

Page 5-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

HOW IS STATE CREATED Systems with control-plane (proactive) state creation (example: routing table built from routing protocol information) are always more scalable than systems that have to react to data-plane events in real time (example: MAC address learning or NAT table maintenance). Data-plane-driven state is particularly problematic for devices with hardware forwarding – packets that change state (example: TCP SYN packets creating new NAT translation) might have to be punted to the CPU. Finally, there’s the “soft state” – cases where the protocol designers needed state in the network, but didn’t want to create a proper protocol to maintain it, so the end devices get burdened with periodic state refresh messages, and the transit devices spend CPU cycles refreshing the state. RSVP is a typical example, and everyone running large-scale MPLS/TE networks simply loves the periodic refresh messages sent by tunnel head-ends – they keep the core routers processing them cozily warm.

HOW OFTEN DOES STATE CHANGE Devices with slow-changing state (example: BGP routers) are clearly more stable than devices with fast-changing state (example: Carrier-Grade NAT).

© Copyright ipSpace.net 2014

Page 5-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

SUMMARY Whenever you’re evaluating a network architecture or reading a vendor whitepaper describing nextgeneration unicorn-tears-blessed solution, try to identify how much state individual components keep, how it’s created and how often it changes. Hardware devices storing plenty of state tend to be complex and expensive (keep that in mind when evaluating the next application-aware fabric). Not surprisingly, RFC 3429 (Some Internet Architectural Guidelines and Philosophy) gives you similar advice, although in way more eloquent form.

© Copyright ipSpace.net 2014

Page 5-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In the initial “What is OpenFlow” blog post I mentioned multi-table support and why it’s crucial to scalable OpenFlow implementation. It took me almost two years to write a follow-up blog post explaining the scalability problems of OpenFlow 1.0.

FLOW TABLE EXPLOSION WITH OPENFLOW 1.0 (AND WHY WE NEED OPENFLOW 1.3) The number of flows in hardware switches (dictated by the underlying TCAM size) is one of the major roadblocks in a large-scale OpenFlow deployment. Vendors are supposedly making progress, with Intel claiming up to 4000 12-tuple flow entries in their new Ethernet Switch FM6700 series. Is that good enough? As always, it depends. First, let’s put the “4000 flows” number in perspective. It’s definitely a bit better than what current commodity switches can do (for vendors trying to keep mum about their OpenFlow limitations, check their ACL sizes – flow entries would use the same TCAM), but NEC had 64.000+ flows on the PF5240 years ago and Enterasys has 64 million flows per box with their CoreFlow2 technology. Judge for yourself whether 4000 flows is such a major step forward. Now let’s focus on whether 4000 flows is enough. As always, the answer depends on the use case, network size and implementation details. This blog post will focus on the last part.

© Copyright ipSpace.net 2014

Page 5-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

USE CASE: DATA CENTER FABRIC The simplest possible data center use case is a traditional (non-virtualized) data center network implemented with OpenFlow (similar to what NEC is doing with their Virtual Tenant Networks). The OpenFlow-based network trying to get feature parity with low-cost traditional ToR switches should support 

Layer-2 and layer-3 forwarding;



Per-port or per-MAC ingress and egress access lists.

We’ll focus on a single layer-2 segment (you really don’t want to get me started on the complexities of scalable OpenFlow-based layer-3 forwarding) implemented on a single hardware switch. Our segment will have two web servers (port 1 and 2), a MySQL server (port 3), and a default gateway on port 4. The default gateway could be a firewall, a router, or a load balancer – it really doesn’t matter if we stay focused on layer-2 forwarding.

STEP 1: SIMPLE MAC-BASED FORWARDING The OpenFlow controller has to install a few forwarding rules in the switch to get the traffic started. Ignoring the multi-tenancy requirements you need a single flow forwarding rule per destination MAC address:

© Copyright ipSpace.net 2014

Page 5-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Flow match

Action

DMAC = Web-1

Forward to port 1

DMAC = Web-2

Forward to port 2

DMAC = MYSQL-1

Forward to port 3

DMAC = GW

Forward to port 4

It seems we don’t need much TCAM:

𝑁𝑓𝑙𝑜𝑤𝑠 = 𝑁𝑀𝐴𝐶 Smart switches wouldn’t store the MAC-only flow rules in TCAM; they would use other forwarding structures available in the switch like MAC hash tables.

STEP 2: MULTI-TENANT INFRASTRUCTURE If you want to implement multi-tenancy, you need multiple forwarding tables (like VRFs), which are not available in OpenFlow 1.0, or you have to add the tenant ID to the existing forwarding table. Traditional switches would do it in two steps: 

Mark inbound packets with VLAN tags;



Perform packet forwarding based on destination MAC address and VLAN tag.

© Copyright ipSpace.net 2014

Page 5-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Switches using OpenFlow 1.0 forwarding model cannot perform more than one operation during the packet forwarding process – they must match the input port and destination MAC address in a single flow rule, resulting in a flow table similar to this one: Flow match

Action

SrcPort = Port 2, DMAC = Web-1

Forward to port 1

SrcPort = Port 3, DMAC = Web-1

Forward to port 1

SrcPort = Port 4, DMAC = Web-1

Forward to port 1

SrcPort = Port 1, DMAC = Web-2

Forward to port 2

SrcPort = Port 3, DMAC = Web-2

Forward to port 2

SrcPort = Port 4, DMAC = Web-2

Forward to port 2

… The number of TCAM entries needed to support multi-tenant layer-2 forwarding has exploded:

𝑁𝑓𝑙𝑜𝑤𝑠 = ∑ 𝑁𝑀𝐴𝐶 × 𝑁𝑃𝑜𝑟𝑡𝑠

© Copyright ipSpace.net 2014

Page 5-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

STEP 3: ACCESS LISTS Let’s assume we want to protect the web servers with an input (server-to-switch) port ACL, which would look similar to this one: Flow match

Action

TCP SRC = 80

Permit

TCP SRC = 443

Permit

TCP DST = 53 & IP DST = DNS

Permit

TCP DST = 25 & IP DST = Mail

Permit

TCP DST = 3306 & IP DST = MySql

Permit

Anything else

Drop

By now you’ve probably realized what happens when you try to combine the input ACL with other forwarding rules. The OpenFlow controller has to generate a Cartesian product of all three requirements: the switch needs a flow entry for every possible combination of input port, ACL entry and destination MAC address.

𝑁𝑓𝑙𝑜𝑤𝑠 = ∑ 𝑁𝑀𝐴𝐶 × 𝑁𝑃𝑜𝑟𝑡𝑠 × 𝑁𝐴𝐶𝐿

© Copyright ipSpace.net 2014

Page 5-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OPENFLOW 1.3 TO THE RESCUE Is the situation really as hopeless as illustrated above? Of course not – smart people trying to implement real-life OpenFlow solutions quickly realized OpenFlow 1.0 works well only in PPT, lab tests, PoCs and glitzy demos, and started working on a solution. OpenFlow 1.1 (and later versions) have a concept of tables - independent ookup tables that can be chained in any way you wish (further complicating the life of hardware vendors). This is how you could implement our requirements with switches supporting OpenFlow 1.3: 

Table #1 – ACL and tenant classification table. This table would match input ports (for tenant classification) and ACL entries, drop the packets not matched by input ACLs, and redirect the forwarding logic to correct per-tenant table.



Table #2 .. #n – per-tenant forwarding tables, matching destination MAC addresses and specifying output ports. The first table could be further optimized in networks using the same (overly long) access list on numerous ports. That decision could also be made dynamically by the OpenFlow controller.

A typical switch would probably have to implement the first table with a TCAM. All the other tables could use the regular MAC forwarding logic (MAC forwarding table is usually orders of magnitude bigger than TCAM). Scalability problem solved. Summary: Buy switches and controllers that support OpenFlow 1.3

© Copyright ipSpace.net 2014

Page 5-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

BUT THERE ARE NO OPENFLOW 1.3-COMPLIANT SWITCHES ON THE MARKET Not true anymore. NEC is shipping OpenFlow 1.3 on their ProgrammableFlow switches, as does HP on its 5900- and 12500-series switches.

CAN WE STILL USE OPENFLOW 1.0 SWITCHES? Of course you can - either make sure the use case is small enough so the Cartesian product of your independent requirements fits into existing TCAM, or figure out which vendors have table-like extensions to OpenFlow 1.0 (hint: NEC does, or their VTN wouldn’t work in real-life networks).

© Copyright ipSpace.net 2014

Page 5-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

After you spend a few minutes researching the data sheets of existing OpenFlow-capable switches from major networking vendors it becomes painfully obvious that flow-based forwarding makes no sense on hardware switching platforms. Surprisingly, the virtual switches aren’t much better.

FLOW-BASED FORWARDING DOESN’T WORK WELL IN VIRTUAL SWITCHES I hope it’s obvious to everyone by now that flow-based forwarding doesn’t work well in existing hardware. Switches designed for large number of flow-like forwarding entries (NEC ProgrammableFlow switches, Enterasys data center switches and a few others) might be an exception, but even they can’t cope with the tremendous flow update rate required by reactive (flow-by-flow) flow setup ideas. One would expect virtual switches to fare better. That doesn’t seem to be the case.

A FEW DEFINITIONS FIRST Flow-based forwarding is sometimes defined as forwarding of individual transport-layer sessions (sometimes also called microflows). Numerous failed technologies are a pretty good proof that this approach doesn’t scale. Other people define flow-based forwarding as anything that is not destination-address-only forwarding. I don’t really understand how this definition differs from MPLS Forwarding Equivalence Class (FEC) and why we need a new confusing term.

© Copyright ipSpace.net 2014

Page 5-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

MICROFLOW FORWARDING IN OPEN VSWITCH Initial versions of Open vSwitch were a prime example of ideal microflow-based forwarding architecture: in-kernel forwarding module performed microflow forwarding and punted all unknown packets to the user-mode daemon. The user-mode daemon would then perform packet lookup (using OpenFlow forwarding entries or any other forwarding algorithm) and install a microflow entry for the newly discovered flow in the kernel module. Third parties (example: Midokura Midonet) use Open vSwitch kernel module in combination with their own user-mode agent to implement non-OpenFlow forwarding architectures.

If you’re old enough to remember the Catalyst 5000, you’re probably getting unpleasant flashbacks of Netflow switching … but the problems we experienced with that solution must have been caused by poor hardware and underperforming CPU, right? Well, it turns out virtual switches don’t fare much better. Digging deep into the bowels of Open vSwitch reveals an interesting behavior: flow eviction. Once the kernel module hits the maximum number of microflows, it starts throwing out old flows. Makes perfect sense – after all, that’s how every caching system works – until you realize the default limit is 2500 microflows, which is barely good enough for a single web server and definitely orders of magnitude too low for a hypervisor hosting 50 or 100 virtual machines.

© Copyright ipSpace.net 2014

Page 5-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHY, OH WHY? The very small microflow cache size doesn’t make any obvious sense. After all, web servers easily handle 10.000 sessions and some Linux-based load balancers handle an order of magnitude more sessions per server. While you can increase the default cache size, one’s bound to wonder what the reason for the dismally low default value is. I wasn’t able to figure out what the underlying root cause is, but I’m suspecting it has to do with per-flow accounting – flow counters have to be transferred from the kernel module to the user-mode daemon periodically. Copying hundreds of thousands of flow counters over a user-to-kernel socket at short intervals might result in “somewhat” noticeable CPU utilization.

HOW CAN YOU FIX IT? Isn’t it obvious? You drop the whole notion of microflow-based forwarding and do things the traditional way. OVS moved in this direction with release 1.11 which implemented megaflows (coarser OpenFlow-like forwarding entries) in kernel module, and moved flow eviction from kernel to user-mode OpenFlow agent (which makes perfect sense as kernel forwarding entries almost exactly match user-mode OpenFlow entries). Not surprisingly, no other virtual switch uses microflow-based forwarding. VMware vSwitch, Cisco’s Nexus 1000V and IBM’s 5000V make forwarding decisions based on destination MAC addresses, Hyper-V and Contrail based on destination IP addresses, and even VMware NSX for vSphere uses distributed vSwitch and in-kernel layer-3 forwarding module.

© Copyright ipSpace.net 2014

Page 5-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

After establishing the size of the problem, let’s move forward to the first scalability obstacle – controller-based packet forwarding. The review of an existing network platform behavior (Cisco IOS) might help you understand the challenges of large-scale OpenFlow implementations.

PROCESS, FAST AND CEF SWITCHING AND PACKET PUNTING Process switching is the oldest, simplest and slowest packet forwarding mechanism in Cisco IOS. Packets received on an interface trigger an interrupt, the interrupt handler identifies the layer-3 protocol based on layer-2 packet headers (example: Ethertype in Ethernet packets) and queues the packets to (user mode) packet forwarding processes (IP Input and IPv6 Input processes in Cisco IOS). Once the input queue of a packet forwarding process becomes non-empty, the operating system schedules it. When there are no higher-priority processes ready to be run, the operating system performs a context switch to the packet forwarding process. When the packet forwarding process wakes up, it reads the next entry from its input queue, performs destination address lookup and numerous other functions that might be configured on input and output interfaces (NAT, ACL ...), and sends the packet to the output interface queue. Not surprisingly, this mechanism is exceedingly slow ... and Cisco IOS is not the only operating system struggling with that – just ask anyone who tried to run high-speed VPN tunnels implemented in Linux user mode processes on SOHO routers.

© Copyright ipSpace.net 2014

Page 5-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Interrupt switching (packet forwarding within the interrupt handler) is much faster as it doesn’t involve context switching and potential process preemption. There’s a gotcha, though – if you spend too much time in an interrupt handler, the device becomes non-responsive, starts adding unnecessary latency to forwarded packets, and eventually starts dropping packets due to receive queue overflows (You don’t believe me? Configure debug all on the console interface of a Cisco router). There’s not much you can do to speed up ACLs (which have to be read sequentially) and NAT is usually not a big deal (assuming the programmers were smart enough to use hash tables). Destination address lookup might be a real problem, more so if you have to do it numerous times (example: destination is a BGP route with BGP next hop based on static route with next hop learnt from OSPF). Welcome to fast switching. Fast switching is a reactive cache-based IP forwarding mechanism. The address lookup within the interrupt handler uses a cache of destinations to find the IP next hop, outgoing interface, and outbound layer-2 header. If the destination is not found in the fast switching cache, the packet is punted to the IP(v6) Input process, which eventually performs full-blown destination address lookup (including ARP/ND resolution) and stores the results in the fast switching cache. Fast switching worked great two decades ago (there were even hardware implementations of fast switching) ... until the bad guys started spraying the Internet with vulnerability scans. No caching code works well with miss rates approaching 100% (because every packet is sent to a different destination) and very high cache churn (because nobody designed the cache to have 100.000 or more entries). When faced with a simple host scanning activity, routers using fast switching in combination with high number of IP routes (read: Internet core routers) experienced severe brownouts because most

© Copyright ipSpace.net 2014

Page 5-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

of the received packets had destination addresses that were not yet in the fast switching cache, and so the packets had to be punted to process switching. Welcome to CEF switching. CEF switching (or Cisco Express Forwarding) is a proactive, deterministic IP forwarding mechanism. Routing table (RIB) as computed by routing protocols is copied into forwarding table (FIB), where it’s combined with adjacency information (ARP or ND table) to form a deterministic lookup table. When a router uses CEF switching, there’s (almost) no need to punt packets sent to unknown destinations to IP Input process; if a destination is not in the FIB, it does not exist. There are still cases where CEF switching cannot do its job. For example, packets sent to IP addresses on directly connected interfaces cannot be sent to destination hosts until the router performs ARP/ND MAC address resolution; these packets have to be sent to the IP Input process. The directly connected prefixes are thus entered as glean adjacencies in the FIB, and as the router learns MAC address of the target host (through ARP or ND reply), it creates a dynamic host route in the FIB pointing to the adjacency entry for the newly-discovered directly-connected host. Actually, you wouldn’t want to send too many packets to the IP Input process; it’s better to create the host route in the FIB (pointing to the bit bucket, /dev/null or something equivalent) even before the ARP/ND reply is received to ensure subsequent packets sent to the same destination are dropped, not punted – behavior nicely exploitable by ND exhaustion attack. It’s pretty obvious that the CEF table must stay current. For example, if the adjacency information is lost (due to ARP/ND aging), the packets sent to that destination are yet again punted to the process switching. No wonder the router periodically refreshes ARP entries to ensure they never expire.

© Copyright ipSpace.net 2014

Page 5-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Controller-based packet forwarding in an OpenFlow implementation is almost exactly like process switching in Cisco IOS. Here are the details:

CONTROLLER-BASED PACKET FORWARDING IN OPENFLOW NETWORKS One of the attendees of the ProgrammableFlow webinar sent me an interesting observation: Though there is separate control plane and separate data plane, it appears that there is crossover from one to the other. Consider the scenario when flow tables are not programmed and so the packets will be punted by the ingress switch to PFC. The PFC will then forward these packets to the egress switch so that the initial packets are not dropped. So in some sense: we are seeing packet traversing the boundaries of typical data-plane and control-plane and vice-versa. He’s absolutely right, and if the above description reminds you of fast and process switching you’re spot on. There really is nothing new under the sun. OpenFlow controllers use one of the following two approaches to switch programming (more details @ NetworkStatic): 

Proactive flow table setup, where the controller downloads flow entries into the switches based on user configuration (ex: ports, VLANs, subnets, ACLs) and network topology;

© Copyright ipSpace.net 2014

Page 5-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Reactive flow table setup (or flow-driven forwarding), where the controller downloads flow entries into the switches based on the unknown traffic the OpenFlow switches forward to the controller.

Even though I write about flow tables, don’t confuse them with per-flow forwarding that Doug Gourlay loves almost as much as I do. A flow entry might match solely on destination MAC address, making flow tables equivalent to MAC address tables, or it might match the destination IP address with the longest IP prefix in the flow table, making the flow table equivalent to routing table or FIB. The controller must know the topology of the network and all the endpoint addresses (MAC addresses, IP addresses or IP subnets) for the proactive (predictive?) flow setup to work. If you’d have an OpenFlow controller emulating OSPF or BGP router, it would be easy to use proactive flow setup; after all, the IP routes never change based on the application traffic observed by the switches. Intra-subnet L3 forwarding is already a different beast. One could declare ARP/ND to be an authoritative control-plane protocol (please don’t get me started on the shortcomings of ARP and whether ES-IS would be a better solution) in which case you could use proactive flow setup to create host routes toward IP hosts (using an approach similar to Mobile ARP – what did I just say about nothing being really new?). However, most vendors’ marketing departments (with a few notable exceptions) think their gear needs to support every bridging-abusing stupidity ever invented, from load balancing schemes that work best with hubs to floating IP or MAC addresses used to implement high-availability solutions. End result: the network has to support dynamic MAC learning, which makes OpenFlow-based networks reactive – nobody can predict when and where a new MAC address will appear (and it’s not guaranteed that the first packet sent from the new MAC address will be an ARP packet), so the

© Copyright ipSpace.net 2014

Page 5-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

switches have to send user traffic with unknown source or destination MAC addresses to the controller, and we’re back to packet punting. Some bridges (lovingly called layer-2 switches) don’t punt packets with unknown MAC addresses to the CPU, but perform dynamic MAC address learning and unknown unicast flooding is in hardware... but that’s not how OpenFlow is supposed to work. Within a single device the software punts packet from hardware (or interrupt) switching to CPU/process switching, in a controller-based network the switches punt packet to the controller. Plus ça change, plus c'est la même chose.

© Copyright ipSpace.net 2014

Page 5-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Packets punted to the controller from the data plane of an OpenFlow switch represent a significant burden on the switch CPU. Large amount of punted packets (triggered, for example, by an address scan) can easily result in a denial-of-service attack. It’s time to reinvent another wheel: controlplane policing (CoPP).

CONTROL-PLANE POLICING IN OPENFLOW NETWORKS The Controller-Based Packet Forwarding in OpenFlow Networks post generated the obvious question: “does that mean we need some kind of Control-Plane Protection (CoPP) in OpenFlow controller?” Of course it does, but things aren’t as simple as that. The weakest link in today’s OpenFlow implementations (like NEC’s ProgrammableFlow) is not the controller, but the dismal CPU used in the hardware switches. The controller could handle millions packets per second (that’s the flow setup rate claimed by Floodlight developers), the switches usually burn out at thousands of flow setups per second. The CoPP function thus has to be implemented in the OpenFlow switches (like it’s implemented in linecard hardware in traditional switches), and that’s where the problems start – OpenFlow doesn’t have a usable rate-limiting functionality till version 1.3, which added meters. OpenFlow meters are a really cool concept – they have multiple bands, and you can apply either DSCP remarking or packet dropping at each band – that would allow an OpenFlow controller to closely mimic the CoPP functionality and apply different rate limits to different types of control- or punted traffic.

© Copyright ipSpace.net 2014

Page 5-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Unfortunately, only a few hardware switches available on the market supports OpenFlow 1.3 yet, and some of them might not support meters (or meters on flows sent to the controller). In the meantime, proprietary extensions galore – NEC used one to limit unicast flooding in its ProgrammableFlow switches.

© Copyright ipSpace.net 2014

Page 5-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Time to move forward to another scalability roadblock: the number of flows you can install in a hardware device per second. This limitation has nothing to do with OpenFlow; the choke point is the communication path between the switch CPU and the forwarding hardware. Traditional switches and routers had the same problems and solved them with Prefix Independent Convergence.

PREFIX-INDEPENDENT CONVERGENCE (PIC): FIXING THE FIB BOTTLENECK Did you rush to try OSPF Loop Free Alternate on a Cisco 7200 after reading my LFA blog post ... and disappointedly discovered that it only works on Cisco 7600? The reason is simple: while LFA does add feasible-successor-like behavior to OSPF, its primary mission is to improve RIB-to-FIB convergence time. If you want to know more details, I would strongly suggest you browse through the IP Fast Reroute Applicability presentation Pierre Francois had @ EuroNOG 2011. To summarize what he told us: 

It’s relatively easy to fine-tune OSPF or IS-IS and get convergence times in tens of milliseconds. SPF runs reasonably fast on modern processors, more so with incremental SPF optimizations.



A platform using software-based switching can use the SPF results immediately (thus there’s no real need for LFA on a Cisco 7200).



The true bottleneck is the process of updating distributed forwarding tables (FIBs) from the IP routing table (RIB) on platforms that use hardware switching. That operation can take a relatively long time if you have to update many prefixes.

© Copyright ipSpace.net 2014

Page 5-34

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The generic optimization of the RIB-to-FIB update process is known as Prefix-Independent Convergence (PIC) – if the routing protocols can pre-compute alternate paths, suitably designed FIB can use that information to cache alternate next hops. Updating such a FIB no longer involves numerous updates to individual prefixes; you have to change only the next hop reachability information. PIC was first implemented for BGP (you can find more details, including interesting discussions of FIB architectures, in another presentation Pierre Francois had @ EuroNOG), which usually carries hundreds of thousands of prefixes that point to a few tens of different next hops. It seems some Service Providers carry way too many routes in OSPF or IS-IS, so it made sense to implement LFA for those routing protocols as well. In its simplest form, BGP PIC goes a bit beyond exiting EBGP/IBGP multipathing and copies backup path information into RIB and FIB. Distributing alternate paths throughout the network requires numerous additional tweaks, from modified BGP path propagation rules to modified BGP route reflector behavior.

© Copyright ipSpace.net 2014

Page 5-35

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Adding support for OpenFlow on an existing switch doesn’t change the underlying hardware. OpenFlow agent on a hardware device has to deal with the same challenges as the traditional control-plane software.

FIB UPDATE CHALLENGES IN OPENFLOW NETWORKS Last week I described the problems high-end service provider routers (or layer-3 switches if you prefer that terminology) face when they have to update large number of entries in the forwarding tables (FIBs). Will these problems go away when we introduce OpenFlow into our networks? Absolutely not, OpenFlow is just another mechanism to download forwarding entries (this time from an external controller) not a laws-of-physics-changing miracle. NEC, the only company I’m aware of that has production-grade OpenFlow deployments and is willing to talk about them admitted as much in their Networking Tech Field Day 2 presentation (watch the ProgrammableFlow Architecture and Use Cases video around 12:00). Their particular controller/switch combo can set up 600-1000 flows per switch per second (which is still way better than what researchers using HP switches found and documented in the DevoFlow paper – they found the switches can set up ~275 flows per second). Now imagine a core of a simple L2 network built from tens of switches and connecting hundreds of servers and thousands of VMs. Using traditional L2 forwarding techniques, each switch would have to know the MAC address of each VM ... and the core switches would have to update thousands of entries after a link failure, resulting in multi-second convergence time. Obviously OpenFlow-based networks need prefix-independent convergence (PIC) as badly as anyone else.

© Copyright ipSpace.net 2014

Page 5-36

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 5-1: Core link failure in an OpenFlow network

OpenFlow 1.0 could use flow matching priorities to implement primary/backup forwarding entries and OpenFlow 1.1 provides a fast failover mechanism in its group tables that could be used for prefix-independent convergence, but it's questionable how far you can get with existing hardware devices, and PIC doesn't work in all topologies anyway. Just in case you’re wondering how existing L2 networks work at all –data plane in highspeed switches performs dynamic MAC learning and populates the forwarding table in hardware; the communication between the control and the data plane is limited to the bare minimum (which is another reason why implementing OpenFlow agents on existing switches is like attaching a jetpack to a camel).

© Copyright ipSpace.net 2014

Page 5-37

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Is there another option? Sure – it’s called forwarding state abstraction, or for those more familiar with MPLS terminology Forwarding Equivalence Class (FEC). While you might have thousands of servers or VMs in your network, you have only hundreds of possible paths between switches. The trick every single OpenFlow controller vendor has to use is to replace endpoint-based forwarding entries in the core switches with path-indicating forwarding entries. Welcome back to virtual circuits and BGP-free MPLS core. It’s amazing how the old tricks keep resurfacing in new disguises every few years.

© Copyright ipSpace.net 2014

Page 5-38

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Forwarding state abstraction (known as Forwarding Equivalence Classes in MPLS lingo) is the only way toward scalable OpenFlow fabrics. The following blog post (written in February 2012) has some of the details:

FORWARDING STATE ABSTRACTION WITH TUNNELING AND LABELING Yesterday I described how the limited flow setup rates offered by most commercially-available switches force the developers of production-grade OpenFlow controllers to drop the microflow ideas and focus on state abstraction (people living in a dreamland usually go in a totally opposite direction). Before going into OpenFlow-specific details, let’s review the existing forwarding state abstraction technologies.

A MOSTLY THEORETICAL DETOUR Most forwarding state abstraction solutions that I’m aware of use a variant of Forwarding Equivalence Class (FEC) concept from MPLS: 

All the traffic that expects the same forwarding behavior gets the same label;



The intermediate nodes no longer have to inspect the individual packet/frame headers; they forward the traffic solely based on the FEC indicated by the label.

The grouping/labeling operation thus greatly reduces the forwarding state in the core nodes (you can call them P-routers, backbone bridges, or whatever other terminology you prefer) and improves

© Copyright ipSpace.net 2014

Page 5-39

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

the core network convergence due to significantly reduced number of forwarding entries in the core nodes.

Figure 5-2: MPLS forwarding diagram from the Enterprise MPLS/VPN Deployment webinar

The core network convergence is improved due to reduced state not due to pre-computed alternate paths that Prefix-Independent Convergence or MPLS Fast Reroute uses.

FROM THEORY TO PRACTICE There are two well-known techniques you can use to transport traffic grouped in a FEC across the network core: tunneling and virtual circuits (or Label Switched Paths if you want to use non-ITU terminology).

© Copyright ipSpace.net 2014

Page 5-40

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

When you use tunneling, the FEC is the tunnel endpoint – all traffic going to the same tunnel egress node uses the same tunnel destination address. All sorts of tunneling mechanisms have been proposed to scale layer-2 broadcast domains and virtualized networks (IP-based layer-3 networks scale way better by design): 

Provider Backbone Bridges (PBB – 802.1ah), Shortest Path Bridging-MAC (SPBM – 802.1aq) and vCDNI use MAC-in-MAC tunneling – the destination MAC address used to forward user traffic across the network core is the egress bridge or the destination physical server (for vCDNI).

Figure 5-3: SPBM forwarding diagram from the Data Center 3.0 for Networking Engineers webinar



VXLAN, NVGRE and GRE (used by Open vSwitch) use MAC-over-IP tunneling, which scales way better than MAC-over-MAC tunneling because the core switches can do another layer of state abstraction (subnet-based forwarding and IP prefix aggregation).

© Copyright ipSpace.net 2014

Page 5-41

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 5-4: Typical VXLAN architecture from the Introduction to Virtual Networking webinar



TRILL is closer to VXLAN/NVGRE than to SPB/vCDNI as it uses full L3 tunneling between TRILL endpoints with L3 forwarding inside RBridges and L2 forwarding between RBridges.

Figure 5-5: TRILL forwarding diagram from the Data Center 3.0 for Networking Engineers webinar

© Copyright ipSpace.net 2014

Page 5-42

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

With tagging or labeling a short tag is attached in front of the data (ATM VPI/VCI, MPLS label stack on point-to-point links) or somewhere in the header (VLAN tags) instead of encapsulating the user’s data into a full L2/L3 header. The core network devices perform packet/frame forwarding based exclusively on the tags. That’s how SPBV, MPLS and ATM work.

Figure 5-6: MPLS-over-Ethernet frame format from the Enterprise MPLS/VPN Deployment webinar

MPLS-over-Ethernet commonly used in today’s high-speed networks is an abomination as it uses both L2 tunneling between adjacent LSRs and labeling ... but that’s what you get when you have to reuse existing hardware to support new technologies.

© Copyright ipSpace.net 2014

Page 5-43

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

A few months after I wrote the “Forwarding State Abstraction” blog post, Martin Casado and his team presented an article with similar ideas at the HotSDN conference. Here’s my summary of that article (written in August 2012):

EDGE AND CORE OPENFLOW (AND WHY MPLS IS NOT NAT) More than a year ago, I explained why end-to-end flow-based forwarding doesn’t scale (and Doug Gourlay did the same using way more colorful language) and what the real-life limitations are. Not surprisingly, the gurus that started the whole OpenFlow movement came to the same conclusions and presented them at the HotSDN conference in August 2012 ... but even that hasn’t stopped some people from evangelizing the second coming.

THE PROBLEM Contrary to what some pundits claim, flow-based forwarding will never scale. If you’ve been around long enough to experience ATM-to-the-desktop failure, Multi-Layer Switching (MLS) kludges, demise of end-to-end X.25, or the cost of traditional circuit switching telephony, you know what I’m talking about. If not, supposedly it’s best to learn from your own mistakes – be my guest. Before someone starts Moore Law incantations: software-based forwarding will always be more expensive than predefined hardware-based forwarding. Yes, you can push tens of gigabits through a highly optimized multi-core Intel server. You can also push 1,2Tbps through Broadcom chipset at

© Copyright ipSpace.net 2014

Page 5-44

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

comparable price. The ratios haven’t changed much in the last decades, and I don’t expect them to change in the near future.

SCALABLE ARCHITECTURES The scalability challenges of flow-based forwarding have been well understood (at least within IETF, ITU is living on a different planet) decades ago. That’s why we have destination-only forwarding, variable-length subnet masks and summarization, and Diffserv (with a limited number of traffic classes) instead of Intserv (with per-flow QoS). The limitations of destination-only hop-by-hop forwarding were also well understood for at least two decades and resulted in MPLS architecture and various MPLS-based applications (including MPLS Traffic Engineering). There’s a huge difference between MPLS TE forwarding mechanism (which is the right tool for the job), and distributed MPLS TE control plane (which sucks big time). Traffic engineering is ultimately an NP-complete knapsack problem best solved with centralized end-to-end visibility. MPLS architecture solves the forwarding rigidity problems while maintaining core network scalability by recognizing that while each flow might be special, numerous flows share the same forwarding behavior. Edge MPLS routers (edge LSR) thus sort the incoming packets into forwarding equivalence classes (FEC), and use a different Label Switched Path (LSP) across the network for each of the forwarding classes.

© Copyright ipSpace.net 2014

Page 5-45

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Please note that this is a gross oversimplification. I’m trying to explain the fundamentals, and (following a great example of physicists) ignore all the details... oops, take the ideal case. The simplest classification implemented in all MPLS-capable devices today is destination prefix-based classification (equivalent to traditional IP forwarding), but there’s nothing in MPLS architecture that would prevent you from using N-tuples to classify the traffic based on source addresses, port numbers, or any other packet attribute (yet again, ignoring the reality of having to use PBR with the infinitely disgusting route-map CLI to achieve that).

MPLS IS JUST A TOOL Always keep in mind that every single network technology is a tool, not a solution (some of them might be solutions looking for a problem, but that’s another story), and some tools are more useful in some scenarios than others ... which still doesn’t make them good or bad, but applicable or inapplicable. Also, after more than a decade of tinkering, the vendor MPLS implementations leave a lot to be desired. If you hate a particular vendor’s CLI or implementation kludges, blame them, not the technology.

© Copyright ipSpace.net 2014

Page 5-46

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

EDGE AND CORE OPENFLOW After this short MPLS digression, let’s come back to the headline topic. Large-scale OpenFlow-based solutions face two significant challenges: 

It’s hard to build resilient networks with centralized control plane and unreliable transport between the controller and controlled devices (this problem was well known in the days of Frame Relay and ATM);



You must introduce layers of abstraction in order to scale the network.

Martin Casado, Teemu Koponen, Scott Shenker and Amin Tootoonchian addressed the second challenge in their Fabric: A Retrospective on Evolving SDN paper, where they propose two layers in an SDN architectural framework: 

Edge switches, which classify the packets, perform network services, and send the packets across core fabric toward the egress edge switch;



Core fabric, which provides end-to-end transport.

Not surprisingly, they’re also proposing to use MPLS labels as the fabric forwarding mechanism.

WHERE’S THE BEEF? The fundamental difference between typical MPLS networks we have today and the SDN Fabric proposed by Martin Casado et al. is the edge switch control/management plane: FEC classification is downloaded into the edge switches through OpenFlow (or some similar mechanism).

© Copyright ipSpace.net 2014

Page 5-47

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Existing MPLS implementations or protocols have no equivalent mechanism, and a mechanism for a consistent implementation of a distributed network edge policy would be highly welcome (all of my enterprise OpenFlow use cases fall into this category).

FINALLY, IS MPLS NAT? Now that we’ve covered MPLS fundamentals, I have to mention another pet peeve that annoys me: let’s see why it’s ridiculous to compare MPLS to NAT. As explained above, MPLS edge routers classify ingress packets into FECs, and attach a label signifying the desired treatment to each of the packet. The original packet is not changed in any way; any intermediate node can get the raw packet content if needed. NAT, on the other hand, always changes the packet content (at least the layer-3 addresses, sometimes also layer-4 port numbers), or it wouldn’t be NAT. NAT breaks transparent end-to-end connectivity, MPLS doesn’t. MPLS is similar to lossless compression (ZIP), NAT is similar to lossy compression (JPEG). Do I need to say more?

© Copyright ipSpace.net 2014

Page 5-48

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Using Forwarding Equivalence Classes (FECs) and path-based forwarding in an OpenFlow network results in another simplification: core switches don’t have to support the same rich functionality as the edge switches.

EDGE PROTOCOL INDEPENDENCE: ANOTHER BENEFIT OF EDGE-AND-CORE LAYERING I asked Martin Casado to check whether I correctly described his HotSDN’12 paper in my Edge and Core OpenFlow post, and he replied with another interesting observation: The (somewhat nuanced) issue I would raise is that [...] decoupling [also] allows evolving the edge and core separately. Today, changing the edge addressing scheme requires a wholesale upgrade to the core. The 6PE architecture (IPv6 on the edge, MPLS in the core) is a perfect example of this concept.

WHY DOES IT MATTER? Traditional scalable network designs always have at least two layers: access or aggregation layer, where most of the network services are performed, and core layer, that provides high-speed transport across a stable network core. In IP-only networks, the core and access routers (aka layer-3 switches) share the same forwarding mechanism (ignoring the option of having default routing in the access layer); if you want to

© Copyright ipSpace.net 2014

Page 5-49

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

introduce a new protocol (example: IPv6) you have to deploy it on every single router throughout the network, including all core routers. On the other hand, you can introduce IPv6, IPX or AppleTalk (not really), or anything else in an MPLS network, without upgrading the core routers. The core routers continue to provide a single function: optimal transport based on MPLS paths signaled by the edge routers (either through LDP, MPLS-TE, MPLS-TP or more creative approaches, including NETCONF-configured static MPLS labels). The same ideas apply to OpenFlow-configured networks. The edge devices have to be smart and support a rich set of flow matching and manipulation functionality; the core (fabric) devices have to match on simple packet tags (VLAN tags, MAC addresses with PBB encapsulation, MPLS tags ...) and provide fast packet forwarding.

IS THIS AN IVORY TOWER DREAM? Apart from MPLS, there are several real-life SDN implementations of this concept: 

Nicira’s NVP is providing virtual networking functionality in OpenFlow-controlled hypervisor switches that use simple IP transport (with STT or GRE encapsulation) across the network core;



Microsoft’s Hyper-V Network Virtualization uses a similar architecture with PowerShell instead of OpenFlow/OVSDB as the hypervisor configuration API;



NEC’s ProgrammableFlow solution uses PF5420 (with 160K OpenFlow entries) at the edge and PF5820 (with 750 full OpenFlow entries and 80K MAC entries) at the core.

Before you mention (multicast-based) VXLAN in the comments: I fail to see something softwaredefined in a technology that uses flooding to learn dynamic VM-MAC-to-VTEP-IP mappings.

© Copyright ipSpace.net 2014

Page 5-50

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The idea of edge and core OpenFlow makes perfect sense, but OpenFlow 1.0 doesn’t support MPLS. Could we use something else to make it work? The following blog post was written in February 2012; in summer 2014 I inserted a few comments to illustrate how we got nowhere in more than two years.

VIRTUAL CIRCUITS IN OPENFLOW 1.0 WORLD Two days ago I described how you can use tunneling or labeling to reduce the forwarding state in the network core (which you have to do if you want to have reasonably fast convergence with currently-available OpenFlow-enabled switches). Now let’s see what you can do in the very limited world of OpenFlow 1.0 (which is what most OpenFlow-enabled switches shipping in summer 2014 support).

OPENFLOW 1.0 DOES NOT SUPPORT TUNNELING OF ANY SORT Open vSwitch (OpenFlow-capable soft switch running on Linux/Xen/KVM) can use GRE tunnels to exchange MAC frames between hypervisor hosts across an IP backbone, but cannot use OpenFlow to provision those tunnels – it uses Open vSwitch Database to get its configuration information (including GRE tunnel definitions). After the GRE tunnels have been created, they appear as regular interfaces within the Open vSwitch; an OpenFlow controller can use them in flow entries to push user packets across GRE tunnels to other hypervisor hosts.

© Copyright ipSpace.net 2014

Page 5-51

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Tunneling support within existing OpenFlow-enabled data center switches is virtually non-existent (Juniper’s MX routers with OpenFlow add-on might be an exception), primarily due to hardware constraints. We will probably see VXLAN/NVGRE/GRE implementations in data center switches in the next few months, but I expect most of those implementations to be software-based and thus useless for anything else but a proof-of-concept (August 2014: no major data center switching vendor supports OpenFlow over any tunneling technology). Cisco already has VXLAN-capable chipset in the M-series linecards; believers in merchant silicon will have to wait for the next-generation chipsets (August 2014: Broadcom’s and Intel’s chipsets support VXLAN, but so far no vendor shipped VXLAN termination that would work with OpenFlow).

OPENFLOW 1.0 HAS LIMITED LABELING FUNCTIONALITY MPLS support was added to OpenFlow in release 1.1 and while MPLS-capable hardware devices could use MPLS labeling with OpenFlow, there aren’t many devices that would support both MPLS and OpenFlow today (yet again, talk to Juniper). Forget MPLS for the moment. VLAN stacking was also introduced in OpenFlow 1.1. While it would be a convenient labeling mechanism (similar to SPBV, but with a different control plane), many data center switches don’t support Q-in-Q (802.1ad). No VLAN stacking today. The only standard labeling mechanism left to OpenFlow-enabled switches is thus VLAN tagging (OpenFlow 1.0 supports VLAN tagging, VLAN translation and tag stripping). You could use VLAN tags to build virtual circuits across the network core (similar to what MPLS labels do) and the source-

© Copyright ipSpace.net 2014

Page 5-52

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

destination-MAC combination at the egress node to recreate the original VLAN tag, but the solution is messy, hard to troubleshoot, and immense fun to audit. But wait, it gets worse.

THE REALITY I had the virtual circuits discussion with multiple vendors during the OpenFlow symposium and Networking Tech Field Day and we always came to the same conclusions: 

Forwarding state abstraction is mandatory;



OpenFlow 1.0 has very limited functionality;



Standard tagging/tunneling mechanisms are almost useless due to hardware/OpenFlow limitations (see above);



Everyone uses their own secret awesomesauce to solve the problem ... often with proprietary OpenFlow extensions.

Someone was also kind enough to give me a hint that solved the secret awesomesauce riddle: “We can use any field in the frame header in any way we like.” Looking at the OpenFlow 1.0 specs (assuming no proprietary extensions are used) you can rewrite source and destination MAC addresses to indicate whatever you wish – you have 96 bits to work with. Assuming the hardware devices support wildcard matches on MAC addresses (either by supporting OpenFlow 1.1 or a proprietary extension to OpenFlow 1.0), you could use the 48 bits of the destination MAC address to indicate egress node, egress port, and egress MAC address.

© Copyright ipSpace.net 2014

Page 5-53

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

I might have doubts about the VLAN translation mechanism described in the previous paragraph (I am positive many security-focused engineers will have doubts), but the reuse header fields approach is even more interesting to support. How can you troubleshoot a network if you never know what the source/destination MAC addresses really mean?

SUMMARY Before buying an OpenFlow-based data center network, figure out what the vendors are doing (they will probably ask you to sign an NDA, which is fine), including: 

What are the mechanisms used to reduce forwarding state in the OpenFlow-based network core?



What’s the actual packet format used in the network core (or: how are the fields in the packet header really used?)



Will you be able to use standard network analysis tools to troubleshoot the network?



Which version of OpenFlow are they using?



Which proprietary extensions are they using (or not using)?



Which switch/controller combinations are tested and fully supported?

© Copyright ipSpace.net 2014

Page 5-54

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Let’s conclude the “forwarding scalability” part of this chapter with a slightly irrelevant detour: is MPLS tunneling?

MPLS IS NOT TUNNELING Greg (@etherealmind) Ferro started an interesting discussion on Google+, claiming MPLS is just tunneling and a duct tape like NAT. I would be the first one to admit MPLS has its complexities and shortcomings, but calling it a tunnel just confuses the innocents. MPLS is not tunneling, it’s a virtualcircuits-based technology, and the difference between the two is a major one. You can talk about tunneling when a protocol that should be lower in the protocol stack gets encapsulated in a protocol that you’d usually find above or next to it. MAC-in-IP, IPv6-in-IPv4, IPover-GRE-over-IP, MAC-over-VPLS-over-MPLS-over-GRE-over-IPsec-over-IP ... these are tunnels. IP-over-MPLS-over-PPP/Ethernet is not tunneling, just like IP-over-LLC1-over-TokenRing or IP-overX.25-over-LAPD wasn’t. It is true, however, that MPLS uses virtual circuits, but they are not identical to tunnels. Just because all packets between two endpoints follow the same path and the switches in the middle don’t inspect their IP headers, doesn’t mean you use a tunneling technology. One-label MPLS is (almost) functionally equivalent to two well-known virtual circuit technologies: ATM or Frame Relay (that was also its first use case). However, MPLS-based networks scale better than those using ATM or Frame Relay because of two major improvements: Automatic setup of virtual circuits based on network topology (core IP routing information), both between the core switches and between the core (P-routers) and edge (PE-routers) devices. Unless

© Copyright ipSpace.net 2014

Page 5-55

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

configured otherwise, IP routing protocol performs topology autodiscovery and LDP establishes a full mesh of virtual circuits across the core. VC merge: Virtual circuits from multiple ingress points to the same egress point can merge within the network. VC merge significantly reduces the overall number of VCs (and the amount of state the core switches have to keep) in fully meshed networks. It’s interesting to note that ITU wants to cripple MPLS to the point of being equivalent to ATM/Frame Relay. MPLS-TP introduces out-of-band management network and management plane-based virtual circuit establishment.

DOES IT MATTER? It might seem like I’m splitting hair just for the fun of it, but there’s a significant scalability difference between virtual circuits and tunnels: devices using tunnels appear as hosts to the underlying network and require no in-network state, while solutions using virtual circuits (including MPLS) require per-VC state entries (MPLS: inbound-to-outbound label mapping in LFIB) on every forwarding device in the path. Even more, end-to-end virtual circuits (like MPLS TE) require state maintenance (provided by periodic RSVP signaling in MPLS TE) involving every single switch in the VC path. You can find scalability differences even within the MPLS world: MPLS/VPN-over-mGRE (tunneling) scales better than pure label-based MPLS/VPN (virtual circuits) because MPLS/VPN-over-mGRE relies on IP transport and not on end-to-end LSPs between PE-routers. You can summarize loopback addresses if you use MPLS/VPN-over-mGRE; doing the same in end-to-end-LSP-based MPLS/VPN networks breaks them. L2TPv3 scales better than AToM for the same reason.

© Copyright ipSpace.net 2014

Page 5-56

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

All VC-based solutions require a signaling protocol between the end devices and the core switches (or an out-of-band layer-8+ communication and management-plane provisioning). Two common protocols used in MPLS networks are LDP (for IP routing-based MPLS) and RSVP (for traffic engineering). Secure and scalable inter-domain signaling protocols are rare; VC-based solutions are thus usually limited to a single management domain (state explosion is another problem that limits the size of a VC-based network). The only global networks using on-demand virtual circuits were the telephone system and X.25; one of them already died because of its high per-bit costs, and the other one is surviving primarily because we’re replacing virtual circuits (TDM voice calls) with tunnels (VoIP).

TANGENTIAL AFTERTHOUGHTS Don’t be sloppy with your terminology. There’s a reason we use different terms to indicate different behavior – it helps us understand the implications (ex: scalability) of the technology. For example, it’s important to understand why bridging differs from routing and why it’s wrong to call them both switching, and it helps if you understand that Fibre Channel actually uses routing (hidden deep inside switching terminology).

© Copyright ipSpace.net 2014

Page 5-57

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Based on all the limitations documented in this chapter, it’s easy to see why nobody tries to use OpenFlow to solve problems that reside above the transport layer (the following blog post has been written in autumn of 2012; nothing has changed in the meantime).

WHY IS OPENFLOW FOCUSED ON L2-4? Another great question I got from David Le Goff: So far, SDN is relying or stressing mainly the L2-L3 network programmability (switches and routers). Why are most of the people not mentioning L4-L7 network services such as firewalls or ADCs. Why would those elements not have to be SDNed with an OpenFlow support for instance? To understand the focus on L2/L3 switching, let’s go back a year and a half to the laws-of-physicschanging big bang event. OpenFlow started as a research project used by academics working on clean-slate network architectures, and it was not the first or the only approach to distributed control/data plane architecture (for more details, watch Ed Crabbe’s presentation from the OpenFlow Symposium). However, suddenly someone felt the great urge to get OpenFlow monetized, had to invent a fancy name, and thus SDN was born. The main proponents of OpenFlow/SDN (in the Open Networking Foundation sense) are still the Googles of the world and what they want is the ability to run their own control-plane on top of commodity switching hardware. They don't care that much about L4-7 appliances, or people who’d

© Copyright ipSpace.net 2014

Page 5-58

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

like to program those appliances from orchestration software. They have already solved the L4-7 appliance problem with existing open-source tools running on commodity x86 hardware.

DOES OPENFLOW/SDN MAKE SENSE IN L4-7 WORLD? It makes perfect sense to offer programmable APIs in L4-7 appliances, and an ever-increasing number of vendors is doing that, from major vendors like F5’s Open API to startups like Embrane and LineRate Systems. However, appliance configuration and programming is a totally different problem that cannot be solved with OpenFlow. OpenFlow is not a generic programming language but a simple protocol that allows you to download forwarding information from controller to data plane residing in a networking element.

IS OPENFLOW STILL USEFUL IN L4-7 WORLD? If you really want to use OpenFlow to implement a firewall or a load balancer (not that it’s always a good idea), you can use the same architecture Cisco used to implement fast path in its Virtual Security Gateway (VSG) firewalls: send all traffic to the central controller, until the controller decides it has enough information to either block or permit the flow, at which time the flow information (5-tuple) is installed in the forwarding elements. Does this sound like Multi-Layer Switching, the technology every Catalyst 5000 user loved to death? Sure it does. Does it make sense? Well, it failed miserably the first time, but maybe we’ll get luckier with the next attempt.

© Copyright ipSpace.net 2014

Page 5-59

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Does it make sense to use OpenFlow on virtual switches, or is it usability limited to hardware devices? I tried to give a few hints in July 2012 while answering questions from David Le Goff who was at that time working for 6WIND.

DOES CPU-BASED FORWARDING PERFORMANCE MATTER FOR SDN? David Le Goff sent me several great SDN-related questions. Here’s the first one: What is your take on the performance issue with software-based equipment when dealing with general purpose CPU only? Do you see this challenge as a hard stop to SDN business? Short answer (as always) is it depends. However, I think most people approach this issue the wrong way. First, let’s agree that SDN means programmable networks (or more precisely, network elements that can be configured through a reasonable and documented API), not the Open Networking Foundation’s self-serving definition. Second, I hope we agree it makes no sense to perpetuate the existing spaghetti mess we have in most data centers. It’s time to decouple content and services from the transport, decouple virtual networks from the physical transport, and start building networks that provide equidistant endpoints (in which case it doesn’t matter to which port a load balancer or firewall is connected).

© Copyright ipSpace.net 2014

Page 5-60

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Now, assuming you’ve cleaned up your design, you have switches that do fast packet forwarding and have few needs for additional services, and the services-focused elements (firewalls, caches, load balancers) that work on L4-7. These two sets of network elements have totally different requirements: 

Implementing fast (and dumb) packet forwarding on L2 (bridge) or L3 (router) on generic x86 hardware makes no sense. It makes perfect sense to implement the control plane on generic x86 hardware (almost all switch vendors use this approach) and generic OS platform, but it definitely doesn’t make sense to let the x86 CPU get involved with packet forwarding. Broadcom's chipset can do a way better job for less money.



L4-7 services are usually complex enough to require lots of CPU power anyway. Firewalls configured to perform deep packet inspection and load balancers inspecting HTTP sessions must process the first few packets of every session by the CPU anyway, and only then potentially offload the flow record to dedicated hardware. With optimized networking stacks, it’s possible to get reasonable forwarding performance on well-designed x86 platforms, so there’s little reason to use dedicated hardware in L4-7 appliances today (SSL offload is still a grey area).

On top of everything else, shortsighted design of dedicated hardware used by L4-7 appliances severely limits your options. Just ask a major vendor that needed years to roll out IPv6-enabled load balancers, high-performance IPv6-enabled firewalls blade ... and still doesn’t have hardware-based deep packet inspection of IPv6 traffic.

© Copyright ipSpace.net 2014

Page 5-61

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

SUMMARY While it’s nice to have high performance packet forwarding on generic x86 architecture, the performance of software switching is definitely not an SDN showstopper. Also, keep in mind a software appliance running on a single vCPU can provide up to a few gigabits of forwarding performance, there are plenty of cores in today’s Xeon-based servers (10Gbps per physical server is thus very realistic), and not that many people have multiple 10GE uplinks from their data centers.

© Copyright ipSpace.net 2014

Page 5-62

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The final blog post in this chapter illustrates what happens when overexcited engineers forget the harsh limits of reality. I hope this chapter gave you enough information to analyze how bad the idea described in this blog post is (the blog post was written in late 2011, but there are still people proposing similar “solutions” in 2014).

OPENFLOW AND THE STATE EXPLOSION While everyone deeply involved with OpenFlow agrees it’s just a low-level tool that can’t solve problems we couldn’t solve in the past (just like replacing Tcl with C++ won’t help you prove P = NP), occasionally you stumble across mindboggling ideas that are so simple you have to ask yourself: “were we really that stupid?” One of them that obviously impressed James Hamilton is the solution to load balancing that requires no load balancers. Before clicking Read more, watch this video and try to figure out what the solution is and why we’re not using it in large-scale networks. The proposal is truly simple: it uses anycast with per-flow forwarding. All servers have the same IP address, and the OpenFlow controller establishes a path from each client to one of the servers. In its most simplistic implementation, a flow entry is installed in all devices in the path every time a client establishes a session with a server (you could easily improve it by using MPLS LSPs or any other virtual circuit/tunneling mechanism in the core). Now ask yourself: will this ever scale? Of course it won’t. It might be a good solution for long-lived sessions (after all, that’s how voice networks handle 800-numbers), but not for the data world where a single client could establish tens of TCP sessions per second.

© Copyright ipSpace.net 2014

Page 5-63

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

A quick look back confirms that hunch: all technologies that required per-session state in every network device have failed. IntServ (with RSVP) never really took off on a global scale, and ATM-tothe-desktop failed miserably. The only two exceptions are global X.25 networks (they were so expensive that nobody ever established more than a few sessions) and voice networks (where sessions usually last for minutes ... or hours if teenagers get involved). Load balancers work as well as they do because a single device in the whole path (load balancer) keeps the per-session state, and because you can scale them out – if they become overloaded, you just add another pair of redundant devices with new IP addresses to the load balancing pool (and use DNS-based load balancing on top of them). Some researchers have quickly figured out the scaling problem and there’s work being done to make the OpenFlow-based load balancing scale better, but one has to wonder: after they’re done and their solution scales, will it be any better than what we have today, or will it just be different? Moral of the story – every time you hear about an incredible solution to a well-known problem ask yourself: why weren’t we using it in the past? Were we really that stupid or are there some inherent limitations that are not immediately visible? Will it scale? Is it resilient? Will it survive device or link failures? And don’t forget: history is a great teacher.

© Copyright ipSpace.net 2014

Page 5-64

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

6

OPENFLOW AND SDN USE CASES

Traditional networking architectures and protocols are a perfect solution to a specific set of problems: shortest-path destination-only layer-2 and layer-3 forwarding. It’s amazing how many problems one can solve with such a specific toolset, from scale-out data center fabrics to the global Internet. More complex challenges (example: traffic engineering) have been solved using the traditional architecture of distributed loosely coupled independent nodes (example: MPLS TE), but could benefit from a centralized network visibility. Finally, the traditional solutions haven’t even tried to tackle some of the harder networking problems (example: megaflow-based forwarding or centralized policies with on-demand deployment) that could be solved with a controller-based architecture.

MORE INFORMATION You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

This chapter contains several real-life SDN solutions, most of them OpenFlow-based. For alternate approaches see the SDN Beyond OpenFlow chapter, for even more use cases watch the publicly available videos from my OpenFlow-based SDN Use Cases webinar.

IN THIS CHAPTER: OPENFLOW: ENTERPRISE USE CASES OPENFLOW @ GOOGLE: BRILLIANT, BUT NOT REVOLUTIONARY COULD IXPS USE OPENFLOW TO SCALE? IPV6 FIRST-HOP SECURITY: IDEAL OPENFLOW USE CASE OPENFLOW: A PERFECT TOOL TO BUILD SMB DATA CENTER SCALING DOS MITIGATION WITH OPENFLOW NEC+IBM: ENTERPRISE OPENFLOW YOU CAN ACTUALLY TOUCH BANDWIDTH-ON-DEMAND: IS OPENFLOW THE SILVER BULLET? OPENSTACK/QUANTUM SDN-BASED VIRTUAL NETWORKS WITH FLOODLIGHT NICIRA, BIGSWITCH, NEC, OPENFLOW AND SDN

© Copyright ipSpace.net 2014

Page 6-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Half a year after the public launch of OpenFlow and SDN (in autumn 2011), we already identified numerous enterprise use cases. Most of them are still largely ignored as every startup and major networking vendor rushes toward the (supposedly) low hanging fruits of data center fabrics and cloud-scale virtual networks.

OPENFLOW: ENTERPRISE USE CASES One of the comments I usually get about OpenFlow is “sounds great and I’m positive Yahoo! and Google will eventually use it, but I see no enterprise use case.” (see also this blog post). Obviously nobody would go for a full-blown native OpenFlow deployment and we’ll probably see hybrid (shipsin-the-night) approach more often in research labs than in enterprise networks, but there’s always the integrated mode that allows you to add OpenFlow-based functionality on top of existing networking infrastructure. Leaving aside the pretentious claims how OpenFlow will solve hard problems like global load balancing, there are four functions you can easily implement with OpenFlow (Tony Bourke wrote about them in more details): 

packet filters – flow classifier followed by a drop or normal action;



policy based routing – flow classifier followed by outgoing interface and/or VLAN tag push;



static routes – flow classifiers using only destination IP prefix and



NAT – some OpenFlow switches might support source/destination IP address/port rewrites.

Combine that with the ephemeral nature of OpenFlow (whatever controller downloads into the networking device does not affect running/startup configuration and disappears when it’s no longer

© Copyright ipSpace.net 2014

Page 6-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

needed), and the ability to use the same protocol with multiple product families, either from one or multiple vendors, and you have a pretty interesting combo. Actually, I don’t care if the mechanism to change networking devices’ forwarding tables is OpenFlow or something completely different, as long as it’s programmable, multi-vendor and integrated with the existing networking technologies. As I wrote a number of times, OpenFlow is just a TCAM/FIB/packet classifier download tool. Remember one of OpenFlow’s primary use cases: “add functionality where vendor is lacking it” (see Igor Gashinsky’s presentation from OpenFlow Symposium for a good coverage of that topic). Now stop for a minute and remember how many times you badly needed some functionality along the lines of the four functions I mentioned above (packet filters, PBR, static routes, NAT) that you couldn’t implement at all, or that required a hodgepodge of expect scripts (or XML/Netconf requests if you’re Junos automation fan) that you have to modify every time you deploy a different device type or a different software release. Here are a few ideas I got in the first 30 seconds (if you get other ideas, please do write a comment): 

User authentication for devices that don’t support 802.1X;



Per-user access control (I guess NAC is the popular buzzword) that works identically on dial-up, VPN, wireless and wired access devices;



Push user into a specific VLAN based on whatever he’s doing (or based on customized user authentication);



Give users controlled access to a single application in another VLAN (combine that with NAT to solve return path problems);



Layer-2 service insertion, be it firewall, IDS/IPS, WAAS or some yet-unknown device;

© Copyright ipSpace.net 2014

Page 6-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Looking at my short list, it seems @beaker was right: security just might be the killer app for OpenFlow/SDN – OpenFlow could be used either to implement some security features (packet filters and traffic steering), to help integrate traditional security functions with the rest of the network, or to implement dynamic security services insertion at any point in the network – something we badly need but almost never get.

© Copyright ipSpace.net 2014

Page 6-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Google uses OpenFlow to control their WAN edge routers that they built from commodity switching components. The details of their implementation are proprietary (and they haven’t open-sourced their solution), here’s what I was able to deduce from publicly available information in May 2012:

OPENFLOW @ GOOGLE: BRILLIANT, BUT NOT REVOLUTIONARY Google unveiled some details of its new internal network at Open Networking Summit in April and predictably the industry press and OpenFlow pundits exploded with the “this is the end of the networking as we know it” glee. Unfortunately I haven’t seen a single serious technical analysis of what it is they’re actually doing and how different their new network is from what we have today. This is a work of fiction, based solely on the publicly available information presented by Google’s engineers at Open Networking Summit (plus an interview or two published by the industry press). Read and use it at your own risk.

WHAT IS GOOGLE DOING? After supposedly building their own switches, Google decided to build their own routers. They use a distributed multi-chassis architecture with redundant central control plane (not unlike Juniper’s XRE/EX8200 combo). Let’s call their combo a G-router.

© Copyright ipSpace.net 2014

Page 6-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

A G-router is used as a WAN edge device in their data centers and runs traditional routing protocols: EBGP with the data center routers and IBGP+IS-IS across WAN with other G-routers (or traditional gear during the transition phase). On top of that, every G-router has a (proprietary, I would assume) northbound API that is used by Google’s Traffic Engineering (G-TE) – a centralized application that’s analyzing the application requirements, computing the optimal paths across the network and creating those paths through the network of G-routers using the above-mentioned API. I wouldn’t be surprised if G-TE would use MPLS forwarding instead of installing 5-tuples into midpath switches. Doing Forwarding Equivalence Class (FEC) classification at the head-end device instead of at every hop is way simpler and less loop-prone. Like MPLS-TE, G-TE runs in parallel with the traditional routing protocols. If it fails (or an end-to-end path is broken), G-routers can always fall back to traditional BGP+IGP-based forwarding, and like with MPLS-TE+IGP, you’ll still have a loop-free (although potentially suboptimal) forwarding topology.

IS IT SO DIFFERENT? Not really. Similar concepts (central path computation) were used in ATM and Frame Relay networks, as well as early MPLS-TE implementations (before Cisco implemented OSPF/IS-IS traffic engineering extensions and RSVP that was all you’d had). Some networks are supposedly still running offline TE computations and static MPLS TE tunnels because they give you way better results than the distributed MPLS-TE/autobandwidth/automesh kludges.

© Copyright ipSpace.net 2014

Page 6-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

MPLS-TP is also going in the same direction – paths are computed by NMS, which then installs in/out label mappings (and fast failover alternatives if desired) to the Label Switch Routers (LSRs).

THEN WHAT IS DIFFERENT? Google is (as far as I know) the first one that implemented the end-to-end system: gathering application needs, computing paths, and installing them in the routers in real time. You could do the same thing (should you wish to do it) with the traditional gear using NETCONF with a bit of MPLS-TP sprinkled on top (or your own API if you have switches that can be easily programmed in a decent programming language – Arista immediately comes to mind), but it would be a “slight” nightmare and would still suffer the drawbacks of distributed signaling protocols (even static MPLS-TE tunnels use RSVP these days). The true difference between their implementation and everything else on the market is thus that they did it the right way, learning from all the failures and mistakes we made in the last two decades.

WHY DID THEY DO IT? Wouldn’t you do the same assuming you’d have the necessary intellectual potential and resources? Google’s engineers built themselves a high-end router with modern scale-out software architecture that runs only the features they need (with no code bloat and no bugs from unrelated features), and they can extend the network functionality in any way they wish with the northbound API. Even though they had to make hefty investment in the G-router platform, they claim their network already converges almost 10x faster than before (on the other hand, it’s not hard converging faster

© Copyright ipSpace.net 2014

Page 6-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

than IS-IS with default timers), and has average link utilization above 90% (which in itself is a huge money-saver).

HYPE GALORE Based on the information from Open Networking Summit (which is all the information I have at the moment), you might wonder what all the hype is about. In one word: OpenFlow. Let’s try to debunk those claims a bit. Google is running an OpenFlow network. Get lost. Google is using OpenFlow between controller and adjacent chassis switches because (like everyone else) they need a protocol between the control plane and forwarding planes, and they decided to use an already-documented one instead of inventing their own (the extra OpenFlow hype could also persuade hardware vendors and chipset manufacturers to implement more OpenFlow capabilities in their next-generation products). Google built their own routers ... and so can you. Really? Based on the scarce information from ONS talks and interview in Wired, Google probably threw more money and resources at the problem than a typical successful startup. They effectively decided to become a router manufacturer, and they did. Can you repeat their feat? Maybe, if you have comparable resources. Google used open-source software ... so the monopolistic Ciscos of the world are doomed. Just in case you believe the fairy-tale conclusion, let me point out that many Internet exchanges use opensource software for BGP route servers, and almost all networking appliances and most switches built today run on open source software (namely Linux or FreeBSD). It’s the added value that matters, in Google’s case their traffic engineering solution.

© Copyright ipSpace.net 2014

Page 6-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Google built an open network – really? They use standard protocols (BGP and IS-IS) like everyone else and their traffic engineering implementation (and probably the northbound API) is proprietary. How is that different (from the openness perspective) from networks built from Juniper’s or Cisco’s gear?

CONCLUSIONS Google’s engineers did a great job – it seems they built a modern routing platform that everyone would love to have, and an awesome traffic engineering application. Does it matter to you and me? Probably not; I don’t expect them giving their crown jewels away. Does it matter that they used OpenFlow? Not really, it’s a small piece of their whole puzzle. Will someone else repeat their feat and bring a low-cost high-end router to the market? I doubt, but I hope to be wrong.

© Copyright ipSpace.net 2014

Page 6-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow might be an ideal tool to solve interesting problems that are too rare to merit attention of traditional networking vendors. Internet Exchange Points (IXPs) might be one of those scenarios.

COULD IXPS USE OPENFLOW TO SCALE? The SDN industry probably considers me an old and grumpy naysayer (and I’m positive Mrs Y has a special place in their hearts after her recent blog post), so I tried really hard to find a real-life example where OpenFlow could be used to solve mid-market innovator’s dilemma to balance my usual OpenFlow and SDN presentation. Internet Exchange Points (IXP) seemed a perfect fit – they are high-speed mission-critical environments usually implemented as geographically stretched layer-2 networks, and facing all sorts of security and scaling problems. Deploying OpenFlow on IXP edge switches would results in standardized security posture that wouldn’t rely on idiosyncrasies of particular vendor’s implementation, and we could use OpenFlow to implement ARP sponge (or turn ARPs into unicasts sent to ARP server). I presented these ideas at MENOG 12 in March 2013 and got a few somewhat interested responses … and then I asked a really good friend with significant operational experience in IXP environments for feedback. Not surprisingly, the reply was a cold shower: I am not quite sure how this improves current situation. Except for the ARP sponge everything else seem to be implemented by vendors in one form or another. For the ARP sponge, AMS-IX uses great software developed in house that they’ve open-sourced. As always, from the ops perspective proven technologies beat shiny new tools.

© Copyright ipSpace.net 2014

Page 6-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

On a somewhat tangential topic, Dean Pemberton runs OpenFlow in production in New Zealand Internet Exchange. His deployment model is totally different: the IXP is a layer-3 fabric (not a layer2 fabric like most Internet exchanges), and his route server is the only way to exchange BGP routes between members. He’s using Quagga and RouteFlow to program Pica8 switches. A note from a grumpy skeptic: his deployment works great because he’s carrying a pretty limited number of BGP routes – the Pica8 switches he’s using support up to 12K routes. IPv4 or IPv6? Who knows, the data sheet ignores that nasty detail.

© Copyright ipSpace.net 2014

Page 6-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

First-hop IPv6 security is another morass lacking a systemic solution. Could we solve it with OpenFlow? Yes, we could… but there’s nobody approaching this problem from the controller-based perspective (at least based on my knowledge in August 2014).

IPV6 FIRST-HOP SECURITY: IDEAL OPENFLOW USE CASE Supposedly it’s a good idea to be able to identify which one of your users had a particular IP address at the time when that source IP address created significant havoc. We have a definitive solution for the IPv4 world: DHCP server logs combined with DHCP snooping, IP source guard and dynamic ARP inspection. IPv6 world is a mess: read this e-mail message from v6ops mailing list and watch Eric Vyncke’s RIPE65 presentation for excruciating details.

SHORT SUMMARY 

Many layer-2 switches still lack the feature parity with IPv4;



IPv6 uses three address allocation algorithms (SLAAC, privacy extensions, DHCPv6) and it’s quite hard to enforce a specific one;



Host implementations are wildly different (aka: The nice thing about standards is that you have so many to choose from.).



IPv6 address tracking is a hodgepodge of kludges.

© Copyright ipSpace.net 2014

Page 6-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHAT IF ...THERE WOULD BE AN OPENFLOW SOLUTION? Now imagine a parallel universe in which the edge switches support OpenFlow 1.3 and IPv6 (the only vendors matching these criteria in August 2014 are NEC and HP). IPv6 address tracking would become an ideal job for an OpenFlow controller: 

Whenever a new end-host appears on the network, it’s authenticated, and its MAC address is logged. Only that MAC address can be used on that port (many switches already implement this functionality).



Whenever an end-host starts using a new IPv6 source address, the packets are not matched by any existing OpenFlow entries and thus get forwarded to the OpenFlow controller.



The OpenFlow controller decides whether the new source IPv6 is legal (enforcing DHCPv6-only address allocation if needed), logs the new IPv6-to-MAC address mapping, and modifies the flow entries in the first-hop switch. The IPv6 end-host can use many IPv6 addresses – each one of them is logged immediately.



Ideally, if the first-hop switches support all the nuances introduced in OpenFlow 1.2, the controller can install neighbor advertisement (NA) filters, effectively blocking ND spoofing.

Will this nirvana appear anytime soon? Not likely. Most switch vendors support only OpenFlow 1.0, which is totally IPv6-ignorant. Also, solving real-life operational issues is never as sexy as promoting the next unicorn-powered fountain of youth.

© Copyright ipSpace.net 2014

Page 6-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Imagine the world where you can buy a prepackaged data center (or a pod for your private cloud deployment), with compute, storage and networking handled from the single central management console. As of August 2014, NEC is still the only vendor with a commercial-grade data center fabric product using OpenFlow. Most other vendors use more traditional architectures, and the virtualization world is quickly moving toward overlay virtual networks. Anyhow, this is how I envisioned potential OpenFlow use in a small data center in 2012:

OPENFLOW: A PERFECT TOOL TO BUILD SMB DATA CENTER When I was writing about the NEC+IBM OpenFlow trials, I figured out a perfect use case for OpenFlow-controlled network forwarding: SMB data centers that need less than a few hundred physical servers – be it bare-metal servers or hypervisor hosts (hat tip to Brad Hedlund for nudging me in the right direction a while ago) As I wrote before, OpenFlow-controlled network forwarding (example: NEC, BigSwitch) experiences a totally different set of problems than OpenFlow-controlled edge (example: Nicira or XenServer vSwitch Controller).

© Copyright ipSpace.net 2014

Page 6-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

THE DREAM As you can imagine, it’s extremely simple to configure an OpenFlow-controlled switch: configure its own IP address, management VLAN, and controller’s IP address, and let the controller do the rest. Once the networking vendors figure out “the fine details”, they could use dedicated management ports for out-of-band OpenFlow control plane (similar to what QFabric is doing today), DHCP to assign an IP address to the switch, and a new DHCP option to tell the switch where the controller is. The DHCP server would obviously run on the OpenFlow controller, and the whole control plane infrastructure would be completely isolated from the outside world, making it pretty secure. The extra hardware cost for significantly reduced complexity (no per-switch configuration and a single management/SNMP IP address): two dumb 1GE switches (to make the setup redundant), hopefully running MLAG (to get rid of STP). Finally, assuming server virtualization is the most common use case in a SMB data center, you could tightly couple OpenFlow controller with VMware’s vCenter, and let vCenter configure the whole network: 

CDP or LLDP would be used to discover server-to-switch connectivity;



OpenFlow controller would automatically download port group information from vCenter and automatically provision VLANs on server-to-switch links.



Going a step further, OpenFlow controller could automatically configure static port channels based on load balancing settings configured on port groups.

© Copyright ipSpace.net 2014

Page 6-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

End result: decently large layer-2 network with no STP, automatic multipathing, and automatic adjustment to VLAN changes, with a single management interface, and the minimum number of moving parts. How cool is that?

SCENARIO#1 – GE-ATTACHED SERVERS If you decide to use GE-attached servers, and run virtual machines on them, it would be wise to use four to six uplinks per hypervisor host (two for VM data, two for kernel activities, optionally additional two for iSCSI or NFS storage traffic). You could easily build a GE Clos fabric using switches from NEC America: PF5240 (ToR switch) as leaf nodes (you’d have almost no oversubscription with 48 GE ports and 4 x 10GE uplinks), and PF5820 (10 GE switch) as spine nodes and interconnection point with the rest of the network. Using just two PF5820 spine switches you could get over 1200 1GE server ports – enough to connect 200 to 300 servers (probably hosting anywhere between 5.000 and 10.000 VMs). You'd want to keep the number of switches controlled by the OpenFlow controller low to avoid scalability issues. NEC claims they can control up to 200 ToR switches with a controller cluster; I would be slightly more conservative.

SCENARIO#2 – 10GE ATTACHED SERVERS Things get hairy if you want to use 10GE-attached servers (or, to put it more diplomatically, IBM and NEC are not yet ready to handle this use case): 

If you want true converged storage with DCB, you have to use IBM’s switches (NEC does not have DCB), and even then I’m not sure how DCB would work with OpenFlow.

© Copyright ipSpace.net 2014

Page 6-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



PF5820 (NEC) and G8264 (IBM) have 40GE uplinks, but I have yet to see a 40GE OpenFlowenabled switch with enough port density to serve as the spine node. At the moment, it seems that bundles of 10GE uplinks are the way to go.



It seems (according to data sheets, but I could be wrong) NEC supports 8-way multipathing, and we’d need at least 16-way multipathing to get 3:1 oversubscription.

Anyhow, assuming all the bumps eventually do get ironed out, you could have a very easy-tomanage network connecting a few hundred 10GE-attached servers.

WILL IT EVER HAPPEN? I remain skeptical, mostly because every vendor seems obsessed with cloud computing and zettascale data centers, ignoring mid-scale market … but there might be silver lining. This idea would make most sense if you’d be able to buy a prepackaged data center (think VCE block) at a reasonably low price (to make it attractive to SMB customers). A few companies have all the components one would need in a SMB data center (Dell, HP, IBM), and Dell just might be able to pull it off (while HP is telling everyone how they’ll forever change the networking industry). And now that I’ve mentioned Dell: how about configuring your data center through a user-friendly web interface, and have it shipped to your location in a few weeks.

© Copyright ipSpace.net 2014

Page 6-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OpenFlow is an ideal tool when you want to augment the software-based networking services with packet forwarding at hardware speeds. This post describes an DoS prevention solution demonstrated by NEC and Radware in spring 2013:

SCALING DOS MITIGATION WITH OPENFLOW NEC and a slew of its partners demonstrated an interesting next step in the SDN saga @ Interop Las Vegas 2013: multi-vendor SDN applications. Load balancing, orchestration and security solutions from A10, Silver Peak, Red Hat and Radware were happily cooperating with ProgrammableFlow controller. A curious mind obviously wants to know what’s behind the scenes. Masterpieces of engineering? Large integration projects ... or is it just a smart application of API glue? In most cases, it’s the latter. Let’s look at the ProgrammableFlow – Radware integration. Here’s a slide from NEC’s white paper. An interesting high-level view, but no details. Radware press release is even less helpful (but it’s definitely a masterpiece of marketing).

© Copyright ipSpace.net 2014

Page 6-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 6-1: NEC+Radware high-level solution architecture

© Copyright ipSpace.net 2014

Page 6-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Fortunately Ron Meyran provided more details on Radware blog as did Lior Cohen in his SDN Central Demo Friday presentation: 

DefenseFlow software monitors the flow entries and counters provided by an OpenFlow controller, and tries to identify abnormal traffic patterns;



The abnormal traffic is diverted to Radware DefensePro appliance that scrubs the traffic before it’s returned to the data center.

Both operations are easily done with ProgrammableFlow API – it provides both flow data and the ability to redirect the traffic to a third-party next hop (or MAC address) based on a dynamicallyconfigured access list. Here’s a CLI example from the ProgrammableFlow webinar; API call would be very similar (but formatted as JSON or XML object):

© Copyright ipSpace.net 2014

Page 6-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 6-2: Sample ProgrammableFlow traffic steering configuration

© Copyright ipSpace.net 2014

Page 6-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHY IS THIS USEFUL? Deep packet inspection is CPU-intensive and hard to implement at high speeds. DPI products and solutions (including traffic scrubbing appliances like DefensePro) thus tend to be expensive. 40Gbps of DefensePro DPI (DefensePro 40420) sets you back almost half a million dollars (according to this price list). Doing initial triage and subsequent traffic blackholing in cheaper forwarding hardware (programmed through OpenFlow) and diverting a small portion of the traffic through the scrubbing appliance significantly improves the average bandwidth a DPI solution can handle at reasonable cost.

IS THIS SOMETHING ONLY OPENFLOW COULD DO? Of course not – flow monitoring and statistics have been available for decades, either in Netflow or sFlow format. Likewise, we’ve been using PBR to redirect traffic for decades, and configuring PBR through NETCONF is not exactly rocket science ... and of course there’s FlowSpec that real-life engineers sometimes use to mitigate real-life DoS attacks (although, like any other tool, it does fail every now and then). However, an OpenFlow controller does provide a more abstracted API – instead of configuring PBR entries that push traffic toward next hop (or an MPLS TE tunnel if you’re an MPLS ninja) and modifying router configuration while doing so, you just tell the OpenFlow controller that you want the traffic redirected toward a specific MAC address, and the necessary forwarding entries automagically appear all across the path. Finally, there’s the sexiness factor. Mentioning SDN instead of Netflow or PBR in your press release is infinitely more attractive to bedazzled buyers.

© Copyright ipSpace.net 2014

Page 6-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

WILL IT SCALE? You should be aware of the major OpenFlow scaling issues by now, and I hope you’ve realized that real-life switches have real-life limitations. Most of the existing hardware reuses ACL entries when you ask for full-blown OpenFlow flow entries. Now go and check the ACL table size on your favorite switch, and imagine you need one entry for each flow spec you want to monitor or divert to the DPI appliance. Done? Disappointed? Pleasantly surprised? However, a well-tuned solution using the right combination of hardware and software (example: NEC’s PF5240 which can handle 160.000 L2, IPv4 or IPv6 flows in hardware) just might work. Still, we’re early in the development cycle, so make sure you do thorough (stress) testing before buying anything ... and just in case you need rock-solid traffic generator, Spirent will be more than happy to sell you one (or few).

© Copyright ipSpace.net 2014

Page 6-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

NEC and IBM gave me access to one of their early ProgrammableFlow customers. This is what I got out of that discussion which took place in February 2012. In the meantime, I’ve encountered at least one large-scale production deployment of ProgrammableFlow, proving that NEC’s solution works in large data centers.

NEC+IBM: ENTERPRISE OPENFLOW YOU CAN ACTUALLY TOUCH I didn’t expect we’d see multi-vendor OpenFlow deployment any time soon. NEC and IBM decided to change that and Tervela, a company specialized in building messaging-based data fabrics, decided to verify their interoperability claims. Janice Roberts who works with NEC Corporation of America helped me get in touch with them and I was pleasantly surprised by their optimistic view of OpenFlow deployment in typical enterprise networks.

A BIT OF A BACKGROUND Tervela’s data fabric solutions typically run on top of traditional networking infrastructure, and an underperforming network (particularly long outages triggered by suboptimal STP implementations) can severely impact the behavior of the services running on their platform. They were looking for a solution that would perform way better than what their customers are typically using today (large layer-2 networks), while at the same time being easy to design,

© Copyright ipSpace.net 2014

Page 6-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

provision and operate. It seems that they found a viable alternative to existing networks in a combination of NEC’s ProgrammableFlow Controller and IBM’s BNT 8264 switches.

EASY TO DEPLOY? As long as your network is not too big (NEC claimed their controller can manage up to 50 switches in their Networking Tech Field Day presentation, and the later releases of ProgrammableFlow increased that limit to 200), the design and deployment isn’t too hard according to Tervela’s engineers: 

They decided to use out-of-band management network and connected the management port of BNT8264 to the management network (they could also use any other switch port).



All you have to configure on the individual switch is the management VLAN, a management IP address and the IP address of the OpenFlow controllers.



The ProgrammableFlow controller automatically discovers the network topology using LLDP packets sent from the controller through individual switch interfaces.



After those basic steps, you can start configuring virtual networks in the OpenFlow controller (see the demo NEC made during the Networking Tech Field Day).

Obviously, you’d want to follow some basic design rules, for example: 

Make the management network fully redundant (read the QFabric documentation to see how that’s done properly);



Connect the switches into a structure somewhat resembling a Clos fabric, not in a ring or a random mess of cables.

© Copyright ipSpace.net 2014

Page 6-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

TEST RESULTS – LATENCY Tervela’s engineers ran a number of tests, focusing primarily on latency and failure recovery. They found out that (as expected) the first packet exchanged between a pair of VMs experiences a 8-9 millisecond latency because it’s forwarded through the OpenFlow controller, with subsequent packets having latency they were not able to measure (their tool has a 1 msec resolution). Lesson#1 – If the initial packet latency matters, use proactive programming mode (if available) to pre-populate the forwarding tables in the switches; Lesson#2 – Don’t do a full 12-tuple lookups unless absolutely necessary. You’d want to experience the latency only when the inter-VM communication starts, not for every TCP/UDP flow (not to mention that capturing every flow in a data center environment is a sure recipe for disaster).

TEST RESULTS – FAILURE RECOVERY Very fast failure recovery was another pleasant surprise. They tested just the basic scenario (parallel primary/backup links) and found that in most cases the traffic switches over to the second link in less than a millisecond, indicating that NEC/IBM engineers did a really good job and pre-populated the forwarding tables with backup entries. If it takes 8-9 milliseconds for the controller to program a single flow into the switches (see latency above), it’s totally impossible that the same controller would do a massive reprogramming for the forwarding tables in less than a millisecond. The failure response must have been preprogrammed in the forwarding tables.

© Copyright ipSpace.net 2014

Page 6-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

There were a few outliers (10-15 seconds), probably caused by lack of failure detection on the physical layer. As I wrote before, detecting link failures via control packets sent by OpenFlow controller doesn’t scale – you need distributed linecard protocols (LACP, BFD) if you want to have a scalable solution. NEC added OAM functionality in later releases of ProgrammableFlow, probably solving this problem.

Finally, assuming their test bed allowed the ProgrammableFlow controller to prepopulate the backup entries, it would be interesting to observe the behavior of a four-node square network, where it’s impossible to find a loop-free alternate path unless you use virtual circuits like MPLS Fast Reroute does.

TEST RESULTS – BANDWIDTH ALLOCATION AND TRAFFIC ENGINEERING One of the interesting things OpenFlow should enable is the bandwidth-aware flow routing. Tervela’s engineers were somewhat disappointed to discover the software/hardware combination they were testing doesn’t meet those expectations yet. They were able to reserve a link for high-priority traffic and observe automatic load balancing across alternate paths (which would be impossible in a STP-based layer-2 network), but they were not able to configure statistics-based routing (route important flows across underutilized links).

© Copyright ipSpace.net 2014

Page 6-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

NEXT STEPS? Tervela’s engineers said the test results made them confident in the OpenFlow solution from NEC and IBM. They plan to run more extensive tests and if those test results work out, they’ll start recommending OpenFlow-based solutions as a Proof-of-Concept-level alternative to their customers.

A HUGE THANK YOU! This blog post would never happen without Janice Roberts who organized the exchange of ideas, and Michael Matatia, Jake Ciarlante and Brian Gladstein from Tervela who were willing to spend time with me sharing their experience.

© Copyright ipSpace.net 2014

Page 6-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Every time a new networking technology appears, someone tries to solve the Bandwidth-on-Demand problem with it. OpenFlow is no exception.

BANDWIDTH-ON-DEMAND: IS OPENFLOW THE SILVER BULLET? Whenever the networking industry invents a new (somewhat radical) technology, bandwidth-ondemand seems to be one of the much-touted use cases. OpenFlow/SDN is no different – Juniper used its OpenFlow implementation (Open vSwitch sitting on top of Junos SDK) to demonstrate Bandwidth Calendaring (see Dave Ward’s presentation @ OpenFlow Symposium for more details), Greg Ferro was talking about the same topic in his fantastic Introduction to OpenFlow/SDN webinar, and Dmitri Kalintsev recently blogged “How about an ability for things like Open vSwitch ... to actually signal the transport network its connectivity requirements ... say desired bandwidth” I have only one problem with these ideas: I’ve seen them before. In the last 20 years, at least three technologies have been invented to solve the bandwidth-ondemand problem: RSVP, ATM Switched Virtual Circuits (SVC) and MPLS Traffic Engineering (MPLSTE). None of them was ever widely used to create a ubiquitous bandwidth-on-demand service. I’m positive very smart network operators (including major CDN and content providers like Google) use MPLS-TE very creatively. I’m also sure there are environments where RSVP is a mission-critical functionality. I’m just saying bandwidth-on-demand is like IP multicast – it’s used by 1% of the networks that badly need it. All three technologies I mentioned above faced the same set of problems:

© Copyright ipSpace.net 2014

Page 6-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



Per-flow (or per-granular-FEC) state in the network core never scales. This is what killed RSVP and ATM SVCs.



It’s pretty hard to traffic engineer just the elephant flows. Either you do it properly and traffic engineer all traffic, or you end with a suboptimal network.



Reacting to short-term changes in bandwidth requirements can cause interesting oscillations in the network (I’m positive Petr Lapukhov could point you to a dozen sources analyzing this problem).



Nobody above the network layer really cares – it’s way simpler to blame the network when the bandwidth fairy fails to deliver.

You don’t think the last bullet is real? Then tell me how many off-the-shelf applications have RSVP support ... even though RSVP has been available in Windows and Unix/Linux server for ages. How many applications can mark their packets properly? How many of them allow you to configure DSCP value to use (apart from IP phones)? Similarly, it’s not hard to implement bandwidth-on-demand for specific elephant flows (inter-DC backup, for example) with a pretty simple combination of MPLS-TE and PBR, potentially configured with Netconf (assuming you have a platform with a decent API). You could even do it with SNMP – pre-instantiate the tunnels and PBR rules and enable tunnel interface by changing ifAdminStatus. When have you last seen it done? So, although I’m the first one to admit OpenFlow is an elegant tool to integrate flow classification (previously done with PBR) with traffic engineering (using MPLS-TE or any of the novel technologies proposed by Juniper) using the hybrid deployment model, being a seasoned skeptic, I just don’t believe we’ll reach the holy grail of bandwidth-on-demand during this hype cycle. However, being an eternal optimist, I sincerely hope I’m wrong.

© Copyright ipSpace.net 2014

Page 6-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In one of their pivoting phases Big Switch Networks proposed to implement virtual networking with MAC-layer access control lists installed through OpenFlow. I’m not aware of any commercial deployment of this idea.

OPENSTACK/QUANTUM SDN-BASED VIRTUAL NETWORKS WITH FLOODLIGHT A few years before MPLS/VPN was invented, I’d worked with a service provider who wanted to offer L3-based (peer-to-peer) VPN service to their clients. Having a single forwarding table in the PErouters, they had to be very creative and used ACLs to provide customer isolation (you’ll find more details in the Shared-router Approach to Peer-to-peer VPN Model section of my MPLS/VPN Architectures book). Now, what does that have to do with OpenFlow, SDN, Floodlight and Quantum?

THE BIG PICTURE Big Switch has released a plug-in for Quantum that provides OpenFlow-based virtual network support with their open-source Floodlight controller, and they use layer-2 ACLs to implement virtual networks, confirming the infinite wisdom of RFC 1925: Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.

© Copyright ipSpace.net 2014

Page 6-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

HOW DOES IT WORK? The 30K foot perspective first: 

OpenStack virtual networks are created with the REST API of the Quantum (networking) component of OpenStack;



Quantum uses back-end plug-ins to create the virtual networks in the actual underlying network fabric. Quantum (and the rest of OpenStack) does not care how the virtual networks are implemented as long as they provide isolated L2 domains.

And a quick look behind the scenes: 

Big Switch decided to implement virtual networks with dynamic OpenFlow-based L2 ACLs instead of using VLAN tags.



The REST API offered by Floodlight’s VirtualNetworkFilter module offers simple methods that create virtual networks and assign MAC addresses to them.



The VirtualNetworkFilter intercepts new flow setup requests (PacketIn messages to the Floodlight controller), checks that the source and destination MAC address belong to the same virtual network, and permits or drops the packet.



If the VirtualNetworkFilter accepts the flow, the Floodlight’s Forwarding module installs the flow entries for the newly-created flow throughout the network.

The current release of Floodlight installs per-flow entries throughout the network. I’m not particularly impressed with the scalability of this approach (and I’m not the only one).

© Copyright ipSpace.net 2014

Page 6-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

DOES IT MAKE SENSE? Floodlight controller and its Quantum plug-in have a very long way to go before I’d use them in a production environment: 

The Floodlight controller is a single point of failure (there’s no provision for a redundant controller);



Unless I can’t read Java code (which wouldn’t surprise me at all), the VirtualNetworkFilter stores all mappings (including MAC membership information) in in-memory structures that are lost if the controller or the server on which it runs crashes;



As mentioned above, per-flow entries used by Floodlight controller don’t scale at all (more about that in an upcoming post).

The whole thing is thus a nice proof-of-concept tool that will require significant efforts (probably including a major rewrite of the forwarding module) before it becomes production-ready. However, we should not use Floodlight to judge the quality of the yet-to-be-released commercial OpenFlow controller from Big Switch Networks. This is how Mike Cohen explained the differences: I want to highlight that all of the points you raised around production deployability and flow scalability (and some you didn't around how isolation is managed / enforced) are indeed addressed in significant ways in our commercial products. There’s a separation between what's in Floodlight and the code folks will eventually see from Big Switch. As always, I might become a believer once I see the product and its documentation.

© Copyright ipSpace.net 2014

Page 6-34

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The final blog post in this chapter was written in early 2012 when the industry press still wasn’t able to figure out what individual companies using OpenFlow were doing. Although it’s a bit old, it still provides an overview of different solutions that use OpenFlow as a low-level forwarding table programming tool. In the meantime, VMware bought Nicira (as I predicted in the last paragraph), and Nicira’s NVP became the basis for VMware’s NSX.

NICIRA, BIGSWITCH, NEC, OPENFLOW AND SDN Numerous articles published in the last few days describing how Nicira clashes heads-on with Cisco and Juniper just proved that you should never let facts interfere with a good story (let alone eyecatching headline). Just in case you got swayed away by those catchy stories, here’s the real McCoy (as I see it):

WHAT ARE THEY ACTUALLY DOING? Nicira is building virtual networks solution using tunneling (VLAN tags, MAC-over-GRE or whatever else is available) between hypervisor switches. It expects the underlying network transport to do its work, be it at layer-2 or layer-3. An Open vSwitch appears as a regular VLAN-capable learning switch or as an IP host to the physical network, and uses existing non-OpenFlow mechanisms to interact with the network.

© Copyright ipSpace.net 2014

Page 6-35

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Deployment paradigm: complexity belongs to the hypervisor soft switches, let’s keep the network simple. It should provide no more and no less than optimal transport between equidistant hypervisor hosts (Clos fabrics come to mind). Target environment: Large cloud builders and other organizations leaning toward Xen/OpenStack. NEC and BigSwitch are building virtual networks by rearranging the forwarding tables in the physical switches. Their OpenFlow controllers are actively reconfiguring the physical network, creating virtual networks out of VLANs, interfaces, or sets of MAC/IP addresses. Deployment paradigm: we know hypervisor switches are stupid and can’t see beyond VLANs, so we’ll make the network smarter (aka VM-aware networking). Target environment: large enterprise networks and those that build cloud solutions with existing software using VLAN-based virtual switches.

COMPETITIVE HOT SPOTS? Between Nicira and NEC/BigSwitch: few. There is an overlap in functionality (NEC and BigSwitch can obviously manage Open vSwitch as well), but not much overlap in typical use case or sweet-spot target environments (I am positive there will be marketing efforts to shoehorn all of them in places where they don’t really fit, but that’s a different story). Between Nicira and Cisco/Juniper switches: few. Large cloud providers already got rid of enterprise kludges and use simple L2 or L3 fabrics. Facebooks, Googles and Amazons of the world run on IP; they don’t care much about TRILL-like inventions. Some of them buy equipment from Juniper, Cisco, Force10 or Arista, some of them build their own boxes, but however they build their network, that

© Copyright ipSpace.net 2014

Page 6-36

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

won’t change because of Nicira. No wonder Michael Bushong from Juniper embraced Nicira's solution. Between Nicira and Cisco’s Nexus 1000V: not at the moment. Open vSwitch runs on Xen/KVM, Nexus 1000V runs on VMware/Hyper-V. Open vSwitch runs on vSphere, but with way lower throughput than Nexus 1000V. Obviously Cisco could easily turn Nexus 1000V VSM into an OpenFlow controller (I predicted that would be their first move into OpenFlow world, and was proven dead wrong) and manage Open vSwitches, but there's nothing at the moment to indicate they're considering it. Between BigSwitch/NEC and Cisco/Juniper. This one will be fun to watch, more so with IBM, Brocade and HP clearly joining the OpenFlow camp and Juniper cautiously being on the sidelines. However, Nicira might trigger an interesting mindset shift in the cloud aspirant community: all of a sudden, Xen/OpenStack/Quantum makes more sense from the scalability perspective. A certain virtualization vendor will indubitably notice that ... unless they already focused their true efforts on PaaS (at which point all of the above becomes a moot point).

© Copyright ipSpace.net 2014

Page 6-37

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

7

SDN BEYOND OPENFLOW

The SDN = Centralized Control Plane (preferably using OpenFlow) definition promoted by Open Networking Foundation (ONF) is too narrow for most real-life use cases, as it forces a controller vendor to reinvent all the mechanisms we had in networking devices for the last 30 years, and make them work within a distributed system with unreliable communication paths. Many end-users (including Microsoft, a founding member of ONF) and vendors took a different approach, and created solutions that use traditional networking protocols in a different way, rely on overlays to reduce the complexity through decoupling, or use a hierarchy of control planes to achieve better resilience. This chapter starts with a blog post describing the alternate approaches to SDN and documents several potentially usable protocols and solutions.

MORE INFORMATION You’ll find additional SDN- and OpenFlow-related information on ipSpace.net web site: 

Start with the SDN, OpenFlow and NFV Resources page;



Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;



Numerous ipSpace.net webinars describe SDN, network programmability and automation, and OpenFlow (some of them are freely available thanks to industry sponsors);



2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function virtualization and SDDC technologies in your network;



Finally, I’m always available for short online or on-site consulting engagements.

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

IN THIS CHAPTER: THE FOUR PATHS TO SDN THE MANY PROTOCOLS OF SDN EXCEPTION ROUTING WITH BGP: SDN DONE RIGHT NETCONF = EXPECT ON STEROIDS DEAR $VENDOR, NETCONF != SDN WE NEED BOTH OPENFLOW AND NETCONF CISCO ONE: MORE THAN JUST OPENFLOW/SDN THE PLEXXI CHALLENGE (OR: DON’T BLAME THE TOOLS) I2RS – JUST WHAT THE SDN GOLDILOCKS IS LOOKING FOR?

© Copyright ipSpace.net 2014

Page 7-2

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The very strict definition of SDN as understood by Open Networking Foundation promotes an architecture with strict separation between a controller and totally dumb devices that cannot do more than forward packets based on forwarding rules downloaded from the controller. This definition is too narrow for most use cases, resulting in numerous solutions and architectures being branded as SDN. Most of these solutions fall into one of the four categories described in the blog post I wrote in August 2014.

THE FOUR PATHS TO SDN After the initial onslaught of SDN washing, four distinct approaches to SDN have started to emerge, from centralized control plane architectures to smart reuse of existing protocols. As always, each approach has its benefits and drawbacks, and there’s no universally best solution. You just got four more (somewhat immature) tools in your toolbox. And now for the details.

CONTROL-DATA PLANE SEPARATION The “original” (or shall I say orthodox) SDN definition comes from Open Networking Foundation and calls for a strict separation of control- and data planes, with a single control plane being responsible for multiple data planes. That definition, while serving the goals of ONF founding members, is at the moment mostly irrelevant for most enterprise or service provider organizations, which cannot decide to become a router manufacturer to build a few dozens of WAN edge routers… and based on the amount of

© Copyright ipSpace.net 2014

Page 7-3

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

resources NEC invested in ProgrammableFlow over the last years, it’s not realistic to expect that we’ll be able to use OpenDaylight in production environments any time soon (assuming you’d want to use it an architecture with a single central failure point in the first place). FYI, I’m not blaming OpenFlow. OpenFlow is just a low level tool that can be extremely handy when you’re trying to implement unusual ideas.

Reasonably-sized organizations could use OpenFlow to augment the forwarding functionality of existing network devices (in which case the only hardware one could use are a few HP switches, as no other major vendor supports send-to-normal OpenFlow action). I am positive there will be people building OpenFlow controllers controlling forwarding fabrics, but they’ll eventually realize what a monumental task they undertook when they’ll have to reinvent all the wheels networking industry invented in the last 30 years including: 

Topology discovery;



Fast failure detection (including detection of bad links, not just lost links);



Fast reroute around failures;



Path-based forwarding and prefix-independent convergence;



Scalable linecard protocols (LACP, LLDP, STP, BFD …).

© Copyright ipSpace.net 2014

Page 7-4

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

OVERLAY VIRTUAL NETWORKS The proponents of overlay virtual networking solutions use the same architectural approach that worked well with Telnet (replacing X.25 PAD), VoIP (replacing telephone exchanges) or iSCSI, not to mention the global Internet – reduce the complexity of the problem by decoupling transport fabric from edge functionality (a more cynical mind might be tempted to quote RFC 1925 section 2.6a). The decoupling approach works well assuming there are no leaky abstractions (in other words, the overlay can ignore the transport network – which wasn’t exactly the case in Frame Relay or ATM networks). Overlay virtual networks work well over fabrics with equidistant endpoints, and fail as miserably as any other technology when being misused for long-distance VLAN extensions.

VENDOR-SPECIFIC APIS After the initial magical dust of SDN-washing settled down, few vendors remained standing (I’m skipping those that allow you to send configuration commands in XML envelope and call that programmability): 

Arista has eAPI (access to EOS command line through REST) as well as the capability to install any Linux component on their switches, and use programmatic access to EOS data structures (sysdb);



Cisco’s OnePK gives you extensive access to inner working of Cisco IOS and IOS XE (haven’t found anything NX-OS-related on DevNet);



Juniper has some SDK that’s safely tucked behind a partner-only regwall. Just the right thing to do in 2014.

© Copyright ipSpace.net 2014

Page 7-5

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web



F5 had iRules and iControl for years (and there’s a Perl library to use it, which is totally awesome).

Not surprisingly, vendors love you to use their API. After all, that’s the ultimate lock-in they can get.

REUSE OF EXISTING PROTOCOL While the vendors and the marketers were fighting the positioning battles, experienced engineers did what they do best – they found a solution to a problem with the tools at hand. Many scalable real-life SDN implementations (as opposed to works great in PowerPoint ones) use BGP to modify forwarding information in the network (or even filter traffic with BGP FlowSpec), and implement programmatic access to BGP with something like ExaBGP. Finally, don’t forget that we’ve been using remote-triggered black holes for years (the RFC describing it is five years old, but the technology itself is way older) – we just didn’t know we were doing SDN back in those days.

© Copyright ipSpace.net 2014

Page 7-6

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHICH ONE SHOULD I USE? You know the answer: it depends. If you’re planning to implement novel ideas in the data center, overlay virtual networks might be the way to do (more so as you can change the edge functionality without touching the physical networking infrastructure). Do you need flexible dynamic ACLs or PBR? Use OpenFlow (or even better, DirectFlow if you have Arista switches). Looking for a large-scale solution that controls the traffic in LAN or WAN fabric? BGP might be the way to go. Finally, you can do things you cannot do with anything else with some vendor APIs (but do remember the price you’re paying).

© Copyright ipSpace.net 2014

Page 7-7

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The following text is a slightly reworded blog post I wrote in April 2013:

THE MANY PROTOCOLS OF SDN One could use a number of existing protocols to implement a controller-based networking solution depending on the desired level of interaction between the controller and the controlled devices. The following diagram lists some of them sorted by the networking device plane they operate on.

Figure 7-1: The many protocols of SDN

© Copyright ipSpace.net 2014

Page 7-8

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

NETCONF, OF-Config (a YANG data model used to configure OpenFlow devices through NETCONF) and XMPP (chat protocol creatively used by Arista EOS) operate at the management plane – they can change network device configuration or monitor its state. Remote Triggered Black Holes is one of the oldest solutions using BGP as the mechanism to modify network’s forwarding behavior from a central controller. Some network virtualization vendors use BGP to build MPLS/VPN-like overlay virtual networking solutions. I2RS and PCEP (a protocol used to create MPLS-TE tunnels from a central controller) operate on the control plane parallel to traditional routing protocols). BGP-LS exports link state topology and MPLSTE data through BGP. OVSDB is a protocol that treats control-plane data structures as database tables and enables a controller to query and modify those structures. It’s used extensively in VMware’s NSX, but could be used to modify any data structure (assuming one defines additional schema that describes the data). OpenFlow, MPLS-TP, ForCES and Flowspec (PBR through BGP used by creative network operators like CloudFlare) work on the data plane and can modify the forwarding behavior of a controlled device. OpenFlow is the only one of them that defines data-to-control-plane interactions (with the Packet In and Packet Out OpenFlow messages).

© Copyright ipSpace.net 2014

Page 7-9

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Microsoft was one of the first companies to document their use of BGP to implement a controllerbased architecture. Numerous similar solutions have been described since the time I wrote this blog post (October 2013) – it seems BGP is becoming one of the most popular SDN implementation tools.

EXCEPTION ROUTING WITH BGP: SDN DONE RIGHT One of the holy grails of data center SDN evangelists is controller-driven traffic engineering (throwing more leaf-and-spine bandwidth at the problem might be cheaper, but definitely not sexier). Obviously they don’t call it traffic engineering as they don’t want to scare their audience with MPLS TE nightmares, but the idea is the same. Interestingly, you don’t need new technologies to get as close to that holy grail as you wish; Petr Lapukhov got there with a 20 year old technology – BGP.

THE PROBLEM I’ll use a well-known suboptimal network to illustrate the problem: a ring of four nodes (it could be anything, from a monkey-designed fabric, to a stack of switches) with heavy traffic between nodes A and D.

© Copyright ipSpace.net 2014

Page 7-10

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-2: Sample network diagram

In a shortest-path forwarding environment you cannot spread the traffic between A and D across all links (although you might get close with a large bag of tricks). Can we do any better with a controller-based forwarding? We definitely should. Let’s see how we can tweak BGP to serve our SDN purposes.

INFRASTRUCTURE: USING BGP AS IGP If you want to use BGP as the information delivery vehicle for your SDN needs, you MUST ensure it’s the highest priority routing protocol in your network. The easiest design you can use is a BGP-only network using BGP as a more scalable (albeit a bit slower) IGP.

© Copyright ipSpace.net 2014

Page 7-11

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-3: Using BGP as a large-scale IGP

BGP-BASED SDN CONTROLLER After building a BGP-only data center, you can start to insert controller-generated routes into it: establish an IBGP session from the controller (cluster) to every BGP router and use higher local preference to override the EBGP-learned routes. You might also want to set no-export community on those routes to ensure they aren’t leaked across multiple routers.

© Copyright ipSpace.net 2014

Page 7-12

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-4: BGP-based SDN controller

Obviously I’m handwaving over lots of moving parts – you need topology discovery, reliable next hops, and a few other things. If you really want to know all those details, listen to the Packet Pushers podcast where we deep dive around them (hint: you could also engage me to help you build it).

© Copyright ipSpace.net 2014

Page 7-13

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

RESULTS: UNEQUAL-COST MULTIPATH The SDN controller in our network could decide to split the traffic between A and D across multiple paths. All it has to do to make it work is to send the following IBGP routing updates for prefix D: 

Two identical BGP paths (with next hops B and D) to A (to ensure the BGP route selection process in A uses BGP multipathing);



A BGP path with next hop C to B (B might otherwise send some of the traffic for D to A, resulting in a forwarding loop between B and A).

Figure 7-5: Unequal cost multipathing with BGP-based SDN controller

© Copyright ipSpace.net 2014

Page 7-14

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

You can get even fancier results if you run MPLS in your network (hint: read the IETF draft on remote LFA to get a few crazy ideas).

MORE INFORMATION 

Routing Design for Large-Scale Data Centers (Petr’s presentation @ NANOG 55)



Use of BGP for Routing in Large-Scale Data Centers (IETF draft)



Centralized Routing Control in BGP Networks (IETF draft)

© Copyright ipSpace.net 2014

Page 7-15

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Not surprisingly, the SDN-washing (labeling whatever you have as SDN) started just a few months after the initial SDN hype, with some people calling their NETCONF implementation SDN. This is what NETCONF really is.

NETCONF = EXPECT ON STEROIDS After the initial explosion of OpenFlow/SDN hype, a number of people made claims that OpenFlow is not the tool one can use to make SDN work, and NETCONF is commonly mentioned as an alternative (not surprisingly, considering that both Cisco IOS and Junos support it). Unfortunately, considering today’s state of NETCONF, nothing can be further from the truth.

WHAT IS NETCONF? NETCONF (RFC 6421) is an XML-based protocol used to manage the configuration of networking equipment. It allows the management console (“manager”) to issue commands and change configuration of networking devices (“NETCONF agents”). In this respect, it’s somewhat similar to SNMP, but since it uses XML, provides a much richer set of functionality than the simple key/value pairs of SNMP. For more details, I would strongly suggest you listen to the NETCONF Packet Pushers podcast.

© Copyright ipSpace.net 2014

Page 7-16

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHAT’S WRONG WITH NETCONF? Greg Ferro made a great analogy in the above-mentioned podcast: NETCONF is like SNMPv2/v3 (the transport protocol) and Yang (the language used to describe valid NETCONF messages) is like ASN.1 (the syntax describing SNMP variables). However, there’s a third component in the SNMP framework: a large set of standardized MIBs that are implemented by almost all networking vendors. It’s thus possible to write a network management application using a standard MIB that would work with equipment from all vendors that decided to implement that MIB. For example, should the Hadoop developers decide to use LLDP to auto-discover the topology of the Hadoop clusters, they could rely on LLDP MIB being available in switches from most data center networking vendors. Apart from few basic aspects of session management, no such standardized data structure exists in the NETCONF world. For example, there’s no standardized command (specified in an RFC) that you could use to get the list of interfaces, shut down an interface, or configure an IP address on an interface. The drafts are being written by the NETMOD working group, but it will take a while before they make it to the RFC status and get implemented by major vendors. Every single vendor that graced us with a NETCONF implementation thus uses its own proprietary format within the NETCONF’s XML envelope. In most cases, the vendor-specific part of the message maps directly into existing CLI commands (in Junos case, the commands are XML-formatted because Junos uses XML internally). Could I thus write a NETCONF application that would work with Cisco IOS and Junos? Sure I could … if I’d implement a vendor-specific module for every device family I plan to support in my application.

© Copyright ipSpace.net 2014

Page 7-17

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

WHY WOULD YOU USE NETCONF? Let’s consider the alternatives: decades ago we configured network devices over Telnet sessions using expect scripts – simple automation scripts that would specify what one needs to send to the device, and what response one should expect. You could implement the scripts with the original expect tool, or with a scripting language like Tcl or Perl. Using a standard protocol that provides clear message delineation (expect scripts were mainly guesswork and could break with every software upgrade done on the networking devices) and error reporting (another guesswork part of the expect scripts) is evidently a much more robust solution, but it’s still too little and delivered way too slowly. What we need is a standard mechanism of configuring a multi-vendor environment, not a better wrapper around existing CLI (although the better wrapper does come handy).

© Copyright ipSpace.net 2014

Page 7-18

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

NETCONF (or XMPP as used by Arista) operates solely on the management plane, making it an interesting device configuration mechanism, but we might need more to implement something that could rightfully be called SDN. This is my response (written in October of 2012) to SDN-washing activities performed by a large data center vendor.

DEAR $VENDOR, NETCONF != SDN Some vendors feeling the urge to SDN-wash their products claim that the ability to “program” them through NETCONF (or XMPP or whatever other similar mechanism) makes them SDN-blessed. There might be a yet-to-be-discovered vendor out there that creatively uses NETCONF to change the device behavior in ways that cannot be achieved by CLI or GUI configuration, but most of them use NETCONF as a reliable Expect script. More precisely: what I’ve seen being done with NETCONF or XMPP is executing CLI commands or changing device (router, switch) configuration on-the-fly using a mechanism that is slightly more reliable than a Perl script doing the same thing over an SSH session. Functionally it’s the same thing as typing the exec-level or configuration commands manually (only a bit faster and with no autocorrect). What's missing? Few examples: you cannot change the device behavior beyond the parameters already programmed in its operating system (like you could with iRules on F5 BIG-IP). You cannot implement new functionality (apart from trivial things like configuring and removing static routes or packet/route filters). And yet some $vendors I respect call that SDN. Give me a break, I know you can do better than that.

© Copyright ipSpace.net 2014

Page 7-19

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Most NETCONF implementations don’t allow you to go below the device configuration level. On the other hand, OpenFlow by itself isn’t enough to implement a self-sufficient SDN solution, as it doesn’t allow the controller to configure the initial state of the attached devices. In a solution that implements novel forwarding functionality we might need both.

WE NEED BOTH OPENFLOW AND NETCONF Every time I write about a simple use case that could benefit from OpenFlow, I invariably get a comment along the lines of “you can do that with NETCONF”. Repeated often enough, such comments might make an outside observer believe you don’t need OpenFlow for Software Defined Networking (SDN), which is simply not true. Here are at least three fundamental reasons why that’s the case.

CONTROL/DATA PLANE SEPARATION Whether you need OpenFlow for SDN obviously depends on how you define SDN. The networking components were defined by their software the moment they became smarter than cables connecting individual hosts (around the time IBM launched 3705 in the seventies if not earlier), so you definitely don’t need OpenFlow to implement networking defined by software. However, lame joking aside, the definition of SDN as promoted by the Open Networking Foundation requires the separation of control and data planes, and you simply can’t do that with NETCONF. If anything, ForCES would be the right tool for the job, but you’ve not heard much about ForCES from

© Copyright ipSpace.net 2014

Page 7-20

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

your favorite vendor, have you … even though its development has been slowly progressing (or not, depending on your point of view) for the last decade.

IMPLEMENTING NEW FUNCTIONALITY NETCONF is a protocol that allows you to modify networking device’s configuration. OpenFlow is a protocol that allows you to modify its forwarding table. If you need to reconfigure a device, NETCONF is the way to go. If you want to implement new functionality (whatever it is) that is not easily configurable within the software your networking device is running, you better be able to modify the forwarding plane directly. There might be interesting things you could do through network device configuration with NETCONF (installing route maps with policy-based routing, access lists or static MPLS in/out label mappings, for example), but installing the same entries via OpenFlow would be way easier, simpler and (most importantly) device- and vendor-independent. For example, NETCONF has no standard mechanism you can use today to create and apply an ACL to an interface. You can create an ACL on a Cisco IOS/XR/NX-OS or a Junos switch or router with NETCONF, but the actual contents of the NETCONF message would be vendor-specific. To support devices made by multiple vendors, you’d have to implement vendor-specific functionality in your NETCONF controller. On the contrary, you could install the same forwarding entries (with the DROP action) through OpenFlow into any OpenFlow-enabled switch (the “only” question being whether these entries would be executed in hardware or on the central CPU).

© Copyright ipSpace.net 2014

Page 7-21

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

EPHEMERAL STATE NETCONF protocol modifies device configuration. Whatever you configure with NETCONF appears in the device configuration and can be saved from running configuration to permanent (or startup) one when you decide to save the changes. You might not want that to happen if all you want to do is apply a temporary ACL on an interface or create an MPLS-TP-like traffic engineering tunnel (computed externally, not signaled through RSVP). OpenFlow-created entries in the forwarding table are by definition temporary. They don’t appear in device configuration (and are probably fun to troubleshoot because they only appear in the forwarding table) and are lost on device reload or link loss.

CAN WE DO IT WITHOUT NETCONF? Given all of the above, can we implement SDN networks without NETCONF? Of course we can assuming we go down the OpenFlow-only route, but not many users or vendors considering OpenFlow are willing to do that (Google being one of the exceptions); most of them would like to retain the field-proven smarts of their networking devices and augment them with additional functionality configured through OpenFlow. In a real-life network, we will thus need both NETCONF to configure the existing software running in networking devices (hopefully through standardized messages in the not-too-distant future), and potentially OpenFlow to add new functionality where needed.

© Copyright ipSpace.net 2014

Page 7-22

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Not surprisingly, some vendors reacted to the SDN movement by launching their own proprietary APIs. Cisco’s OnePK is (in August 2014) by far the most comprehensive one.

CISCO ONE: MORE THAN JUST OPENFLOW/SDN As expected, Cisco launched its programmable networks strategy (Cisco Open Networking Environment – ONE) at Cisco Live US ... and as we all hoped, it was more than just OpenFlow support on Nexus 3000. It was also totally different from the usual we support OpenFlow on our gear me-too announcements we’ve seen in the last few months. One of the most important messages in the Cisco’s ONE launch is OpenFlow is just a small part of the big picture. That’s pretty obvious to anyone who tried to understand what OpenFlow is all about, and we’ve heard that before, but realistic statements like this tend to get lost in all the hype generated by OpenFlow zealots and industry press.

© Copyright ipSpace.net 2014

Page 7-23

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-6: Cisco OnePK high level overview

The second, even more important message is “let’s not reinvent the wheel.” Google might have the needs and resources to write their own OpenFlow controllers, northbound API, and custom applications on top of that API; the rest of us would just like to get our job done with minimum hassle. To help us get there, Cisco plans to add One Platform Kit (onePK) API to IOS, IOS-XR and NX-OS.

© Copyright ipSpace.net 2014

Page 7-24

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-7: Cisco OnePK APIs

WHY IS ONEPK IMPORTANT? You probably remember the “OpenFlow is like x86 instruction set” statement made by Kyle Forster in 2011. Now, imagine you’d like to write a small PERL script on top of x86 instruction set. You can’t do that, you’re missing a whole stack in the middle – the operating system, file system, user authentication and authorization, shell, CLI utilities, PERL interpreter ... you get the picture. OpenFlow has the same problem – it’s useless without a controller with a northbound API, and there’s no standard northbound API at the moment. If I want to modify packet filters on my wireless access point, or create a new traffic engineering tunnel, I have to start from scratch. That’s where onePK comes in – it gives you high-level APIs that allow you to inspect or modify the behavior of the production-grade software you already have in your network. You don’t have to deal

© Copyright ipSpace.net 2014

Page 7-25

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

with low-level details, you can (hopefully – we have to see the API first) focus on how getting your job done.

OPEN OR PROPRIETARY? No doubt the OpenFlow camp will be quick to claim onePK is proprietary. Of course it is, but so is almost every other SDK or API in this industry. If you decide to develop an iOS application, you cannot run it on Windows 7; if your orchestration software works with VMware’s API, you cannot use it to manage Hyper-V. The real difference between networking and most of the other parts of the IT is that in networking you have a choice. You can use onePK, in which case your application will only work with Cisco IOS and its cousins, or you could write your own application stack (or use a third party one) using OpenFlow to communicate with the networking gear. The choice is yours.

MORE DETAILS You can get more details about Cisco ONE on Cisco’s web site and its data center blog, and a number of bloggers published really good reviews: 

Derick Winkworth is underwear-throwing excited about Cisco ONE



Jason Edelman did an initial analysis of Cisco’s SDN material and is waiting to see the results of the Cisco ONE announcement.



Colin McNamara’s blog post is a bit more product focused.

© Copyright ipSpace.net 2014

Page 7-26

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Plexxi implemented an interesting controller-based architecture that combines smart autonomous switches with a central controller. The fabric can work without the controller, but behaves better when the controller is present.

THE PLEXXI CHALLENGE (OR: DON’T BLAME THE TOOLS) Plexxi has an incredibly creative data center fabric solution: they paired data center switching with CWDM optics, programmable ROADMs and controller-based traffic engineering to get something that looks almost like distributed switched version of FDDI (or Token Ring for the FCoTR fans). Not surprisingly, the tools we use to build traditional networks don’t work well with their architecture. In a recent blog post Marten Terpstra hinted at shortcomings of Shortest Path First (SPF) approach used by every single modern routing algorithm. Let’s take a closer look at why Plexxi’s engineers couldn’t use SPF.

ONE RING TO RULE THEM ALL The cornerstone of Plexxi ring is the optical mesh that’s automatically built between the switches. Each switch can control 24 lambdas in the CWDM ring (8 lambdas pass through the switch) and uses them to establish connectivity with (not so very) adjacent switches: 

Four lambdas (40 Gbps) are used to connect to the adjacent (east and west) switch;



Two lambdas (20 Gbps) are used to connect to four additional switches in both directions.

© Copyright ipSpace.net 2014

Page 7-27

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-8: The Plexxi optical network

© Copyright ipSpace.net 2014

Page 7-28

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

The CWDM lambdas established by Plexxi switches build a chordal ring. Here’s the topology you get in a 25-node network:

Figure 7-9: Topology of a 25 node Plexxi ring

© Copyright ipSpace.net 2014

Page 7-29

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

And here’s how a 10-node topology would look like:

Figure 7-10: Topology of a 10 node Plexxi ring

The beauty of Plexxi ring is the ease of horizontal expansion: assuming you got the wiring right, all you need to do to add a new ToR switch to the fabric is to disconnect a cable between two switches and insert a new switch between them as shown in the next diagram. You could do it in a live network if the network survives a short-term drop in fabric bandwidth while the CWDM ring is reconfigured.

© Copyright ipSpace.net 2014

Page 7-30

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

Figure 7-11: Adding a new Plexxi switch into an existing ring

FULL MESH SUCKS WITH SPF ROUTING Now imagine you’re running a shortest path routing protocol over a chordal ring topology. Smaller chordal rings look exactly like a full mesh, and we know that a full mesh is the worst possible fabric topology. You need non-SPF routing to get a reasonable bandwidth utilization and more than 20 (or 40) GBps of bandwidth between a pair of nodes.

© Copyright ipSpace.net 2014

Page 7-31

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

There are at least two well-known solutions to the non-SPF routing challenge: 

Central controllers (well known from SONET/SDH, Frame Relay and ATM days);



Distributed traffic engineering (thoroughly hated by anyone who had to operate a large MPLS TE network close to its maximum capacity).

Plexxi decided to use a central controller, not to provision the virtual circuits (like we did in ATM days) but to program the UCMP (Unequal Cost Multipath) forwarding entries in their switches. Does that mean that we should forget all we know about routing algorithms and SPF-based ECMP and rush into controller-based fabrics? Of course not. SPF and ECMP are just tools. They have wellknown characteristics and well understood use cases (for example, they work great in leaf-and-spine fabrics). In other words, don’t blame the hammer if you decided to buy screws instead of nails.

© Copyright ipSpace.net 2014

Page 7-32

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

In summer of 2012 IETF launched yet another working group to develop a protocol that could interact with routers on the control plane. I2RS (initially called IRS) might be exactly what a resilient SDN solution needs… assuming it ever gets off the ground.

I2RS – WHAT THE SDN GOLDILOCKS IS LOOKING FOR? Most current SDNish tools are too cumbersome for everyday use: OpenFlow is too granular (the controller interacts directly with the FIB or TCAM), and NETCONF is too coarse (it works on the device configuration level and thus cannot be used to implement anything the networking device can’t already do). In many cases, we’d like an external application to interact with the device’s routing table or routing protocols (similar to tracked static routes available in Cisco IOS, but without the configuration hassle). Interface to the Routing System (I2RS) is a new initiative that should provide just what we might need in those cases. To learn more about IRS, you might want to read the problem statement and framework drafts, view the slides presented at IETF84, or even join the irs-discuss mailing list. Even if you don’t want to know those details, but consider yourself a person interested in routing and routing protocols, do read two excellent e-mails written by Russ White: in the first one he explained how IRS might appear as yet another routing protocol and benefit from the existing routing-table-related infrastructure (including admin distance and route redistribution), in the second one he described several interesting use cases. Is I2RS the SDN porridge we're looking for? It’s way too early to tell (we need to see more than an initial attempt to define the problem and the framework), but the idea is definitely promising.

© Copyright ipSpace.net 2014

Page 7-33

This material is copyrighted and licensed for the sole use by Srdjan Milenkovic ([email protected] [109.121.110.253]). More information at http://www.ipSpace.net/Web

B0 - SDN and OpenFlow - The Harsh Reality

Short Description

Description

Comments

We need your help!