Changing network architectures and cultures, part 1 - SDN
October 31, 2014
These are momentous days for networking. About five years ago, the networking industry seemed stuck in a boring routine of incremental bandwidth upgra...
Looking in the rearview mirror
First, before we get all futuristic, let’s get current. Some gradual changes were already happening in networking at the same time that immense changes were happening in datacenters at the edge of networks. A good place to start is at modern networking’s origin mythology, or at least my tongue in cheek version of it:
In the beginning was the Internet
Users saw best effort traffic delivery and said, “It is good enough”
Bandwidth became fruitful and multiplied
Soon the Internet was under the sea, teeming on the land, and flying in the air
Yet telecom desired a business-worthy companion and created MPLS from Internet
Then big businesses asked, “Must traffic truly only be best-effort?”
So engineers saw that traffic engineering was attractive to behold and partook of it
Humor aside, the Internet was the last truly big thing to happen in networking, and most everything since then has been a build out and growing up of that concept – ubiquitous, high-speed, best-effort traffic delivery. The Internet has become a utility as relied upon and as necessary as electricity. Perhaps trillions of dollars have gone into the development and deployment of this worldwide technology, architecture, and infrastructure. Sure, the pipes have gotten fatter, traffic moves faster, and telecoms have made some specialized versions of the Internet for business customers based on MPLS and RSVP-TE traffic engineering extensions to help deliver VPNs and crude levels of traffic differentiation. But in essence, today’s networks are all derived from the Internet. Traffic is primarily switched, prioritized, and routed end-to-end on a packet-by-packet basis. The service concept is primarily utility-like, built for simplicity and ease of use. When’s the last time you really had to struggle to get connection to a network for technical reasons? It’s not perfect, but as the faux-origin myth above expresses, it is definitely good enough.
The rise of a utility-grade, global internetwork has enabled some truly astounding things to happen at the edge of the network, and that is what has led to the software-defined networking (SDN) transition that we are about hurtle into. All the tech running in datacenters and in users’ hands that communicates across networks has gone through a profound revolution. Server and desktop virtualization, cloud computing, infrastructure as a service (IaaS), software as a service (SaaS), the smart device, and mobile app ecosystem are evidence of this revolution. The cultural impact on users is known as IT consumerization – the expectation that all information technology should be fast, easy, and self-served. All of this happened because the Internet serves as the communications platform and enabler.
So if things were so good, why are we facing this transition to SDN, network functions virtualization (NFV) and the Internet of Things (IoT)? The answers vary a bit, but they all come together in the end.
One of the beauties of the Internet is that it is utility-like in terms of simplicity and reliability. However, Internet-style networks also suffer from this simplicity because the control plane of the network has very few external controls available and they are essentially inaccessible to the endpoints or client devices that utilize the network. Since the Internet was designed for reliability, both control and data planes are located on each switch (or IP router) that operates as part of the network. The control plane is essentially a distributed computing architecture, where all the participating routers share their local connectivity state information (such as which of their links are up or down, and which IP address prefixes they are advertising), and then all the routers take that same state information and algorithmically calculate in parallel the optimal path from any point A to any point B. I’m simplifying here, of course, but the point is that the reason the control plane is implemented this way is for resilience. If a link goes down, all the other routers learn about it and can calculate the best way to route traffic based on that news.
There are some problems though:
• There are very few changes you can make in the way these algorithmic calculations are performed – and the changes that are possible have to be made on the individual routers themselves
• If you want to make changes in network traffic, you usually have to go to multiple router nodes and make configuration changes on those routers to influence how routing will work in the network
In practice, this is rarely done because it’s difficult to figure out how an individual configuration change will affect network-wide behavior. It’s very easy to make a mistake and then really hard to trace down the root cause of the issue afterwards. This is a recipe for chaos, which means that outside of a very few knobs that are turned with extreme care, IP routing is usually left to function in its default operational mode.
Hereís another conundrum. A critical piece of the “control plane” – the controls over how the data plane traffic itself is prioritized – is not even dealt with by the routing protocols. This is all configured individually on the ports of all the routers in a network. This means, in practice, that you can’t have much complexity. You choose a few classes of service to define the ladder of traffic priority then configure the exact same values on all the routers.
The most advanced networks in the world today can do something a bit better than this by using MPLS and the RSVP-TE protocol (which requires configuration of all the routers at the edge of the network) to reserve certain amounts of bandwidth through a network for a select number of MPLS tunnels. This is typically limited to managing high-volume traffic trunking across large IP/MPLS networks. Traffic engineering extensions to IP routing protocols (OSPF-TE and ISIS-TE) can monitor routers for bandwidth availability, and RSVP-TE can be configured on those edge routers to adjust traffic engineering policies based on information gathered by the protocol perhaps every 30 to 60 minutes. A small minority of IP networks in the world have a sophisticated enough engineering staff to even use RSVP-TE.
You might ask, “What about network management tools? Don’t they help?” Of course there are network management systems that can manage the configurations on all the routers. There are separate systems that will do one-time calculations to optimize advanced routing configurations such as MPLS and RSVP-TE traffic engineering. There are even technologies such as route analytics that passively participate in the routing control plane and in parallel calculate all the network paths in a routed network. Nonetheless, all these tools are hard to integrate, and furthermore, they exist only in the realm of network monitoring and troubleshooting. Furthermore, it takes highly trained personnel to even understand what all this data means in most cases.
Now, you may be thinking that this is interesting as a matter of network management and troubleshooting, but let’s extrapolate. What if you want, for business reasons, to have different applications receive different network behavior policies? Rather than default to the response rooted in a religious affiliation with how the Internet works today – which is, “Why would you ever want to do that?” – let’s assume that there are very legitimate reasons that a business user, for example, might want policies that enforce real IT priorities based on the value or criticality of the application. When push comes to shove, certain applications must win and receive different treatment than others. For example, in an electrical utility, the traffic that handles the control of the grid is not only business critical, but can impact people’s lives when disrupted. On top of everything everything else, that traffic is subject to regulatory oversight, so it would also consistently need to be treated with higher priority and afforded optimized paths compared to most other applications in that utility’s network. Well, the issue with traditional IP/MPLS networks is that, due to the reasons mentioned above, there is really no practical way to make this happen. The control plane that manages exactly how traffic gets through the network is inaccessible, the traffic prioritization has to be configured on individual routers, and network management tools are barely able to give enough information for experts to use.
This is where SDN comes in. Turns out the answer to being able to manage the “control plane” of paths, traffic prioritization, bandwidth allocation, etc., is to no longer have the control plane be distributed like it always has. Sacrilege! Yet, it just makes sense. You need to be able to have the network provide a unified point of control for applications to be able to access the control plane and request services. There are multiple ways that SDN can be delivered, but in all cases the key notion is that the control plane is centralized to a controller or set of federated controllers that manage control plane functions of the routers through potentially multiple southbound protocols or communication methods, and also provide a unified northbound API that affords applications programmatic access to control plane functionality.
There are two main ways of implementing SDN. The first is what could be called “overlay,” where overlay traffic tunnels are instantiated from edge to edge over the underlying routed or switched network as requested by applications. The other way is what could be called “underlay,” where the actual routing/switching behavior is controlled by centralized controllers. OpenFlow is the de facto standard today for this type of SDN.
For large-scale network operators, SDN offers a way to construct service offerings that are easier and faster to deploy because the centralized controllers and APIs allow self-service and even third-party applications to directly request services from the network. This application enablement is crucial for service providers to compete in a market where agile, web-based companies can quickly spin up applications and services that ride over the top (OTT) of the service provider’s simple utility network to capture revenues and profits.
Where is SDN at today? It’s still in its early stages, but it is growing and there are real-world problems being solved. A couple of years ago it would have been possible to critique software-defined networks by saying that only hyperscale companies like Google were doing anything with SDN. However, now Gartner reports that there are between 500 and 1,000 live SDN deployments in production networks. It’s still early, but SDN seems to be catching on.
SDN is part of the transformation occurring in the nature of network behavior, service architecture, and scale of network communication. While these changes are significant, a cultural change to agile, DevOps, and continuous notions is equally as important. Engineers working in the networking space need to be aware of these trends and ensure their personal and career paths correspond with these tectonic shifts.
Part 2 will investigate how NFV and the IoT are having a similar impact on the world of networking.