Changing network architectures and cultures, part 2 - NFV and the IoT
November 21, 2014
In the past five years, networking has experienced several industry disruptions, new paradigms (and not just new protocols), new architectures. This a...
The rise of a utility-grade, global internetwork has enabled some truly amazing innovations at the edge of the network. All the tech running in datacenters and in users’ hands that communicates across networks has gone through a profound revolution. Server and desktop virtualization, cloud computing, infrastructure as a service (IaaS), software as a service (SaaS), the smart device, and mobile app ecosystem are evidence of this revolution. The cultural impact on users is known as IT consumerization – the expectation that all information technology should be fast, easy, and self-served. All of this happened because the Internet serves as the communications platform and enabler.
Network Functions Virtualization
Network functions virtualization (NFV) is, in a sense, really simple, and you can reduce it to these three steps:
1. Take any function that performs or is part of a network service (such as a firewall, load balancer, traffic optimizer, or LTE mobile control plane gateway)
2. Remove that function from dedicated hardware appliances that typically get severely underutilized and are very difficult to manage in combination with other appliances because of the time and physical space required to rack and cable appliances together
3. Put that function onto a virtual machine (VM) whose hardware resources can be elastically managed via software, and which can be configured fairly easily in series – or service chain – via software
The benefits of NFV boil down to a few things:
• You can rapidly, flexibly create service chains without going through a hardware management process, which means that you can go from concept or customer request to implementation in a much more competitive timeframe.
• You can manage hardware capacity flexibly, which is particularly important when considering that traffic may vary quite a bit based on time of day, day of week, and special events.
• You can move many of these functions onto commoditized datacenter hardware and away from many separate, specialized appliances, which reduces capital expenditures (CAPEX) as well as operating expenditures (OPEX) as you consolidate training, upgrades, and management systems, and gain greater efficiencies from high-scale datacenter equipment versus individual appliances.
One of the technical implications of NFV comes from service chaining. If you’ve gained the speed and efficiency of virtual network functions (VNFs), you certainly don’t want to be doing slow, manual network configurations to connect these things together. Again, SDN comes into play here. By being able to efficiently manage network paths and QoS between VNFs in a service chain via software, you can achieve overall time to market and ease of deployment improvements that are a quantum leap over the traditional way of doing things.
The culture change accompanying the NFV network architecture shift
Due to the relative opacity and inaccessibility of the inner workings of networks to the outside world (including both network management and client applications), network engineering has remained one of the least automated domains of IT and telecom infrastructure. The ongoing assumption is that design, assembly, and testing of networks is a slow, engineer-driven process because it takes so much knowledge, experience, and intuition to understand how things work. As a result, networks tend to have change and certification cycles that follow the waterfall model. A prime example is the fact that it takes on the order of six-to-nine months to certify new network upgrades and changes in service provider settings.
Here’s the problem: the whole idea behind SDN and NFV is predicated on agility – speed in conceiving and deploying VNF service chains; speed in creating applications that can simply talk to a unified network control plane API through centralized controllers and request specific services from the network. That’s all well and good, but if it takes you eons to certify network changes this becomes a real bottleneck. Imagine a new SDN-enabled application is rolled out and/or upgraded – you’ve got to certify that it will work against all the underlying hardware in the network. After all, just because the northbound API works doesn’t mean that everything in the southbound direction will come up roses. If you’re trying to have agile application development cycles push out new functionality rapidly, things are going to get pretty tangled in multi-month certification cycles.
That’s the top-down application view, but it also applies to bottom-up. Large networks are continuously changing OS images, getting hardware upgrades, etc. How do you certify that the application layer will work with the new network layer stuff?
These issues aren’t just about SDN; let’s think about just the certification challenges of NFV. We know that with NFV you can create any service chain quickly and with enough flexibility to respond to customer opportunities and requests, evolve service offerings, etc. The throttling mechanism on this before was hardware reconfiguration, but with that out of the way, there’s no end to the creativity and agility you can exercise. Oops, that does sound like a QA nightmare doesn’t it? How big will the service chain QA matrix grow? Well, infinity would be hyperbolic, but let’s just say it’d be very large and constantly growing. How do you address this continuous barrage of new service chains to certify?
The answer is that internal network operations must go through a culture change – away from the almighty network engineering hero who with brains, experience, and superior intuition that rules processes, and towards a collaborative, agile, and automated process. AT&T, in its Domain 2.0 white paper, which describes the company’s vision for essentially turning a lumbering telecom giant into a nimble software/digital enterprise, says it this way:
“There remains much to do before this vision [Domain 2.0] can be implemented, including pivots from networking craft to software engineering, and from carrier operations models to cloud “DevOps” models. We also see an important pivot to embrace agile development in preference to existing waterfall models.”
Having spoken to network engineers within AT&T, I can say that there is a palpable feeling that you’re either going on this journey or you’re not going to be on the train in a certain amount of time. In the words of one engineer: “Either you change, or you don’t have a future career with AT&T.”
How does this culture change happen? Obviously, top-down management mandates have a critical impact, but practically speaking for organizations that have a lot of networking infrastructure in place (especially with years and perhaps decades of accumulated assets), one of the primary thrusts needs to be aiming for agile/continuous cycles based on an ever-increasing automation of the infrastructure. Infrastructure automation in particular cannot be underestimated.
Let’s look at the test cycle some more because it’s a great example. With the heroic (and gruesomely time-consuming) cycles it can take to manually set up network testbed environments, engineers are understandably reluctant to give up that gear once they’ve got it just right. The manual, engineer-centric way of doing things means (even if millions of dollars of equipment sits idle) shedding thousands of CAPEX write-down dollars per hour and consuming costly space, power, and non-stop cooling because it is so rare to find the skilled engineers who can do the work. However, if you can automate the entire test infrastructure so that it functions like a cloud, then QA engineers are focused on where their knowledge really counts – in designing and automating increasingly broad and deep tests to ensure that applications, service chains, and infrastructure all play nicely together. If you think about things in this manner, then there are some important take-aways. First off, there needs to be a realistic assessment of:
• The current state and likely evolutionary process of the network infrastructure – It does no good to pretend that there aren’t millions or billions of dollars of non-SDN, full-metal networking devices in place, and that they aren’t going away anytime soon. It also isn’t wise to think of the DevOps culture as something that will only work over the newest infrastructure. The reality of the hardware is that it has to be part of the culture change.
• The skills mix and attitudes of the engineering team – it is very tempting to try to hire software engineers from a web company who have a DevOps background to come in and “transform” everything. If collaboration is key, then setting up personnel silos isn’t helpful either. Culture change is not the same thing as a tiger team. That said, obviously there will also need to be leaders, and it’s also important to assess who in the organization is open to change and who is determined to dig in their heels and resist. Beyond attitude, recognizing that a current team composed 98 percent of vendor-trained domain experts who may have some scripting skills but who aren’t professional software programmers means that you have to look for ways of doing automation and process that can leverage their skills without losing their expertise and contributions.
Network infrastructure: Automate everything
A relentless commitment to automating all the infrastructure in a way that is cognizant of the above realities will yield tons of productive results. It might be that the first stage is to move from manually operated design and QA labs to cloud-like, automated infrastructure, even though the way that design and testing is done is still pretty manual. This might seem like a no-brainer, but it does require culture change as people are invited to adopt new ways of working and thinking, knowing that their deliverables remain under the same deadlines that have been so hard to meet in the past. The experience of many network manufacturers and telecoms shows that just this step alone can dramatically cut down test cycles from months to days in some cases.
Success inspires folks to come on board. Top-down commitment to continuing an iterative, collaborative process of improvement will capitalize on these early gains and help organizations move down the road to dynamic sandbox-based network service and topology design, network test automation, data-driven network testing, continuous integration, and continuous deployment (where relevant).