Our NetOps Journey: From Vetting White Box Options to Taking Development In-House
March 05, 2019
We set out to create an Operations Manager appliance that could utilize a vendor-neutral open architecture, provide automation of NetOps workflows to securely provision remote IT infrastructure.
My work within our organization for the past 15 years has consistently included squeezing as much efficiency as possible out of practically-resourced hardware. So, when the product team advised our engineers to pursue production of a new x86 64-bit Operations Manager appliance featuring gigabytes of memory, SATA SSDs and so on…well, frankly I got pretty excited.
I assumed it would be a piece of cake: we would leverage an ODM with a cost-effective x86 micro/mini server that we could use as a white box solution and be on our way. The reality, it turned out, was a lot more complicated. If you ever find yourself in the position of building out your own network hardware, you might find the story of our journey to be a helpful one.
We set out to create an Operations Manager appliance that could utilize a vendor-neutral open architecture, provide automation of NetOps workflows to securely provision remote IT infrastructure, and offer a generic server-centric platform supporting container-based services and solutions.
We had our own ideas for the scope of the new product, but also wanted our customers to have their say and share their wish-lists for the product. These enterprises expected the product to support all the capabilities of our existing remote infrastructure management appliances, as well as new services. Practically, this meant the white box serving as our product’s foundation would need to comfortably accommodate a generous range of serial, USB, and Ethernet ports, as well as various combinations of embedded 4G-LTE (and do it all in a 1RU form factor).
“Open platform” organizations and top-tier ODMs
Our first candidates for sourcing the server box came in the form of white box reference designs from open platform orgs such as Open Compute, Open Switch, Open Networking Forum and others. This quickly got us in the weeds trying to make sense of various standards and often narrow use cases.
We soon understood that these boxes are intended for the SDN/hyperscale data center market, and, for the most part, were produced by top-tier ODMs and purchased by the largest technology companies (Amazon, Apple, Google, etc.), telcos, and service providers. However, we found that those “open” chassis reference designs were too tightly coupled to their specific use cases, and too inflexible to fulfill our diverse connectivity needs. Unfortunately, shoehorning our product into these designs might have twisted our dream of a thoroughbred white box into an unwieldy Frankenstein nightmare. The search continued.
Next, we investigated the possibilities of using computer-on-module boards (COMs) in place of complete systems. COMs are standardized x86-based nano-motherboards, which expose myriad expansion buses in order to allow extensive customizations and produce appliances such as what we had in mind. The COM boards and heatsinks are then assembled onto a custom-designed carrier board, with all the ports, connectivity, storage, and power chips our product would require.
COMs provided a successful proof of concept, but issues stemming from an inability to avoid using a licensed BIOS – coupled with high costs and reliability issues given the complexity of production – ultimately steered us away from this option.
Keeping the design in-house
Having been turned away from available white box options across a number of fronts, our engineering team resolved to take matters into our own hands and design an efficient, white-box-esque, standard open x86 platform appliance in-house. At the same time, a concerted effort from our hardware, firmware, and software engineers ensured that we would leverage the most available and affordable merchant silicon whenever possible. Furthermore, we resolved to make all decisions with efficiency, ease of integration, code re-use, compatibility, and on-going maintenance top of mind.
The month we devoted to vetting and selecting product components felt like a lifetime, but by the end we had an initial system-wide block diagram from which to begin designing the hardware in detail.
As we proceeded through this process, we were encouraged by how highly-integrated, low-power, and efficient 64-bit x86 systems on chip had become over the past few years, notwithstanding the various supporting parts needed to keep them running. We found that chips targeted at network appliances were especially efficient, making our power and environmental goals achievable and our product’s peripheral connectivity quite logical. From here, it was just a matter of investing a few months of arduous work into creating motherboard and expansion board circuit diagrams, PCB layouts, bills of materials, enclosure drawings, and detailed assembly instructions – the kind of work you really love if you’re an engineer who’s always wanted to play with powerful toys.
Adventures in prototyping
Now it was time to order the parts, blank boards and enclosures we needed, and thus we embarked on our prototyping adventure. Our teams were sure to equip themselves for this leg of the journey with all sorts of support tools, prepped to start on the boot ROM/BIOS code (coreboot for x86). Inevitably, the prototypes arrived and almost nothing worked the first time – but we came armed to troubleshoot hardware, firmware, and software issues with a toolbox full of signal scopes, analyzers, on-chip debug adapters, ROM emulators, BIOS code decoders and more.
Guided by insights from these tools, we made steady progress. A few board trace cuts, jumpers and part rearrangements, iterative code changes and inclusions, and we made our way to earn those special moments of victory: observing ROM console output, deploying a diagnostics payload to the motherboard, and finally getting a tiny Linux image operational.
From there, we stress-tested and fully verified additional hardware components, from M2 memory to the SATA-SSD, sensors, Ethernets, USBs, serial ports, and the Trusted Platform Module (TPM) chip. We enhanced our bootloaders, and developed various Linux images according to the Yocto open source build system and board support package for the chosen system on chip. Finally, we created a dedicated Linux image for production testing on mass-produced units, incorporating diagnostics to stress test all components. The image also enabled the provisioning of network ports, and stored the operational Linux image and applications onto the SSD for customer use.
And then, out of the fog, our OM2200 Operations Manager was born, through a journey that was at times as long and painful and labor intensive as any birthing process – but with a result that, at least in this engineer’s eyes, is just as beautiful.
Our NetOps “in-house optimized white-box” platform journey is becoming a road well-travelled as further x86 platforms (AMD and Intel) are now under development. Despite the computing world being dominated by large motherboard/white box ODMs with deep pockets and strong supply chains, there is still room for innovation and optimization. New solutions will be aimed at servicing the burgeoning edge network and SD-branch markets.
Tony Merenda is a co-founder and works in Hardware Engineering at Opengear, a company that builds remote infrastructure management solutions for enterprises. Over his nearly 40-year careeer, Tony has concentrated on designing communciations (WAN, X.25, etc.), networking (LAN/WAN) and security (VPN/FW) hardware appliances based on various flavours of Unix, Linux and BSD.