There’s been a fair amount of talk lately touting the benefits and supposed inevitability of converging all your data center I/O traffic onto Ethernet. But convergence for convergence’s sake does not add value and in fact might actually cause more problems than it solves, at least right now.
Before you make the leap toward a fully converged data center, I’d urge you to step back and consider: What’s driving the value proposition?
Some networking realities
There’s a certain appeal to the idea of convergence, with promises of simplifying your IT and saving the redundancies and extra costs of buying, maintaining and managing multiple networks, with the multiple silos of skill sets they entail.
But people are confused when they assume the same dynamics apply to data center convergence as were in play during the classic battles such as VHS vs. Betamax. There’s a really big difference. For starters, VHS and Betamax (or the more recent Blu-Ray vs. HD-DVD) solve the same problem, for the same customer, with the same use case and the same set of content suppliers (i.e., Hollywood studios). In these classic cases, convergence is driven by obvious supplier and customer benefits.
In contrast, Fibre Channel, InfiniBand and Ethernet solve three very different problems, with three different types of customers, three different use cases, three different sets of suppliers (with a notable exception: Cisco sells all three) and three different technology life cycles. In this case, the primary value proposition of convergence is in favor of vendors who want to grab business growth at the expense of customers.
Let’s step back a minute and think about the realities if Fibre Channel, InfiniBand and Ethernet traffic were converged over a single Ethernet network.
Most TCP applications are fundamentally different from SCSI or high-performance compute clustering applications. The topologies are different, the security models are different, the endpoint communication models are different, the underlying services have a different intent, and the notions of what’s static, dynamic and persistent are different.
Just as significant, storage demands and use cases are evolving as rapidly as TCP/IP. Convergence of voice has been aided by the fact that a VoIP call consumes very little bandwidth and will not grow over time, due to the limits of human perception. As IP network bandwidth continues to rise, voice requirements become easier and easier to accommodate. Assigning a high priority to VoIP packets is safe, because there are few of them and they won’t starve other network users.
Fibre Channel, however, is much more demanding and is a moving target, as well. It’s not clear how the network will behave if any high-bandwidth consumer is given higher priority than other high-bandwidth consumers.
Ethernet is simply not a proven multi-tenant architecture in this case. Note that iSCSI — not widely adopted yet, but gradually growing — offers an IP mechanism for adapting storage block traffic to Ethernet. The knock on iSCSI has been the fact that it is “different” from Fibre Channel, requires a different technology ecosystem and reduces performance. This provides a very interesting clue as to the differences between the TCP/IP/Ethernet ecosystem and the Fibre Channel requirements set.
There’s a reason that Fibre Channel and InfiniBand continue to survive, despite repeated predictions of their demise. It’s because they fulfill solution needs in ways that Ethernet does not. To look at it another way: If standard Ethernet were easily able to handle extreme low-latency clustering and serial SCSI as well as InfiniBand and Fibre Channel, it’s likely that InfiniBand and Fibre Channel would not be deployed today.
They haven’t disappeared, and Ethernet hasn’t taken over their roles. Let’s examine why not.
The myth of convergence cost savings
One of the arguments proposed for total data center convergence is that Ethernet is so cost-effective compared to Fibre Channel and InfiniBand. But what is overlooked in that argument is that Ethernet’s cost structure is directly related to the TCP/IP ecosystem. Ethernet is a low-cost approach because it’s simple.
A network that can occasionally drop a packet is significantly different from a network that cannot drop any. A network where endpoints implement a robust general-purpose communications state machine (like TCP/IP) will be easier to administer than a network that supports an endpoint state machine completely unlike the underlying network implementation, as is the case with the SCSI model for block I/O implemented on top of a network such as Fibre Channel. In fact, much of the complexity of managing Fibre Channel is a result of the need to present a SCSI model to the endpoints.
To accommodate a network sharing both Fibre Channel blocks and IP traffic, you must modify the existing Ethernet standard. Hop counts and distance will need to be limited for lossless traffic. On the supply side of the networking equation, the modified Ethernet requires different — and more expensive — equipment. And on the consumption side, customers will assume they get anything Ethernet offers, which will not be true.
The “new” Ethernet, with all its new constraints, will not be the Ethernet we are familiar with. And with this specialization it will likely not offer the same cost structure as the more ubiquitous Ethernet.
And what about the big problem that so many people cite regarding Fibre Channel: the problem of vendor interoperability? Ironically, one of the advantages touted for FCoE (Fibre Channel over Ethernet) as compared to iSCSI is that customers will be able to do “all the same Fibre Channel things the same Fibre Channel ways” — i.e., no change in devices or management model. Thus, not only will the existing Fibre Channel interoperability problems persist, but there will now be a whole new set of interoperability issues related to the new Ethernet layer below.
Why so many networks survive
InfiniBand is a perfect example of a network technology serving a durable niche market. This niche has staying power because InfiniBand does important things better than the mainstream is able to do.
Inter-server communication latency is a very important characteristic in some (but not all) high-performance clusters, and low latency can be easily compromised without an end-to-end solution. InfiniBand excels in minimizing latency and, not coincidentally, is frequently consumed as an end-to-end solution. If you try to “part out” a multi-server cluster solution, latency is one of the first items at risk. InfiniBand-based clusters are tried-and-true and the solution TCO has stabilized — and Ethernet suppliers can choose many better growth opportunities for their investments than low-latency clustering. In the meantime, InfiniBand and Fibre Channel suppliers are able to control their cost structures by reusing many of the same components as Ethernet.
As a result, viable InfiniBand suppliers survive to serve the niche. The same end-to-end solutions for high-performance compute clustering aren’t offered in the Ethernet ecosystem – not because no one has thought to do it, but because an Ethernet-based result will probably always be somewhat inferior to the comparable InfiniBand solution, and the ROI does not make sense (notwithstanding a long line of science projects funded by well-meaning but misinformed venture capitalists).
With Fibre Channel, the advantage is also specificity of solution: Fibre Channel traffic presents a fundamentally different network use case than IP traffic. To do Fibre Channel right, you need the network to do the heavy lifting of providing reliable communication (unlike with TCP), as well supporting the abstractions and flat virtual topology associated with the SCSI model. Modifying Ethernet to handle Fibre Channel means adding a layer to the Fibre Channel network, as well as (potentially) forcing the SAN team to rely on the network team in IT for management and troubleshooting (with requisite finger pointing to follow!). Storage integrity suffers at the same time Ethernet’s advantages are diminished. Again, what’s the point?
Convergence over InfiniBand
In the high-performance computing (HPC) community, in particular, some well-respected experts are predicting (or, perhaps, hoping) that InfiniBand, not Ethernet, will become the common data center networking infrastructure.
In certain HPC installations, that approach might make some sense. For organizations that have already invested in a specialized InfiniBand infrastructure, running Ethernet traffic over an existing InfiniBand network might help simplify their infrastructure.
However, for the rest of the world, converging all their network traffic over InfiniBand would represent a complete overhaul of their infrastructure, which they will obviously not do.
Is convergence really inevitable?
In the end, some aspects of data center networking convergence probably make sense for some organizations, while other aspects are not practical or don’t offer sufficient benefits to warrant the move — certainly not now, and maybe not ever. Also, new technologies always emerge that change the conversation entirely.
Certainly the value proposition of “consolidating server NICs and HBAs and cables” sounds very short-sighted, given the extra complexity that will be introduced to the core network in order to provide this consolidation. Let’s face it — vendors are eager to sell customers a new “super NIC” at a new (read: higher) price point. Likewise, vendors are eager to sell customers a new “super switch” also at a new (read: higher) price point. In this new world, if risk-averse customers choose to segregate FCoE traffic from the TCP/IP traffic (for reasons of security, QoS or troubleshooting), there will be just as many NICs as today, and at a higher price!
To eliminate a highly effective and specialized niche network such as InfiniBand or Fibre Channel just to achieve convergence over Ethernet – or to replace a cost-effective Ethernet infrastructure with an expensive InfiniBand or Fibre Channel infrastructure – you must weigh the real costs and benefits. In my experience, the benefit of collapsing these separately successful networks into a single infrastructure is too small — and the benefits of retaining the multiple networks is too great — to justify a vote for convergence.
Before you rush to jump on the convergence bandwagon, look closely at what you’re really achieving, and what you might be giving up. Despite what you might read and hear, data center network convergence is not inevitable.
Lin Nease, Director of Emerging Technologies for ProCurve Networking by HP, is responsible for ProCurve's data center strategy. |