Over the last 10 or so years, application performance demands have increasingly been outpacing Moore’s law in a variety of fields, particularly deep learning and AI. The solution has been to adopt heterogeneous accelerated processors, such as GPUs, FPGAs and various specialized ASICs. With the implementation of these alternative compute architectures, hardware has inevitably become more complex, and software more abstract, to keep up with the shifting landscape.
Virtualization in the Datacenter
The x86 platform, a set of backward-compatible instruction set architectures, has dominated the datacenter for the better part of the last two decades. In the late 1990s, the virtualization of x86 took place to increase its efficiency and let multiple operating systems simultaneously use its processor resources. Then, having different levels of storage systems within the datacenter created the need for storage to be virtualized. Continuing this domino effect, networking then had to be virtualized with software-defined networking.
The Virtualized Heterogeneous Datacenter
Similar to the way virtualization advancements surrounding x86 made state-of-the-art technologies available to the broad market, modern solutions will be needed to make at-scale heterogeneous compute widely accessible.
Just a few years ago, heterogeneous processors were were not yet widely adopted. Within a short time, they have become exceedingly popular, with many enterprises taking advantage of their benefits and flexibility. Even the public cloud now offers GPUs, FPGAs and specialized ASICs for on-demand consumption.
Despite the many benefits of using a heterogeneous architecture, onramping resource management of GPUs and other co-processors is complex. Modern datacenters will require a virtualization layer in order to homogenize the heterogeneous datacenter, allowing organizations to manage their use of co-processors with efficiency and ease.
Deep Learning and AI as the Beachhead Use Case for the Heterogeneous Co-processor-Based Datacenter
A heterogeneous datacenter usually has more CPU systems available to run applications with specialized hardware sparsely available. As such onramp to specialized hardware like GPUs and other co-processors are complex and manual, resource management of heterogeneous systems is not as simple as CPU-based systems and sharing of GPU systems across users is also straightforward. This essentially makes the management of a datacenter or cloud with co-processors non-trivial and costly with a lot of custom solutions usually needed.
Development of AI and deep learning applications requires truly elastic compute requirements, from experimental testing to full-throttle model training and parameter sweeping. This calls for the need for a more sophisticated co-processor virtualization, batch scheduling, and resource orchestration approaches, which assume a static allocation of GPUs for each user throughout their interaction.
Additionally, the DevOps needed to set up and manage deep learning environments is a big problem. This is no different from the problems we have observed in any major technological shifts. When the web was first introduced, DevOps was a huge problem — and now, as AI is introduced into application development along with new types of co-processors beyond just x86, the same DevOps problem has surfaced.
Virtualization brings in the ability to essentially homogenize a heterogeneous data center, making it a problem that’s well understood so that application lifecycle management becomes easy. In addition, virtualization enables one to create composable software-defined compute infrastructures, which are much more cost effective. With the growth in Deep learning and AI over the past few years, it is just the perfect beachhead use case for heterogeneous datacenter.
The Future and Beyond
True AI is closer than we think. Neural network architecture is getting more complex and sophisticated as it tries to mimic human speech and intelligence and we’re seeing deep learning being applied all around us — everywhere from retail and finance, to manufacturing and medicine, including drug discovery, and personalized medicine. VC dollars are pouring into the industry, startups are emerging left and right, and more industries are adopting deep learning as a competitive advantage. This explosive growth of deep learning will lead to more cloud services providers offering co-processors like GPUs, FPGAs and specialized ASICs like TPUs. As these co-processors become widely available, there will more verticals and use cases that will emerge that we have not even thought of yet.
As the modern heterogeneous data center continues to change, it’s imperative that it is able to keep up with increasingly application demands while remaining efficient and manageable. To meet these subsequent challenges, the datacenter of the future demands the following: A flexible, virtualized ‘backbone’ to connect GPUs, FPGAs and other ASICs, and a management pane to efficiently manage such a datacenter.
At Bitfusion, we are building such a management pane around AI as a beachhead, and with a grander vision to become the operating system for the heterogeneous datacenter. Try our infrastructure management solution Flex on AWS and leverage CPUs, GPUs or FPGAs for your workloads — https://www.bitfusion.io/aws.