Why Bitfusion

Making AI Development Simple

Deep learning and other AI development techniques are disrupting every industry in an incredible way — self-driving cars, drones, virtual assistants, more accurate medical diagnosis, automatic lead generation, better customer service, cybersecurity, and much more.

However, the current approach to create and deploy these applications is slow, hard or unenjoyable. Data scientists must:

  • Install and constantly upgrade deep learning software modules and hardware drivers
  • Become familiar with how to deploy GPUs and how to manage their quirks
  • Build complex data pipelines and workflow processes
  • Train their models in a single-threaded way, forcing weeks between iterations
  • Tune and optimize models with low visibility and minimal automation

Even teams of data scientists working together will take several months or even years to develop deep learning and AI applications for production.

The time scale and process for building AI is more like developing hardware rather than agile software.

On top of these development challenges…

Data scientists, engineering and devops teams must:

  • Orchestrate fleets of servers and multiple data storage layers
  • Leverage outdated job schedulers to collaborate and share resources
  • Hijack a package management or container management solution for workloads and hardware it wasn’t designed for

Meanwhile, IT and management are trying to control costs, increase utilization, and maintain high service levels with hardware that is unique, inconsistent, and spends long periods of time idle.

A Better Way

Bitfusion took our groundbreaking GPU virtualization tech, combined it with our pre-built machine images, and layered in a great user experience with loads of intelligent automation to create the industry’s first end-to-end deep learning and AI development platform.

From Big Data and Big Compute to AI

 

Deep learning didn’t happen overnight — but three huge levers allowed it to proliferate quickly.

Big data is growing massively and you can’t just store it — it has to be queryable and analyzable.

Yahoo, Facebook, Google, and others pioneered technologies like Hadoop and later Spark to ensure that massive data could be collected while still being “online” for data scientists. Now this tech is being adopted industry-wide.

Raw compute power has grown exponentially.

Because deep learning’s matrix multiplication can be run across multiple cores, GPUs (graphical processing units) which are normally reserved for gaming or oil and gas simulations, end up being an ideal platform for training AI models.

Data center infrastructure has become a cheap and plentiful commodity.

In the past, big data and big compute meant investing in hugely expensive, warehouse-scale supercomputers, or physical data centers. Today, chaining together several supercomputers can happen in minutes and be paid for by the hour, minute, or the function call.

At Bitfusion, we enable you to develop, train, and deploy deep learning and AI applications to crawl massive datasets and leverage big scale-up and scale-out compute, quickly and cost-effectively.

GPUs and The Heterogeneous Data Center

Application performance demands are outpacing Moore’s Law — to combat that, modern data centers are now comprised of heterogeneous hardware systems beyond CPUs. Each of these systems are tailored for specific kinds of workloads and different levels of flexibility:

  • GPUs for graphics, simulations, and AI training
  • FPGAs (field programmable gate arrays) for AI inference and flexible software-like hardware programming
  • TPUs (tensor processing units) designed specifically for deep learning’s matrix multiplication operations
  • And many others in development

A fundamentally different approach to computing is required to tackle the time and cost challenge, meet performance demands, and take advantage of these diverse hardware-based innovations to maximize R&D investments.

Bitfusion champions this approach through our innovative software so that developers and data scientists can:

  • Optimize their deep learning and AI applications
  • Take advantage of diverse hardware, and
  • Make it far easier for their supporting IT organizations to manage heterogeneous hardware environments simply and efficiently.