Announcing the Beta Launch of WoolyAI: The Era of Unbound GPU Execution

Today, we’re thrilled to announce the beta launch of WoolyAI Acceleration Service, a revolutionary GPU Cloud service built on WoolyStack, our cutting-edge CUDA abstraction layer.

Reimagining GPU Resource Utilization

Current GPU resource consumption and management in machine learning scope is constrained and highly inefficient. It’s constrained because of the dominance of CUDA(Nvidia) in the ML software ecosystem and the consumption is inefficient because organizations have to choose between cost-efficiency, resource utilization, SLA goals and control when consuming GPUs from cloud service providers and/or setting up their own managed GPU clusters.

We have built a Wooly Abstraction Layer that decouples Kernel Shader executions from applications that use CUDA into a Wooly Abstraction layer. We are launching this in the first phase for Pytorch Applications. In this abstraction layer, we compile these applications to a new binary and Shaders are compiled into a Wooly Instruction Set. At runtime, Kernel Shader launch events initiate transfer of Shader over the network from a CPU host to a GPU host where they are recompiled and their execution is managed to achieve max GPU resource utilization, isolation between workloads and cross compatibility with hardware vendors before being converted to be passed on to the respective GPU hardware runtime and drivers. The Wooly Abstraction layer sits at the intersection of application software and hardware and optimizes GPU performance like an operating system for GPUs.

Here’s how it works:

  • Decoupled Execution: Shaders are compiled into the Wooly Instruction Set, allowing for cross-vendor GPU compatibility.
  • Dynamic GPU Allocation: At runtime, kernel shaders(In Wooly Instruction Set) are transferred over the network from CPU hosts to GPU hosts, where execution is dynamically managed to ensure maximum GPU resource utilization.
  • Multi-Tenant Efficiency: Instead of reserving GPUs in fixed partitions, WoolyAI flexibly assigns GPU memory and processing cycles to workloads based on predefined SLAs.
  • Actual Resource consumption metrics: Our service tracks actual GPU core processing and memory resources consumed during shader execution ensuring cost-efficient execution.

Introducing WoolyAI Acceleration Service

Built on WoolyStack, our GPU Cloud service allows users to run PyTorch applications seamlessly without modifying their existing workflows. Unlike traditional cloud GPU solutions that either lock users into expensive, underutilized instances or require disruptive workflow changes, WoolyAI Acceleration Service enables:

  • Data Scientists work with their Pytorch applications inside CPU backed container environments while Shaders execute on GPU through the WoolyAI Acceleration Service.
  • GPU Usage Billing is based on actual GPU core processing and memory resources consumed during execution and Not Time Used.

Join the Beta Today

This is just the beginning. While we currently support PyTorch applications, we are actively expanding our capabilities to include other CUDA-based applications.

Be among the first to experience the future of Unbound GPU Execution with WoolyAI Acceleration Service.

Sign up for the beta now!