ML Efficiency

Modern machine learning (ML) systems are typically optimized for data center GPUs, tailored to large-scale, cloud-centric workloads. This leaves a performance and accessibility gap on widely used but under-optimized hardware - CPUs, consumer GPUs, and edge devices.

ML Efficiency research focuses on filling this gap through hardware-aware optimization techniques, pushing the limits of commodity hardware.

Key research areas include:

Quantization and low-bit precision: utilize accelerated low-bit compute to speed up inference and training, as well as reducing memory requirements.
Custom kernels: hand-crafted compute kernels for popular model architectures and quantization schemes.
System optimization: training and inference pipelines for cost-effective setups (single-GPU or multi-GPU systems without high-speed interconnect).

ML Efficiency

Products

Models

Infra

Community