ML Efficiency
Modern machine learning (ML) systems are typically optimized for data center GPUs, tailored to large-scale, cloud-centric workloads. This leaves a performance and accessibility gap on widely used but under-optimized hardware - CPUs, consumer GPUs, and edge devices.
ML Efficiency research focuses on filling this gap through hardware-aware optimization techniques, pushing the limits of commodity hardware.
Key research areas include:
- Quantization and low-bit precision: utilize accelerated low-bit compute to speed up inference and training, as well as reducing memory requirements.
- Custom kernels: hand-crafted compute kernels for popular model architectures and quantization schemes.
- System optimization: training and inference pipelines for cost-effective setups (single-GPU or multi-GPU systems without high-speed interconnect).
Last updated on