Vision
At Menlo, we’re pushing the boundaries of computer vision by developing innovative approaches that bridge the gap between 2D and 3D understanding. Our research focuses on creating models that can interpret and reason about visual information in ways that are more aligned with human perception.
Research Projects
VoxRep (Apr 2025)
🎯 VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models
VoxRep is a novel framework that enables standard Vision-Language Models to interpret structured 3D voxel data, facilitating richer spatial comprehension vital for advanced AI systems. By processing 3D voxel space through systematic 2D slices, VoxRep demonstrates how general-purpose 2D models can effectively learn 3D representations, opening new possibilities for robotics, autonomous navigation, and virtual reality applications.
Links:
- Paper Link: https://arxiv.org/abs/2503.21214
- Model: https://huggingface.co/Menlo/voxel-representation-gemma3-4b
- Github: https://github.com/menloresearch/voxel-representation