In a new AI research, a team of MIT and Harvard University researchers has introduced a groundbreaking framework called “Follow Anything” (FAn). The system addresses the limitations of current object-following robotic systems and presents an innovative solution for real-time, open-set object tracking and following.
The primary shortcomings of existing robotic object-following systems are a constrained ability to accommodate new objects due to a fixed set of recognized categories and a lack of user-friendliness in specifying target objects. The new FAn system tackles these issues by presenting an open-set approach that can seamlessly detect, segment, track, and follow a wide range of things while adapting to novel objects through text, images, or click queries.
The core features of the proposed FAn system can be summarized as follows:
Open-Set Multimodal Approach: FAn introduces a novel methodology that facilitates real-time detection, segmentation, tracking, and following of any object within a given environment, regardless of its category.
Unified Deployment: The system is designed for easy deployment on robotic platforms, focusing on micro aerial vehicles, enabling efficient integration into practical applications.
Robustness: The system incorporates re-detection mechanisms to handle scenarios where tracked objects are occluded or temporarily lost during the tracking process.
The fundamental objective of the fan system is to empower robotic systems equipped with onboard cameras to identify and track objects of interest. This involves ensuring the object remains within the camera’s field of view as the robot moves.
FAn leverages state-of-the-art Vision Transformer (ViT) models to achieve this objective. These models are optimized for real-time processing and merged into a cohesive system. The researchers exploit the strengths of various models, such as the Segment Anything Model (SAM) for segmentation, DINO and CLIP for learning visual concepts from natural language, and a lightweight detection and semantic segmentation scheme. Additionally, real-time tracking is facilitated using the (Seg)AOT and SiamMask models. A light visual serving controller is also introduced to govern the object-following process.
The researchers conducted comprehensive experiments to evaluate FAn’s performance across diverse objects in zero-shot detection, tracking, and following scenarios. The results demonstrated the system’s seamless and efficient capability to follow objects of interest in real-time.
In conclusion, the FAn framework represents an encompassing solution for real-time object tracking and following, eliminating the limitations of closed-set systems. Its open-set nature, multimodal compatibility, real-time processing, and adaptability to new environments make it a significant advancement in robotics. Moreover, the team’s commitment to open-sourcing the system underscores its potential to benefit a wide array of real-world applications.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, please follow us on Twitter
The post MIT and Harvard Researchers Propose (FAn): A Comprehensive AI System that Bridges the Gap between SOTA Computer Vision and Robotic Systems- Providing an End-to-End Solution for Segmenting, Detecting, Tracking, and Following any Object appeared first on MarkTechPost.