Dr. Shao Ling of Terminus Group: Spatial Intelligence as a Prerequisite for Building World Models

Table of Content [Hide]

    Large Models and Spatial Intelligence: The Leap from Cognition to Action


    Today, large model technologies are reshaping the foundations of spatial intelligence through architectural innovation and multimodal integration, accelerating its transition from laboratory exploration to industrial applications. Traditional AI has largely focused on structured data processing and rule execution, whereas the value of spatial intelligence lies in solving spatial reasoning challenges arising from the diversity and complexity of the physical world. With spatial intelligence, machines can perceive, understand, and interact with their environment in three dimensions in a human-like manner.


    Although deep learning models have shown remarkable success in computer vision, the efficient integration of multimodal data and execution of complex tasks in dynamic environments remain key challenges to be addressed.


    In this issue, we speak with Dr. Shao Ling, President of Terminus Group, Chief Scientist, and Head of AI Lab, for an in-depth discussion on spatial intelligence, multimodal fusion, and the future of technological transformation.


    Terminus-Group-20250616-1.jpg

    Key Highlights from the Dialogue


    Q1: What is spatial intelligence? How does it relate to the World Model?


    Shao Ling: Spatial intelligence is an advanced extension of computer vision. Its essence is to enable machines to perceive, understand, reason about, and interact with the three-dimensional physical world. It goes beyond object recognition to include insights into spatial relationships, environmental context, and their impact on interactive behaviors.


    Spatial intelligence can be regarded as a critical stage of AI development following large language models (LLMs), bridging the gap between perception and action. The World Model, by contrast, is an internal representation of the physical world, designed to predict environmental states and support planning. While not identical, the two are closely related: spatial intelligence provides the perception and understanding necessary for World Models, while World Models transform this information into predictive and decision-making capabilities.


    Q2: What are the mainstream approaches to achieving spatial intelligence?


    Shao Ling: There is no single pathway to spatial intelligence. In addition to the widely discussed Large World Models (LWMs), several complementary approaches exist:


    • Knowledge- and reasoning-based: emphasizing structured knowledge and logical inference.

    • Multimodal fusion-based: enhancing understanding through the integration of vision, speech, environmental, and other sensory channels.

    • Embodied intelligence-based: relying on continuous interaction with the environment to achieve autonomous learning and exploration.


    At Terminus Group, we adopt a hybrid path—integrating spatial perception models with industry knowledge bases, domain models, and AI agents. This approach leverages the advantages of real-world scenarios while enhancing robustness and generalization.


    Q3: What are Terminus Group's advantages?


    Shao Ling: Our strength lies in years of accumulation in AIoT infrastructure, industry data, and scenario understanding, enabling us to develop integrated spatial intelligence solutions. Specifically:


    • AIoT provides data collection and real-time perception.

    • Domain models capture industry-specific knowledge and intrinsic relationships.

    • Spatial intelligent agents serve as a universal intelligence foundation, supporting deployment in diverse scenarios such as campuses, buildings, transportation, and energy.


    Q4: What technical barriers must be crossed to move from large language models to spatial intelligence?


    Shao Ling: The realization of spatial intelligence relies on multiple interdisciplinary technologies:

    • Perception: computer vision and deep learning.

    • Understanding: learning 3D representations of geometry and topology.

    • Reasoning: applying vision-language models (VLMs) and reinforcement learning (RL) for spatial semantic reasoning.

    • Execution: leveraging embodied intelligence and environmental simulation to train agents in virtual 3D environments.


    Terminus Group has years of experience in CV, deep learning, VLMs, RL, 3D simulation, and environmental modeling. We are also developing multimodal spatial intelligence large models capable of generating control commands directly from edge sensor data—enabling smarter, more efficient decision-making and execution in real-world scenarios.


    Q5: Multimodal data fusion and alignment remain bottlenecks. How does Terminus Group address them?


    Shao Ling: We believe intelligence emerges from diversity and distribution. With our extensive AIoT deployments, Terminus Group can collect dozens of data modalities, including vision, speech, text, environmental, geographic, and biosignal data. Using dynamic adaptive temporal synchronization, we achieve cross-modal alignment. Combined with large model pre-training and reinforcement learning, we are building multimodal spatial intelligence models closely aligned with real-world applications.


    Looking ahead, Terminus Group will develop specialized intelligent agents driven by overseas market demand:


    • Short-term (within 1 year): launch the mobile AI agent HALI, empowering wearables and robots with human-like reasoning, long-term memory, and personalization.

    • Long-term (3–5 years): explore general-purpose agents, advancing high-dimensional spatial intelligence, autonomous learning, and multi-agent collaboration to provide scalable intelligence foundations for more complex scenarios.


    Conclusion


    From large language models to spatial intelligence, AI is evolving from being a "knowledge provider" to becoming an "autonomous actor." By integrating AIoT, domain models, and multimodal large models, Terminus Group is continuously expanding the boundaries of spatial intelligence. This is not only a technological evolution but also a reflection of how Chinese technology enterprises are shaping competitive advantages in the global industrial landscape.

    References
    We use cookies to offer you a better browsing experience, analyze site traffic and personalize content. By using this site, you agree to our use of cookies. Visit our cookie policy to leamn more.