Towards AI-Assisted Immersive Learning:Factor Analysis of Learning Effect in K-CubeEdu-Metaverse
Ye Jia, Chen Li, Zackary P. T. Sin, Wang, Xiangzhi Eric, Jiongning Lian, Peter H. F. Ng, Xiao Huang, George Baciu, Cao, Jiannong, Qing Li
IEEE International Conference on Metaverse 2025 (2025)

Abstract
This study examines the impact of an AI-powered Virtual Teaching Assistant (NivTA) within a VR-based Edu-Metaverse (K-Cube), highlighting the roles of social presence, trust, and engagement in shaping learning outcomes. Grounded in Social Presence Theory, the Uses and Gratifications framework, and the Cognitive-Affective Theory of Learning with Media (CASTLE), our AI-Assisted Immersive Learning Framework emphasizes both cognitive and affective dimensions. A user study with 21 participants in a Cave Automatic Virtual Environment (CAVE) setting collected quantitative and qualitative data on trust, social presence, engagement, workload, and learning performance. Partial Least Squares Structural Equation Modeling revealed that heightened social presence fosters trust, which in turn drives behavioral, cognitive, and affective engagement. Notably, cognitive social presence was directly linked to better knowledge test scores, while confidence in test responses stemmed primarily from all forms of engagement. Overall, these findings underscore the significance of nurturing trust and social presence to enhance learner engagement and outcomes in AI-driven immersive educational environments.
From System Description to Mechanism
The 2024 NivTA paper described a system — a virtual teaching assistant in a CAVE-VR environment, with LLM and KG integration — but did not empirically test how or why it affected learning. This paper closes that gap. It deploys NivTA within K-Cube, the edu-metaverse platform, and uses structural equation modeling (PLS-SEM) to trace the causal pathways from social presence through trust and engagement to learning outcomes. The result is a mechanism, not a demo.
The Theoretical Scaffold
The paper draws on three theoretical frameworks, which it integrates into what the authors call the AI-Assisted Immersive Learning Framework:
- Social Presence Theory: the sense of being with another (human or AI) in a mediated environment. In the context of NivTA, social presence is not just about copresence with other learners — it's about whether the AI teaching assistant feels like a social entity rather than a tool.
- Uses and Gratifications (U&G): a media psychology framework that asks what needs users seek to satisfy through media and whether those needs are met. Applied to edu-metaverse, it frames learning not just as information transfer but as a gratifying experience that competes with other media for student attention.
- CASTLE (Cognitive-Affective Theory of Learning with Media): a dual-processing framework that distinguishes cognitive engagement (processing information, building mental models) from affective engagement (emotional investment, motivation, enjoyment), and argues that both matter for learning, but through different pathways.
The theoretical contribution is the synthesis: social presence triggers trust, trust enables both cognitive and affective engagement, and these forms of engagement differentially predict learning outcomes. This is more specific and testable than any of the three frameworks alone.
The Empirical Model
Twenty-one participants used NivTA in a CAVE-VR environment for a learning session. The study measured social presence (both cognitive and affective dimensions), trust in the AI assistant, three forms of engagement (behavioral, cognitive, affective), workload (NASA-TLX), learning performance (knowledge test scores), and response confidence.
The PLS-SEM analysis produced a path model with several noteworthy features:
Social presence → Trust → Engagement. The primary causal chain runs from social presence (particularly cognitive social presence — the sense that the AI "understands" you and that you can "reach" it) through trust in the AI assistant, which then enables all three forms of engagement. Trust is the mediator: social presence without trust won't produce engagement, and trust without social presence won't form. This means that building an engaging AI teaching assistant requires both social design (making the AI feel present, responsive, addressable) and reliability design (making the AI accurate, consistent, and transparent enough to earn trust).
Cognitive social presence has a direct path to knowledge. Unlike affective social presence (feeling emotionally connected to the AI), cognitive social presence — the sense that meaningful information exchange is happening — directly predicted better knowledge test scores, unmediated by trust or engagement. This is the paper's most actionable finding: if your goal is knowledge transfer, invest in making the AI feel like a competent information source, not just a friendly presence.
Engagement predicts confidence, not knowledge. All three forms of engagement (behavioral, cognitive, affective) predicted how confident students felt about their answers, but none directly predicted whether those answers were correct. This is consistent with the ICBL 2025 finding (from the same research group) that engagement and knowledge are not tightly coupled. Engagement makes students feel like they're learning; cognitive social presence is what actually produces learning.
Design Implications
The path model suggests a triage for resource allocation in AI-assisted immersive learning systems:
- If the goal is knowledge: invest in cognitive social presence — response quality, information relevance, the AI's ability to demonstrate understanding of the learner's question. This is primarily a prompt engineering and KG-integration problem.
- If the goal is engagement: invest in trust-building — reliability, consistency, transparency about the AI's limitations. This is primarily a system design and communication design problem.
- If the goal is confidence: any form of engagement works, but it's worth asking whether confidence without knowledge is educationally valuable or potentially harmful (overconfidence effects are well-documented in learning science).
Boundaries
The sample is small (21 participants), and PLS-SEM, while appropriate for exploratory path modeling with modest samples, produces path coefficients that need validation in larger, confirmatory studies. The single-session design means we don't know whether the social-presence-to-trust pathway strengthens or decays with repeated exposure — trust in an AI teaching assistant may follow a very different trajectory than trust in a human instructor. The CAVE-VR setting, while ecologically valid for the NivTA system, limits generalizability to lower-immersion platforms (desktop, mobile) where social presence is harder to establish.
The AI-Assisted Immersive Learning Framework is proposed but not tested against competing models. The PLS-SEM analysis shows that the proposed paths fit the data, but it does not show that alternative path structures (e.g., engagement → social presence, or trust → social presence → engagement) fit worse. Model comparison with a larger sample is needed to establish the causal directionality the framework claims.