文献管理

Title: Balancing Learning and Engagement in Game-Based Learning Environments with Multi-objective Reinforcement Learning
Source: DOI: 10.1007/978-3-319-61425-0_27
Author(s): Robert Sawyer, Jonathan Rowe, and James Lester
Online Reference:
Abstract:

Game-based learning environments create rich learning experiences that are both effective and engaging. Recent years have seen growing interest in data-driven techniques for tutorial planning, which dynamically personalize learning experiences by providing hints, feedback, and problem scenarios at run-time. In game-based learning environments, tutorial planners are designed to adapt gameplay events in order to achieve multiple objectives, such as enhancing student learning or student engagement, which may be complementary or com- peting aims. In this paper, we introduce a multi-objective reinforcement learning framework for inducing game-based tutorial planners that balance between improving learning and engagement in game-based learning environments. We investigate a model-based, linear-scalarized multi-policy algorithm, Convex Hull Value Iteration, to induce a tutorial planner from a corpus of student interactions with a game-based learning environment for middle school science education. Results indicate that multi-objective reinforcement learning creates policies that are more effective at balancing multiple reward sources than single-objective techniques. A qualitative analysis of select policies and multi-objective prefer- ence vectors shows how a multi-objective reinforcement learning framework shapes the selection of tutorial actions during students’ game-based learning experiences to effectively achieve targeted learning and engagement outcomes.

File:
Relevant Principles (APA): 原理6 清晰的、及时的以及解释性的反馈对学生的学习很重 要
Notes (Theories):

基于游戏的教学系统,一般存在多个目标(例如学习效果和参与感),建立多目标之中的动态平衡能改善系统的功能。

Notes (Technologies):

本研究使用强化学习技术Reinforcement learning (RL)进行多目标的教学游戏的智能导学系统的设计。

为了解决数据稀疏性问题,我们将教程规划分解为几个不同的子问题,这些子问题表示为可适应事件序列adaptable event sequences (AESs)。

每个AES都被建模为一个多目标马尔可夫决策过(MOMDP),具有自己的状态表示、动作集、状态转换模型和奖励模型。

每个MOMDP都使用了相同的两套奖励模型,它们基于:(1)参与者标准化的学习收益,(2)游戏结束后的自我报告。

用最大似然模型的MOMDP,通过对训练语料库中状态-动作转换计数和观察到的奖励,建立了一定的环境效价模型。

Notes (Applications):

本文提出一个多目标强化学习框架用于基于游戏的学习环境中的教程规划,该框架解决了将多个奖励源(如学习和参与)合并到用于教程规划的数据驱动框架中的问题。

Notes (Impacts):
Tags: