zippy/samples/llm-generated/2110.15724_generated.txt

1 wiersz
8.9 KiB
Plaintext

Abstract For each goal-oriented dialog task of inter- est, large amounts of data need to be col- lected for end-to-end learning of a neural dia- log system. Collecting that data is a costly and time-consuming process. Instead, we show that we can use only a small amount of data, supplemented with data from a related dialog task. Naively learning from related data fails to improve performance as the related data can be inconsistent with the target task. We de- scribe a meta-learning based method that selec- tively learns from the related dialog task data. Our approach leads to significant accuracy im- provements in an example dialog task. Introduction One key benefit of goal-oriented dialog systems that are trained end-to-end is that they only re- quire examples of dialog for training. Avoiding the modular structure of pipeline methods removes the human effort involved in creating intermediate annotations for data to train the modules. The end- to-end structure also enables automatic adaptation of the system, with different components of the model changing together. This flexibility is partic- ularly valuable when applying the system to a new domain. However, end-to-end systems currently require significantly more data, increasing the human effort in data collection. The most common method for training is Supervised Learning (SL) using a dataset of dialogs of human agents performing the task of interest (Bordes et al., 2017; Eric and Manning, 2017; Wen et al., 2017). To produce an effective model, the dataset needs to be large, high quality, and in the target domain. That means for each new dialog task of interest large amounts of new data has to be collected. The time and money involved in that collection process limits the potential appli- cation of these systems. We propose a way to reduce this cost by selec- tively learning from data from related dialog tasks: tasks that have parts/subtasks that are similar to the new task of interest. Specifically, we describe a method for learning which related task examples to learn from. Our approach uses meta-gradients to automatically meta-learn a scalar weight ∈ (0, 1) for each of the related task data points, such that learning from the weighted related task data points improves the performance of the dialog system on the new task of interest. These weights are dynami- cally adjusted over the course of training in order to learn most effectively. We still learn from data for the target task, but do not need as much to achieve the same results. To demonstrate this idea, we considered two ex- periments. First, we confirmed that the method can work in an ideal setting. We constructed a classifi- cation task where the related task data is actually from the same task, but with the incorrect label for 75% of examples, and there is an input feature that indicates whether the label is correct or not. Our approach is able to learn to ignore the misleading data, achieving close to the performance of a model trained only on the correct examples. Second, we evaluated the approach on a per- sonalized restaurant reservation task with limited training data. Here, the related task is also restau- rant reservation, but without personalization and with additional types of interactions. We compare our approach to several standard alternatives, in- cluding multi-task learning and using the related data for pre-training only. Our approach is consis- tently the best, indicating its potential to effectively learn which parts of the related data to learn from and which to ignore. Successfully learning from available related task data can allow us to build end-to-end goal-oriented dialog systems for new tasks faster with reduced cost and human effort in data collection. Related Work The large cost of collecting data for every new dia- log task has been widely acknowledged, motivat- ing a range of efforts. One approach is to transfer knowledge from other data to cope with limited availability of training dialog data for the new task of interest. For example Zhao et al. (2020) split the dialog model such that most of the model can be learned using ungrounded dialogs and plain text. Only a small part of the dialog model with a small number of parameters is trained with the dialog data available for the task of interest. In contrast, we explore how to learn from related grounded di- alogs, and also without any specific constraints on the structure of the end-to-end dialog system archi- tecture. Wen et al. (2016) pre-train the model with data automatically generated from different tasks and Lin et al. (2020) use pre-trained language mod-as initialization and then fine-tune the dialog model with data from the task of interest. These ideas are complementary to our approach as we make no assumptions about how the model was pre-trained. Recently, there has been work that explored ways to automatically learn certain aspects of the transfer process using meta-learning. In this work, we compare our approach to several standard alternatives, in- cluding multi-task learning and using the related data for pre-training only. Our approach is consis- tently the best, indicating its potential to effectively learn which parts of the related data to learn from and which to ignore. Successfully learning from available related task data can allow us to build end-to-end goal-oriented dialog systems for new tasks faster with reduced cost and human effort in data collection. Related Work The large cost of collecting data for every new dia- log task has been widely acknowledged, motivat- ing a range of efforts. One approach is to transfer knowledge from other data to cope with limited availability of training dialog data for the new task of interest. For example Zhao et al. (2020) split the dialog model such that most of the model can be learned using ungrounded dialogs and plain text. Only a small part of the dialog model with a small number of parameters is trained with the dialog data available for the task of interest. In contrast, we explore how to learn from related grounded di- alogs, and also without any specific constraints on the structure of the end-to-end dialog system archi- tecture. Wen et al. (2016) pre-train the model with data automatically generated from different tasks and Lin et al. (2020) use pre-trained language mod- els as initialization and then fine-tune the dialog model with data from the task of interest. These ideas are complementary to our approach as we make no assumptions about how the model was pre-trained. Recently, there has been work that explored ways to automatically learn certain aspects of the transfer process using meta-learning. Xu et al. (2020) look at the problem of learning a joint dia- log policy using Reinforcement Learning (RL) in a multi-domain setting which can then be trans- ferred to a new domain. They decomposed the state and action representation into features that correspond to low level components that are shared across domains, facilitating cross-domain transfer. They also proposed a Model Agnostic Meta Learn- ing (MAML Finn et al., 2017) based extension that learns to adapt faster to a new domain. Madotto et al. (2019), Mi et al. (2019), Qian and Yu (2019) and Dai et al. (2020) also look at multi-domain settings. They use MAML based meta-learning methods to learn an initialization that adapts fast with few dialog samples from a new task. All of the papers above consider settings where there is access to a large set of training tasks. The meta-learning systems learn to transfer knowledge to a new test task by learning how to do transfer on different training tasks. While each task only has a limited amount of dialog data, they need a lot of tasks during training. In contrast, we look at a setting where the task from which we want to transfer knowledge from and the task that we want to transfer knowledge to are the only tasks that we have access to at training time. Any learning about how to transfer knowledge has to happen from just these two tasks. None of the above methods are applicable to this setting. Learning a task, while simultaneously meta- learning certain aspects of the learning process has been done successfully in some SL and RL set- tings recently. Wu et al. (2018); Wichrowska et al. (2017) use meta-learning to adapt hyper parameters such as learning rate and and even learn entire op- timizers themselves during training for SL tasks such as image classification. Given a single task, Zheng et al. (2018) successfully meta-learn intrin- sic rewards that help the agent perform well on that task. Xu et al. (2018) use meta-gradients to learn RL training hyperparameters such as the discount factor and bootstrapping parameters. The meta- gradient technique used in our proposed method is closely related to Rajendran et al. (2020). They learn intrinsic rewards for an RL agent acting in given domain, such that learning with those intrin- sic rewards improves the performance of the agent in the task of interest in a different domain.