您现在的位置是：首页 > 硬件

当前栏目

NLP-阅读理解-2015：MRC模型-指导机器去阅读并理解【开篇之作】【完形填空任务】【第一次构建大批量有监督机器阅读理解训练语料】【三种模型结构：LSTM、Attention、Impatient】

机器训练模型理解构建结构任务三种

2023-09-27 14:20:38 时间

在这里插入图片描述
《原始论文：Teaching Machines to Read and Comprehend》

作者想要研究的问题是什么？一一在当下神经网络迅速发展的时代，如何针对机器阅读理解提出一个网络模型结构是基础。
由于缺乏大量规模的训练语料。无法对深度网络模型进行训练。因此作者构建了一个训练数据。
有了数据作为支撑，作者提出了三个基本的深度神经网络模型，用于解决阅读理解任务和提供给研究人员作为baselines

在这里插入图片描述

一、机器阅读理解概述

1、检索式问答 v.s. 阅读理解式问答

在这里插入图片描述

2、机器阅读理解任务

在这里插入图片描述

3、机器阅读理解数据集

机器阅读理解的时间脉络，主要是从数据集的角度出发，介绍不同时间提出不同的数据集以及他们各自的特点。
在这里插入图片描述

4、论文研究成果

在这里插入图片描述
https://cs.nyu.edu/~kcho/DMQA/

在这里插入图片描述

5、论文历史意义

在这里插入图片描述

二、本论文作者构建的训练数据集

在这里插入图片描述

http://web.archive.org/web/20150408044315id_/http://edition.cnn.com/2015/03/26/asia/taiwan-taipei-movie-location/index.html

@entity2 ( @entity1 ) @entity0 is fast becoming the go - to @entity4 city for some of @entity7 's biggest hitters . @entity8 is filming his latest opus " silence " there , and @entity11 director @entity10 chose @entity0 over seven other @entity4 cities for his sci - fi thriller " @entity15 , " starring @entity14 . and this may " @entity19 " star @entity18 will begin filming in the city for a movie project . tasked with attracting international film makers , @entity23 , head of the @entity24 , said that 92 foreign film crews shot in the city last year , up from 56 in 2013 . " we hope the whole island can be like a big studio , " she told @entity1 while in @entity2 thursday . it was n't always this way . for years , @entity37 , while home to acclaimed filmmakers like @entity35 and @entity36 , was overlooked in favor of its ritzier neighbors @entity2 , @entity41 and @entity42 . and even when @entity7 came knocking , the city did n't always have its act together . the producers of " @entity46 , " released in 2006 , had wanted to film at @entity0 , at that time the world 's tallest skyscraper . but faced with red tape and reluctance , they ended up choosing the 53 - story @entity51 for @entity54 's memorable bungee jump . " we lost an opportunity for the world to get to know @entity0 , " says @entity23 . @entity23 's commission was set up in 2008 to court international film makers but it was n't until director @entity61 , who was born in @entity37 , filmed the @entity66 - winning hit " @entity65 " on the island that it began to earn a reputation as an accommodating and affordable place to shoot . @entity61 filmed the memorable and technically difficult scenes of a shipwrecked boy and a tiger at a purpose - built facility at abandoned airport in the @entity37 city of @entity73 . with its relatively unknown cityscape , @entity37 can also function as a generic @entity4 backdrop . the island , which was a @entity77 colony , is already being used as a stand - in for @entity77 . @entity77 director @entity79 used @entity37 's high - speed rail system in a bullet - strewn action sequence for crime drama " @entity82 , " which competed at the @entity83 . @entity77 rail authorities turned him away . and @entity8 's " silence , " due to release in 2016 , is a historic drama about two @entity87 priests who travel to @entity77 . @entity37 is also a popular alternative to @entity90 , where there are many restrictions on filmmakers -- authorities can censor scripts considered politically sensitive or obscene . @entity2 director @entity96 used both @entity37 and @entity90 as locations in " @entity98 . " dubbed @entity90 's " titanic , " it focuses on a ship that sank when the @entity102 government fled @entity90 for @entity37 in 1949 as the @entity103 took over -- a sensitive period in @entity90 history . @entity37 and @entity90 are still governed separately . @entity90 authorities asked @entity96 to tone down the heroics of a @entity102 soldier , according to the @entity109 -- not something that @entity37 would ever require , says @entity23 . the commission offers incentives for international film crews . up to $ 2 million is available per movie -- half of that as a cash subsidy . but just as important is the island 's versatility as a location , says @entity23 . while many of the film crews are from neighboring @entity4 countries , the city has hosted crews from @entity126 and @entity127 , while the @entity128 shot some of its newly released drama " @entity132 " in the city . " it 's a small island . within half an hour , you can go from the streets to the mountains to the sea . "

@placeholder is shooting his latest movie in the city

@entity8

@entity23:Jao
@entity24:Taipei Film Commission
@entity36:Hou Hsiao-hsien
@entity82:Shield of Straw
@entity83:Cannes International Film Festival
@entity87:Jesuit
@entity132:X + Y
@entity2:Hong Kong
@entity1:CNN
@entity0:Taipei
@entity7:Hollywood
@entity11:French
@entity4:Asian
@entity8:Martin Scorsese
@entity51:Shanghai Bank of China Tower
@entity79:Takashi Miike
@entity54:Tom Cruise
@entity77:Japan
@entity73:Taichung
@entity15:Lucy
@entity14:Scarlett Johansson
@entity35:Edward Yang
@entity37:Taiwan
@entity10:Luc Besson
@entity19:The Walking Dead
@entity18:Andrew Lincoln
@entity102:Nationalist
@entity103:Communists
@entity128:BBC
@entity98:The Crossing
@entity126:Latvia
@entity96:Woo
@entity127:Germany
@entity90:China
@entity46:Mission Impossible III
@entity41:Shanghai
@entity42:Tokyo
@entity66:Oscar
@entity65:Life of Pi
@entity109:South China Morning Post
@entity61:Lee

三、机器阅读理解模型架构

1、Deep LSTM Reader

发表当前论文时（2015年），Transformer还没有问世。LSTM仍然是文本序列建模的主流模型，所以作者使用LSTM作为特征提取器。

在这里插入图片描述
输入：问题和文档拼接输入到LSTM模型中；
输出：两层LSTM的最后隐状态的拼接，然后分类选出一个答案实体（每一个样例都有一个候选实体列表）；

在这里插入图片描述

2、Attention Reader（LSTM+Attention）【问题整体与文档中的每个token进行Attention计算“相关”权重】

在这里插入图片描述

2.1 输入

输入：问题、文档分别用独立不同的LSTM进行特征提取

2.2 交互（问题整体与文档中的各个token进行Attention）

文档（document）中第 $t$ 个token的隐层向量为：
$y_d(t)=\overrightarrow{y}_d(t)||\overleftarrow{y}_d(t)$

查询问题（query）整体的隐层向量为：
$u=\overrightarrow{y}_q(q)||\overleftarrow{y}_q(1)$
即将第一个词的隐层向量（后向LSTM的最后得到的隐层向量）与最后一个词的隐层向量（前向LSTM的最后得到的隐层向量）进行相加得到问题的隐层向量；

计算Attention（使用加法Attention）：

在这里插入图片描述

2.3 输出

输出：经过Attention之后，分类（Softmax线性层）选出一个答案实体

3、The Impatient Reader（LSTM+改进Attention）【问题中的每个token与文档中的每个token进行Attention计算“相关”权重】

在这里插入图片描述

2.1 输入

输入：问题、文档分别用独立不同的LSTM进行特征提取

2.2 交互

文档（document）中第 $t$ 个token的隐层向量为（与Attention Reader一致）：
$y_d(t)=\overrightarrow{y}_d(t)||\overleftarrow{y}_d(t)$

查询问题（query）整体的隐层向量为（与Attention Reader一致）：
$u=\overrightarrow{y}_q(q)||\overleftarrow{y}_q(1)$

查询问题（query）中第 $i$ 个token的隐层向量为：
$y_q(i)=\overrightarrow{y}_q(i)||\overleftarrow{y}_q(i)$

在这里插入图片描述

其中 $∣ q ∣$ 表示问题的长度； $r (∣ q ∣)$ 表示 $r$ 最终的向量表示

2.3 输出

输出：经过复杂的Attention之后，分类（Softmax线性层）选出一个答案实体

四、实验结果

1、模型对比

在这里插入图片描述
传统机器阅读理解方法：

Maximum frequency：选择最常出现的实体作为答案
Exclusive frequency：问题中未出现，文本中最常出现的实体作为答案
Frame-semantic model:语义解析
Word distance model：编辑距离等

神经网络机器阅读理解方法：

神经网络的方法在大规模的数据集上，效果优于传统的方法
通过增加复杂的attention机制能够提升模型的表现
对于不同的数据集，不同的attention的方式效果会不同

2、案例研究

在这里插入图片描述

问题：谁为moms创建了一场秋季时装秀?

通过对模型中的attention进行可视化，可以发现对ent63的关注度最高

通过阅读文章，模型的关注度和人类回答问题时的关注度一致

五、论文总结

关键点：

如何构建一个大规模的语料提供给神经网络训练
如何设计模型对文章和问题进行交互

创新点

首次提出了一个大规模的训练数据，便于后续的研究
提出了三个基本的神经网络模型以及较为复杂的attention机制

启发点

这一篇文章是针对完形填空的阅读理解。如何去构造不同的任务类型是后续可研究的地方
在模型对实体进行分类的时候，可解释性较差，并为涉及到因果推理等关键问题，数据集较
为简单。
文章只是简单的对问题和篇章进行了串联。如何更好的进行问题和篇章的交互？后续也提出
了一大批模型构建问题和篇章的交互。

猜你喜欢

设计模式复习-备忘录模式
测试人生 | 薪资翻倍涨至50W是种什么样的体验？
qps
服务器故障用什么软件解决好？大家有推荐的吗？
易维帮助台：论工业产品售后服务升级转型的正确打开姿势
CYQ.Data 数据框架 V3.5 开源版本发布(源码提供下载)
Kafka-再均衡监听器
spring mvc DispatcherServlet详解之前传---FrameworkServlet
[Work Summary] Navicat导入导出数据集
客服呼叫中心内话务员三大交流法宝
《Mahout算法解析与案例实战》一一3.3　Mean Shift算法
数据挖掘与数据化运营实战. 3.13　决策支持
[LeetCode] 1184. Distance Between Bus Stops 公交站间的距离

相关主题

机器学习基础
机器学习-SVM
机器学习资源
机器学习——感知机
机器学习之梯度下降
机器学习——EM算法
机器学习-随机森林
机器学习-线性回归
Python机器学习·微教程
机器学习笔记（一）
什么是机器学习
机器学习_knn算法_2
机器学习算法总结
机器学习模型评估
机器学习算法分类
机器学习的原理
机器学习介绍
机器学习原理
机器学习入门
机器学习过程

zl程序教程

当前栏目

NLP-阅读理解-2015：MRC模型-指导机器去阅读并理解【开篇之作】【完形填空任务】【第一次构建大批量有监督机器阅读理解训练语料】【三种模型结构：LSTM、Attention、Impatient】

一、机器阅读理解概述

1、检索式问答 v.s. 阅读理解式问答

2、机器阅读理解任务

3、机器阅读理解数据集

4、论文研究成果

5、论文历史意义

二、本论文作者构建的训练数据集

三、机器阅读理解模型架构

1、Deep LSTM Reader

2、Attention Reader（LSTM+Attention）【问题整体与文档中的每个token进行Attention计算“相关”权重】

2.1 输入

2.2 交互（问题整体与文档中的各个token进行Attention）

2.3 输出

3、The Impatient Reader（LSTM+改进Attention）【问题中的每个token与文档中的每个token进行Attention计算“相关”权重】

2.1 输入

2.2 交互

2.3 输出

四、实验结果

1、模型对比

2、案例研究

五、论文总结

相关文章