您现在的位置是：首页 > 工具

当前栏目

深度学习（自然语言处理）Seq2Seq学习笔记（采用GRU且进行信息压缩）（二）

笔记学习处理进行深度采用自然语言 Seq2Seq

2023-09-11 14:20:00 时间

0 前言：

在上一个模型，我们的解码器与编码采用的都是多层的RNN，在多层的RNN我们可以使用dropout进行处理，且我们采用LSTM返回每个训练的cell，但是由于多层的RNN且线性层的设计导致了信息计算量大，没有很好的压缩起来。接下来我们采用GRU 并且我们的Encoder都采用单层RNN进行设计以期望进行信息压缩，优化我们的Seq2Seq。

1 Encoder

由于在多层RNN的每一层之间都使用了dropout，因此我们也不会将dropout作为参数传递给GRU。当我们只有一个图层，如果我们尝试使用传递一个dropout值，PyTorch将显示警告

GRU返回的只有隐藏层而没有cell

class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, hid_dim, dropout):
        super().__init__()

        self.hid_dim = hid_dim
        
        self.embedding = nn.Embedding(input_dim, emb_dim) #no dropout as only one layer!
        
        self.rnn = nn.GRU(emb_dim, hid_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, src):
        
        #src = [src len, batch size]
        
        embedded = self.dropout(self.embedding(src))
        
        #embedded = [src len, batch size, emb dim]
        
        outputs, hidden = self.rnn(embedded) #no cell state!
        
        #outputs = [src len, batch size, hid dim * n directions]
        #hidden = [n layers * n directions, batch size, hid dim]
        
        #outputs are always from the top hidden layer
        
        return hidden

2 Decoder

解码器的实现与以前的模型有很大不同，我们减轻了一些信息压缩

2.1 两种变化：

在解码器中的每个时间段都重复使用编码器返回的相同上下文向量。 $s_t = \text{DecoderGRU}(d(y_t), s_{t-1}, z)$
通过当前Token的嵌入层d(yi)是向量的下一层 $\hat{y}_{t+1} = f(d(y_t), s_t, z$ (先前那我们预测下一个向量值，只使用了在每一个time-step中最高层的隐藏状态）

这两个改变时如何减少信息压缩：

如果解码器的隐藏状态不再需要包含有关源序列的信息，那么它始终可以用作输入
在线性层加上yt也意味着该层可以直接查看令牌是什么，而不必从隐藏状态获取信息

class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, hid_dim, dropout):
        super().__init__()

        self.hid_dim = hid_dim
        self.output_dim = output_dim
        
        self.embedding = nn.Embedding(output_dim, emb_dim)
        
        self.rnn = nn.GRU(emb_dim + hid_dim, hid_dim)
        
        self.fc_out = nn.Linear(emb_dim + hid_dim * 2, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, input, hidden, context):
        
        #input = [batch size]
        #hidden = [n layers * n directions, batch size, hid dim]
        #context = [n layers * n directions, batch size, hid dim]
        
        #n layers and n directions in the decoder will both always be 1, therefore:
        #hidden = [1, batch size, hid dim]
        #context = [1, batch size, hid dim]
        
        input = input.unsqueeze(0)
        
        #input = [1, batch size]
        
        embedded = self.dropout(self.embedding(input))
        
        #embedded = [1, batch size, emb dim]
                
        emb_con = torch.cat((embedded, context), dim = 2)
            
        #emb_con = [1, batch size, emb dim + hid dim]
            
        output, hidden = self.rnn(emb_con, hidden)
        
        #output = [seq len, batch size, hid dim * n directions]
        #hidden = [n layers * n directions, batch size, hid dim]
        
        #seq len, n layers and n directions will always be 1 in the decoder, therefore:
        #output = [1, batch size, hid dim]
        #hidden = [1, batch size, hid dim]
        
        output = torch.cat((embedded.squeeze(0), hidden.squeeze(0), context.squeeze(0)), 
                           dim = 1)
        
        #output = [batch size, emb dim + hid dim * 2]
        
        prediction = self.fc_out(output)
        
        #prediction = [batch size, output dim]
        
        return prediction, hidden

3 Seq2Seq 模型

将Encoder与Decoder 放在一起，实现如图：

在实验过程中，我们需要确保编码器与解码器的隐藏尺寸相同

遇上一个实验类似

步骤简要如下：

创建输出张量以保存所有预测，𝑌̂
源序列𝑋被馈入编码器以接收上下文向量
初始解码器隐藏状态设置为上下文向量，𝑠0=𝑧=ℎ𝑇
我们使用一批<sos>令牌作为第一个输入𝑦1
然后，我们在一个循环中解码：

将输入令牌𝑦𝑡，先前的隐藏状态𝑠𝑡−1和上下文向量插入解码器、

接收预测𝑡+ 1以及新的隐藏状态𝑠𝑡

然后，我们决定是否要进行teacher_force，适当地设置下一个输入（目标序列中的地面真理下一个标记或最高预测的下一个标记）

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        
        assert encoder.hid_dim == decoder.hid_dim, \
            "Hidden dimensions of encoder and decoder must be equal!"
        
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):
        
        #src = [src len, batch size]
        #trg = [trg len, batch size]
        #teacher_forcing_ratio is probability to use teacher forcing
        #e.g. if teacher_forcing_ratio is 0.75 we use ground-truth inputs 75% of the time
        
        batch_size = trg.shape[1]
        trg_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim
        
        #tensor to store decoder outputs
        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)
        
        #last hidden state of the encoder is the context
        context = self.encoder(src)
        
        #context also used as the initial hidden state of the decoder
        hidden = context
        
        #first input to the decoder is the <sos> tokens
        input = trg[0,:]
        
        for t in range(1, trg_len):
            
            #insert input token embedding, previous hidden state and the context state
            #receive output tensor (predictions) and new hidden state
            output, hidden = self.decoder(input, hidden, context)
            
            #place predictions in a tensor holding predictions for each token
            outputs[t] = output
            
            #decide if we are going to use teacher forcing or not
            teacher_force = random.random() < teacher_forcing_ratio
            
            #get the highest predicted token from our predictions
            top1 = output.argmax(1) 
            
            #if teacher forcing, use actual next token as next input
            #if not, use predicted token
            input = trg[t] if teacher_force else top1

        return outputs

3.1 训练Seq2Seq模型

与上一个教程非常相似深度学习（自然语言处理）Seq2Seq学习笔记（动手实践）

INPUT_DIM = len(SRC.vocab)
OUTPUT_DIM = len(TRG.vocab)
ENC_EMB_DIM = 256
DEC_EMB_DIM = 256
HID_DIM = 512
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5

enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, ENC_DROPOUT)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, DEC_DROPOUT)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = Seq2Seq(enc, dec, device).to(device)

虽然我们使用的编码器和解码器只有单层RNN，我们实际上比上一个模型拥有更多的参数深度学习（自然语言处理）Seq2Seq学习笔记（动手实践）

这是由于GRU和线性层的输入大小增加了。

创建训练的loop 与eval函数与上一篇实验类似

3.2 train

def train(model, iterator, optimizer, criterion, clip):
    
    model.train()
    
    epoch_loss = 0
    
    for i, batch in enumerate(iterator):
        
        src = batch.src
        trg = batch.trg
        
        optimizer.zero_grad()
        
        output = model(src, trg)
        
        #trg = [trg len, batch size]
        #output = [trg len, batch size, output dim]
        
        output_dim = output.shape[-1]
        
        output = output[1:].view(-1, output_dim)
        trg = trg[1:].view(-1)
        
        #trg = [(trg len - 1) * batch size]
        #output = [(trg len - 1) * batch size, output dim]
        
        loss = criterion(output, trg)
        
        loss.backward()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        
        optimizer.step()
        
        epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

3.4 eval

def evaluate(model, iterator, criterion):
    
    model.eval()
    
    epoch_loss = 0
    
    with torch.no_grad():
    
        for i, batch in enumerate(iterator):

            src = batch.src
            trg = batch.trg

            output = model(src, trg, 0) #turn off teacher forcing

            #trg = [trg len, batch size]
            #output = [trg len, batch size, output dim]

            output_dim = output.shape[-1]
            
            output = output[1:].view(-1, output_dim)
            trg = trg[1:].view(-1)

            #trg = [(trg len - 1) * batch size]
            #output = [(trg len - 1) * batch size, output dim]

            loss = criterion(output, trg)

            epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

3.5 实验结果分析：

4 小结

仅从结果分析，我们的测试结果的损失优于上一个实验。这是一个很好的信号，表明此模型体系结构正在做正确的事情！

因此，减轻信息压缩似乎是一个必要的方法，在下一个教程中，我们将进一步关注这一点。

参考文献

猜你喜欢

IOUtils工具类复制输入输出流示例
Android加载器LoaderManager.loaderCallbacks
力扣——1. 两数之和（java实现）
利用矩阵快速幂转换的题目
Ambari 生命周期
[Trading] 股票日内交易者能赚多少钱 - 看到日内交易的潜力并学习如何实现它
【习题 8-14 UVA - 1616】Caravan Robbers
Java实现蓝桥杯VIP 算法提高洗牌
梯度下降算法原理讲解——机器学习
Django 基于Ajax & form 简单实现文件上传
EventBus 事件总线原理 MD
pytorch 反向求导
未来的智能嵌入式设备将由 Java 驱动
Qt音视频开发28-Onvif信息获取
PHP中类似$a && $b = $c 语法的用法和实例
DevOps - Scrum
rem布局下使用背景图片和sprite图

相关主题

JavaScript 笔记
react学习笔记
4.13学习笔记
struts2学习笔记
linux学习笔记2
elasticsearch学习笔记001
redis笔记1

zl程序教程

当前栏目

深度学习（自然语言处理）Seq2Seq学习笔记（采用GRU且进行信息压缩）（二）

0 前言：

1 Encoder

2 Decoder

2.1 两种变化：

3 Seq2Seq 模型

3.1 训练Seq2Seq模型

3.2 train

3.4 eval

3.5 实验结果分析：

4 小结

相关文章