zl程序教程

您现在的位置是:首页 >  工具

当前栏目

深度学习(自然语言处理)Seq2Seq学习笔记(采用GRU且进行信息压缩)(二)

笔记学习 处理 进行 深度 采用 自然语言 Seq2Seq
2023-09-11 14:20:00 时间

目录

0 前言:

1 Encoder

2 Decoder

2.1 两种变化:

3 Seq2Seq 模型

3.1 训练Seq2Seq模型

3.2 train

 3.4 eval

3.5 实验结果分析:

4 小结


0 前言:

在上一个模型,我们的解码器与编码采用的都是多层的RNN,在多层的RNN我们可以使用dropout进行处理,且我们采用LSTM返回每个训练的cell,但是由于多层的RNN且线性层的设计导致了信息计算量大,没有很好的压缩起来。接下来我们采用GRU 并且我们的Encoder都采用单层RNN进行设计以期望进行信息压缩,优化我们的Seq2Seq。

1 Encoder

由于在多层RNN的每一层之间都使用了dropout,因此我们也不会将dropout作为参数传递给GRU。当我们只有一个图层,如果我们尝试使用传递一个dropout值,PyTorch将显示警告

GRU返回的只有隐藏层而没有cell

class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, hid_dim, dropout):
        super().__init__()

        self.hid_dim = hid_dim
        
        self.embedding = nn.Embedding(input_dim, emb_dim) #no dropout as only one layer!
        
        self.rnn = nn.GRU(emb_dim, hid_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, src):
        
        #src = [src len, batch size]
        
        embedded = self.dropout(self.embedding(src))
        
        #embedded = [src len, batch size, emb dim]
        
        outputs, hidden = self.rnn(embedded) #no cell state!
        
        #outputs = [src len, batch size, hid dim * n directions]
        #hidden = [n layers * n directions, batch size, hid dim]
        
        #outputs are always from the top hidden layer
        
        return hidden

2 Decoder

解码器的实现与以前的模型有很大不同,我们减轻了一些信息压缩

2.1 两种变化:

  • 在解码器中的每个时间段都重复使用编码器返回的相同上下文向量s_t = \text{DecoderGRU}(d(y_t), s_{t-1}, z)

  • 通过当前Token的嵌入层d(yi)是向量的下一层\hat{y}_{t+1} = f(d(y_t), s_t, z(先前那我们预测下一个向量值,只使用了在每一个time-step中最高层的隐藏状态)

这两个改变时如何减少信息压缩

  • 如果解码器的隐藏状态不再需要包含有关源序列的信息,那么它始终可以用作输入
  • 在线性层加上yt也意味着该层可以直接查看令牌是什么,而不必从隐藏状态获取信息
class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, hid_dim, dropout):
        super().__init__()

        self.hid_dim = hid_dim
        self.output_dim = output_dim
        
        self.embedding = nn.Embedding(output_dim, emb_dim)
        
        self.rnn = nn.GRU(emb_dim + hid_dim, hid_dim)
        
        self.fc_out = nn.Linear(emb_dim + hid_dim * 2, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, input, hidden, context):
        
        #input = [batch size]
        #hidden = [n layers * n directions, batch size, hid dim]
        #context = [n layers * n directions, batch size, hid dim]
        
        #n layers and n directions in the decoder will both always be 1, therefore:
        #hidden = [1, batch size, hid dim]
        #context = [1, batch size, hid dim]
        
        input = input.unsqueeze(0)
        
        #input = [1, batch size]
        
        embedded = self.dropout(self.embedding(input))
        
        #embedded = [1, batch size, emb dim]
                
        emb_con = torch.cat((embedded, context), dim = 2)
            
        #emb_con = [1, batch size, emb dim + hid dim]
            
        output, hidden = self.rnn(emb_con, hidden)
        
        #output = [seq len, batch size, hid dim * n directions]
        #hidden = [n layers * n directions, batch size, hid dim]
        
        #seq len, n layers and n directions will always be 1 in the decoder, therefore:
        #output = [1, batch size, hid dim]
        #hidden = [1, batch size, hid dim]
        
        output = torch.cat((embedded.squeeze(0), hidden.squeeze(0), context.squeeze(0)), 
                           dim = 1)
        
        #output = [batch size, emb dim + hid dim * 2]
        
        prediction = self.fc_out(output)
        
        #prediction = [batch size, output dim]
        
        return prediction, hidden

3 Seq2Seq 模型

将Encoder与Decoder 放在一起,实现如图:

在实验过程中,我们需要确保编码器与解码器的隐藏尺寸相同

遇上一个实验类似

步骤简要如下:

  • 创建输出张量以保存所有预测,𝑌̂
  • 源序列𝑋被馈入编码器以接收上下文向量
  • 初始解码器隐藏状态设置为上下文向量,𝑠0=𝑧=ℎ𝑇
  • 我们使用一批<sos>令牌作为第一个输入𝑦1
  • 然后,我们在一个循环中解码:

将输入令牌𝑦𝑡,先前的隐藏状态𝑠𝑡−1和上下文向量插入解码器、

接收预测𝑡+ 1以及新的隐藏状态𝑠𝑡

然后,我们决定是否要进行teacher_force,适当地设置下一个输入(目标序列中的地面真理下一个标记或最高预测的下一个标记)

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        
        assert encoder.hid_dim == decoder.hid_dim, \
            "Hidden dimensions of encoder and decoder must be equal!"
        
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):
        
        #src = [src len, batch size]
        #trg = [trg len, batch size]
        #teacher_forcing_ratio is probability to use teacher forcing
        #e.g. if teacher_forcing_ratio is 0.75 we use ground-truth inputs 75% of the time
        
        batch_size = trg.shape[1]
        trg_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim
        
        #tensor to store decoder outputs
        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)
        
        #last hidden state of the encoder is the context
        context = self.encoder(src)
        
        #context also used as the initial hidden state of the decoder
        hidden = context
        
        #first input to the decoder is the <sos> tokens
        input = trg[0,:]
        
        for t in range(1, trg_len):
            
            #insert input token embedding, previous hidden state and the context state
            #receive output tensor (predictions) and new hidden state
            output, hidden = self.decoder(input, hidden, context)
            
            #place predictions in a tensor holding predictions for each token
            outputs[t] = output
            
            #decide if we are going to use teacher forcing or not
            teacher_force = random.random() < teacher_forcing_ratio
            
            #get the highest predicted token from our predictions
            top1 = output.argmax(1) 
            
            #if teacher forcing, use actual next token as next input
            #if not, use predicted token
            input = trg[t] if teacher_force else top1

        return outputs

3.1 训练Seq2Seq模型

与上一个教程非常相似深度学习(自然语言处理)Seq2Seq学习笔记(动手实践)

INPUT_DIM = len(SRC.vocab)
OUTPUT_DIM = len(TRG.vocab)
ENC_EMB_DIM = 256
DEC_EMB_DIM = 256
HID_DIM = 512
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5

enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM, ENC_DROPOUT)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, DEC_DROPOUT)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = Seq2Seq(enc, dec, device).to(device)

虽然我们使用的编码器和解码器只有单层RNN,我们实际上比上一个模型拥有更多的参数深度学习(自然语言处理)Seq2Seq学习笔记(动手实践)

这是由于GRU和线性层的输入大小增加了

创建训练的loop 与eval函数与上一篇实验类似

3.2 train

def train(model, iterator, optimizer, criterion, clip):
    
    model.train()
    
    epoch_loss = 0
    
    for i, batch in enumerate(iterator):
        
        src = batch.src
        trg = batch.trg
        
        optimizer.zero_grad()
        
        output = model(src, trg)
        
        #trg = [trg len, batch size]
        #output = [trg len, batch size, output dim]
        
        output_dim = output.shape[-1]
        
        output = output[1:].view(-1, output_dim)
        trg = trg[1:].view(-1)
        
        #trg = [(trg len - 1) * batch size]
        #output = [(trg len - 1) * batch size, output dim]
        
        loss = criterion(output, trg)
        
        loss.backward()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        
        optimizer.step()
        
        epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

 3.4 eval

def evaluate(model, iterator, criterion):
    
    model.eval()
    
    epoch_loss = 0
    
    with torch.no_grad():
    
        for i, batch in enumerate(iterator):

            src = batch.src
            trg = batch.trg

            output = model(src, trg, 0) #turn off teacher forcing

            #trg = [trg len, batch size]
            #output = [trg len, batch size, output dim]

            output_dim = output.shape[-1]
            
            output = output[1:].view(-1, output_dim)
            trg = trg[1:].view(-1)

            #trg = [(trg len - 1) * batch size]
            #output = [(trg len - 1) * batch size, output dim]

            loss = criterion(output, trg)

            epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

3.5 实验结果分析:

4 小结

仅从结果分析,我们的测试结果的损失优于上一个实验。这是一个很好的信号,表明此模型体系结构正在做正确的事情!

因此,减轻信息压缩似乎是一个必要的方法,在下一个教程中,我们将进一步关注这一点。

参考文献

  1. pytorch-seq2seq::https://github.com/bentrevett/pytorch-seq2seq
  2. 深度学习(自然语言处理)Seq2Seq学习笔记(动手实践)https://blog.csdn.net/qq_37457202/article/details/108836103#2.7%20%E8%AF%84%E4%BC%B0%EF%BC%9A