跑实验记录一
记录 实验
2023-09-14 09:11:23 时间
1.使用tagger&wikipedia-pubmed-and-PMC-w2v词向量
Loading pretrained embeddings from ../.local/lib/python3.5/site-packages/neuroner/data/word_vectors/wikipedia-pubmed-and-PMC-w2v.txt... WARNING: 5443657 invalid lines Loaded 0 pretrained embeddings. 0 / 18309 (0.0000%) words have been initialized with pretrained embeddings. 0 found directly, 0 after lowercasing, 0 after lowercasing + zero. Compiling...
词向量无效的问题。
2.使用tagger&PMC-w2v词向量
Loading pretrained embeddings from ./dataset/PMC-w2v.txt... WARNING: 2515687 invalid lines Loaded 0 pretrained embeddings. 0 / 18141 (0.0000%) words have been initialized with pretrained embeddings. 0 found directly, 0 after lowercasing, 0 after lowercasing + zero. Compiling...
依旧是词向量不能加载的问题。
解决:找到原因了,因为词向量中的维度和默认维度不同,需要指定默认维度啊,--word_dim 200。即可:
Found 10407 unique words (115614 in total)
Loading pretrained embeddings from ./dataset/PMC-w2v.txt...
Found 80 unique characters
Found 9 unique named entity tags
4595 / 4598 / 4840 sentences in train / dev / test.
Saving the mappings to disk...
Loading pretrained embeddings from ./dataset/PMC-w2v.txt... WARNING: 1 invalid lines Loaded 2515686 pretrained embeddings. 17963 / 18141 (99.0188%) words have been initialized with pretrained embeddings. 17876 found directly, 46 after lowercasing, 41 after lowercasing + zero. Compiling...
目前使用的是Att中的CDR数据集进行训练的。
3.使用tagger和chemdner_pubmed_drug.word2vec_model_token4_d50词向量
相关文章
- Hive错误记录
- 追踪记录每笔业务操作数据改变的利器——SQLCDC
- 对LMAX架构以及Event Sourcing模式的一些新思考和问题的记录
- CAS3.3.0在logout后不能自动根据service跳转,所以装了个CAS4.0.0,记录一下
- 成功解决YOLOv3测试——could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZE作记录
- 如何实现把多个git仓库合并为一个,并保留提交记录?
- 【面经】面试官:如何以最高的效率从MySQL中随机查询一条记录?
- 罗技 M558 鼠标维修记录
- RK3399平台开发系列讲解(系统修改记录篇)1.20、system.img扩容