RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
Error not status when CUDA handle initialized RuntimeError
2023-09-27 14:19:49 时间
问题背景
今天训练BERT时遇到了这个bug:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
于是在网上搜罗了一番,发现基本都是在说batch size开的太大,但调小batch size对我而言并不能解决问题。
解决过程
既然是比较罕见的CUDA报错,为什么不尝试先在CPU上跑跑看看呢?
于是我将 device = 'cuda' if torch.cuda.is_available() else 'cpu'
直接改成了 device = 'cpu'
,再运行代码时遇到了如下的bug(只截取了最后几行):
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
容易看出这是因为embedding层无法对输入的某些索引进行lookup,即词表大小设置的有问题,于是又回过头去翻翻自己写的BERT代码:
class BERT(nn.Module):
def __init__(self, vocab):
super().__init__()
self.vocab = vocab
self.config = BertConfig()
self.model = BertModel(config=self.config)
self.config.vocab_size = len(vocab)
很显然,BERT模型实例化了之后才修改的词表大小,这样做毫无意义,对调最后两行后成功解决!
相关文章
- gunicorn日志系列4-flask接口入参出参日志,合并到gunicorn的日志文件error.log,并且打印到控制台,方便定位接口问题
- Error: Transaction check error: package managesoft-13.1.1-1.x86_64 does not verify: no digest
- ERROR: Process pool report error: Can‘t pickle
- Mac M1 运行PyCharm出现的问题:Error loading: /Applications/PyCharm CE.app/.../attach_x86_64.dylib
- 解决Running setup.py install for pillow ...error的方案
- "error" : "Content-Type header [application/x-www-form-urlencoded] is not supported"
- {dataSource-1} init error和Could not autowire.No beans of ‘xxx‘ type found
- yocto编译时报错"fontconfig-2.12.1/src/fcmatch.c:324:63: error: ‘PRI_CHAR_WIDTH_STRONG' undeclared here (not in a function); did you mean ‘PRI_WIDTH_STRONG’?"
- 使用insmod命令无法加载模块,内核提示"<your kernel name>: disagrees about version of symbol module_layout insmod: ERROR: could not insert module <your kernel name>.ko: Invalid module format
- java.io.IOException: Cannot run program "jarsigner": CreateProcess error=2,No such file
- 转 MySQL 1064 You have an error in your SQL syntax 错误解决办法
- HTTP 404 Not Found Error with .woff or .woff2 Font Files
- Windows下无法启动虚拟机: “VMware Workstation and Device/Credential Guard are not compatible“ error in VM...“
- pull docker/getting-started 80端口被占用 Error response from daemon: Ports are not available
- RTX显卡 运行TensorFlow=1.14.0 代码 报错 Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
- Vue3实践指南:Prettier代码格式化工具、格式化Vue出现单引号变双引号及分号问题、useRouter执行后undefined问题、Property 'value' does not exist on type 'HTMLElement'、error Unexpected mutation of “xxxx“ prop
- vue中watch的用法总结以及报错处理Error in callback for watcher "checkList"
- Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=stock32 error was 14: curl#6 - "Could not resolve host: mirrorlist.centos.org; Unknown error"
- Maven报错 解决方案。ERROR: No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id
- (phpmyadmin error)Login without a password is forbidden by configuration (see AllowNoPassword) in ubuntu
- How to fix “HTTP Status Code 505 – HTTP Version Not Supported” error?--转
- 微信小程序 thirdScriptError sdk uncaught third Error regeneratorRuntime is not defined ReferenceError: regeneratorRuntime is not defined
- 解决Vue启动失败报错:Module not found: Error: Can‘t resolve ‘less-loader‘
- GIT error: You have not concluded your merge (MERGE_HEAD exists)