Pytorch Bug记录

`RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.`

解决方法
view()需要Tensor中的元素地址是连续的，但可能出现Tensor不连续的情况，所以先用 .contiguous() 将其在内存中变成连续分布：

1	out = out.contiguous().view(out.size()[0], -1)

`RuntimeError: The size of tensor a (32) must match the size of tensor b (28) at non-singleton` or `ValueError: expected sequence of length 51 at dim 1 (got 0)`

问题及解决
pytorch中将list转为tensor的时候，如果list中长度不同会报维度不匹配的错误，查看list中的元素寻找解决方法

`RTX3090运行stylegan编译不成功`

1	export TORCH_CUDA_ARCH_LIST="8.0"

`RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0`

问题出在tensor.sub_(mean[:, None, None]).div_(std[:, None, None])
在读取PNG图片然后进行transform的正则化过程中，Image会读取成RGBA，即四个通道，一般在做图像处理只读取三个通道，所以归一化也只做三个通道，因此报错

解决办法
加上.convert(‘RGB’)

1	image = Image.open("img_path").convert('RGB')

`torch.utils.cpp_extension.load卡住无响应`

参考文章

torch.utils.cpp_extension.load卡住无响应

主要原因是因为读写锁，在home/username/的路径下的.cache文件中，找到torch_extensions文件夹，然后寻找里面的文件夹是否存在lock文件，存在删除即可。

`TypeError: Caught TypeError in DataLoader worker process 0`

问题极大可能出现在transform，实际上和work的数量没什么关系

我第一次遇到这个bug的时候是因为送入的transform为None

`RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.`

在网络训练过程中，在一次迭代中使用多次backward()时就有可能会遇到这个问题。说明你的中间变量进行了两次的梯度回传。

解决
查找除了叶子节点的其他中间节点，是否重复使用，若有，则对这个中间节点使用
1
var_new = var.clone().detach().requires_grad_(True)
或者
1
var.detach()
pytorch 中的Out of memory(显存占用问题)记录

计算loss但是不进行梯度回传
读取数据的batch 大小
用于计算loss时的图片大小(1024和256使用同一个loss计算可能显存占用差别会很大)
即使设置generator的grad为false，但是forward经过generator生成的图片仍带有梯度；

pytorch中程序退出但显存未释放

查看进程

1	fuser -v /dev/nvidia*

杀死进程

1	kill -9 PID

Pytorch-Q&A

Pytorch Bug记录

`RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.`

`RuntimeError: The size of tensor a (32) must match the size of tensor b (28) at non-singleton` or `ValueError: expected sequence of length 51 at dim 1 (got 0)`

`RTX3090运行stylegan编译不成功`

`RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0`

`torch.utils.cpp_extension.load卡住无响应`

`TypeError: Caught TypeError in DataLoader worker process 0`

`RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.`

pytorch 中的`Out of memory`(显存占用问题)记录

pytorch中程序退出但显存未释放

Pytorch Bug记录

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

RuntimeError: The size of tensor a (32) must match the size of tensor b (28) at non-singleton or ValueError: expected sequence of length 51 at dim 1 (got 0)

RTX3090运行stylegan编译不成功

RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

torch.utils.cpp_extension.load卡住无响应

TypeError: Caught TypeError in DataLoader worker process 0

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

pytorch 中的Out of memory(显存占用问题)记录

pytorch中程序退出但显存未释放

`RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.`

`RuntimeError: The size of tensor a (32) must match the size of tensor b (28) at non-singleton` or `ValueError: expected sequence of length 51 at dim 1 (got 0)`

`RTX3090运行stylegan编译不成功`

`RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0`

`torch.utils.cpp_extension.load卡住无响应`

`TypeError: Caught TypeError in DataLoader worker process 0`

`RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.`

pytorch 中的`Out of memory`(显存占用问题)记录