-
Pytorch and Chainer are in great resemblance, from architecture to API grammer.
-
x_cuda = x.cuda(device_id=0)returns a new tensor on gpu0, meanwhile x is kept as it was; i.e., this is copy op. After this, change x has no effect on x_cuda and vice versa. Use.cpu()to copy the tensor from GPU to -
Pytorch variable has no .zero_grad() function as Chainer, use
.data.zero_()to do the job (In Pytorch, only nn.Module class has .zero_grad() attribute) -
Pytorch use Module.train() & Model.eval() calls with empty param to switch between training mode and evaluation mode
-
torch.Tensor() is just an alias of torch.FloatTensor(), not what I expected as an universal constructer which would determine the dtype according to input ndarray. torch.from_numpy() does this job meanwhile.
-
For RNN with multiple layers and dropout enabled, the pytorch implementation does not apply dropout for the last layer.
-
The difference between
LSTMandLSTMCelllies in the input shape: input ofLSTMis 3D (B, T, D) whereas input ofLSTMCellis 2D (B, D), i.e.,LSTMCellis used for just one time step. -
The
LSTMimplementation of Pytorch has two problems: 1) no peepholes 2) two biases ini_t, f_t, g_t, o_t, which is nonsense (according to Facebook's comment, it's from CuDNN's convention) -
Pytorch does not support numpy-style broadcasting, so to do element-wise multiplication, for example
X(3, 50) andy(50), you need do.unsqueezeand then.expand:X * y.unsqueeze(0).expand_as(X) -
Even with
batch_first=True, the hiddens returned by LSTM(GRU, etc) are still of size(num_layers * num_directions, batch, hidden_size) -
The
CNNimplementation of Pytorch does not do filter flipping by default; and its speed is comparable to Theano'sCNN, produces exactly the same result, meanwhileconvolve2d()of scipy is about 2 * times slower, and result slightly different (within 10eps); source code of Pytorch's convolution resides inpytorch/torch/csrc/autograd/functions/convolution.cpp -
Pytorch does not do weight initialization automatically, you have to define a
reset_parameters()function yourself, and call it at model initialization. -
Tensor.numpy()return a numpy array sharing the memory. In another word, it just returns the memory pointer. So we can use this mechanism to MODIFY the value of a tensor, though not intuitive. -
To set a model/module to switch between train/predict mode, call
nn.Module.train(True/False). This function will recursively set every child module's mode. NEVER useself.training=True/False, this only applies to the current module without effecting its children.