【代码阅读】详解在Pytorch中定义⾃⼰写的CUDA编程函数
⽂章⽬录
⽬前,3D的⽹络,尤其时point-based的⽹络,很多模块在pytorch中都没有官⽅实现,这就需要我们⾃⼰写。例如PointNet++中的FPS,group,query等函数。之前也只是⽤过,对其的修改也限于python层⾯,这次,就好好探究⼀下,如何⾃定义⼀个函数,如何将其加⼊到pytorch中,使得在pytorch中也能⽤。
官⽅⽂档中清楚的给出了两种将⾃⼰定义的cuda编程的函数放⼊pytorch中的⽅法。⼀种是通过编译,⽣成⼀个python的包,⼀种是在程序执⾏中调⽤。
个⼈认为编译的⽅法更好⼀些,⽣成了⼀个python包,在其他的project中也很⽅便调⽤。
⾸先,我们先看⼀下pytorch接⼝的设置,这⾥,我们先假设已经写好了函数。
pytorch接⼝设置
编译的⽅式
假设我们已经写好了要实现的函数,在本例中,函数包括pointnet2/src中的⼀系列xxx.cpp,xxx.cu和xxx.h。
那么我们如何放到pytorch的接⼝中呢?这就要看pointnet2/setup.py中:
# 这两个import是标准写法,不⽤改,setuptools是为了把我们⾃定义的函数变成⼀个包
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
setup(
# 这个包的name是pointnet2
name='pointnet2',
ext_modules=[
# 模块的name是pointnet2_cuda,就是说要import pointnet2_cuda
# 定义与这个包关联的xxx.cpp, xxx.cu, xxx.h
CUDAExtension('pointnet2_cuda',[
'src/pointnet2_api.cpp',
'src/ball_query.cpp',
'src/ball_query_gpu.cu',
'src/group_points.cpp',
'src/group_points_gpu.cu',
'src/interpolate.cpp',
'src/interpolate_gpu.cu',
'src/sampling.cpp',
'src/sampling_gpu.cu',
],
# 以下的东西都不⽤改
extra_compile_args={'cxx':['-g'],
'nvcc':['-O2']})
],
cmdclass={'build_ext': BuildExtension}
)
在我们⽤这些函数之前,要先运⾏
python setup.py install
其实就是在把我们定义的这些函数,集合成⼀个包安装起来。这就出现了⼀个问题,函数包是安装上了,但我们⽤什么接⼝去调⽤函数呢?
这部分就定义在pointnet2/pointnet2_api.py中
#include<torch/serialize/tensor.h>
#include<torch/extension.h>
// 把写好的函数都先include进来
#include"ball_query_gpu.h"
#include"group_points_gpu.h"
#include"sampling_gpu.h"
#include"interpolate_gpu.h"
// 使⽤PYBIND11_MODULE,这个是在torch/extension.h中包含了的
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m){
// python中调⽤时使⽤的函数名为:ball_query_wrapper
// cpp中相关的函数是:ball_query_wrapper_fast
// python中调⽤help所产⽣的提⽰是:"ball_query_wrapper_fast"
m.def("ball_query_wrapper",&ball_query_wrapper_fast,"ball_query_wrapper_fast");
m.def("group_points_wrapper",&group_points_wrapper_fast,"group_points_wrapper_fast");
m.def("group_points_grad_wrapper",&group_points_grad_wrapper_fast,"group_points_grad_wrapper_fast");
m.def("gather_points_wrapper",&gather_points_wrapper_fast,"gather_points_wrapper_fast");
m.def("gather_points_grad_wrapper",&gather_points_grad_wrapper_fast,"gather_points_grad_wrapper_fast");
m.def("furthest_point_sampling_wrapper",&furthest_point_sampling_wrapper,"furthest_point_sampling_wrapper");
m.def("three_nn_wrapper",&three_nn_wrapper_fast,"three_nn_wrapper_fast");
m.def("three_interpolate_wrapper",&three_interpolate_wrapper_fast,"three_interpolate_wrapper_fast");
m.def("three_interpolate_grad_wrapper",&three_interpolate_grad_wrapper_fast,"three_interpolate_grad_wrapper_fast"); }
上⾯就完成了pytorch中要调⽤的接⼝了。那么我们看⼀下,是如何调⽤的?
这个在pointnet2/pointnet2_utils.py中,以
from torch.autograd import Variable
from torch.autograd import Function
as nn
from typing import Tuple
# import我们⾃⼰定义的包
import pointnet2_cuda as pointnet2
# 定义⼀个pytorch的函数,要继承torch.autograd.Function
class GatherOperation(Function):
# 定义前向运算,ctx保存⼀些变量,保存如ctx中的变量会传⼊backward中
@staticmethod
def forward(ctx, features: torch.Tensor, idx: torch.Tensor)-> torch.Tensor:
"""
:param ctx:
:param features: (B, C, N)
:param idx: (B, npoint) index tensor of the features to gather
:return:
output: (B, C, npoint)
"""
assert features.is_contiguous()
assert idx.is_contiguous()
B, npoint = idx.size()
_, C, N = features.size()
output = torch.cuda.FloatTensor(B, C, npoint)
# 调⽤我们定义的函数,进⾏计算
pointnet2.gather_points_wrapper(B, C, N, npoint, features, idx, output)
# 将反向传播中要⽤到的变量放⼊ctx中
ctx.for_backwards =(idx, C, N)
return outputpython怎么读文件夹下的文件夹
# 定义反向传播的函数,其输⼊的第⼀个变量是ctx,然后其他输⼊的数量与forward的输出的数量相同
@staticmethod
def backward(ctx, grad_out):
# 从ctx中取出前向计算中保存的变量
idx, C, N = ctx.for_backwards
B, npoint = idx.size()
grad_features = Variable(torch.cuda.FloatTensor(B, C, N).zero_())
grad_out_data = grad_iguous()
pointnet2.gather_points_grad_wrapper(B, C, N, npoint, grad_out_data, idx, grad_features.data)
# 输出变量的数量必须与forward输⼊的变量数量(除ctx之外)相同
return grad_features,None
# 调⽤我们定义的函数的⽅法是outputs = xxx.apply(inputs),这⾥预先把apply取出来,所以⽤的时候就可以直接使⽤ outputs = gather_operation(inputs)即可gather_operation = GatherOperation.apply
在运⾏是调⽤的形式
以PVCNN中的代码为例。PVCNN中的xxx.cpp,xxx.cu,xxx.h都modules/functional/src⽂件夹中。
对应编译的⽅式的顺序来看,先看看,xxx.cpp和xxx.cu是怎么被pytorch所知道的呢?这个在modules/backend.py中:
from torch.utils.cpp_extension import load
_src_path = os.path.dirname(os.path.abspath(__file__))
_backend = load(name='_pvcnn_backend',
extra_cflags=['-O3','-std=c++17'],
sources=[os.path.join(_src_path,'src', f)for f in[
'ball_query/ball_query.cpp',
'ball_query/ball_query.cu',
'grouping/grouping.cpp',
'grouping/grouping.cu',
'interpolate/neighbor_interpolate.cpp',
'interpolate/neighbor_interpolate.cu',
'interpolate/trilinear_devox.cpp',
'interpolate/trilinear_devox.cu',
'sampling/sampling.cpp',
'sampling/sampling.cu',
'voxelization/vox.cpp',
'voxelization/vox.cu',
'bindings.cpp',
]]
)
__all__ =['_backend']
可以说,这个就是标准代码,也就name和sources需要按照⾃⼰的写⼀下。
那就出现了下⼀个问题,这些程序是已经被pytorch知道了,但接⼝该怎么调⽤呢,哪个函数是哪个函数呢?这个跟编译的⽅式的接⼝的定义⽅式是⼀样的。在modules/functional/src/bindings.cpp中:
#include<pybind11/pybind11.h>
#include"ball_query/ball_query.hpp"
#include"grouping/grouping.hpp"
#include"interpolate/neighbor_interpolate.hpp"
#include"interpolate/trilinear_devox.hpp"
#include"sampling/sampling.hpp"
#include"voxelization/vox.hpp"
PYBIND11_MODULE(_pvcnn_backend, m){
m.def("gather_features_forward",&gather_features_forward,
"Gather Centers' Features forward (CUDA)");
m.def("gather_features_backward",&gather_features_backward,
"Gather Centers' Features backward (CUDA)");
m.def("furthest_point_sampling",&furthest_point_sampling_forward,
"Furthest Point Sampling (CUDA)");
m.def("ball_query",&ball_query_forward,"Ball Query (CUDA)");
m.def("grouping_forward",&grouping_forward,
"Grouping Features forward (CUDA)");
m.def("grouping_backward",&grouping_backward,
"Grouping Features backward (CUDA)");
m.def("three_nearest_neighbors_interpolate_forward",
&three_nearest_neighbors_interpolate_forward,
"3 Nearest Neighbors Interpolate forward (CUDA)");
m.def("three_nearest_neighbors_interpolate_backward",
&three_nearest_neighbors_interpolate_backward,
"3 Nearest Neighbors Interpolate backward (CUDA)");
m.def("trilinear_devoxelize_forward",&trilinear_devoxelize_forward,
"Trilinear Devoxelization forward (CUDA)");
m.def("trilinear_devoxelize_backward",&trilinear_devoxelize_backward,
"Trilinear Devoxelization backward (CUDA)");
m.def("avg_voxelize_forward",&avg_voxelize_forward,
"Voxelization forward with average pooling (CUDA)");
m.def("avg_voxelize_backward",&avg_voxelize_backward,
"Voxelization backward (CUDA)");
}
紧接着,下⼀个问题,知道了xxx.cpp所对应的函数在python中是怎么调⽤的,那如何封装为⼀个pytorch的Function呢?这个与编译的⽅式的定义⽅式也是⼀样的。以modules/functional/grouping.py为例:

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。