深度学习(五):FastFCN代码运行、测试与预测
目录
0 前言
1 环境配置
1.1 安装python包
1.2 下载detail-api
1.3 运行prepare_pcontext.py
1.4 运行 prepare_ade20k.py
2 训练模型
3 测试模型
3.1 下载模型
3.2 测试 encnet_jpu_res50_pcontext.pth.tar
3.2.1 test [single-scale] (单一尺寸:pixAcc=0.7898、mIou=0.5105)
3.2.2 test [multi-scale] (多尺寸:pixAcc=0.7964、mIou=0.5210)
3.2.3 predict [single-scale] (单一尺寸)
4 报错与解决:
4.1 detail-api编译报错
4.2 模型文件丢失
4.3 AttributeError: 'NoneType' object has no attribute 'run_slave'
参考链接:
0 前言
全称:FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation----沈阳自动化所团队
论文:.11816
github:
本机:RTX3070、cuda-11.0、torch-1.7.1+cu110、python3.7
FastFCN下一篇:深度学习(8):FastFCN代码运行、测试与预测2_biter0088的博客-CSDN博客
1 环境配置
官方测试的环境:
PyTorch >= 1.1.0 (Note: The code is test in the environment with
python=3.6, cuda=9.0
)
#master版本,我克隆的是2022年3月版本的,作者可能会有改动
git clone .git
cd FastFCN
1.1 安装python包
创建文件requirements.txt,安装其他包
注:激活python环境 source activate yolov5py37
nose
tqdm
scipy
cython
requests
scikit-image
python3-dev
libevent-dev
cPython
pip install -r requirements.txt
1.2 下载detail-api
下载到FastFCN目录下:
git clone
并注释/xx/FastFCN/scripts/prepare_pcontext.py文件如下:
def install_pcontext_api():#repo_url = ""#os.system("git clone " + repo_url)os.system("cd detail-api/PythonAPI/ && python setup.py install")shutil.rmtree('detail-api')try:import detailexcept Exception:print("Installing PASCAL Context API failed, please install it manually %s"%(repo_url))
注:执行prepare_pcontext.py后,detail-api被安装,上面箭头指的文件夹会被删除
1.3 运行prepare_pcontext.py
文件目录为:/xx/FastFCN/scripts/prepare_pcontext.py,准备VOC2010数据集
python -m scripts.prepare_pcontext
会下载VOC2010数据到如下目录:
#VOC2010数据集
官方网站:.html.
└── VOCdevkit #根目录└── VOC2010 #不同年份的数据集,这里只下载了2012的,还有2007等其它年份的├── Annotations #存放xml文件,与JPEGImages中的图片一一对应,解释图片的内容等等├── ImageSets #该目录下存放的都是txt文件,txt文件中每一行包含一个图片的名称,末尾会加上±1表示正负样本│ ├── Action│ ├── Layout│ ├── Main│ └── Segmentation├── JPEGImages #存放源图片├── SegmentationClass #存放的是图片,语义分割相关,标注出每个像素的类别└── SegmentationObject #存放的是图片,实例分割相关,标注出每个像素属于哪一个物体
下载完成后,会编译安装detail-api,安装完成后会删除前面1.2的下载文件----所以如果detail-api在终端打印输出安装成功时,下面几行就没有作用了,可以注释掉:
os.system("cd detail-api/PythonAPI/ && python setup.py install")shutil.rmtree('detail-api')
注:prepare_pcontext.py程序再次运行时,还会重新下载一遍VOC2010数据集----一个bug(一般如果成功安装包和下载数据后,这个程序就不要运行了);如果第一遍数据下载成功后,出现了一些其他报错,需要再次运行prepare_pcontext.py去准备数据和环境包时,可以将下面几行注释掉:
if __name__ == '__main__':args = parse_args()#mkdir(os.path.expanduser('~/.encoding/data'))#if args.download_dir is not None:# if os.path.isdir(_TARGET_DIR):# os.remove(_TARGET_DIR)# make symlink# os.symlink(args.download_dir, _TARGET_DIR)#else:# download_ade(_TARGET_DIR, overwrite=False)install_pcontext_api()
1.4 运行 prepare_ade20k.py
文件目录为:/xxx/FastFCN/scripts/prepare_ade20k.py,准备ADEChallengeData2016数据集。
python -m scripts.prepare_ade20k
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ python -m scripts.prepare_ade20k
Downloading /home/meng/.encoding/data/downloads/ADEChallengeData2016.zip from .zip...
944710KB [05:23, 2923.61KB/s]
Downloading /home/meng/.encoding/data/downloads/release_test.zip from .zip...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 206856/206856 [04:29<00:00, 766.68KB/s]
(yolov5py37) meng@meng:~/deeplearning/FastFCN$
2 训练模型
在训练模型之前,参考4.2和4.3进行操作
参考:FastFCN/encnet_res50_pcontext.sh at master · wuhuikai/FastFCN · GitHub
训练encnet_res_50模型的参考命令为:
#train
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --checkname encnet_res50_pcontext
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext --model encnet --jpu JPU --aux --se-loss --backbone resnet50 --checkname encnet_res50_pcontext
能够训练,但RuntimeError: CUDA out of memory.
---------先不训练了
3 测试模型
3.1 下载模型
在 下载作者训练好的模型文件。(下图右侧的bash文件包含指令:训练--预测--fps计算)
在下面文件夹中存放上述文件:
3.2 测试 encnet_jpu_res50_pcontext.pth.tar
3.2.1 test [single-scale] (单一尺寸:pixAcc=0.7898、mIou=0.5105)
#github参考输入
#test [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --resume {MODEL} --split val --mode testval
我这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu JPU --aux --se-loss \--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval
像素准确度pixAcc=0.7898,平均交并比mIou=0.5105,测试约10分钟。
3.2.2 test [multi-scale] (多尺寸:pixAcc=0.7964、mIou=0.5210)
#github参考输入
#test [multi-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --resume {MODEL} --split val --mode testval --ms
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu JPU --aux --se-loss \--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval --ms
测试耗时1小时19分钟,像素准确度pixAcc为0.7964,平均交并比为0.5210
test [multi-scale] 比test [single-scale] 多了个选项--ms,--ms在test文件里面首先改变scales
然后跳转到base.py文件,scales值被传递过来
接着在base.py文件里面进行系列的计算:
for scale in self.scales:long_size = int(math.ceil(self.base_size * scale))#math.ceil():大于浮点数的最小整数if h > w:height = long_sizewidth = int(1.0 * w * long_size / h + 0.5) #好像是根据原长h:w来设置新长度height和widthshort_size = widthelse:width = long_sizeheight = int(1.0 * h * long_size / w + 0.5)short_size = height# resize image to current sizecur_img = resize_image(image, height, width, **self.module._up_kwargs)if long_size <= crop_size: #if 和 else 保证pad_img的长宽都不小于crop_sizepad_img = pad_image(cur_img, self.module.mean,self.module.std, crop_size)outputs = module_inference(self.module, pad_img, self.flip)outputs = crop_image(outputs, 0, height, 0, width)else:if short_size < crop_size:# pad if neededpad_img = pad_image(cur_img, self.module.mean,self.module.std, crop_size)else:pad_img = cur_img_,_,ph,pw = pad_img.size()assert(ph >= height and pw >= width)# grid forward and normalizeh_grids = int(math.ceil(1.0 * (ph-crop_size)/stride)) + 1w_grids = int(math.ceil(1.0 * (pw-crop_size)/stride)) + 1with torch.cuda.device_of(image):outputs = image.new().resize_(batch,self.nclass,ph,pw).zero_().cuda()count_norm = image.new().resize_(batch,1,ph,pw).zero_().cuda()# grid evaluationfor idh in range(h_grids):for idw in range(w_grids):h0 = idh * stridew0 = idw * strideh1 = min(h0 + crop_size, ph)w1 = min(w0 + crop_size, pw)crop_img = crop_image(pad_img, h0, h1, w0, w1)# pad if neededpad_crop_img = pad_image(crop_img, self.module.mean,self.module.std, crop_size)output = module_inference(self.module, pad_crop_img, self.flip)outputs[:,:,h0:h1,w0:w1] += crop_image(output,0, h1-h0, 0, w1-w0)count_norm[:,:,h0:h1,w0:w1] += 1assert((count_norm==0).sum()==0)outputs = outputs / count_normoutputs = outputs[:,:,:height,:width]score = resize_image(outputs, h, w, **self.module._up_kwargs)scores += scorereturn scores
注意在base.py里面有对scores的定义:
with torch.cuda.device_of(image):scores = image.new().resize_(batch,self.nclass,h,w).zero_().cuda()
说明在test.py文件中调用的 MultiEvalModule函数应该是为了生成多个尺度的图像用于训练。
在知乎上一个回答是:
3.2.3 predict [single-scale] (单一尺寸)
#github参考输入
#predict [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --resume {MODEL} --split val --mode test
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu JPU --aux --se-loss \--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test
结果为:
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
> --model encnet --jpu JPU --aux --se-loss \
> --backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test
Namespace(aux=True, aux_weight=0.2, backbone='resnet50', base_size=520, batch_size=16, checkname='default', crop_size=480, cuda=True, dataset='pcontext', dilated=False, epochs=80, ft=False, jpu='JPU', lateral=False, lr=0.001, lr_scheduler='poly', mode='test', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume='/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar', save_folder='experiments/segmentation/results', se_loss=True, se_weight=0.2, seed=1, split='val', start_epoch=0, test_batch_size=16, train_split='train', weight_decay=0.0001, workers=16)
loading annotations into memory...
JSON root keys:dict_keys(['info', 'images', 'annos_segmentation', 'annos_occlusion', 'annos_boundary', 'categories', 'parts'])
Done (t=3.22s)
creating index...
index created! (t=2.42s)
mask_file: /home/meng/.encoding/data/VOCdevkit/VOC2010/val.pth
=> loaded checkpoint '/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar' (epoch 79)
观察上面打印的:save_folder,去找预测的结果,进行对比(原图片在:/home/meng/.encoding/data/VOCdevkit/VOC2010/JPEGImages)
对比2008_000064图片
图片介绍文件:2008_000064.xml:
<annotation><folder>VOC2010</folder><filename>2008_000064.jpg</filename><source><database>The VOC2008 Database</database><annotation>PASCAL VOC2008</annotation><image>flickr</image></source><size><width>375</width><height>500</height><depth>3</depth></size><segmented>0</segmented><object><name>aeroplane</name><pose>Frontal</pose><truncated>1</truncated><occluded>0</occluded><bndbox><xmin>1</xmin><ymin>152</ymin><xmax>375</xmax><ymax>461</ymax></bndbox><difficult>0</difficult></object>
</annotation>
4 报错与解决:
4.1 detail-api编译报错
error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually
我第一遍运行prepare_pcontext.py程序时,编译detail-api报错如下,此时我按照1.1和1.2的操作解决了问题.
gcc -pthread -B /home/meng/anaconda3/envs/yolov5py37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include -I../common -I/home/meng/anaconda3/envs/yolov5py37/include/python3.7m -c detail/_mask.c -o build/temp.linux-x86_64-3.7/detail/_mask.o
In file included from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969:0,from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,from detail/_mask.c:461:
/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]#warning "Using deprecated NumPy API, disable it with " \^~~~~~~
detail/_mask.c: In function ‘__Pyx_PyCFunction_FastCall’:
detail/_mask.c:12772:13: error: too many arguments to function ‘(PyObject * (*)(PyObject *, PyObject * const*, Py_ssize_t))meth’return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
detail/_mask.c: In function ‘__Pyx__ExceptionSave’:
detail/_mask.c:14254:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?*type = tstate->exc_type;^~~~~~~~curexc_type
detail/_mask.c:14255:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?*value = tstate->exc_value;^~~~~~~~~curexc_value
detail/_mask.c:14256:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?*tb = tstate->exc_traceback;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c: In function ‘__Pyx__ExceptionReset’:
detail/_mask.c:14263:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tmp_type = tstate->exc_type;^~~~~~~~curexc_type
detail/_mask.c:14264:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tmp_value = tstate->exc_value;^~~~~~~~~curexc_value
detail/_mask.c:14265:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tmp_tb = tstate->exc_traceback;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c:14266:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tstate->exc_type = type;^~~~~~~~curexc_type
detail/_mask.c:14267:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tstate->exc_value = value;^~~~~~~~~curexc_value
detail/_mask.c:14268:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tstate->exc_traceback = tb;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c: In function ‘__Pyx__GetException’:
detail/_mask.c:14323:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tmp_type = tstate->exc_type;^~~~~~~~curexc_type
detail/_mask.c:14324:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tmp_value = tstate->exc_value;^~~~~~~~~curexc_value
detail/_mask.c:14325:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tmp_tb = tstate->exc_traceback;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c:14326:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tstate->exc_type = local_type;^~~~~~~~curexc_type
detail/_mask.c:14327:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tstate->exc_value = local_value;^~~~~~~~~curexc_value
detail/_mask.c:14328:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tstate->exc_traceback = local_tb;^~~~~~~~~~~~~curexc_traceback
error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually
4.2 模型文件丢失
报错:RuntimeError: Failed downloading url .zip
点开报错的链接进入:http://ttps://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip
换了几种上网方式都无法访问,大概是作者删模型文件了吧
在github上提问,作者给了三个模型的下载链接:
将下载的文件放在下面的文件夹中
4.3 AttributeError: 'NoneType' object has no attribute 'run_slave'
报错原因:
The reason is that you're not using multiple GPUs. Change SynBN to regular BN if you want to train on one GPU.
没有使用多个GPU进行训练,如果使用一个GPU进行训练时,将SynBN修改为regular BN
(1)修改/FastFCN/experiments/segmentation/train.py的54行
(2)修改/FastFCN/experiments/segmentation/train.py的111行
(3)移除/FastFCN/experiments/segmentation/train.py的132行
参考链接:
一个博主汇总的部分pytorch官方训练的resnet:
在github上提问:
RuntimeError: Failed downloading url .zip · Issue #108 · wuhuikai/FastFCN · GitHub
多gpu改为单gpu:how to Change SynBN to regular BN ? · Issue #12 · wuhuikai/FastFCN · GitHub
Pascal VOC数据集分析:
Pascal Voc数据集详细分析_持久决心的博客-CSDN博客_pascal voc
知乎:关于多尺度与单一尺度的理解:
如何理解深度学习中的multi scale和single scale? - 知乎
深度学习(五):FastFCN代码运行、测试与预测
目录
0 前言
1 环境配置
1.1 安装python包
1.2 下载detail-api
1.3 运行prepare_pcontext.py
1.4 运行 prepare_ade20k.py
2 训练模型
3 测试模型
3.1 下载模型
3.2 测试 encnet_jpu_res50_pcontext.pth.tar
3.2.1 test [single-scale] (单一尺寸:pixAcc=0.7898、mIou=0.5105)
3.2.2 test [multi-scale] (多尺寸:pixAcc=0.7964、mIou=0.5210)
3.2.3 predict [single-scale] (单一尺寸)
4 报错与解决:
4.1 detail-api编译报错
4.2 模型文件丢失
4.3 AttributeError: 'NoneType' object has no attribute 'run_slave'
参考链接:
0 前言
全称:FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation----沈阳自动化所团队
论文:.11816
github:
本机:RTX3070、cuda-11.0、torch-1.7.1+cu110、python3.7
FastFCN下一篇:深度学习(8):FastFCN代码运行、测试与预测2_biter0088的博客-CSDN博客
1 环境配置
官方测试的环境:
PyTorch >= 1.1.0 (Note: The code is test in the environment with
python=3.6, cuda=9.0
)
#master版本,我克隆的是2022年3月版本的,作者可能会有改动
git clone .git
cd FastFCN
1.1 安装python包
创建文件requirements.txt,安装其他包
注:激活python环境 source activate yolov5py37
nose
tqdm
scipy
cython
requests
scikit-image
python3-dev
libevent-dev
cPython
pip install -r requirements.txt
1.2 下载detail-api
下载到FastFCN目录下:
git clone
并注释/xx/FastFCN/scripts/prepare_pcontext.py文件如下:
def install_pcontext_api():#repo_url = ""#os.system("git clone " + repo_url)os.system("cd detail-api/PythonAPI/ && python setup.py install")shutil.rmtree('detail-api')try:import detailexcept Exception:print("Installing PASCAL Context API failed, please install it manually %s"%(repo_url))
注:执行prepare_pcontext.py后,detail-api被安装,上面箭头指的文件夹会被删除
1.3 运行prepare_pcontext.py
文件目录为:/xx/FastFCN/scripts/prepare_pcontext.py,准备VOC2010数据集
python -m scripts.prepare_pcontext
会下载VOC2010数据到如下目录:
#VOC2010数据集
官方网站:.html.
└── VOCdevkit #根目录└── VOC2010 #不同年份的数据集,这里只下载了2012的,还有2007等其它年份的├── Annotations #存放xml文件,与JPEGImages中的图片一一对应,解释图片的内容等等├── ImageSets #该目录下存放的都是txt文件,txt文件中每一行包含一个图片的名称,末尾会加上±1表示正负样本│ ├── Action│ ├── Layout│ ├── Main│ └── Segmentation├── JPEGImages #存放源图片├── SegmentationClass #存放的是图片,语义分割相关,标注出每个像素的类别└── SegmentationObject #存放的是图片,实例分割相关,标注出每个像素属于哪一个物体
下载完成后,会编译安装detail-api,安装完成后会删除前面1.2的下载文件----所以如果detail-api在终端打印输出安装成功时,下面几行就没有作用了,可以注释掉:
os.system("cd detail-api/PythonAPI/ && python setup.py install")shutil.rmtree('detail-api')
注:prepare_pcontext.py程序再次运行时,还会重新下载一遍VOC2010数据集----一个bug(一般如果成功安装包和下载数据后,这个程序就不要运行了);如果第一遍数据下载成功后,出现了一些其他报错,需要再次运行prepare_pcontext.py去准备数据和环境包时,可以将下面几行注释掉:
if __name__ == '__main__':args = parse_args()#mkdir(os.path.expanduser('~/.encoding/data'))#if args.download_dir is not None:# if os.path.isdir(_TARGET_DIR):# os.remove(_TARGET_DIR)# make symlink# os.symlink(args.download_dir, _TARGET_DIR)#else:# download_ade(_TARGET_DIR, overwrite=False)install_pcontext_api()
1.4 运行 prepare_ade20k.py
文件目录为:/xxx/FastFCN/scripts/prepare_ade20k.py,准备ADEChallengeData2016数据集。
python -m scripts.prepare_ade20k
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ python -m scripts.prepare_ade20k
Downloading /home/meng/.encoding/data/downloads/ADEChallengeData2016.zip from .zip...
944710KB [05:23, 2923.61KB/s]
Downloading /home/meng/.encoding/data/downloads/release_test.zip from .zip...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 206856/206856 [04:29<00:00, 766.68KB/s]
(yolov5py37) meng@meng:~/deeplearning/FastFCN$
2 训练模型
在训练模型之前,参考4.2和4.3进行操作
参考:FastFCN/encnet_res50_pcontext.sh at master · wuhuikai/FastFCN · GitHub
训练encnet_res_50模型的参考命令为:
#train
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --checkname encnet_res50_pcontext
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext --model encnet --jpu JPU --aux --se-loss --backbone resnet50 --checkname encnet_res50_pcontext
能够训练,但RuntimeError: CUDA out of memory.
---------先不训练了
3 测试模型
3.1 下载模型
在 下载作者训练好的模型文件。(下图右侧的bash文件包含指令:训练--预测--fps计算)
在下面文件夹中存放上述文件:
3.2 测试 encnet_jpu_res50_pcontext.pth.tar
3.2.1 test [single-scale] (单一尺寸:pixAcc=0.7898、mIou=0.5105)
#github参考输入
#test [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --resume {MODEL} --split val --mode testval
我这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu JPU --aux --se-loss \--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval
像素准确度pixAcc=0.7898,平均交并比mIou=0.5105,测试约10分钟。
3.2.2 test [multi-scale] (多尺寸:pixAcc=0.7964、mIou=0.5210)
#github参考输入
#test [multi-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --resume {MODEL} --split val --mode testval --ms
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu JPU --aux --se-loss \--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval --ms
测试耗时1小时19分钟,像素准确度pixAcc为0.7964,平均交并比为0.5210
test [multi-scale] 比test [single-scale] 多了个选项--ms,--ms在test文件里面首先改变scales
然后跳转到base.py文件,scales值被传递过来
接着在base.py文件里面进行系列的计算:
for scale in self.scales:long_size = int(math.ceil(self.base_size * scale))#math.ceil():大于浮点数的最小整数if h > w:height = long_sizewidth = int(1.0 * w * long_size / h + 0.5) #好像是根据原长h:w来设置新长度height和widthshort_size = widthelse:width = long_sizeheight = int(1.0 * h * long_size / w + 0.5)short_size = height# resize image to current sizecur_img = resize_image(image, height, width, **self.module._up_kwargs)if long_size <= crop_size: #if 和 else 保证pad_img的长宽都不小于crop_sizepad_img = pad_image(cur_img, self.module.mean,self.module.std, crop_size)outputs = module_inference(self.module, pad_img, self.flip)outputs = crop_image(outputs, 0, height, 0, width)else:if short_size < crop_size:# pad if neededpad_img = pad_image(cur_img, self.module.mean,self.module.std, crop_size)else:pad_img = cur_img_,_,ph,pw = pad_img.size()assert(ph >= height and pw >= width)# grid forward and normalizeh_grids = int(math.ceil(1.0 * (ph-crop_size)/stride)) + 1w_grids = int(math.ceil(1.0 * (pw-crop_size)/stride)) + 1with torch.cuda.device_of(image):outputs = image.new().resize_(batch,self.nclass,ph,pw).zero_().cuda()count_norm = image.new().resize_(batch,1,ph,pw).zero_().cuda()# grid evaluationfor idh in range(h_grids):for idw in range(w_grids):h0 = idh * stridew0 = idw * strideh1 = min(h0 + crop_size, ph)w1 = min(w0 + crop_size, pw)crop_img = crop_image(pad_img, h0, h1, w0, w1)# pad if neededpad_crop_img = pad_image(crop_img, self.module.mean,self.module.std, crop_size)output = module_inference(self.module, pad_crop_img, self.flip)outputs[:,:,h0:h1,w0:w1] += crop_image(output,0, h1-h0, 0, w1-w0)count_norm[:,:,h0:h1,w0:w1] += 1assert((count_norm==0).sum()==0)outputs = outputs / count_normoutputs = outputs[:,:,:height,:width]score = resize_image(outputs, h, w, **self.module._up_kwargs)scores += scorereturn scores
注意在base.py里面有对scores的定义:
with torch.cuda.device_of(image):scores = image.new().resize_(batch,self.nclass,h,w).zero_().cuda()
说明在test.py文件中调用的 MultiEvalModule函数应该是为了生成多个尺度的图像用于训练。
在知乎上一个回答是:
3.2.3 predict [single-scale] (单一尺寸)
#github参考输入
#predict [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu [JPU|JPU_X] --aux --se-loss \--backbone resnet50 --resume {MODEL} --split val --mode test
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \--model encnet --jpu JPU --aux --se-loss \--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test
结果为:
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
> --model encnet --jpu JPU --aux --se-loss \
> --backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test
Namespace(aux=True, aux_weight=0.2, backbone='resnet50', base_size=520, batch_size=16, checkname='default', crop_size=480, cuda=True, dataset='pcontext', dilated=False, epochs=80, ft=False, jpu='JPU', lateral=False, lr=0.001, lr_scheduler='poly', mode='test', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume='/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar', save_folder='experiments/segmentation/results', se_loss=True, se_weight=0.2, seed=1, split='val', start_epoch=0, test_batch_size=16, train_split='train', weight_decay=0.0001, workers=16)
loading annotations into memory...
JSON root keys:dict_keys(['info', 'images', 'annos_segmentation', 'annos_occlusion', 'annos_boundary', 'categories', 'parts'])
Done (t=3.22s)
creating index...
index created! (t=2.42s)
mask_file: /home/meng/.encoding/data/VOCdevkit/VOC2010/val.pth
=> loaded checkpoint '/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar' (epoch 79)
观察上面打印的:save_folder,去找预测的结果,进行对比(原图片在:/home/meng/.encoding/data/VOCdevkit/VOC2010/JPEGImages)
对比2008_000064图片
图片介绍文件:2008_000064.xml:
<annotation><folder>VOC2010</folder><filename>2008_000064.jpg</filename><source><database>The VOC2008 Database</database><annotation>PASCAL VOC2008</annotation><image>flickr</image></source><size><width>375</width><height>500</height><depth>3</depth></size><segmented>0</segmented><object><name>aeroplane</name><pose>Frontal</pose><truncated>1</truncated><occluded>0</occluded><bndbox><xmin>1</xmin><ymin>152</ymin><xmax>375</xmax><ymax>461</ymax></bndbox><difficult>0</difficult></object>
</annotation>
4 报错与解决:
4.1 detail-api编译报错
error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually
我第一遍运行prepare_pcontext.py程序时,编译detail-api报错如下,此时我按照1.1和1.2的操作解决了问题.
gcc -pthread -B /home/meng/anaconda3/envs/yolov5py37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include -I../common -I/home/meng/anaconda3/envs/yolov5py37/include/python3.7m -c detail/_mask.c -o build/temp.linux-x86_64-3.7/detail/_mask.o
In file included from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969:0,from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,from detail/_mask.c:461:
/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]#warning "Using deprecated NumPy API, disable it with " \^~~~~~~
detail/_mask.c: In function ‘__Pyx_PyCFunction_FastCall’:
detail/_mask.c:12772:13: error: too many arguments to function ‘(PyObject * (*)(PyObject *, PyObject * const*, Py_ssize_t))meth’return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
detail/_mask.c: In function ‘__Pyx__ExceptionSave’:
detail/_mask.c:14254:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?*type = tstate->exc_type;^~~~~~~~curexc_type
detail/_mask.c:14255:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?*value = tstate->exc_value;^~~~~~~~~curexc_value
detail/_mask.c:14256:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?*tb = tstate->exc_traceback;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c: In function ‘__Pyx__ExceptionReset’:
detail/_mask.c:14263:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tmp_type = tstate->exc_type;^~~~~~~~curexc_type
detail/_mask.c:14264:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tmp_value = tstate->exc_value;^~~~~~~~~curexc_value
detail/_mask.c:14265:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tmp_tb = tstate->exc_traceback;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c:14266:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tstate->exc_type = type;^~~~~~~~curexc_type
detail/_mask.c:14267:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tstate->exc_value = value;^~~~~~~~~curexc_value
detail/_mask.c:14268:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tstate->exc_traceback = tb;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c: In function ‘__Pyx__GetException’:
detail/_mask.c:14323:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tmp_type = tstate->exc_type;^~~~~~~~curexc_type
detail/_mask.c:14324:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tmp_value = tstate->exc_value;^~~~~~~~~curexc_value
detail/_mask.c:14325:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tmp_tb = tstate->exc_traceback;^~~~~~~~~~~~~curexc_traceback
detail/_mask.c:14326:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?tstate->exc_type = local_type;^~~~~~~~curexc_type
detail/_mask.c:14327:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?tstate->exc_value = local_value;^~~~~~~~~curexc_value
detail/_mask.c:14328:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?tstate->exc_traceback = local_tb;^~~~~~~~~~~~~curexc_traceback
error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually
4.2 模型文件丢失
报错:RuntimeError: Failed downloading url .zip
点开报错的链接进入:http://ttps://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip
换了几种上网方式都无法访问,大概是作者删模型文件了吧
在github上提问,作者给了三个模型的下载链接:
将下载的文件放在下面的文件夹中
4.3 AttributeError: 'NoneType' object has no attribute 'run_slave'
报错原因:
The reason is that you're not using multiple GPUs. Change SynBN to regular BN if you want to train on one GPU.
没有使用多个GPU进行训练,如果使用一个GPU进行训练时,将SynBN修改为regular BN
(1)修改/FastFCN/experiments/segmentation/train.py的54行
(2)修改/FastFCN/experiments/segmentation/train.py的111行
(3)移除/FastFCN/experiments/segmentation/train.py的132行
参考链接:
一个博主汇总的部分pytorch官方训练的resnet:
在github上提问:
RuntimeError: Failed downloading url .zip · Issue #108 · wuhuikai/FastFCN · GitHub
多gpu改为单gpu:how to Change SynBN to regular BN ? · Issue #12 · wuhuikai/FastFCN · GitHub
Pascal VOC数据集分析:
Pascal Voc数据集详细分析_持久决心的博客-CSDN博客_pascal voc
知乎:关于多尺度与单一尺度的理解:
如何理解深度学习中的multi scale和single scale? - 知乎