We use the VGG-based Fully Convolution Network approach for per-pixel scene segmentation. The model supports 21 classes. The model is quite time consuming and inference might take up a significant amount of CPU time, so be patient.
The following result is expected: