The performance of a CNN network can be improved to a certain extent by adopting proper tuning and setup mechanisms such as: data pre-processing, batch normalization, optimal pre-initialization of weights; choosing the correct activation function; using techniques such as regularization to avoid overfitting; using an optimal optimization function; and training with plenty of (quality) data.
Beyond these training and architecture-related decisions, there are image-related nuances because of which the performance of visual models may be impacted. Even after controlling the aforementioned training and architectural factors, the conventional CNN-based image classifier does not work well ...