Debugging code written for vector processors can be either complicated or very simple, depending on your toolset. The first item to remember is to have a good IDE (integrated development environment). With this and the proper processor package, the packed data vector registers can be immediately dumped and examined to verify the data is as expected.
I do not want to bog you down or lecture to you as I have done enough of that already, but here are some suggestions for developing assembly code:
Always write your functions in C first.
Vectorize it in C if possible.
Debug the C. Single step ...