As you can see, the vectorized UDFs provide ~100x performance improvements! Don't get too excited, as such speedups are only expected for more complex queries, such as the one we used previously.


Cover of PySpark Cookbook


Both the measurement seem same. This is a error in the book .