Even algorithms that may seem very simple might contain optimizations you wouldn't consider. Let's have a look at std::find(), for example. At a glance, it seems that the obvious implementation couldn't be optimized further. Here is a possible implementation of the std::find() algorithm:
template <typename It, typename Value>auto find_slow(It first, It last, const Value& value) { for(auto it = first; it != last; ++it) if(*it == value) return it; return last;}
However, looking through the libstdc++ implementation, when being used with RandomAccessIterator (in other words, std::vector, std::string, std::deque, and std::array), the libc++ implementers have unrolled the for-loop ...