Breadth-first crawling is when priority is given to finding new domains and spreading out as far as possible, as opposed to continuing through a single domain in a depth-first manner.
Writing a breadth-first crawler will be left as an exercise for the reader based on the information provided in this chapter. It is not very different from the depth-first crawler in the previous section, except that it should prioritize URLs that point to domains that have not been seen before.
There are a couple of notes to keep in mind. If you're not careful and you don't set a maximum limit, you could potentially end up crawling petabytes of data! You might choose to ignore subdomains, or you can enter a site that has infinite subdomains ...