Cover by Donald Bruce Stewart, Bryan O'Sullivan, John Goerzen

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

A Concurrent Web Link Checker

As a practical example of using STM, we will develop a program that checks an HTML file for broken links—that is, URLs that either point to bad web pages or dead servers. This is a good problem to address via concurrency: if we try to talk to a dead server, it will take up to two minutes before our connection attempt times out. If we use multiple threads, we can still get useful work done while one or two are stuck talking to slow or dead servers.

We can’t simply create one thread per URL, because that may overburden either our CPU or our network connection if (as we expect) most of the links are live and responsive. Instead, we use a fixed number of worker threads, which fetch URLs to download from a queue:

-- file: ch28/Check.hs {-# LANGUAGE FlexibleContexts, GeneralizedNewtypeDeriving, PatternGuards #-} import Control.Concurrent (forkIO) import Control.Concurrent.STM import Control.Exception (catch, finally) import Control.Monad.Error import Control.Monad.State import Data.Char (isControl) import Data.List (nub) import Network.URI import Prelude hiding (catch) import System.Console.GetOpt import System.Environment (getArgs) import System.Exit (ExitCode(..), exitWith) import System.IO (hFlush, hPutStrLn, stderr, stdout) import Text.Printf (printf) import qualified Data.ByteString.Lazy.Char8 as B import qualified Data.Set as S -- This requires the HTTP package, which is not bundled with GHC import Network.HTTP type URL = B.ByteString data Task = Check ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required