Cover by Leonard Richardson, Lucas Carlson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

1.19. Validating an Email Address

Problem

You need to see whether an email address is valid.

Solution

Here's a sampling of valid email addresses you might encounter:

	test_addresses = [ #The following are valid addresses according to RFC822.
	                   'joe@example.com', 'joe.bloggs@mail.example.com',
	                   'joe+ruby-mail@example.com', 'joe(and-mary)@example.museum',
	                   'joe@localhost',

Here are some invalid email addresses you might encounter:

	                   # Complete the list with some invalid addresses
	                   'joe', 'joe@', '@example.com',
	                   'joe@example@example.com',
	                   'joe and mary@example.com' ]

And here are some regular expressions that do an okay job of filtering out bad email addresses. The first one does very basic checking for ill-formed addresses:

	valid = '[^ @]+' # Exclude characters always invalid in email addresses
	username_and_machine = /^#{valid}@#{valid}$/

	test_addresses.collect { |i| i =~ username_and_machine }
	# => [0, 0, 0, 0, 0, nil, nil, nil, nil, nil]

The second one prohibits the use of local-network addresses like "joe@localhost". Most applications should prohibit such addresses.

	username_and_machine_with_tld = /^#{valid}@#{valid}\.#{valid}$/

	test_addresses.collect { |i| i =~ username_and_machine_with_tld }
	# => [0, 0, 0, 0, nil, nil, nil, nil, nil, nil]

However, the odds are good that you're solving the wrong problem.

Discussion

Most email address validation is done with naive regular expressions like the ones given above. Unfortunately, these regular expressions are usually written too strictly, and reject many email addresses. This is a common source of frustration for people with unusual email addresses like , or people taking advantage of special features of email, as in . The regular expressions given above err on the opposite side: they'll accept some syntactically invalid email addresses, but they won't reject valid addresses.

Why not give a simple regular expression that always works? Because there's no such thing. The definition of the syntax is anything but simple. Perl hacker Paul Warren defined an 6343-character regular expression for Perl's Mail::RFC822::Address module, and even it needs some preprocessing to accept absolutely every allowable email address. Warren's regular expression will work unaltered in Ruby, but if you really want it, you should go online and find it, because it would be foolish to try to type it in.

Check validity, not correctness

Even given a regular expression or other tool that infallibly separates the RFC822 compliant email addresses from the others, you can't check the validity of an email address just by looking at it; you can only check its syntactic correctness.

It's easy to mistype your username or domain name, giving out a perfectly valid email address that belongs to someone else. It's trivial for a malicious user to make up a valid email address that doesn't work at all—I did it earlier with the nonsense. !@ is a valid email address according to the regexp test, but no one in this universe uses it. You can't even compare the top-level domain of an address against a static list, because new top-level domains are always being added. Syntactic validation of email addresses is an enormous amount of work that only solves a small portion of the problem.

The only way to be certain that an email address is valid is to successfully send email to it. The only way to be certain that an email address is the right one is to send email to it and get the recipient to respond. You need to weigh this additional work (yours and the user's) against the real value of a verified email address.

It used to be that a user's email address was closely associated with their online identity: most people had only the email address their ISP gave them. Thanks to today's free web-based email, that's no longer true. Email verification no longer works to prevent duplicate accounts or to stop antisocial behavior online—if it ever did.

This is not to say that it's never useful to have a user's working email address, or that there's no problem if people mistype their email addresses. To improve the quality of the addresses your users enter, without rejecting valid addresses, you can do three things beyond verifying with the permissive regular expressions given above:

  1. Use a second naive regular expression, more restrictive than the ones given above, but don't prohibit addresses that don't match. Only use the second regular expression to advise the user that they may have mistyped their email address. This is not as useful as it seems, because most typos involve changing one letter for another, rather than introducing nonalphanumerics where they don't belong.

    	def probably_valid?(email)
    	 valid = '[A-Za-z\d.+-]+' #Commonly encountered email address characters
    	 (email =~ /#{valid}@#{valid}\.#{valid}/) == 0
    	end
    
    	#These give the correct result.
    	probably_valid? 'joe@example.com'                # => true
    	probably_valid? 'joe+ruby-mail@example.com'      # => true
    	probably_valid? 'joe.bloggs@mail.example.com'    # => true
    	probably_valid? 'joe@examplecom'                 # => false
    	probably_valid? 'joe+ruby-mail@example.com'      # => true
    	probably_valid? 'joe@localhost'                  # => false
    
    	# This address is valid, but probably_valid thinks it's not.
    	probably_valid? 'joe(and-mary)@example.museum'   # => false
    
    	# This address is valid, but certainly wrong.
    	probably_valid? 'joe@example.cpm'                # => true
  2. Extract from the alleged email address the hostname (the "example.com" of ), and do a DNS lookup to see if that hostname accepts email. A hostname that has an MX DNS record is set up to receive mail. The following code will catch most domain name misspellings, but it won't catch any username misspellings. It's also not guaranteed to parse the hostname correctly, again because of the complexity of RFC822.

    	require 'resolv'
    	def valid_email_host?(email)
    	  hostname = email[(email =~ /@/)+1..email.length]
    	  valid = true
    	  begin
    	    Resolv::DNS.new.getresource(hostname, Resolv::DNS::Resource::IN::MX)
    	  rescue Resolv::ResolvError
    	    valid = false
    	  end
    	  return valid
    	end
    
    	#example.com is a real domain, but it won't accept mail
    	valid_email_host?('joe@example.com')         # => false
    
    	#lcqkxjvoem.mil is not a real domain.
    	valid_email_host?('joe@lcqkxjvoem.mil')      # => false
    
    	#oreilly.com exists and accepts mail, though there might not be a 'joe' there.
    	valid_email_host?('joe@oreilly.com')         # => true
  3. Send email to the address the user input, and ask the user to verify receipt. For instance, the email might contain a verification URL for the user to click on. This is the only way to guarantee that the user entered a valid email address that they control. See Recipes 14.5 and 15.19 for this.

    This is overkill much of the time. It requires that you add special workflow to your application, it significantly raises the barriers to use of your application, and it won't always work. Some users have spam filters that will treat your test mail as junk, or whitelist email systems that reject all email from unknown sources. Unless you really need a user's working email address for your application to work, very simple email validation should suffice.

See Also

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required