7.10. Extracting the Host from a URL
Problem
You want to extract the host from a string that holds a URL.
For example, you want to extract www.regexcookbook.com
from http://www.regexcookbook.com/
.
Solution
Extract the host from a URL known to be valid
\A [a-z][a-z0-9+\-.]*:// # Scheme ([a-z0-9\-._~%!$&'()*+,;=]+@)? # User ([a-z0-9\-._~%]+ # Named or IPv4 host |\[[a-z0-9\-._~%!$&'()*+,;=:]+\]) # IPv6+ host
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|↵ \[[a-z0-9\-._~%!$&'()*+,;=:]+\])
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Extract the host while validating the URL
\A [a-z][a-z0-9+\-.]*:// # Scheme ([a-z0-9\-._~%!$&'()*+,;=]+@)? # User ([a-z0-9\-._~%]+ # Named host |\[[a-f0-9:.]+\] # IPv6 host |\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\]) # IPvFuture host (:[0-9]+)? # Port (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/? # Path (\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)? # Query (\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)? # Fragment \Z
Regex options: Case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|↵ \[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])(:[0-9]+)?↵ (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?↵ (#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, ... |
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.