7.11. Extracting the Port from a URL

Problem

You want to extract the port number from a string that holds a URL. For example, you want to extract 80 from http://www.regexcookbook.com:80/.

Solution

Extract the port from a URL known to be valid

\A
[a-z][a-z0-9+\-.]*://               # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+@)?      # User
([a-z0-9\-._~%]+                    # Named or IPv4 host
|\[[a-z0-9\-._~%!$&'()*+,;=:]+\])   # IPv6+ host
:(?<port>[0-9]+)                    # Port number
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?↵
([a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&'()*+,;=:]+\]):([0-9]+)
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extract the host while validating the URL

\A
[a-z][a-z0-9+\-.]*://                       # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
([a-z0-9\-._~%]+                            # Named host
|\[[a-f0-9:.]+\]                            # IPv6 host
|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
:([0-9]+)                                   # Port
(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Query
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Fragment
\Z
Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^[a-z][a-z0-9+\-.]*:\/\/([a-z0-9\-._~%!$&'()*+,;=]+@)?↵
([a-z0-9\-._~%]+|\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]↵
+\]):([0-9]+)(\/[a-z0-9\-._~%!$&'()*+,;=:@]+)*\/?↵
(\?[a-z0-9\-._~%!$&'()*+,;=:@\/?]*)?(#[a-z0-9\-._~%!$&'()*+,;=:@\/?]*)?$
Regex options: Case insensitive
Regex flavors: ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.