Cover by Steven Levithan, Jan Goyvaerts

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

8.12. Extracting the Path from a URL

Problem

You want to extract the path from a string that holds a URL. For example, you want to extract /index.html from http://www.regexcookbook.com/index.html or from /index.html#fragment.

Solution

Extract the path from a string known to hold a valid URL. The following finds a match for all URLs, even for URLs that have no path:

\A
# Skip over scheme and authority, if any
([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?
# Path
([a-z0-9\-._~%!$&'()*+,;=:@/]*)
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?([a-z0-9\-._~%!$&'()*+,;=:@/]*)
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extract the path from a string known to hold a valid URL. Only match URLs that actually have a path:

\A
# Skip over scheme and authority, if any
([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?
# Path
(/?[a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/)
# Query, fragment, or end of URL
([#?]|\Z)
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?(/?[a-z0-9\-._~%!$&'()*+,;=@]+↵
(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/)([#?]|$)
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extract the path from a string known to hold a valid URL. Use atomic grouping to match only those URLs that actually have a path:

\A # Skip over scheme and authority, if ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required