8.23. Extract the Filename from a Windows Path
Problem
You have a string that holds a (syntactically) valid path
to a file or folder on a Windows PC or network, and you want to extract
the filename, if any, from the path. For example, you want to extract
file.ext
from
c:\folder\file.ext
.
Solution
[^\\/:*?"<>|\r\n]+$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Discussion
Extracting the filename from a string known to hold a valid path is trivial, even if you don’t know whether the path actually ends with a filename.
The filename always occurs at the end of the string. It can’t contain any colons or backslashes, so it cannot be confused with folders, drive letters, or network shares, which all use backslashes and/or colons.
The anchor ‹$
› matches
at the end of the string (Recipe 2.5). The fact
that the dollar also matches at
embedded line breaks in Ruby doesn’t matter, because valid Windows paths
don’t include line breaks. The negated character class ‹[^\\/:*?"<>|\r\n]+
› (Recipe 2.3) matches the characters that can occur
in filenames. Though the regex engine scans the string from left to
right, the anchor at the end of the regex makes sure that only the last run of filename
characters in the string will be matched, giving us our filename.
If the string ends with a backslash, as it will for paths that don’t specify a filename, the regex won’t match at all. When it does match, it will match only the filename, so we don’t need to use any capturing ...
Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.