Numeric Constants

Problem

You need a regular expression that matches a decimal integer without a leading zero, an octal integer with a leading zero, a hexadecimal integer prefixed with 0x, or a binary integer prefixed with 0b. The integer may have the suffix L to denote it is a long rather than an int.

The regular expression should have separate (named) capturing groups for decimal, octal, hexadecimal, and binary numbers without any prefix or suffix, so the procedural code that will use this regex can easily determine the base of the number and convert the text into an actual number. The suffix L should also have its own capturing group, so the type of the integer can be easily identified.

Solution

\b(?:(?<dec>[1-9][0-9]*)
   | (?<oct>0[0-7]*)
   | 0x(?<hex>[0-9A-F]+)
   | 0b(?<bin>[01]+)
  )(?<L>L)?\b
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9
\b(?:(?P<dec>[1-9][0-9]*)
   | (?P<oct>0[0-7]*)
   | 0x(?P<hex>[0-9A-F]+)
   | 0b(?P<bin>[01]+)
  )(?P<L>L)?\b
Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4, Perl 5.10, Python
\b(?:([1-9][0-9]*)|(0[0-7]*)|0x([0-9A-F]+)|0b([01]+))(L)?\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

This regular expression is essentially the combination of the solutions presented in Recipe 6.5 (decimal), Recipe 6.4 (octal), Recipe 6.2 (hexadecimal), and Recipe 6.3 (binary). The digit zero all by itself can be either a decimal or an octal number. ...

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.