The split Operator

Another operator that uses regular expressions is split, which breaks up a string according to a pattern. This is useful for tab-separated, colon-separated, whitespace-separated, or anything-separated data.[219] Anywhere you can specify the separator with a regular expression (generally, it’s a simple regular expression), you can use split. It looks like this:

    @fields = split /separator/, $string;

The split operator[220] drags the pattern through a string and returns a list of fields (substrings) that were separated by the separators. Whenever the pattern matches, that’s the end of one field and the start of the next. So, anything that matches the pattern will never show up in the returned fields. Here’s a typical split pattern, splitting on colons:

    @fields = split /:/, "abc:def:g:h";  # gives ("abc", "def", "g", "h")

You could even have an empty field if there were two delimiters together:

    @fields = split /:/, "abc:def::g:h";  # gives ("abc", "def", "", "g", "h")

Here’s a rule that seems odd at first, but it rarely causes problems: leading empty fields are always returned, but trailing empty fields are discarded:[221]

    @fields = split /:/, ":::a:b:c:::";  # gives ("", "", "", "a", "b", "c")

It’s common to split on whitespace using /\s+/ as the pattern. Under that pattern, all whitespace runs are equivalent to a single space:

    my $some_input = "This  is a \t        test.\n";
    my @args = split /\s+/, $some_input;  # ("This", "is", "a", "test.")

The default for split is to break up ...

Get Learning Perl, Fourth Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.