Chapter 4. Strings

In the simplest terms, a string in a programming language is a sequence of one or more characters and usually represents some human language, whether written or spoken. You are probably more likely to use methods from the String class than from any other class in Ruby. Manipulating strings is one of the biggest chores a programmer has to manage. Fortunately, Ruby offers a lot of convenience in this department.

For more information on string methods, go to http://www.ruby-doc.org/core/classes/String.html. You can also use the command line to get information on a method. For example, to get information on the String instance method chop, type:

ri String#chop [or] ri String.chop

You can use # or . between the class and method names when returning two methods with ri. This, of course, assumes that you have the Ruby documentation package installed and that it is in the path (see "Installing Ruby,” in Chapter 1).

Creating Strings

You can create strings with the new method. For example, this line creates a new, empty string called title:

title = String.new # => ""

Now you have a new string, but it is only filled with virtual air. You can test a string to see if it is empty with empty?:

title.empty? # => true

You might want to test a string to see if it is empty before you process it, or to end processing when you run into an empty string. You can also test its length or size:

title.length [or] title.size # => 0

The length and size methods do the same thing: they both return an integer indicating how many characters a string holds.

The new method can take a string argument:

title = String.new( "Much Ado about Nothing" )

Now check title:

title.empty? # => false
title.length # => 22

There we go. Not quite so vacuous as before.

Another way to create a string is with Kernel’s String method:

title = String( "Much Ado about Nothing" )
puts title # => Much Ado about Nothing

But there is an even easier way. You don’t have to use the new or String methods to generate a new string. Just an assignment operator and a pair of double quotes will do fine:

sad_love_story = "Romeo and Juliet"

You can also use single quotes:

sad_love_story = 'Romeo and Juliet'

The difference between using double quotes versus single quotes is that double quotes interpret escaped characters and single quotes preserve them. I’ll show you what that means. Here’s what you get with double quotes (interprets \n as a newline):

lear = "King Lear\nA Tragedy\nby William Shakespeare"
puts lear # => King Lear
          #    A Tragedy
          #    by William Shakespeare

And here’s what you get with single quotes (preserves \n in context):

lear = 'King Lear\nA Tragedy\nby William Shakespeare'
puts lear # => King Lear\nA Tragedy\nby William Shakespeare

For a complete list of escape characters, see Table A-1 in Appendix A.

General Delimited Strings

Another way to create strings is with general delimited strings, which are all preceded by a % and then followed by a matched pair of delimiter characters, such as !, {, or [ (must be nonalphanumeric). The string is embedded between the delimiters. All of the following examples are delimited by different characters (you can even use quote characters):

comedy = %!As You Like It!
history = %[Henry V]
tragedy = %(Julius Ceasar)

You can also use %Q, which is the equivalent of a double-quoted string; %q, which is equivalent to a single-quoted string; or %x for a back-quoted string (`) for command output.

Here Documents

A here document allows you to build strings from multiple lines on the fly, while preserving newlines. A here document is formed with a << and a delimiting character or string of your choice. I’ll save Shakespeare’s 29th sonnet as a here document, with 29 as the delimiter:

sonnet = <<29
When in disgrace with fortune and men's eyes
I all alone beweep my outcast state,
And trouble deaf heaven with my bootless cries,
And look upon myself, and curse my fate,
Wishing me like to one more rich in hope,
Featured like him, like him with friends possessed,
Desiring this man's art, and that man's scope,
With what I most enjoy contented least;
Yet in these thoughts my self almost despising,
Haply I think on thee, and then my state,
Like to the lark at break of day arising
From sullen earth, sings hymns at heaven's gate;
For thy sweet love remembered such wealth brings
That then I scorn to change my state with kings.
29

This document is stored in the string sonnet, but you can create a here document without placing it in a string. Wherever the line breaks, a record separator (such as \n) is inserted at that place. Now use:

puts sonnet

You’ll see for yourself how the lines break.

You can also “delimit the delimiter” for various effects:

sonnet = <<hamlet # same as double-quoted string
O my prophetic soul! My uncle!
hamlet

sonnet = <<"hamlet" # again as double-quoted string
O my prophetic soul! My uncle!
hamlet

sonnet = <<'ghost' # same as single-quoted string
Pity me not, but lend thy serious hearing
To what I shall unfold.
ghost

my_dir = <<`dir` # same as back ticks
ls -l
dir

ind = <<-hello # for indentation
    Hello, Matz!
hello

Concatenating Strings

In Ruby, you can add on to an existing string with various concatenation techniques. With Ruby, you don’t have to jump through the hoops that you might if you were using a language with immutable strings.

Adjacent strings can be concatenated simply because that they are next to each other:

"Hello," " " "Matz" "!" # => "Hello, Matz!"

You can also use the + method:

"Hello," + " " + "Matz" + "!" # => "Hello, Matz!"

You can even mix double and single quotes, as long as they are properly paired.

Another way to do this is with the << method. You can add a single string:

"Hello, " << "Matz!" # => Hello, Matz!

Or you can chain them together with multiple calls to <<:

"Hello," << " " << "Matz" << "!" # => Hello, Matz!

An alternative to << is the concat method (which does not allow you to chain):

"Hello, ".concat "Matz!"

Or you can do it this way:

h = "Hello, "
m = "Matz!"
h.concat(m)

You can make a string immutable with Object’s freeze method:

greet = "Hello, Matz!"
greet.freeze

# try to append something
greet.concat("!") # => TypeError: can't modify frozen string

# is the object frozen?
greet.frozen? # => true

Accessing Strings

You can extract and manipulate segments of a string using the String method []. It’s an alias of the slice method: any place you use [], you can use slice, with the same arguments. slice! performs in-place changes and is a counterpart to []=.

We’ll access several strings in the examples that follow:

line = "A horse! a horse! my kingdom for a horse!"
cite = "Act V, Scene IV"
speaker = "King Richard III"

If you enter a string as the argument to [], it will return that string, if found:

speaker['King'] # => "King"

Otherwise, it will return nil—in other words, it’s trying to break the news to you: “I didn’t find the string you were looking for.” If you specify a Fixnum (integer) as an index, it returns the decimal character code for the character found at the index location:

line[7] # => 33

At the location 7, [] found the character 33 (!). If you add the chr method (from the Integer class), you’ll get the actual character:

line[7].chr # => "!"

You can use an offset and length (two Fixnums) to tell [] the index location where you want to start, and then how many characters you want to retrieve:

line[18, 23] # => "my kingdom for a horse!"

You started at index location 18, and then scooped up 23 characters from there, inclusive. You can capitalize the result with the capitalize method, if you want:

line[18, 23].capitalize # => "My kingdom for a horse!"

(More on capitalize and other similar methods later in the chapter.)

Enter a range to grab a range of characters. Two dots (..) means include the last character:

cite[0..4] # => "Act V"

Three dots (...) means exclude the last value:

cite[0...4] # => "Act "

You can also use regular expressions (see the end of the chapter), as shown here:

line[/horse!$/] # => "horse!"

The regular expression /horse!$/ asks, “Does the word horse, followed by ! come at the end of the line ($)?” If this is true, this call returns horse!; nil if not. Adding another argument, a Fixnum, returns that portion of the matched data, starting at 0 in this instance:

line[/^A horse/, 0] # => "A horse"

The index method returns the index location of a matching substring. So if you use index like this:

line.index("k") # => 21

21 refers to the index location where the letter k occurs in line.

See if you get what is going on in the following examples:

line[line.index("k")] # => 107
line[line.index("k")].chr # => "k"

If you figured out these statements, you are starting to catch on! It doesn’t take long, does it? If you didn’t understand what happened, here it is: when line.index("k") was called, it returned the value 21, which was fed as a numeric argument to []; this, in effect, called line[21].

Comparing Strings

Sometimes you need to test two strings to see if they are the same or not. You can do that with the == method. For example, you might want to test a string before printing something:

print "What was the question again?" if question == ""

Also, here are two versions of the opening paragraph of Abraham Lincoln’s Gettysburg Address, one from the so-called Hay manuscript, the other from the Nicolay (see http://www.loc.gov/exhibits/gadd/gadrft.html):

hay = "Four score and seven years ago our fathers brought forth, upon this continent,
a new nation, conceived in Liberty, and dedicated to the proposition that all men are
created equal."

nicolay = "Four score and seven years ago our fathers brought forth, upon this
continent, a new nation, conceived in liberty, and dedicated to the proposition that
\"all men are created equal\""

The strings are only slightly different (for example, Liberty is capitalized in the Hay version). Let’s compare these strings:

hay == nicolay # => false

The result is false, because they must match exactly. (We’ll let the historians figure out how to match them up.) You could also apply the eql? method and get the same results, though eql? and == are slightly different:

  • == returns true if two objects are Strings, false otherwise.

  • eql? returns true if two strings are equal in length and content, false otherwise.

Here eql? returns false:

hay.eql? nicolay # => false

Yet another way to compare strings is with the <=> method, commonly called the spaceship operator. It compares the character code values of the strings, returning −1 (less than), 0 (equals), or 1 (greater than), depending on the comparison, which is case-sensitive:

"a" <=> "a" # => 0
"a" <=> 97.chr # => 0
"a" <=> "b" # => −1
"a" <=> "`" # => 1

A case-insensitive comparison is possible with casecmp, which has the same possible results as <=> (−1, 0, 1) but doesn’t care about case:

"a" <=> "A" # => 1
"a".casecmp "A" # => 0
"ferlin husky".casecmp "Ferlin Husky" # => 0
"Ferlin Husky".casecmp "Lefty Frizzell" # => −1

Manipulating Strings

Here’s a fun one to get started with. The * method repeats a string by an integer factor:

"A horse! " * 2 # => "A horse! A horse! "

You can concatenate a string to the result:

taf = "That's ".downcase * 3 + "all folks!" # => "that's that's that's all folks!"
taf.capitalize # => "That's that's that's all folks!"

Inserting a String in a String

The insert method lets you insert another string at a given index in a string. For example, you can correct spelling:

"Be carful.".insert 6, "e" # => "Be careful."

or add a word (plus a space):

"Be careful!".insert 3, "very " # => "Be very careful!"

or even throw the * method in just to prove that you can:

"Be careful!".insert 3, "very " * 5 # => "Be very very very very very careful!"

Changing All or Part of a String

You can alter all or part of a string, in place, with the []= method. (Like [], which is the counterpart of slice, []= is an alias of slice!, so anywhere you use []=, you can use slice!, with the same arguments.)

Given the following strings (some scoundrel has been editing our Shakespeare text):

line = "A Porsche! a Porsche! my kingdom for a Porsche!"
cite = "Act V, Scene V"
speaker = "King Richard, 2007"

enter a string as the argument to []=, and it will return the new, corrected string, if found; nil otherwise.

speaker[", 2007"]= "III" # => "III"
p speaker # => "King Richard III"

That’s looking better.

If you specify a Fixnum (integer) as an index, it returns the corrected string you placed at the index location. (String lengths are automatically adjusted by Ruby if the replacement string is a different length than the original.)

cite[13]= "IV" # => "IV"
p cite # => "Act V, Scene IV"

At the index 13, []= found the substring V and replaced it with IV.

You can use an offset and length (two Fixnums) to tell []= the index of the substring where you want to start, and then how many characters you want to retrieve:

line[39,8]= "Porsche 911 Turbo!" # => "Porsche 911 Turbo!"
p line # => "A Porsche! a Porsche! my kingdom for a Porsche 911 Turbo!"

You started at index 39, and went 8 characters from there (inclusive).

You can also enter a range to indicate a range of characters you want to change. Include the last character with two dots (..):

speaker[13..15]= "the Third" # => "the Third"
p speaker # => "King Richard the Third"

You can also use regular expressions (see "Regular Expressions,” later in this chapter), as shown here:

line[/Porsche!$/]= "Targa!" # => "Targa!"
p line # => "A Porsche! a Porsche! my kingdom for a Targa!"

The regular expression /Porsche!$/ matches if Porsche! appears at the end of the line ($). If this is true, the call to []= exchanges Porsche! with Targa!.

The chomp and chop Methods

The chop (or chop!) method chops off the last character of a string, and the chomp (chomp!) method chomps off the record separator ($/)—usually just a newline—from a string. Consider the string joe, a limerick created as a here document:

joe = <<limerick
There once was a fellow named Joe
quite fond of Edgar Allen Poe
   He read with delight
   Nearly half the night
When his wife said "Get up!" he said "No."
limerick # => "There once was a fellow named Joe\nquite fond of Edgar Allen
Poe\n   He read with delight\n   Nearly half the night\nWhen his wife said \"Get up!\"
he said \"No.\"\n"

Apply chomp! to remove the last record separator (\n):

joe.chomp! # => "There once was a fellow named Joe\nquite
fond of Edgar Allen Poe\n   He read with delight\n   Nearly half the
night\nWhen his wife said \"Get up!\" he said \"No.\""

Now apply it again, and chomp! returns nil without altering the string because there is no record separator at the end of the string:

joe.chomp! # => nil

chop, chomp’s greedy twin, shows no mercy on the string, removing the last character (a quote) with abandon:

joe.chop! = "There once was a fellow named Joe\nquite fond of
Edgar Allen Poe\n   He read with delight\n   Nearly half the
night\nWhen his wife said \"Get up!\" he said \"No"

The delete Method

With delete or delete!, you can delete characters from a string:

"That's call folks!".delete "c" # => "That's all folks"

That looks easy, because there is only one occurrence of the letter c in the string, so you don’t see any interesting side effects, as you would in the next example. Let’s say you want to get rid of that extra l in alll:

"That's alll folks".delete "l" # => "That's a foks"

Oh, boy. It cleaned me out of all ls. I can’t use delete the way I want, so how do I fix calll? What if I use two ls instead of one?

"That's alll folks".delete "ll" # => "That's a foks"

I got the same thing. (I knew I would.) That’s because delete uses the intersection (what intersects or is the same in both) of its arguments to decide what part of the string to take out. The nifty thing about this, though, is you can also negate all or part of an argument with the caret (^), similar to its use in regular expressions:

"That's all folks".delete "abcdefghijklmnopqrstuvwxyz", "^ha" # => "haa"

The caret negates both the characters in the argument, not just the first one (you can do "^h^a", too, and get the same answer).

Substitute the Substring

Try gsub (or gsub!). This method replaces a substring (first argument) with a replacement string (second argument):

"That's alll folks".gsub "alll", "all" # => "That's all folks"

Or you might do it this way:

"That's alll folks".gsub "lll", "ll" # => "That's all folks"

The replace method replaces a string wholesale. Not just a substring, the whole thing.

call = "All hands on deck!"
call.replace "All feet on deck!" # => "All feet on deck!"

So why wouldn’t you just do it this way?

call = "All hands on deck!"
call = "All feet on deck!"

Wouldn’t you get the same result? Not exactly. When you use replace, call remains the same object, with the same object ID, but when you assign the string to call twice, the object and its ID will change. Just a subtlety you ought to know.

# same object
call = "All hands on deck!" # => "All hands on deck!"
call.object_id # => 1624370
call.replace "All feet on deck!" # => "All feet on deck!"
call.object_id # => 1624370

# different object
call = "All hands on deck!" # => "All hands on deck!"
call.object_id # => 1600420
call = "All feet on deck!" # => "All feet on deck!"
call.object_id # => 1009410

Turn It Around

To reverse the characters means to alter the characters so they read in the opposite direction. You can do this with the reverse method (or reverse! for permanent damage). Say you want to reverse the order of the English alphabet:

"abcdefghijklmnopqrstuvwxyz".reverse # => "zyxwvutsrqponmlkjihgfedcba"

Or, maybe you’d like to reverse a palindrome:

palindrome = "dennis sinned"
palindrome.reverse! # => "dennis sinned"
p palindrome

Not much harm done, even though reverse! changed the string in place. Think about that one for a while.

From a String to an Array

Conveniently, split converts a string to an array. The first call to split is without an argument:

"0123456789".split # => ["0123456789"]

That was easy, but what about splitting up all the individual values and converting them into elements? Do that with a regular expression (//) that cuts up the original string at the junction of characters.

"0123456789".split( // ) # => ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]

In the next example, the regular expression matches a comma and a space (/, /):

c_w = "George Jones, Conway Twitty, Lefty Frizzell, Ferlin Husky"
# => "George Jones, Conway Twitty, Lefty Frizzell, Ferlin Husky"
c_w.split(/, /) # => ["George Jones", "Conway Twitty",
"Lefty Frizzell", "Ferlin Husky"]

Case Conversion

You can capitalize a word, sentence, or phrase with capitalize or capitalize!. (By now you should know the difference between the two.) Here is a pair of sentences that are under the influence of capitalize:

"Ruby finally has a killer app. It's Ruby on Rails.".capitalize
# => "Ruby finally has a killer app. it's ruby on rails."

Notice that the second sentence is not capitalized, which doesn’t look so good. Now you can see that capitalize only capitalizes the first letter of the string, not the beginning of succeeding sentences. Plan accordingly.

Iterating Over a String

To get the effect you want, you may have to split strings up. Here is a list of menu items, stored in a string. They are separated by \n. The each method (or its synonym each_line) iterates over each separate item, not just the first word in the overall string, and capitalizes it:

"new\nopen\nclose\nprint".each { |item| puts item.capitalize }# =>
# New
# Open
# Close
# Print

By the way, there is one other each method: each_byte. It takes a string apart byte by byte, returning the decimal value for the character at each index location. Print each character as a decimal, separated by /:

"matz".each_byte { |b| print b, "/" } # => 109/97/116/122/

Tip

This example assumes that a character is represented by a single byte, which is not always the case. The default character set for Ruby is ASCII, whose characters may be represented by bytes. However, if you use UTF-8, characters may be represented in one to four bytes. You can change your character set from ASCII to UTF-8 by specifying $KCODE = 'u' at the beginning of your program.

Convert each decimal to its character equivalent with Integer’s chr method:

"matz".each_byte { |b| print b.chr, "/" } # => m/a/t/z/

Or append the output to an array—out:

out = [] # create an empty array
"matz".each_byte { |b| p out << b} # =>
[109]
[109, 97]
[109, 97, 116]
[109, 97, 116, 122]
p out # => [109, 97, 116, 122]

You’ll learn more about arrays in Chapter 6.

downcase, upcase, and swapcase

YOU KNOW IT CAN BE ANNOYING TO READ SOMETHING THAT IS ALL IN UPPERCASE LETTERS! It’s distracting to read. That’s one reason it’s nice that Ruby has the downcase and downcase! methods.

"YOU KNOW IT CAN BE ANNOYING TO READ SOMETHING THAT IS IN ALL UPPERCASE
LETTERS!".downcase # => "you know it can be annoying to
read something that is all in uppercase letters!"

There, that’s better. But now the first letter is lowercase, too. The grammar police will be on our case. Fix this by adding a call to capitalize onto the statement.

"YOU KNOW IT CAN BE ANNOYING TO READ SOMETHING THAT IS ALL IN UPPERCASE
LETTERS!".downcase.capitalize # =>
"You know it can be annoying to read something that is all in uppercase letters!"

Good. That took care of it.

What if you want to go the other way and change lowercase letters to uppercase? For example, you may want to get someone’s attention by turning warning text to all uppercase. You can do that with upcase or upcase!.

"warning! keyboard may be hot!".upcase # => WARNING! KEYBOARD MAY BE HOT!

Sometimes you may want to swap uppercase letters with lowercase. Use swapcase or swapcase!. For example, you can switch an English alphabet list that starts with lowercase first to a string that starts with uppercase first:

"aAbBcCdDeEfFgGhHiI".swapcase # => "AaBbCcDdEeFfGgHhIi"

Managing Whitespace, etc.

You can adjust whitespace (or other characters) on the left or right of a string, center a string in whitespace (or other characters), and strip whitespace away using the following methods. First, create a string—the title of a Shakespeare play:

title = "Love's Labours Lost"

How long is the string? This will be important to you (length and size are synonyms).

title.size # => 19

The string title is 19 characters long. With that information in tow, we can start making some changes. The ljust and rjust methods pad a string with whitespace or, if specified, some other character. The string will be right justified, and the number of characters, whitespace or otherwise, must be greater than the length of the string. Make sense? I hope so. Let’s go over an example or two.

Let’s call these two methods with an argument (an integer) that is less than or equal to the length of the string.

title.ljust 10 # => "Love's Labours Lost"
title.rjust 19 # => "Love's Labours Lost"

What happened? Nothing! That’s because the argument must be greater than the length of the string in order to do anything. The added whitespace is calculated based on the length of the string plus the value of the argument. Watch:

title.ljust 20 # => "Love's Labours Lost "
title.rjust 25 # => "      Love's Labours Lost"

See how it works now? In the call to ljust, one space character is added on the right (20 − 19 = 1), and the call to rjust adds six characters to the left (25 − 19 = 6). If it seems backward, just remember that the string is always right justified. Still confused? So am I, but we’ll go on. You can use another character besides the default space character if you’d like:

title.rjust( 21, "-" ) # => "--Love's Labours Lost"

or use more than one character—the sequence will be repeated:

title.rjust 25, "->" # => "->->->Love's Labours Lost"

OK, now let’s really mess with your head:

title.rjust(20, "-").ljust(21, "-") # => "-Love's Labours Lost-"

You might want to do something like that someday.

If you want to play both ends to the middle, we are be better off using center instead:

title.center 23 # => "  Love's Labours Lost  "
title.center 23, "-" # => "--Love's Labours Lost--"

With one more tip of the hat, I’ll use center to create a comment:

filename = "hack.rb" # => "hack.rb"
filename.size # => 7
filename.center 40-7, "#" # => "#############hack.rb#############"

We’ve been adding whitespace and other characters. What if you just want to get rid of it? Use lstrip, rstrip, and strip (lstrip!, rstrip!, and strip!). Suppose you have a string surrounded by whitespace:

fear = "             Fear is the little darkroom where negatives develope. --
Michael Pritchard                  "

Oops. Fell asleep with my thumb on the space bar—twice! I can fix it easily now, starting with the left side (make the change stick to the original string with lstrip!):

fear.lstrip! # => "Fear is the little darkroom where
negatives develope. -- Michael Pritchard                  "

Now the right side:

fear.rstrip! # => "Fear is the little darkroom where
negatives develope. -- Michael Pritchard"

Or do the whole thing at once:

fear.strip! # => "Fear is the little darkroom where
negatives develope. -- Michael Pritchard"

strip removes other kinds of whitespace, too:

"\t\tBye, tabs and line endings!\r\n".strip # => "Bye, tabs and line endings!"

Incrementing Strings

The Ruby String class has several methods that let you produce successive strings—that is, strings that increment, starting at the rightmost character. You can increment strings with next and next! (or succ and succ!). I prefer to use next. (The methods ending in ! make in-place changes.) For example:

"a".next [or] "a".succ # => "b"

Remember, next increments the rightmost character:

"aa".next # => "ab"

It adds a character when it reaches a boundary, or adds a digit or decimal place when appropriate, as shown in these lines:

"z".next # => "aa" # two a's after one z
"zzzz".next # => "aaaaa" # five a's after four z's
"999.0".next # => "999.1" # increment by .1
"999".next # => "1000" # increment from 999 to 1000

We’re not just talking letters here, but any character, based on the character set in use (ASCII in these examples):

" ".next # => "!"

Chain calls of next together—let’s try three:

"0".next.next.next # => "3"

As you saw earlier, next works for numbers represented as strings as well:

"2007".next # => "2008"

Or you can get it to work when numbers are not represented as strings, though the method will come from a different class, not String. For example:

2008.next # => 2009

Instead of from String, this call actually uses the next method from Integer. (The Date, Generator, Integer, and String classes all have next methods.)

You can even use a character code via chr with next:

120.chr # => "x"
120.chr.next # => "y"

The upto method from String, which uses a block, makes it easy to increment. For example, this call to upto prints the English alphabet:

"a".upto("z") { |i| print i } # => abcdefghijklmnopqrstuvwxyz

You could also do this with a for loop and an inclusive range:

for i in "a".."z"
  print i
end

You decide what’s simpler. The for loop takes only slightly more keystrokes (29 versus 31, including whitespace). But I like upto.

Converting Strings

You can convert a string into a float (Float) or integer (Fixnum). To convert a string into a float, or, more precisely, an instance of the String class into an instance of Float, use the to_f method:

"200".class # => String
"200".to_f # => 200.0
"200".to_f.class # => Float

Likewise, to convert a string to an integer, use to_i:

"100".class # => String
"100".to_i # => 100
"100".to_i.class # => Fixnum

To convert a string into a symbol (Symbol class), you can use either the to_sym or intern methods.

"name".intern # => :name
"name".to_sym # => :name

The value of the string, not its name, becomes the symbol:

play = "The Merchant of Venice".intern # => :"The Merchant of Venice"

Convert an object to a string with to_s. Ruby calls the to_s method from the class of the object, not the String class (parentheses are optional).

(256.0).class # => Float
(256.0).to_s # => "256.0"

Regular Expressions

You have already seen regular expressions in action. A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. The syntax for regular expressions was invented by mathematician Stephen Kleene in the 1950s.

I’ll spend a little time demonstrating some patterns to search for strings. In this little discussion, you’ll learn the fundamentals: how to use basic string patterns, square brackets, alternation, grouping, anchors, shortcuts, repetition operators, and braces. Table 4-1 lists the syntax for regular expressions in Ruby.

We need a little text to munch on. Here are the opening lines of Shakespeare’s 29th sonnet:

opening = "When in disgrace with fortune and men's eyes\nI all alone beweep my
outcast state,\n"

Note that this string contains two lines, set off by the newline character \n.

You can match the first line just by using a word in the pattern:

opening.grep(/men/) # => ["When in disgrace with fortune and men's eyes\n"]

By the way, grep is not a String method; it comes from the Enumerable module, which the String class includes, so it is available for processing strings. grep takes a pattern as an argument, and can also take a block (see http://www.ruby-doc.org/core/classes/Enumerable.html).

When you use a pair of square brackets ([]), you can match any character in the brackets. Let’s try to match the word man or men using []:

opening.grep(/m[ae]n/) # => ["When in disgrace with fortune and men's eyes\n"]

It would also match a line with the word man in it:

Alternation lets you match alternate forms of a pattern using the pipe character (|):

opening.grep(/men|man/) # => ["When in disgrace with fortune and men's eyes\n"]

Grouping uses parentheses to group a subexpression, like this one that contains an alternation:

opening.grep(/m(e|a)n/) # => ["When in disgrace with fortune and men's eyes\n"]

Anchors anchor a pattern to the beginning (^) or end ($) of a line:

opening.grep(/^When in/) # => ["When in disgrace with fortune and men's eyes\n"]
opening.grep(/outcast state,$/) # => ["I all alone beweep my outcast state,\n"]

The ^ means that a match is found when the text When in is at the beginning of a line, and $ will only match outcast state if it is found at the end of a line.

One way to specify the beginning and ending of strings in a pattern is with shortcuts. Shortcut syntax is brief—a single character preceded by a backslash. For example, the \d shortcut represents a digit; it is the same as using [0-9] but, well, shorter. Similarly to ^, the shortcut \A matches the beginning of a string, not a line:

opening.grep(/\AWhen in/) # => ["When in disgrace with fortune and men's eyes\n"]

Similar to $, the shortcut \z matches the end of a string, not a line:

opening.grep(/outcast state,\z/) # => ["I all alone beweep my outcast state,"]

The shortcut \Z matches the end of a string before the newline character, assuming that a newline character (\n) is at the end of the string (it won’t work otherwise).

Let’s figure out how to match a phone number in the form (555)123-4567. Supposing that the string phone contains a phone number like this, the following pattern will find it:

phone.grep(/[\(\d\d\d\)]?\d\d\d-\d\d\d\d/) # => ["(555)123-4567"]

The backslash precedes the parentheses (\(...\)) to let the regexp engine know that these are literal characters. Otherwise, the engine will see the parentheses as enclosing a subexpression. The three \ds in the parentheses represent three digits. The hyphen (-) is just an unambiguous character, so you can use it in the pattern as is.

The question mark (?) is a repetition operator. It indicates zero or one occurrence of the previous pattern. So the phone number you are looking for can have an area code in parentheses, or not. The area-code pattern is surrounded by [ and ] so that the ? operator applies to the entire area code. Either form of the phone number, with or without the area code, will work. Here is a way to use ? with just a single character, u:

color.grep(/colou?r/) # => ["I think that colour is just right for you office."]

The plus sign (+) operator indicates one or more of the previous pattern, in this case digits:

phone.grep(/[\(\d+\)]?\d+-\d+/) # => ["(555)123-4567"]

Braces ({}) let you specify the exact number of digits, such as \d{3} or \d{4}:

phone.grep(/[\(\d{3}\)]?\d{3}-\d{4}/)# => ["(555)123-4567"]

Tip

It is also possible to indicate an “at least” amount with {m,}, and a minimum/maximum number with {m,n}.

The String class also has the =~ method and the !~ operator. If =~ finds a match, it returns the offset position where the match starts in the string:

color =~ /colou?r/ # => 13

The !~ operator returns true if it does not match the string, false otherwise:

color !~ /colou?r/ # => false

Also of interest are the Regexp and MatchData classes. The Regexp class (http://www.ruby-doc.org/core/classes/Regexp.html) lets you create a regular expression object. The MatchData class (http://www.ruby-doc.org/core/classes/MatchData.html) provides the special $- variable, which encapsulates all search results from a pattern match.

This discussion has given you a decent foundation in regular expressions (see Table 4-1 for a listing). With these fundamentals, you can define most any pattern.

Table 4-1. Regular expressions in Ruby

Pattern

Description

/pattern/options

Pattern pattern in slashes, followed by optional options, i.e., one or more of: i for case-insensitive; o for substitute once; x for ignore whitespace, allow comments; m for match multiple lines, newlines as normal characters

%r!pattern!

General delimited string for a regular expression, where ! can be an arbitrary character

^

Matches beginning of line

$

Matches end of line

.

Matches any character

\1...\9

Matches nth grouped subexpression

\10

Matches nth grouped subexpression, if already matched; otherwise, refers to octal representation of a character code

\n, \r, \t, etc.

Matches character in backslash notation

\w

Matches word character, as in [0-9A-Za-z_]

\W

Matches nonword character

\s

Matches whitespace character, as in [\t\n\r\f]

\S

Matches nonwhitespace character

\d

Matches digit, same as [0-9]

\D

Matches nondigit

\A

Matches beginning of a string

\Z

Matches end of a string, or before newline at the end

\z

Matches end of a string

\b

Matches word boundary outside [], or backspace (0x08) inside []

\B

Matches nonword boundary

\G

Matches point where last match finished

[..]

Matches any single character in brackets, such as [ch]at

[^..]

Matches any single character not in brackets

*

Matches 0 or more of previous regular expressions

*?

Matches zero or more of previous regular expressions (nongreedy)

+

Matches one or more of previous regular expressions

+?

Matches one or more of previous regular expressions (nongreedy)

{m}

Matches exactly m number of previous regular expressions

{m,}

Matches at least m number of previous regular expressions

{m,n}

Matches at least m but at most n number of previous regular expressions

{m,n}?

Matches at least m but at most n number of previous regular expressions (nongreedy)

?

Matches zero or one of previous regular expressions

|

Alternation, such as color|colour

( )

Grouping regular expressions or subexpression, such as col(o|ou)r

(?#..)

Comment

(?:..)

Grouping without back-references (without remembering matched text)

(?=..)

Specify position with pattern

(?!..)

Specify position with pattern negation

(?>..)

Matches independent pattern without backtracking

(?imx)

Toggles i, m, or x options on

(?-imx)

Toggles i, m, or x options off

(?imx:..)

Toggles i, m, or x options on within parentheses

(?-imx:..)

Toggles i, m, or x options off within parentheses

(?ix-ix: )

Turns on (or off) i and x options within this noncapturing group

1.9 and Beyond

In the versions of Ruby that follow, String will likely:

  • Add the start_with? and end_with? methods, which will return true if a string starts with or ends with a given prefix or suffix of the string.

  • Add a clear method that will turn a string with a length greater than 1 to an empty string.

  • Add an ord method that will return a character code.

  • Add the partition and rpartition methods to partition a string at a given separator.

  • Add a bytes method that will return the bytes of a string, one by one.

  • Return a single character string instead of a character code when a string is indexed with [].

  • Consider characters to be more than one byte in length.

Review Questions

  1. How do chop and chomp differ?

  2. Name two ways to concatenate strings.

  3. What happens when you reverse a palindrome?

  4. How do you iterate over a string?

  5. Name two or more case conversion methods.

  6. What methods would you use to adjust space in a string?

  7. Describe alternation in a regular expression pattern?

  8. What does /\d{3}/ match?

  9. How do you convert a string to an array?

  10. What do you think is the easiest way to create a string?

Get Learning Ruby now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.