You want to partition a `Set`

or array based on some attribute of its elements. All elements that go
"together" in some code-specific sense should be grouped together in
distinct data structures. For instance, if you're partitioning by
color, all the green objects in a `Set`

should be grouped together, separate
from the group of all the red objects in the `Set`

.

Use `Set#divide`

, passing in a
code block that returns the partition of the object it's passed. The
result will be a new `Set`

containing
a number of partitioned subsets of your original `Set`

.

The code block can accept either a single argument or two
arguments.^{[3]} The single-argument version examines each object to see
which subset it should go into.

require 'set' s = Set.new((1..10).collect) # => #<Set: {5, 6, 1, 7, 2, 8, 3, 9, 4, 10}> # Divide the set into the "true" subset and the "false" subset: that # is, the "less than 5" subset and the "not less than 5" subset. s.divide { |x| x < 5 } # => #<Set: {#<Set: {5, 6, 7, 8, 9, 10}>, #<Set: {1, 2, 3, 4}>}> # Divide the set into the "0" subset and the "1" subset: that is, the # "even" subset and the "odd" subset. s.divide { |x| x % 2 } # => #<Set: {#<Set: {6, 2, 8, 4, 10}>, #<Set: {5, 1, 7, 3, 9}>}> s = Set.new([1, 2, 3, 'a', 'b', 'c', -1.0, -2.0, -3.0]) # Divide the set into the "String subset, the "Fixnum" subset, and the # "Float" subset. s.divide { |x| x.class } # => #<Set: {#<Set: {"a", "b", "c"}>, # => #<Set: {1, 2, 3}>, # => #<Set: {-1.0, -3.0, -2.0}>}>

For the two-argument code block version of `Set#divide`

, the code block should return
true if both the arguments it has been passed should be put into the
same subset.

s = [1, 2, 3, -1, -2, -4].to_set # Divide the set into sets of numbers with the same absolute value. s.divide { |x,y| x.abs == y.abs } # => #<Set: {#<Set: {-1, 1}>, # => #<Set: {2, -2}>, # => #<Set: {-4}>, # => #<Set: {3}>}> # Divide the set into sets of adjacent numbers s.divide { |x,y| (x-y).abs == 1 } # => #<Set: {#<Set: {1, 2, 3}>, # => #<Set: {-1}>, # => #<Set: {-4, -3}>}>

If you want to classify the subsets by the values they have in
common, use `Set#classify`

instead of
`Set#divide`

. It works like `Set#divide`

, but it returns a hash that maps
the names of the subsets to the subsets themselves.

s.classify { |x| x.class } # => {String=>#<Set: {"a", "b", "c"}>, # => Fixnum=>#<Set: {1, 2, 3}>, # => Float=>#<Set: {-1.0, -3.0, -2.0}>}

The version of `Set#divide`

that takes a two-argument code block uses the `tsort`

library to turn the `Set`

into a directed graph. The nodes in the
graph are the items in the `Set`

. Two
nodes `x`

and `y`

in the graph are connected with a vertex
(one-way arrow) if the code block returns true when passed `|x,y|`

. For the `Set`

and the two-argument code block given in
the example above, the graph looks like Figure 4-1.

The `Set`

partitions returned by `Set#divide`

are the *strongly
connected components* of this graph, obtained by iterating
over `TSort#each_strongly_connected_component`

. A
strongly connected component is a set of nodes such that, starting
from any node in the component, you can follow the one-way arrows and
get to any other node in the component.

Visually speaking, the strongly connected components are the
"clumps" in the graph. 1 and 3 are in the same strongly connected
component as 2, because starting from 3 you can follow one-way arrows
through 2 and get to 1. Starting from 1, you can follow one-way arrows
through 2 and get to 3. This makes 1, 2, and 3 part of the same
`Set`

partition, even though there
are no direct connections between 1 and 3.

In most real-world scenarios (including all the examples above),
the one-way arrows will be symmetrical: if the code returns true for
`|x,y|`

, it will also return true for
`|y,x|`

. `Set#divide`

will work even if this isn't
true. Consider a `Set`

and a `divide`

code block like the following:

connections = { 1 => 2, 2 => 3, 3 => 1, 4 => 1 } [1,2,3,4].to_set.divide { |x,y| connections[x] == y } # => #<Set: {#<Set: {1, 2, 3}>, #<Set: {4}>}>

The corresponding graph looks like Figure 4-2.

You can get to any other node from 4 by following one-way
arrows, but you can't get to 4 from any of the other nodes. This puts
4 is in a strongly connected component—and a `Set`

partition—all by itself. 1, 2, and 3
form a second strongly connected component—and a second `Set`

partition—because you can get from any
of them to any of them by following one-way arrows.

If you're starting with an array instead of a `Set`

, it's easy to simulate `Set#classify`

(and the single-argument
block form of `Set#divide`

)with a
hash. In fact, the code below is almost identical to the current
Ruby implementation of `Set#classify`

.

class Array def classify require 'set' h = {} each do |i| x = yield(i) (h[x] ||= self.class.new) << i end h end def divide(&block) Set.new(classify(&block).values) end end [1,1,2,6,6,7,101].divide { |x| x % 2 } # => #<Set: {[2, 6, 6], [1, 1, 7, 101]}>

There's no simple way to implement a version of `Array#divide`

that takes a two-argument
block. The `TSort`

class is
`Set`

-like, in that it won't create
two different nodes for the same object. The simplest solution is to
convert the array into a `Set`

to
remove any duplicate values, `divide`

the `Set`

normally, then convert the partitioned
subsets into arrays, adding back the duplicate values as you
go:

class Array def divide(&block) if block.arity == 2 counts = inject({}) { |h, x| h[x] ||= 0; h[x] += 1; h} to_set.divide(&block).inject([]) do |divided, set| divided << set.inject([]) do |partition, e| counts[e].times { partition << e } partition end end else Set.new(classify(&block).values) end end end [1,1,2,6,6,7,101].divide { |x,y| (x-y).abs == 1 } # => [[101], [1, 1, 2], [6, 6, 7]]

Is it worth it? You decide.

^{[3] }This is analogous to the one-argument code block passed into
`Enumerable#sort_by`

and the
two-argument code block passed into `Array#sort`

.

Start Free Trial

No credit card required