Cover by Leonard Richardson, Lucas Carlson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

4.15. Partitioning or Classifying a Set

Problem

You want to partition a Set or array based on some attribute of its elements. All elements that go "together" in some code-specific sense should be grouped together in distinct data structures. For instance, if you're partitioning by color, all the green objects in a Set should be grouped together, separate from the group of all the red objects in the Set.

Solution

Use Set#divide, passing in a code block that returns the partition of the object it's passed. The result will be a new Set containing a number of partitioned subsets of your original Set.

The code block can accept either a single argument or two arguments.[3] The single-argument version examines each object to see which subset it should go into.

	require 'set' 
	s = Set.new((1..10).collect)
	# => #<Set: {5, 6, 1, 7, 2, 8, 3, 9, 4, 10}>

	# Divide the set into the "true" subset and the "false" subset: that
	# is, the "less than 5" subset and the "not less than 5" subset.
	s.divide { |x| x < 5 }
	# => #<Set: {#<Set: {5, 6, 7, 8, 9, 10}>, #<Set: {1, 2, 3, 4}>}>

	# Divide the set into the "0" subset and the "1" subset: that is, the
	# "even" subset and the "odd" subset.
	s.divide { |x| x % 2 }
	# => #<Set: {#<Set: {6, 2, 8, 4, 10}>, #<Set: {5, 1, 7, 3, 9}>}>

	s = Set.new([1, 2, 3, 'a', 'b', 'c', -1.0, -2.0, -3.0])
	# Divide the set into the "String subset, the "Fixnum" subset, and the
	# "Float" subset.
	s.divide { |x| x.class } 
	# => #<Set: {#<Set: {"a", "b", "c"}>,
	# =>       #<Set: {1, 2, 3}>,
	# =>       #<Set: {-1.0, -3.0, -2.0}>}>

For the two-argument code block version of Set#divide, the code block should return true if both the arguments it has been passed should be put into the same subset.

	s = [1, 2, 3, -1, -2, -4].to_set

	# Divide the set into sets of numbers with the same absolute value.
	s.divide { |x,y| x.abs == y.abs }
	# => #<Set: {#<Set: {-1, 1}>, 
	# =>         #<Set: {2, -2}>, 
	# =>         #<Set: {-4}>,
	# =>         #<Set: {3}>}>

	# Divide the set into sets of adjacent numbers 
	s.divide { |x,y| (x-y).abs == 1 }
	# => #<Set: {#<Set: {1, 2, 3}>,
	# =>         #<Set: {-1}>,
	# =>         #<Set: {-4, -3}>}>

If you want to classify the subsets by the values they have in common, use Set#classify instead of Set#divide. It works like Set#divide, but it returns a hash that maps the names of the subsets to the subsets themselves.

	s.classify { |x| x.class }
	# => {String=>#<Set: {"a", "b", "c"}>,
	# =>  Fixnum=>#<Set: {1, 2, 3}>,
	# =>  Float=>#<Set: {-1.0, -3.0, -2.0}>}

Discussion

The version of Set#divide that takes a two-argument code block uses the tsort library to turn the Set into a directed graph. The nodes in the graph are the items in the Set. Two nodes x and y in the graph are connected with a vertex (one-way arrow) if the code block returns true when passed |x,y|. For the Set and the two-argument code block given in the example above, the graph looks like Figure 4-1.

The set {1, 2, 3, -1, -2, -4} graphed according to the code block that checks adjacency

Figure 4-1. The set {1, 2, 3, -1, -2, -4} graphed according to the code block that checks adjacency

The Set partitions returned by Set#divide are the strongly connected components of this graph, obtained by iterating over TSort#each_strongly_connected_component. A strongly connected component is a set of nodes such that, starting from any node in the component, you can follow the one-way arrows and get to any other node in the component.

Visually speaking, the strongly connected components are the "clumps" in the graph. 1 and 3 are in the same strongly connected component as 2, because starting from 3 you can follow one-way arrows through 2 and get to 1. Starting from 1, you can follow one-way arrows through 2 and get to 3. This makes 1, 2, and 3 part of the same Set partition, even though there are no direct connections between 1 and 3.

In most real-world scenarios (including all the examples above), the one-way arrows will be symmetrical: if the code returns true for |x,y|, it will also return true for |y,x|. Set#divide will work even if this isn't true. Consider a Set and a divide code block like the following:

	connections = { 1 => 2, 2 => 3, 3 => 1, 4 => 1 }
	[1,2,3,4].to_set.divide { |x,y| connections[x] == y }
	# => #<Set: {#<Set: {1, 2, 3}>, #<Set: {4}>}>

The corresponding graph looks like Figure 4-2.

The set {1,2,3,4} graphed according to the connection hash

Figure 4-2. The set {1,2,3,4} graphed according to the connection hash

You can get to any other node from 4 by following one-way arrows, but you can't get to 4 from any of the other nodes. This puts 4 is in a strongly connected component—and a Set partition—all by itself. 1, 2, and 3 form a second strongly connected component—and a second Set partition—because you can get from any of them to any of them by following one-way arrows.

Implementation for arrays

If you're starting with an array instead of a Set, it's easy to simulate Set#classify (and the single-argument block form of Set#divide)with a hash. In fact, the code below is almost identical to the current Ruby implementation of Set#classify.

	class Array
	  def classify
	    require 'set'
	    h = {}
	    each do |i|
	      x = yield(i)
	      (h[x] ||= self.class.new) << i
	  end
	  h
	end

	  def divide(&block)
	    Set.new(classify(&block).values) 
	  end
	end

	[1,1,2,6,6,7,101].divide { |x| x % 2 }
	# => #<Set: {[2, 6, 6], [1, 1, 7, 101]}>

There's no simple way to implement a version of Array#divide that takes a two-argument block. The TSort class is Set-like, in that it won't create two different nodes for the same object. The simplest solution is to convert the array into a Set to remove any duplicate values, divide the Set normally, then convert the partitioned subsets into arrays, adding back the duplicate values as you go:

	class Array
	  def divide(&block)
	    if block.arity == 2
	     counts = inject({}) { |h, x| h[x] ||= 0; h[x] += 1; h}
	     to_set.divide(&block).inject([]) do |divided, set|
	       divided << set.inject([]) do |partition, e|
	         counts[e].times { partition << e }  
	         partition
	       end
	     end
	   else
	     Set.new(classify(&block).values) 
	   end
	  end
	end

	[1,1,2,6,6,7,101].divide { |x,y| (x-y).abs == 1 }
	# => [[101], [1, 1, 2], [6, 6, 7]]

Is it worth it? You decide.



[3] This is analogous to the one-argument code block passed into Enumerable#sort_by and the two-argument code block passed into Array#sort.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required