[LRUG] An interesting Ruby hash semantics gotcha

Wed May 7 06:50:46 PDT 2008

Thought this might amuse some on here (or be useful to know - it's  
been the source of a couple of hard-to-track-down bugs in the past).

 >> {{} => true}[{}]
=> nil

 >> {{} => true, {} => false}
=> {{}=>true, {}=>false}

but yet,

 >> {} == {}
=> true

What's going on here?

Ruby's Hashes behave very strangely when you try to use a Hash itself  
as a key of a Hash.

This acts as a subtle gotcha when you try to memoize a function which  
takes hash arguments - and so a tricky-to-address bug in libraries  
like this:
http://raa.ruby-lang.org/project/memoize/

Why?

Ruby calls Object#hash on each key of a Hash, using that numeric hash  
(small h) to allocate that object to a bucket of the hash table data  
structure. Equality, when it comes to Hash lookups and unique keys of  
a Hash, will only work if the keys generate the same numeric hash as a  
result of their hash methods.

For most ruby data structures, x.hash == y.hash  is implied by   x ==  
y, and everything works fine.
But, not for Hashes themselves!

(NB. this also affects data structures like Arrays which themselves  
contain a Hash, since Array#hash must call hash recursively on its  
contents).

(Interestingly, for things like 1.0 == 1, x.hash == y.hash also fails.  
Note, x.hash == y.hash is always implied by x.eql?(y), but this  
equality isn't a desperately useful one, and seems to have been  
constructed artificially as an equality for use with Hash which is  
consistent with .hash)

Why might it have been implemented this way?

Hashes are insensitive to the order of their keys - so, for example,  
we have:
{:a => true, :b => true} == {:b => true, :a => true}

When you're actually being given two concrete objects to compare, you  
can just check that each key from the one has an equal corresponding  
value in the other, and vice versa.

But, when you're asked to generate a numeric hash which uniquely  
identifies the equivalence class, you'd have to do something to ensure  
the hashing isn't order-sensitive. Like ordering the key/value pairs  
by their individual hashes before feeding into the hash function.

Some attempts at an answer in the form of an overridden Hash#hash:

(1) sort key/value pairs by the numeric hash of the pair first:

class Hash
   def hash
     sort_by {|pair| pair.hash}.hash
   end

   def eql?(other)
     self == other   # hash is now consistent with ==, so we need to  
make eql? consistent too
   end
end

(2) Use an XOR of the hashes of the key/value pairs (XOR is order- 
insensitive)

class Hash
   def hash
     inject(0) {|hash,pair| hash ^ pair.hash}
   end

   def eql?(other)
     self == other
   end
end

These then fix, eg:
 >> {}.hash == {}.hash
=> true

 >> {{} => true}[{}]
=> true

 >> {{} => true, {} => false}
=> {{}=>false}

(Note, overriding eql? is required to make the last two work)

Now, I'm sure there's a reason Matz didn't do it this way - perhaps a  
performance reason, perhaps a gotcha that I haven't noticed with my  
approach. Perhaps it'll be fixed in 1.9.
But at any rate, it's useful to be aware of the issue.

-Matt