[LRUG] An interesting Ruby hash semantics gotcha

Wed May 7 07:11:54 PDT 2008

Really nice post Matthew - it's good to know the quirks of our
rarely-fickle mistress :)

On Wed, May 7, 2008 at 2:50 PM, Matthew Willson <matthew at playlouder.com> wrote:
> Thought this might amuse some on here (or be useful to know - it's been the
> source of a couple of hard-to-track-down bugs in the past).
>
>  >> {{} => true}[{}]
>  => nil
>
>  >> {{} => true, {} => false}
>  => {{}=>true, {}=>false}
>
>  but yet,
>
>  >> {} == {}
>  => true
>
>  What's going on here?
>
>  Ruby's Hashes behave very strangely when you try to use a Hash itself as a
> key of a Hash.
>
>  This acts as a subtle gotcha when you try to memoize a function which takes
> hash arguments - and so a tricky-to-address bug in libraries like this:
>  http://raa.ruby-lang.org/project/memoize/
>
>
>  Why?
>
>  Ruby calls Object#hash on each key of a Hash, using that numeric hash
> (small h) to allocate that object to a bucket of the hash table data
> structure. Equality, when it comes to Hash lookups and unique keys of a
> Hash, will only work if the keys generate the same numeric hash as a result
> of their hash methods.
>
>  For most ruby data structures, x.hash == y.hash  is implied by   x == y,
> and everything works fine.
>  But, not for Hashes themselves!
>
>  (NB. this also affects data structures like Arrays which themselves contain
> a Hash, since Array#hash must call hash recursively on its contents).
>
>  (Interestingly, for things like 1.0 == 1, x.hash == y.hash also fails.
> Note, x.hash == y.hash is always implied by x.eql?(y), but this equality
> isn't a desperately useful one, and seems to have been constructed
> artificially as an equality for use with Hash which is consistent with
> .hash)
>
>
>
>  Why might it have been implemented this way?
>
>  Hashes are insensitive to the order of their keys - so, for example, we
> have:
>  {:a => true, :b => true} == {:b => true, :a => true}
>
>  When you're actually being given two concrete objects to compare, you can
> just check that each key from the one has an equal corresponding value in
> the other, and vice versa.
>
>  But, when you're asked to generate a numeric hash which uniquely identifies
> the equivalence class, you'd have to do something to ensure the hashing
> isn't order-sensitive. Like ordering the key/value pairs by their individual
> hashes before feeding into the hash function.
>
>
>  Some attempts at an answer in the form of an overridden Hash#hash:
>
>  (1) sort key/value pairs by the numeric hash of the pair first:
>
>  class Hash
>   def hash
>     sort_by {|pair| pair.hash}.hash
>   end
>
>   def eql?(other)
>     self == other   # hash is now consistent with ==, so we need to make
> eql? consistent too
>   end
>  end
>
>  (2) Use an XOR of the hashes of the key/value pairs (XOR is
> order-insensitive)
>
>  class Hash
>   def hash
>     inject(0) {|hash,pair| hash ^ pair.hash}
>   end
>
>   def eql?(other)
>     self == other
>   end
>  end
>
>  These then fix, eg:
>  >> {}.hash == {}.hash
>  => true
>
>  >> {{} => true}[{}]
>  => true
>
>  >> {{} => true, {} => false}
>  => {{}=>false}
>
>  (Note, overriding eql? is required to make the last two work)
>
>  Now, I'm sure there's a reason Matz didn't do it this way - perhaps a
> performance reason, perhaps a gotcha that I haven't noticed with my
> approach. Perhaps it'll be fixed in 1.9.
>  But at any rate, it's useful to be aware of the issue.
>
>  -Matt
>
>  _______________________________________________
>  Chat mailing list
>  Chat at lists.lrug.org
>  http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>


-- 
* J *
 ~