[LRUG] An interesting Ruby hash semantics gotcha
James Adam
james.adam at gmail.com
Wed May 7 07:11:54 PDT 2008
Really nice post Matthew - it's good to know the quirks of our
rarely-fickle mistress :)
On Wed, May 7, 2008 at 2:50 PM, Matthew Willson <matthew at playlouder.com> wrote:
> Thought this might amuse some on here (or be useful to know - it's been the
> source of a couple of hard-to-track-down bugs in the past).
>
> >> {{} => true}[{}]
> => nil
>
> >> {{} => true, {} => false}
> => {{}=>true, {}=>false}
>
> but yet,
>
> >> {} == {}
> => true
>
> What's going on here?
>
> Ruby's Hashes behave very strangely when you try to use a Hash itself as a
> key of a Hash.
>
> This acts as a subtle gotcha when you try to memoize a function which takes
> hash arguments - and so a tricky-to-address bug in libraries like this:
> http://raa.ruby-lang.org/project/memoize/
>
>
> Why?
>
> Ruby calls Object#hash on each key of a Hash, using that numeric hash
> (small h) to allocate that object to a bucket of the hash table data
> structure. Equality, when it comes to Hash lookups and unique keys of a
> Hash, will only work if the keys generate the same numeric hash as a result
> of their hash methods.
>
> For most ruby data structures, x.hash == y.hash is implied by x == y,
> and everything works fine.
> But, not for Hashes themselves!
>
> (NB. this also affects data structures like Arrays which themselves contain
> a Hash, since Array#hash must call hash recursively on its contents).
>
> (Interestingly, for things like 1.0 == 1, x.hash == y.hash also fails.
> Note, x.hash == y.hash is always implied by x.eql?(y), but this equality
> isn't a desperately useful one, and seems to have been constructed
> artificially as an equality for use with Hash which is consistent with
> .hash)
>
>
>
> Why might it have been implemented this way?
>
> Hashes are insensitive to the order of their keys - so, for example, we
> have:
> {:a => true, :b => true} == {:b => true, :a => true}
>
> When you're actually being given two concrete objects to compare, you can
> just check that each key from the one has an equal corresponding value in
> the other, and vice versa.
>
> But, when you're asked to generate a numeric hash which uniquely identifies
> the equivalence class, you'd have to do something to ensure the hashing
> isn't order-sensitive. Like ordering the key/value pairs by their individual
> hashes before feeding into the hash function.
>
>
> Some attempts at an answer in the form of an overridden Hash#hash:
>
> (1) sort key/value pairs by the numeric hash of the pair first:
>
> class Hash
> def hash
> sort_by {|pair| pair.hash}.hash
> end
>
> def eql?(other)
> self == other # hash is now consistent with ==, so we need to make
> eql? consistent too
> end
> end
>
> (2) Use an XOR of the hashes of the key/value pairs (XOR is
> order-insensitive)
>
> class Hash
> def hash
> inject(0) {|hash,pair| hash ^ pair.hash}
> end
>
> def eql?(other)
> self == other
> end
> end
>
> These then fix, eg:
> >> {}.hash == {}.hash
> => true
>
> >> {{} => true}[{}]
> => true
>
> >> {{} => true, {} => false}
> => {{}=>false}
>
> (Note, overriding eql? is required to make the last two work)
>
> Now, I'm sure there's a reason Matz didn't do it this way - perhaps a
> performance reason, perhaps a gotcha that I haven't noticed with my
> approach. Perhaps it'll be fixed in 1.9.
> But at any rate, it's useful to be aware of the issue.
>
> -Matt
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
--
* J *
~
More information about the Chat
mailing list