[LRUG] An interesting Ruby hash semantics gotcha
Murray Steele
murray.steele at gmail.com
Wed May 7 09:16:46 PDT 2008
Give google a day or two and
http://lists.lrug.org/pipermail/chat-lrug.org/2008-May/002439.html should
start appearing if you search for it.
Admittedly the lrug mailing list archive is likely to have little or no
google juice, so that page is not likely to be up the rankings, but I
thought I'd just point out that these messages don't disappear into the
non-indexed ether of our inboxes.
Muz
2008/5/7 Daniel Tenner <daniel.ruby at tenner.org>:
> Agreed.
>
> This is quite blog-worthy...anyone willing to post it up for
> posterity/non-londoners? :-)
>
> Daniel
>
>
> On 7 May 2008, at 15:117 May 2008, James Adam wrote:
>
> Really nice post Matthew - it's good to know the quirks of our
> > rarely-fickle mistress :)
> >
> > On Wed, May 7, 2008 at 2:50 PM, Matthew Willson <matthew at playlouder.com>
> > wrote:
> >
> > > Thought this might amuse some on here (or be useful to know - it's
> > > been the
> > > source of a couple of hard-to-track-down bugs in the past).
> > >
> > > {{} => true}[{}]
> > > > >
> > > > => nil
> > >
> > > {{} => true, {} => false}
> > > > >
> > > > => {{}=>true, {}=>false}
> > >
> > > but yet,
> > >
> > > {} == {}
> > > > >
> > > > => true
> > >
> > > What's going on here?
> > >
> > > Ruby's Hashes behave very strangely when you try to use a Hash itself
> > > as a
> > > key of a Hash.
> > >
> > > This acts as a subtle gotcha when you try to memoize a function which
> > > takes
> > > hash arguments - and so a tricky-to-address bug in libraries like
> > > this:
> > > http://raa.ruby-lang.org/project/memoize/
> > >
> > >
> > > Why?
> > >
> > > Ruby calls Object#hash on each key of a Hash, using that numeric hash
> > > (small h) to allocate that object to a bucket of the hash table data
> > > structure. Equality, when it comes to Hash lookups and unique keys of
> > > a
> > > Hash, will only work if the keys generate the same numeric hash as a
> > > result
> > > of their hash methods.
> > >
> > > For most ruby data structures, x.hash == y.hash is implied by x ==
> > > y,
> > > and everything works fine.
> > > But, not for Hashes themselves!
> > >
> > > (NB. this also affects data structures like Arrays which themselves
> > > contain
> > > a Hash, since Array#hash must call hash recursively on its contents).
> > >
> > > (Interestingly, for things like 1.0 == 1, x.hash == y.hash also fails.
> > > Note, x.hash == y.hash is always implied by x.eql?(y), but this
> > > equality
> > > isn't a desperately useful one, and seems to have been constructed
> > > artificially as an equality for use with Hash which is consistent with
> > > .hash)
> > >
> > >
> > >
> > > Why might it have been implemented this way?
> > >
> > > Hashes are insensitive to the order of their keys - so, for example,
> > > we
> > > have:
> > > {:a => true, :b => true} == {:b => true, :a => true}
> > >
> > > When you're actually being given two concrete objects to compare, you
> > > can
> > > just check that each key from the one has an equal corresponding value
> > > in
> > > the other, and vice versa.
> > >
> > > But, when you're asked to generate a numeric hash which uniquely
> > > identifies
> > > the equivalence class, you'd have to do something to ensure the
> > > hashing
> > > isn't order-sensitive. Like ordering the key/value pairs by their
> > > individual
> > > hashes before feeding into the hash function.
> > >
> > >
> > > Some attempts at an answer in the form of an overridden Hash#hash:
> > >
> > > (1) sort key/value pairs by the numeric hash of the pair first:
> > >
> > > class Hash
> > > def hash
> > > sort_by {|pair| pair.hash}.hash
> > > end
> > >
> > > def eql?(other)
> > > self == other # hash is now consistent with ==, so we need to make
> > > eql? consistent too
> > > end
> > > end
> > >
> > > (2) Use an XOR of the hashes of the key/value pairs (XOR is
> > > order-insensitive)
> > >
> > > class Hash
> > > def hash
> > > inject(0) {|hash,pair| hash ^ pair.hash}
> > > end
> > >
> > > def eql?(other)
> > > self == other
> > > end
> > > end
> > >
> > > These then fix, eg:
> > >
> > > > {}.hash == {}.hash
> > > > >
> > > > => true
> > >
> > > {{} => true}[{}]
> > > > >
> > > > => true
> > >
> > > {{} => true, {} => false}
> > > > >
> > > > => {{}=>false}
> > >
> > > (Note, overriding eql? is required to make the last two work)
> > >
> > > Now, I'm sure there's a reason Matz didn't do it this way - perhaps a
> > > performance reason, perhaps a gotcha that I haven't noticed with my
> > > approach. Perhaps it'll be fixed in 1.9.
> > > But at any rate, it's useful to be aware of the issue.
> > >
> > > -Matt
> > >
> > > _______________________________________________
> > > Chat mailing list
> > > Chat at lists.lrug.org
> > > http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> > >
> > >
> >
> >
> > --
> > * J *
> > ~
> > _______________________________________________
> > Chat mailing list
> > Chat at lists.lrug.org
> > http://lists.lrug.org/listinfo.cgi/chat-lrug.org
> >
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20080507/6cad59dd/attachment.html>
More information about the Chat
mailing list