[LRUG] Testing PDFs

Roland Swingler roland.swingler at gmail.com
Tue Aug 1 15:31:52 PDT 2017


Don't know if any of the tools already mentioned do something like this
internally, but if not you could investigate perceptual hashing - think
phash is a well known standard http://www.phash.org/

Something like this probably isn't really the right fit, because phashes
are designed to be robust against transformations such as rotation etc.
which you probably care about in this context; also it would probably be a
lot of work to get implemented in a ruby test suite.

However, throwing it out there because you may find it an interesting
approach/something to google more about - even if for its own sake and it
turns out to be useless for your problem.

R


On Tue, Aug 1, 2017 at 8:39 PM, Josh McMillan <josh at joshmcmillan.co.uk>
wrote:

> I'm currently writing a bunch of smoke tests that involve checking the
> validity of machine generated PDFs. We use a multi-layer approach depending
> on how fast we want the suite to run:
>
>    - Basic tests for the content in the document – checking the right
>    text boxes etc are rendered in the right place. This is done with Prawn's
>    pdf-inspector package: https://github.com/prawnpdf/pdf-inspector
>    - Pixel-by-pixel comparison tests using ImageMagick (the `convert`
>    tool can handle PDFs as if it they were images) with a level of tolerance:
>    https://www.imagemagick.org/script/compare.php
>    <https://www.imagemagick.org/script/compare.php>
>
> In the event that there's a major difference between two PDFs as flagged
> by ImageMagick, we output a load of visual diffs (which can be done via
> `compare -verbose -metric RMSE -highlight-color <actual> <expected>
> <diff>`, see the above link) for validation by a human.
>
> The validity of these PDFs is "mission critical" though (they get printed
> and sent to customers as a physical product that they've paid money for) so
> this is probably overkill for most scenarios.
>
> On Tue, Aug 1, 2017 at 8:29 PM, Mark Burns <markthedeveloper at gmail.com>
> wrote:
>
>> Yeah that's the kind of thing I was thinking.
>>
>> I guess I may have been a bit too hopeful. No magical silver bullet
>> shortcuts.
>>
>> Just about getting as close as possible to automating the actual
>> eyeballing of the doc.
>>
>> diff-pdf sounded promising then:
>>
>> ```
>> $ diff-pdf book-1.pdf book-2.pdf
>> $ diff-pdf book-1.pdf book-2.pdf --verbose
>> page 1 differs
>> page 4 differs
>> ```
>>
>> Much better than nothing though :)
>>
>> On Tue, Aug 1, 2017 at 8:21 PM Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>>> A visual diff sounds most reasonable. Never used it myself, but
>>> https://github.com/vslavik/diff-pdf is worth a try. And guess what? brew
>>> install diff-pdf
>>>
>>> On Tue, Aug 1, 2017 at 8:00 PM, Mark Burns <markthedeveloper at gmail.com>
>>> wrote:
>>>
>>>> Has anyone any recommendations or suggestions for testing PDF
>>>> generation?
>>>>
>>>> I'm working on a side project and using Prawn. Which is great. I can
>>>> programmatically generate large aspects of the content I want.
>>>>
>>>> But so far I've been tweaking then looking at the result in the browser.
>>>> It's not an absolute nightmare - a few seconds to render. But it's hard
>>>> to know whether the result is working without actually looking at it.
>>>>
>>>> The DSL is nice, but very imperative. Mocking method calls out would be
>>>> insane.
>>>>
>>>> I'm managing to refactor into small objects to represent the components
>>>> and layout, pages, typography aspects etc of the document. Which brings the
>>>> complexity back down to manageable chunks.
>>>>
>>>> But ultimately everything just calls underlying prawn DSL methods. So I
>>>> can test little bits of logic that I have in my objects, but ultimately
>>>> whether it works or not comes down to "have a look and see".
>>>>
>>>> Perhaps the best I can hope for is screenshotting when I'm happy and
>>>> using approvals to verify each major change hasn't radically borked
>>>> everything.
>>>>
>>>> It seems like there are tools to test which strings get into the
>>>> document, but that seems like the easiest part. And probably the only part
>>>> I'd be happy with test doubles for prawn and setting expectations on the
>>>> text generating methods.
>>>>
>>>> _______________________________________________
>>>> Chat mailing list
>>>> Chat at lists.lrug.org
>>>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>>>> Manage your subscription: http://lists.lrug.org/options.
>>>> cgi/chat-lrug.org
>>>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>>>
>>>> _______________________________________________
>>> Chat mailing list
>>> Chat at lists.lrug.org
>>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>>> Manage your subscription: http://lists.lrug.org/options.
>>> cgi/chat-lrug.org
>>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20170801/94a6dcc9/attachment-0002.html>


More information about the Chat mailing list