A testing conundrum

"In coverage.py, I have a class for computing the fingerprint of a data structure. It's used to avoid doing duplicate work when re-processing the same data won't add to the outcome. It's designed to work for nested data, and to canonicalize things like set ordering. The slightly simplified code looks like this: To test this, I had some basic tests like: The last line in the update() method adds a dot to the running hash. That was to solve a problem covered by this test:"

"The Hypothesis library would let me generate data, and I could check that the desired properties of the hash held true. It took me a while to get the Hypothesis strategies wired up correctly. I ended up with this, but there might be a simpler way: This doesn't make completely arbitrary nested Python data: sets are forced to have elements all of the same type or I wouldn't be able to sort them."

A Hasher class computes a fingerprint for nested Python data structures to avoid duplicate processing when inputs do not change the outcome. The Hasher canonicalizes unordered containers by sorting set and dict elements so equal values yield identical hashes regardless of iteration order. Tests exercise basic behaviors and an update() method appends a dot to separate adjacent items. Property-based testing with Hypothesis was used to generate nested data, constraining sets to uniform element types and dict keys to strings. The Hypothesis checks confirmed stability of repeated hashing while exposing limitations in covering all edge cases.

#hashing #canonicalization #property-based-testing #hypothesis #nested-data

Read at Nedbatchelder

Unable to calculate read time

Collection

[

...

]

A testing conundrumA testing conundrum Briefly

A testing conundrum
A testing conundrum
Briefly