Oh, hey - that's my post. I have a blazingly fast re-implementation of these algorithms in Rust, which will soon make an appearance as a set of command-line tools. This will be a companion to the interactive in-browser implementation at https://binvis.io.
If this sort of thing floats your boat, please get in touch. The code is not public yet because it's not ready yet, but I'd welcome like-minded collaborators.
Space filling curves have some super important applications for indexing and information retrieval. I recently stumbled on an fascinating library called Uzaygezen for multidimensional Hilbert space filling curves. Extremely high quality code and documentation:
What a nice way to make 1d -> 2d. I made my MSc thesis about data visualization and the thing I found useful was excess entropy, which means how well you can you predict the next bit better, if you take one bit more into the sliding window you use to predict the next bit/byte. That is usually really dependent about the sliding window size. Imagine what happens with text written in 8 bit characters. With that trick one could make a 3d visualization.
I love this! Years ago I toyed with the idea of trying to map arbitrary files to some sort of 2d "hash" thumbnail image, as people are visually oriented and remember things visually. I wanted there to be some sort of continuity between small changes in the file that shows up immediately visually. This seems to solve those problems!
That's quite possible! Though after staring at thousands of these across many application domains, I do think the space-filling curves perform legitimately better.
Another factor is that the large discontinuities in the zig-zag curve means that a contiguous area in the data is not always contiguous in the visualisation, which makes things like the region selection I do for https://binvis.io impossible.
That is, given an image generated this way, can you get the original binary back?
If so, it could be useful in glitching audio in interesting ways, using image editing tools.
It would also be interesting to hear particular executable binaries sound when converted to audio from this representation. Would differences in different types of binaries be distinguishable by human audio pattern recognition?
Also useful for this would be maximally permissive image representation requirements for the trip back to binary. Of course, this would be difficult for binaries meant to be executed as code, as arbitrary binaries are unlikely to be executable, but for transformation of images to audio it should be much simpler to ensure that any image makes a valid, playable audio file.
how does this help you compare binaries? I am trying to understand what questions the visual helps you answer that you don’t get from summary statistics like entropy? Also, how does this work for you binaries with very different sizes? This looks neat and I have seen similar concepts like this on other data sets: https://xkcd.com/195/
Batelle's CantorDust combines the visualization concept with a convenient UX for selecting blocks of code graphically and zooming in on the corresponding hex, or vice-versa. The "devil is in the details" with respect to the UX for these kinds of tools. The visualization or 2D image by itself is somewhat less useful without being able to snap to the corresponding part of the hex or IDA/Ghidra disassembly.
I do think adding a 3rd dimension to the visualization probably adds somewhat more utility as well. The recently released open source package for CantorDust seems to omit the 3D visualizations which were shown in the demo linked here.