I am looking for a Windows command line utility that would compare two files, starting from different offsets, and report the offsets corresponding to the first identified difference. For instance, file A has a size of 100MB, file B has a size of 500MB, it could be determined that the beginning of file B matches file A at offset 104857600, but (based on a checksum comparison between file B and a 100MB block of file A starting from 104857600) that file B is not entirely contained within file A. So now I need something that does a byte-by-byte comparison between file A starting from offset 104857600 and file B starting from offset 0, then reports the offset values of the first mismatched byte.
Apparently, Windows native CLI tools comp and fc do not allow to set start offsets for the comparison (comp doesn’t even allow to compare files with different sizes). Based on this thread I tested diffutils, which doesn’t seem to suit those requirements either. I know that this can be done with a hexadecimal editor like WinHex, or dedicated compare / merge GUI utilities like WinMerge, but here a command line utility is required to process hundreds of files at once with a script.
Goal : I made a complete data recovery from a 4TB HDD, both in filesystem analysis mode and in so-called “raw file carving” mode (through file signature search) ; most of the files recovered through the second method are actually duplicates or fragments from files which could be fully recovered through the first method. Full duplicates are easy to identify, there are many dedicated tools for that purpose. I had a harder time identifying file fragments which were entirely contained within another file ; I managed to do that with WinHex’s “simultaneous search” and a PowerShell script to calculate checksums. Now I’m left with files for which a match could be found, but which are not perfect matches, meaning that they contain parts of different original files (most likely because the source drive was fragmented). At the end should only remain file fragments which do not have any counterpart within the main recovery directory.