I’m building a reverse proxy with Apache so people can use a certain service, that they otherwise couldn’t. The whole website is working fine, but when people download an docx file, it’s a corrupted zip file. With a very minimal docx file (a document with only the word “foobar” in it), unzip on macos reports the following:
Archive: foobar.docx warning (foobar.docx): 16167 extra bytes at beginning or within zipfile (attempting to process anyway) error (foobar.docx): start of central directory not found; zipfile corrupt. (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)
No idea what’s causing this. The file still looks like a docx/zip file when opened with a hex editor. Of course Word complains as well, but its error is useless. I can both unzip the file and open it in Word no problem if I download it directly (i.e. avoiding the reverse proxy).
file agrees it looks like a zip archive:
$ file foobar.docx foobar.docx: Zip archive data, at least v2.0 to extract
…however, since it’s a docx, the output should actually look like this:
$ file orig.docx orig.docx: Microsoft Word 2007+
Here’s the apache reverse proxy config that I’m currently using:
Before someone points at the load balancer mentioned in comments above – the issue persists if I circumvent the load balancer.
- I was able to confirm only zipped files (such as docx and xlsx) are affected, e.g. doc, xls or pdf are fine.
- Also, removing DEFLATE doesn’t change anything
RequestHeader unset Accept-Encodingand removing INFLATE also doesn’t change anything
- I’m wondering whether the issue is somehow that the file is transfered ‘chunked’, but even when setting
SetEnv proxy-sendclit’s still sent ‘chunked’