apache 2.4 – Zip file (i.e. docx) is corrupt after downloading through reverse proxy


I’m building a reverse proxy with Apache so people can use a certain service, that they otherwise couldn’t. The whole website is working fine, but when people download an docx file, it’s a corrupted zip file. With a very minimal docx file (a document with only the word “foobar” in it), unzip on macos reports the following:

Archive:  foobar.docx
warning (foobar.docx):  16167 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error (foobar.docx):  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

No idea what’s causing this. The file still looks like a docx/zip file when opened with a hex editor. Of course Word complains as well, but its error is useless. I can both unzip the file and open it in Word no problem if I download it directly (i.e. avoiding the reverse proxy).

Also, file agrees it looks like a zip archive:

$ file foobar.docx 
foobar.docx: Zip archive data, at least v2.0 to extract

…however, since it’s a docx, the output should actually look like this:

$ file orig.docx 
orig.docx: Microsoft Word 2007+

Here’s the apache reverse proxy config that I’m currently using:

<VirtualHost *:80>
        # that's the outside url of the load balancer in front of us
        ServerName https://outside.url.com
        UseCanonicalName on

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

        # enable SSL to talk to the remote server
        SSLProxyEngine on
        # disable forward proxy
        ProxyRequests off

        # proxy all /*.someurl.com requests to their https://*.someurl.com
        ProxyPassMatch "/(.*.someurl.com.*)" "https://$1"

        # proxy all other requests to http://www.someurl.com
        ProxyPass "/" "https://www.someurl.com/"
        ProxyPassReverse "/" "https://www.someurl.com/"

        ProxyHTMLEnable on
#       ProxyHTMLExtended on

        SetOutputFilter INFLATE;proxy-html;DEFLATE

        # turn all subdomains in the html/css/js code into relative paths instead
        # e.g. https://static.someurl.com becomes /status.someurl.com
        ProxyHTMLURLMap "http(s?)://(.*.someurl.com)" "/$2" Rni

        AddOutputFilterByType SUBSTITUTE text/html text/css application/x-javascript application/javascript application/json
        Substitute "s|http(s?)://(.*.someurl.com)|/$2|i"

</VirtualHost>

Before someone points at the load balancer mentioned in comments above – the issue persists if I circumvent the load balancer.

Some notes:

  • I was able to confirm only zipped files (such as docx and xlsx) are affected, e.g. doc, xls or pdf are fine.
  • Also, removing DEFLATE doesn’t change anything

EDIT:

  • Adding RequestHeader unset Accept-Encoding and removing INFLATE also doesn’t change anything
  • I’m wondering whether the issue is somehow that the file is transfered ‘chunked’, but even when setting SetEnv proxy-sendcl it’s still sent ‘chunked’