design – How to architecture the big files download?

I believe you are overthinking this. The slowest link in the chain, whether it’s disk or network is going to be the only real limiting factor. You have a choice, you can delegate downloads to another service, or you can serve it yourself. If possible, I would stage your files on SSD drives, which have bandwidth that exceeds your network speeds. Either that, or have another service host it, and simply redirect to the file location.

Offload the problem

For example, AWS S3 buckets can serve large files to millions of concurrent users without breaking a sweat. If you simply have that serve your downloads, then you don’t have to do anything fancy.

So it yourself

The main thing you have to do is to stream the data. When you look at the braindead examples on most tutorials, they have you read the entire image into memory and copy it to the output. That requires you to have at least 500MB of free RAM for each concurrent request.

By streaming data, you only need a couple megabytes for each concurrent request for buffering. You would use a StreamingResponseBody to send the data back. Here is another article to help you out.

The concept is to read a few bytes and serialize that, read a few more bytes, and serialize that. With a small buffer, you can easily max out the Tomcat server hosting your Spring application with roughly 1 GB of memory usage. That makes your service a whole lot more economical to host, and scaling is also easy.