I need a design for multiple clients to process an entire database table, and processing each row independently and without overlap.
If there are 10,000 rows and 5 clients, each client will eventually process approximately 2,000 rows each. That is, the idea is to process all of the rows, using 1 to N workers, by having the workers each do only part of the work.
- Each client will temporarily reserve a small set of rows, eventually marking one row in the working set as completed, and then release them all back to the pool of available rows. Rows are reserved in sets for performance reasons, but processed one row at a time. Processing one row is resource intensive and consumes significant clock time.
- Rows in the table must be processed approximately in decreasing row number order (it’s the primary key). I say approximately because if it makes things simpler, a bit of “randomness” or temporary gaps is OK.
- Every row must eventually be selected, and completed by one of the clients.
Additional details, if they matter to proposed designs:
- Client requests arrive via HTTP.
- There is private session data per client which can preserve client state.
- Database is MySQL.
- There is shared memory caching available, e.g. memcached and Redis.
This feels like a relatively common problem, but I wasn’t able to find any applicable writing in my web searches.
I’m looking for suggestions.