Its a question of ease of programming and deployment.
You have services A and B, when you run your application/solution as a whole A is under heavy load, B is under light load.
In order to prevent A being a bottle neck you want to add more CPU resource to A, but not to B as it would just go unused.
If you are using powerful multicore boxes you might as well put both A and B on all the boxes, and if you are good at multiprocessor programming, even have them in the same application. The OS will divide up the available processing power as per the needs of each application, some B’s will never get any load but it wont matter as the overhead of running a idle B is insignificant.
Your deployment is pretty simple because you just put all your micro-services on every box and just scale the number of boxes. If load unexpectedly increases on B you can maybe use up some slack in your A processing without having to worry.
If however you are using tiny containers that can barely handle a single A, you might want to consider the overhead of running B when its not going to get any traffic.
Or maybe each instance only has a single processor which will have to switch between working on A and B causing delays.
In this case you might find it better to separate A and B into different programs. then you have have boxes dedicated to A or B and scale them up independently.
Your deployment is more complex, but your code is simplified and its arguably more efficient resource usage.
If load on B increases or A decreases you have to worry about adjusting the scaling levels, rather than just having one big pool.
I’ve seen both done and I don’t think there is a huge difference either way tbh. I don’t think i would go as far as combining the two services in a single application though. Unless you were querying the live job being processed or something.
A central database T adds an extra wrinkle, as you obviously want to avoid it becoming a bottleneck in its self. But I think this is a common solution.