Are there resources / best practices for running a highly available / high speed Lightning node?
In Lightning Network, any disruption (power outage, server failure) when running the application may result in permanent loss of funds. It is therefore recommended to run your server in a managed installation like cloud or colo if you are not able to provide high reliability servers yourself. This should be especially true in your case because you will run multiple nodes that will route payments. Any malfunction of the server can cause direct monetary losses.
To achieve high throughput (higher lightning routing), you can use the autopilot feature to determine which nodes to connect to and how much liquidity you need to have in each channel. The things you may want to consider are to increase the centrality of your node, to increase the probability of having well funded routes and to improve your presence in your own geography.
If my Lightning service needs to take off, do I need to run multiple Lightning nodes or a single Lightning node?
It depends on the service you want to provide. If you want to provide a preservation solution, running a Lightning node can be considered. However, increasing the number of nodes will increase the likelihood of your routing capabilities, prevent bottlenecks and therefore provide better customer service.
If you want users to be in charge of their funds, you have no choice but to use multiple Lightning nodes, one for each user.
Maybe each channel could be managed on a different machine?
No need to run different nodes on a different machine. You can create separate directories on the same machine and run the Lightning node.