(I orignally posted my question on StackOverflow and figured maybe this would be a better place to ask)
I provide details on the librairies I’m working with because it helps understand my issue but it could be applied to other projects as this is a general question on how to do a migration while still be able to use previously saved data.
We are currently using Tensorflow 1.15 for some MLP models and Stable Baselines (SB2) for our Reinforcement Learning models. I have the task to migrate the code to Tensorflow 2 and, because Stable baselines use Tensorflow 1.x as its backend but is not compatible with Tensorflow 2 (and won’t be), to also migrate it to Stable Baselines3 (SB3) which use PyTorch has its backend.
My issue is the following: we want to still be able to use our old trained models which are used in production (both MLP and RL ones) until they are not relevant anymore while at the same time starting to train new models on Tensorflow2 and SB3. This means keeping TF1 to be able to use SB2 and our previous trained model but also adding TF2 and SB2 support for future use.
Having both TF1 and TF2 at the same time might cause issues (it doesn’t seems to be possible in Python very easily unless I want to mess with installed packages).
What is the best way to do this ?
- Should I migrate by developping same functionnality in newer versions (TF2 and SB3) and offer the choice to use both previous and new while warning about deprecation for TF1 and SB2?
- Another solution might be to develop a new branch with TF2 and SB3 support only. Then run both service at the same time (each handling models for its corresponding version) until at some point I stop running the old one. It might be tricky to maintain both at the same time though, any thoughts on that ?
Thanks in advance for any help,
P.S. I don’t know if this is the right StackExchange forum for this kind of question so let me know if not 😉
EDIT More details
For now, clients all access the same site which is hosted on a single machine. They can train, define and use models. There are 3 main services which are docker images running as service on Linux host. One service is in charge of training and evaluating models, one service for making prediction continuously with said models and finally a service that use results to control some other software/hardware. Thus they all use SB2 and Tensorflow 1.x models for now.