I used in-depth learning in my research and I added a second 1080ti to my computer because using only one did not allow me to finish my experiments on time. Tensorflow will use almost 100% GPU and some CPUs.
Problem: If I run tensorflow with both GPUs, the system shuts down after about 30 seconds and will not be submitted. I have to remove the first gpu to turn it back on (I can then add the first GPU).
- 1x shut up! Dark Power Pro 11 750 W ATX 2.4 (BN252)
- 1 time ASUS Prime X370-Pro (90MB0TD0-M0EAY0)
- 1x AMD Ryzen 5 1600 (TDP: 65W), 6x 3.20GHz, in box (YD1600BBAEBOX)
- 2 x MSI
GeForce GTX 1080 Ti (250 W) Gaming X 11G, 11GB GDDR5X, DVI, 2x HDMI, 2x DP
- 1x Samsung SSD 850 EVO 250GB, SATA (MZ-75E250B)
- 3 Seagate IronWolf NAS 10 TB hard drives, SATA 6 GB / s (ST10000VN0004)
- 1x G.Skill Aegis 16GB DIMM Kit, DDR4-3000, CL16-18-18-38 (F4-3000C16D-16GISB)
I am using PCIEX16_1 and PCIEX16_2 for GPUs.
What I have tried so far:
- Tensorflow execution with each GPU (100% GPU utilization) -> OK for both GPUs
- Checking the temperature of both GPUs when using GPU parallel: -> OK max temp <80 ° C
- Check that nothing is overwritten -> OK
Can someone guide me through the next steps to understand the problem?
Thank you all for your help.