log files – Server constantly crashing under load, are my SSDs failing?

My system is locking up constantly under load, and this pops up in the journal during each reboot:

May 02 21:24:34 fred smartd(1590): Device: /dev/sdb (SAT), SMART Prefailure Attribute: 3 Spin_Up_Time changed from 200 to 196
May 02 21:24:34 fred smartd(1590): Device: /dev/nvme0, number of Error Log entries increased from 80 to 81
May 02 21:24:34 fred smartd(1590): Device: /dev/nvme1, number of Error Log entries increased from 69 to 70
May 02 21:24:34 fred smartd(1590): Device: /dev/sda (SAT), state written to /var/lib/smartmontools/smartd.SanDisk_SSD_PLUS_120GB-202401A009EE.ata.state
May 02 21:24:34 fred smartd(1590): Device: /dev/sdb (SAT), state written to /var/lib/smartmontools/smartd.WDC_WD120EDAZ_11F3RA0-5PJVAK5F.ata.state
May 02 21:24:34 fred smartd(1590): Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Force_MP600-21028230000128565120.nvme.state
May 02 21:24:34 fred smartd(1590): Device: /dev/nvme1, state written to /var/lib/smartmontools/smartd.Force_MP600-2102823000012856516D.nvme.state

Does this mean the crash was caused by an SSD failure, or that the SSD was affected by the crash? When I check those logs, there is no information other than some command completing “successfully”.

Error Log Entries for device:nvme0 entries:63
.................
 Entry( 0)
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
cs           : 0
.................
 Entry( 1)
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
cs           : 0
.................

( lots of the exact same log here )

.................
 Entry(62)
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
cs           : 0
.................

And this is what the smart log says.

critical_warning                    : 0
temperature                         : 35 C
available_spare                     : 100%
available_spare_threshold           : 5%
percentage_used                     : 1%
data_units_read                     : 82,459,944
data_units_written                  : 96,773,163
host_read_commands                  : 350,279,974
host_write_commands                 : 934,553,230
controller_busy_time                : 1,617
power_cycles                        : 55
power_on_hours                      : 110
unsafe_shutdowns                    : 35
media_errors                        : 0
num_err_log_entries                 : 81
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0

The error and smart log is pretty much the same for the other ssd. They’re both on the latest firmware
EGFM13.0 as far as I can tell, Corsair doesn’t only make it a pain to update the firmware but also doesn’t actually tell you if there is a new one out.

Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     21028230000128565120 Force MP600                              1           2.00  TB /   2.00  TB    512   B +  0 B   EGFM13.0
/dev/nvme1n1     2102823000012856516D Force MP600                              1           2.00  TB /   2.00  TB    512   B +  0 B   EGFM13.0

OS : Ubuntu Server 20.04
CPU : AMD Ryzen 9 5950X
RAM : 2x 32GB Corsair Vengeance LPX DDR4-3600 CL18-22-22-42
MBD : Asus Prime B550-Plus
SSD : 2x 2TB Corsair MP600 in software RAID 0
PSU : Corsair HX750