I want to set up a database in a high reliability set-up on Azure. I’ve previously relied on DB-as-a-service offerings, but can’t do that in this case, so I’d like your feedback on the plan below. Is this enough to ensure reliable storage of data?
- An Azure Web App takes in metric data from the web, does some minor processing and sampling, and sends the data in batches to VM2.
- VM2 runs the Clickhouse database, and stores data on an Azure Managed Disk
- Some periodical job takes snapshots of the disk and stores them to cold storage
My understanding of managed disks is that they are enough to ensure reliable storage of data, that they should take away any concerns of data loss due to hardware failure. Is this correct?
Another concern is data loss due to human error, i.e. accidentally running “DROP TABLE xx” on the wrong data. I think storing periodical backups takes away this concern (i.e. allows for recovery to the last backup). Do you agree?
The recovery plan is that if VM2 fails, some monitoring process catches this and spins up a new VM2 instance attached to the same managed disk. The Web App similarly restarts if it fails.
I understand that this setup isn’t high-availability, if a VM fails there will be some window of time before it is able to store new data. This is acceptable to me. But I want to ensure that data that gets stored will not be lost, i.e. durably stored with very high probability. Is this enough to ensure that? Do you see any problems?