Andrew,
Concerns about reliability have always been a concern with systems dating back decades (think Apollo 13 and "Failure is not an option"). With systems design, the major numbers to look at were MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair/Replace).
These days, reliability may start to be a concern in DevOps approaches once you have a means of deploying to production efficiently (e.g. your CI/CD pipeline is stabilized). You can see how resilient your production system is and design for improving your MTBF/MTTR. Netflix is a famous example of doing this with their "Chaos Monkey" suite of tools.
An EXCELLENT resource for looking at this is a series of blog articles from Sanjeev Sharma regarding DevOps and reliability. Links to the articles are below.
https://sdarchitect.blog/2017/06/26/cloud-service-reliability-part-i-apollo-13-to-google-sre/
https://sdarchitect.blog/2017/07/19/cloud-service-reliability-part-2-houston-we-have-an-outage/
https://sdarchitect.blog/2017/08/18/cloud-service-reliability-part-3-antifragile-when-devops-met-sre/