- Create a monitoring and alerting system and monitor the system's reliability.
- Investigate and fix infrastructure problems quickly to reduce downtime and guarantee ongoing system performance enhancements.
- Document every action so your findings turn into repeatable actions–and then into automation.
- Work with SA or the Product engineering team to solve the problem of the common deployment, system, network, or reliability.
- Manage Capacity and Resources Administer and manage SRE tooling.
- Conduct security assessments and vulnerability scans of our applications and infrastructure.