Job Description
- Drive the development and adoption of automation tools and frameworks to improve efficiency and reduce toil. Identify opportunities to automate manual tasks.
- Design and implement comprehensive monitoring and alerting systems.
- Analyze performance data to identify bottlenecks and optimize system performance.
- Promote SRE best practices and principles within the organization. Influence architectural decisions to improve system reliability.
- Contribute to the development of the SRE roadmap and strategy. Identify and prioritize key initiatives.
- Stay up -to -date on the latest SRE trends and technologies. Research and evaluate new tools and techniques.
Qualifications
- Proven experience as Devops / IT Automation / Application Support / System Engineer or similar position at least 4 years' experience.
- Bachelor of Information Technology
- Strong knowledge and understanding on Linux OS, Kubernetes and Docker Swarm and/or other microservices technologies, GCP / AWS cloud technologies.
- Experienced with Automation tools like Terraform, Ansible, etc.
- Experience with AI/ LLM (Large Language Model)
Technical
- Linux administration skills
- Ansible / Terraform skills
- Bash / Python / other automation code skills
- AWS / GCP skills
- Docker / Kubernetes skills
- Troubleshooting skills
- CI / CD
- Elasticsearch, Grafana or other observability skills
Knowledge
- Cloud Technology
- REST API
- Networking
- Monitoring Tools
- Automation / Scripting
- Postman / API Testing
- TCP/IP Knowledge
- Microservices
- AI Model Knowledge