Role: Private Cloud Site Reliability Specialist
Location: Montreal, Quebec hybrid (on-site 3 days/week) role
· Provide L3 support for a private cloud, including on-call rotation
· Work closely with the internal engineering team and provide input on testing of new component releases and infrastructure upgrades, as well as performance, capacity, and monitoring
· Create and improve processes for support, including training, documentation, customer engagement, incident, problem, and change management
· Contribute to internally developed CLIs and APIs to automate SRE's activities and platform's automation
· Work together with L2 teams and other L3 team members internationally.
Qualifications:
· 5 to 10 years of relevant experience in platform maintenance/development
· Experience in at least one programming language
· Experience with maintaining complex production systems with cloud and legacy technologies
· Proven Kubernetes and Docker experience-Knowledges of monitoring stack (Grafana, Prometheus, Splunk) usage
· Strong organizational skills and ability to manage multiple tasks and high-pressure situations for outage resolution
· Communicate effectively with various user groups, e.g. developers and engineers, as well as remote team members.