Site Reliability Engineer - Verona, Italia - WESTHOUSE ITALIA SRL

    WESTHOUSE ITALIA SRL
    WESTHOUSE ITALIA SRL Verona, Italia

    Trovato in: Talent IT 2 C2 - 2 settimane fa

    Default job background
    Descrizione

    Westhouse is a leading company operating globally in the field of research, selection, recruitment, and project management, and it is authorized on a permanent basis for the administration of labor with Ministerial Authorization Prot. n of 03/08/2018.

    For our American client based in Verona, operating in the medical sector and specializing in Software Development, we are currently searching for a

    Site Reliability Engineer

    As a Site Reliability Engineer, you will work closely with the key stakeholders in Software Engineering to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, incident retrospectives, chaos testing, and end-to-end ownership.

    Key Accountabilities:

    • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards.
    • Manage site stability, performance, reliability, and maintain uptime for production environments.
    • Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns.
    • Strive for automation to reduce toil and increase development velocity.
    • Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed.
    • Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
    • Document resolution run books and standard operating procedures.
    • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
    • Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams.
    • Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.,)

    Skills

    • Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
    • Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic, Prometheus, Grafana etc.,)
    • Experience implementing observability plans around logs, metrics, and traces.
    • Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation).
    • Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansilbe, Chef.
    • Experience with Release automation, system administration, configuration management.
    • Experience with programming languages (Java, Python, Go, etc).
    • Strong understanding of Linux,Windows, software development, systems, networking, and cloud concepts.

    Benefit

    • Permanent employment by the company
    • Benefits package including health and dental insurance, life insurance, disability coverage
    • Onboarding, ongoing training, mentoring and career pathing
    • Hybrid working mode (1 day a week based in Verona)

    Workplace: Verona (Hybrid)

    Candidates residing in Verona are expected to be in the office one day per week, while those residing outside Verona are required to be present in the office once a month.

    Candidates of both sexes (Legislative Decree no. 198/2006) are invited to read the privacy policy in accordance with Articles 13 and 14 of EU Reg. 679/2016 at the following address: ***/privacy-pro-eng /
    Please also note that the curricula may be considered for other vacancies and may also be managed and communicated using our own tools and/or the client company.