About the Role
The systems reliability engineer (SRE) will apply aspects of software development skills to infrastructure and operations work. Since the main goal of this role is to create scalable and highly reliable systems, the SRE will spend up to half of its time doing operations-related activities such as supporting issues, writing documentation, and system management.
The SRE should spend the rest of its time delivering development tasks such as new features, scaling, and automation.
Roles and Responsibilities
Provide technical leadership on large/complex systems and platform opportunities.
Build tooling to support the automation, management, and reliability of applicable systems.
Build and support release pipelines for applicable systems.
Work as an integrated part of a development/delivery team to include SRE practices as part of solution design.
Manage the system lifecycle from design and implementation, to turn-down and decommissioning.
Write documentation for peers and business partners supporting applicable systems.
Work with business partners to define SLOs and SLIs and build robust monitoring solutions supporting agreed-upon metrics.
Resolve P1 and P2 tickets using a proven systematic approach that focuses on returning safely returning to full system capability and creating plans to fix root-cause issues.
Lead communications efforts regarding both system issues/activities as well as blameless post-mortems.
Provide mentorship to junior team members.
Performs other related duties as assigned.
Strong ability to optimally multitask and prioritize work, juggling daily support responsibilities with multiple product/project-driven activities.
Strong troubleshooting and problem-solving skills.
Work independently and with a delivery team toward releasing features on agreed-upon timelines.
Strong ability to support SQL database technologies including SQL Server, Azure SQL PaaS, and Azure SQL Managed Instances.
Provide comprehensive administration of core platforms including backups, recovery, monitoring, maintenance, and upgrades.
Scripting and automation experience with Infrastructure as Code languages such as Azure Resource Manager, AWS Cloud Formation, Ansible, or Terraform.
Experience writing YAML code to build and manage Azure DevOps or Github Actions pipelines.
Proficiency with Azure resource management and operations.
Familiarity with Linux Server administration and operations.
Familiarity with Azure Kubernetes services or any other Kubernetes managed service.
Requirements
Bachelor’s Degree in Computer Science or related field and 5+ years of experience or equivalent combination of education and experience.
Minimum 4 years in an enterprise-level system engineering or reliability engineering role.
Strong base knowledge of operating systems, networking basics, and security best-practices.
Working knowledge of Agile delivery and DevOps principles.
Proficiency in one or more programming or scripting languages (PowerShell, C#, Go or Python).
Technical certifications are a plus.
Benefits
Private Health Insurance
Paid Time Off
Pension Plan
Work From Home
Training & Development
Apply via :
apply.workable.com