Site Reliability Engineer

Forefront Technologies International Inc.

Key Responsibilities:

– Maintain and enhance the reliability, availability, and performance of large-scale distributed systems.

– Automate deployment, monitoring, and management of production systems. – Implement and manage CI/CD pipelines for software delivery.

– Collaborate with software engineers to design, build, and manage scalable and resilient infrastructure.

– Troubleshoot complex system issues, identify root causes and implement long-term solutions.

– Monitor system performance and optimize configurations for better performance and cost efficiency.

– Implement security best practices and ensure compliance with industry standards.

Requirements

Required Skills:

– Proficiency in cloud platforms (AWS, Google Cloud, or Azure) and containerization technologies like Docker and Kubernetes.

– Strong scripting and automation skills using Python, Bash, or similar languages.

– Experience with infrastructure as code (IaC) tools such as Terraform or Ansible.

– Deep understanding of monitoring and logging tools (Prometheus, Grafana, ELK Stack).

– Knowledge of database management (SQL/NoSQL) and networking fundamentals.

– Experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI.

– Strong problem-solving skills and experience in troubleshooting large-scale systems.

Education:

– A degree in Computer Science, Engineering, or a related field from a recognized institution.

– Ideally, 5-10 years of experience in a similar role at a product company.

Source
remotive.com

Comments are closed.