As a Site Reliability Engineer focused on Big Data technologies, your role covers the entire life cycle of multiple products — from helping developers with architecture and delivery to on-call incident response and assessment Your primary focus will be automation and continuous integration/delivery with an emphasis on solving operations issues using software You will report to the SRE/DevOPS Manager. Responsibilities: You will help build and operate a unified platform across EA, extract and process massive data from spanning 20+ game studios and use the insight to serve massive online requests. Incident management: Use automation technologies to ensure repeatability, eliminate toil, reduce mean time to detection and resolution (MTTD MTTR) and repair services. Perform root cause analysis and post-mortems with an eye towards future prevention. Infrastructure Automation: Maintain automation scripts and tools using Python to improve infrastructure provisioning, configuration, and management. Continuous Integration/Continuous Deployment (CI/CD): Implement and improve CI/CD pipelines to automate software delivery, testing, and deployment processes. Monitoring: Design and set up monitoring systems to track the health and performance of applications and infrastructure components. Configure alerts and respond to incidents promptly. Infrastructure as Code (IaC): Leverage IaC tools (e.g., Terraform, Ansible) to define and manage infrastructure configurations, ensuring consistency and reproducibility. Containerization and Orchestration: Work with containerization technologies like Docker and orchestration tools like Kubernetes to manage and scale containerized applications. Security and Compliance: Collaborate with security teams to maintain security best practices, including access controls, vulnerability management, and compliance monitoring. Performance Optimization: Identify bottlenecks and performance issues in the infrastructure and applications and implement optimizations to enhance system efficiency. Disaster Recovery and Backup: Maintain disaster recovery plans and backup strategies to ensure data integrity and business continuity. Documentation: Create clear and comprehensive documentation for infrastructure configurations, procedures, and troubleshooting guides. Collaboration: Collaborate with teams, including software developers, system administrators, and QA engineers, to address operational challenges and improve system reliability. Qualifications: Bachelors degree in computer science, Information Technology, or a related field (or equivalent work experience). Experience as a DevOps/SRE engineer with expertise in 7+ years along with Python/Golang programming. Knowledge of Linux/Unix systems and administration. You are proficient in configuration management tools such as Ansible, Puppet, or Chef. Experience with cloud computing platforms (e.g., AWS or Google Cloud). Familiarity with containerization and orchestration technologies like Docker and Kubernetes. Understanding of CI/CD concepts and experience with CI/CD tools (e.g., Argo CD, Jenkins, and GitLab CI/CD). Knowledge of version control systems (e.g., Git). Certification in relevant DevOps/SRE technologies.

Job Overview
We use cookies to improve your experience on our website. By browsing this website, you agree to our use of cookies.

Sign in

Sign Up

Forgotten Password