As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
We are looking for a dynamic Site Reliability Engineer to join our Cloud IaaS Team in Bengaluru, India, who is responsive to market needs, to deliver value to our clients in a fast-changing cloud landscape. The SRE team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data centre design, Storage & Network architecture and compute clusters to flexible infrastructure services. We are building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.
Role and Responsibilities:Manage and maintain Linux-based systems across multiple environments.
Automate provisioning, configuration, and deployment tasks using tools like Ansible and Jenkins
Design, implement, and manage deployment of containerized applications using Kubernetes and docker.
Monitor and troubleshoot system performance, network issues, and applications to ensure optimal uptime and efficiency.
Harden the server from scratch using baseboard management controller (BMC)s.
Implement and maintain security best practices, ensuring compliance with company policies.
Proactively identify potential improvements to processes and systems.
Analyze and fix network & DNS issues in the environment.
Upgrade Kubernetes worker nodes and packages without interrupting the cluster.
Maintain benchmarking standards on systems to ensure continuous compliance.
Participate in on-call rotation to support critical infrastructure issues.