How can we effectively design and deliver Observability platforms and services to be employed for IBM Cloud IaaS offerings, a large scale, highly distributed cloud infrastructure? We are looking for an individual who will work on a team and will design, develop and implement automated platforms and services along with lead the AI mission for IBM Cloud Observability that will enable SRE's and developers to manage their services, reduce costs for operation, identify anomalies and reduce MTTR. The job provides the opportunity to work and collaborate with experienced and knowledgeable technical leaders and grow your career and build expertise as a platform engineer for aaS model.
Experienced in conceptualization, analysis, architecture, solution design and development of software products and services. This involves engaging in and improving the lifecycle of a observability service built for monitoring, eventing, logging, dashboarding etc. from inception and design through deployment, operation and refinement.
Experienced in using ML or Generative AI to build analytics and AI usecases, demonstrate value from inception to implementation including collaboration with the appropriate stakeholders
Experience in building, training & deploying ML models and automation of these activities
Adopt and build on automation solutions governed by SRE principles including CI CD pipelines, configuration management, immutable infrastructure deployment, auto healing systems etc.
Ensure compliance and security integrity of the environment and build secure practices. Have a deep understanding of how security impacts each stage of the development pipeline and the final product or service. Identify gaps and embed secure practices into our processes.
Support services before they go live through activities such as system design consulting, developing, testing and identifying software platforms and frameworks, capacity planning and launch reviews.
Work with and adopt open source technologies as well as participate in new IBM innovations across IaaS
A self-driven attitude to propose, test and implement solutions and improvements for review and consideration with your peers
Practice sustainable incident response and blameless post mortem.
In addition, you will mentor, share expertise and help build a self sustaining tea
You will also work with wider teams to enable consumption of service, identify gaps and provide thought leadership.