Sr. Site Reliability Engineer

Details of the offer

About Zeta
Zeta is aNext-Gen Banking Techcompany that empowers banks and fintechs to launch banking products for the future. It was founded byBhavin Turakhiaand Ramki Gaddipati in 2015.
Our flagship processing platform - Zeta Tachyon - is the industry's first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.
Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.
Zeta has over1700+employees - with over70%roles in R&D - across locations in theUS,EMEA, andAsia. We raised$280 millionat a$1.5 billionvaluation from Softbank, Mastercard, and other investors in 2021.
Learn more @www.zeta.tech,careers.zeta.tech,Linkedin,Twitter
Responsibilities

System Reliability:Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.
Automation:Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.
Incident Response and Resolution:Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.
Capacity Planning:Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.
Performance Optimization:Identifying and addressing performance bottlenecks in software systems through optimization and tuning.
Infrastructure as Code (IaC):Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.
Monitoring and Logging:Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.
On-Call Support:Participating in an on-call rotation to respond to incidents outside of regular working hours and ensure 24/7 system availability
Security:Collaborating with security teams to implement and maintain security best practices in infrastructure and application
Disaster Recovery Planning:Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failures
Continuous Improvement:Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.
Skills

Programming Languages:Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.
Automation and Scripting:Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like Terraform
Containerization and Orchestration:Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.
Cloud Computing:Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.
Monitoring and Logging:Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.
Networking:Understanding of networking concepts, protocols, and troubleshooting skills.
Security:Knowledge of security best practices, including encryption, access controls, and vulnerability management.
Continuous Integration/Continuous Deployment (CI/CD):Understanding and implementation of CI/CD pipelines for automated testing and deployment.
Load Balancing:Experience in incident response, troubleshooting, and resolution.
Version Control:Proficient use of version control systems like Git.
Experience and Qualifications

Minimum 5+ years of experience in site reliability engineering.
B.Tech/M.Techin computer science, information technology or a related field.
Having experience working for a product organization is a plus.
Certifications from cloud service providers like AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or Microsoft Certified is a plus


Nominal Salary: To be agreed

Source: Lever_Co

Job Function:

Requirements

Security Automation Engineer

At Deliveroo, it is our mission to build the definitive food company. To do that, we're building a company where everyone can belong, grow, and do the best w...


Deliveroo - Andhra Pradesh

Published a month ago

Digital Engineering Lead Engineer

Req ID:302101 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, a...


Nttdata - Andhra Pradesh

Published 25 days ago

Engineer, Td Yield Technology Rda Process

Our vision is to transform how the world uses information to enrich life forall. Micron Technology is a world leader in innovating memory and storage solutio...


Micron - Andhra Pradesh

Published a month ago

Staff Engineer, Signal Integrity

Our vision is to transform how the world uses information to enrich life forall. Micron Technology is a world leader in innovating memory and storage solutio...


Micron - Andhra Pradesh

Published a month ago

Built at: 2024-11-21T19:28:55.339Z