About Zeta
Zeta is aNext-Gen Banking Techcompany that empowers banks and fintechs to launch banking products for the future. It was founded byBhavin Turakhiaand Ramki Gaddipati in 2015.
Our flagship processing platform - Zeta Tachyon - is the industry's first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.
Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.
Zeta has over1700+employees - with over70%roles in R&D - across locations in theUS,EMEA, andAsia. We raised$280 millionat a$1.5 billionvaluation from Softbank, Mastercard, and other investors in 2021.
Learn more @www.zeta.tech,careers.zeta.tech,Linkedin,Twitter
Responsibilities
System Reliability:Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.
Automation:Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.
Incident Response and Resolution:Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.
Capacity Planning:Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.
Performance Optimization:Identifying and addressing performance bottlenecks in software systems through optimization and tuning.
Infrastructure as Code (IaC):Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.
Monitoring and Logging:Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.
On-Call Support:Participating in an on-call rotation to respond to incidents outside of regular working hours and ensure 24/7 system availability
Security:Collaborating with security teams to implement and maintain security best practices in infrastructure and application
Disaster Recovery Planning:Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failures
Continuous Improvement:Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.
Skills
Programming Languages:Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.
Automation and Scripting:Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like Terraform
Containerization and Orchestration:Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.
Cloud Computing:Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.
Monitoring and Logging:Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.
Networking:Understanding of networking concepts, protocols, and troubleshooting skills.
Security:Knowledge of security best practices, including encryption, access controls, and vulnerability management.
Continuous Integration/Continuous Deployment (CI/CD):Understanding and implementation of CI/CD pipelines for automated testing and deployment.
Load Balancing:Experience in incident response, troubleshooting, and resolution.
Version Control:Proficient use of version control systems like Git.
Experience and Qualifications
Minimum 5+ years of experience in site reliability engineering.
B.Tech/M.Techin computer science, information technology or a related field.
Having experience working for a product organization is a plus.
Certifications from cloud service providers like AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or Microsoft Certified is a plus