The Platform Engineering team is responsible for enabling the organization by developing tooling and architectural patterns to leverage the public cloud in a reliable and scalable manner. As a Site Reliability Engineer, you will serve as a technical leader for the team as well as the organization and help evolve our technology through automation and reliable architecture as well help increase velocity by adoption of such implementations.
What you'll do:
Driving solutions and implementing systems that propel the organization as we leverage the capability of the Cloud to provide a seamless Platform.
Compute Platform architecture and development (IaaS, PaaS)
Core infrastructure tools and frameworks (Configuration Management, IAM, WAF, CI/CD, Infrastructure as Code, Monitoring, HA, kubernetes, Helm, FLux, Service Mesh, etc.)
Provide expertise in the use of Google Cloud Platform (GCP) and container orchestration systems like Kubernetes.
Collaborate across Engineering and Product teams to translate application requirements to infrastructure capabilities
Maintaining an automation centric vision and incorporating SRE methodologies in an effort to increase reliability and decreasing toil
Involvement in technical design and architecture discussions and decisions as well as contributing to technical troubleshooting in various part of the stack
What's great about the role:
You will have the opportunity to contribute greatly to an extremely engineering-focused organization.
Your contributions will have a noticeable impact on our members as well as your fellow Karmanauts (that's what we call ourselves).
You will be involved in organizational efforts of continuous improvement to increase and ensure the reliability of Credit Karma.
You will get broad exposure to our full stack, consisting of desired and progressive technologies such as Docker, Kubernetes, Terraform, Google Cloud Platform, Jenkins, etc.
You will grow and learn and have fun doing it--it's part of our culture.
And, of course, all those awesome company perks that you have probably already read about.
Minimum Basic Requirement:
6+ years of experience in a web-centric Linux production environment building and architecting infrastructure to scale modern web applications and supporting components.
Strong understanding of Computer Engineering with a focus on Infrastructure, Platform, and Application (Cloud, Containerization, Container orchestration, Network, Application Reliability, Database Architecture).
Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
Experience running Infrastructure at scale; utilizing Configuration Management and automation to ensure idempotency, ephemerality, and reliability.
Strong experience in scripting or relevant programming skills (Python, Go or other higher-level OOP languages) for automating repetitive tasks.
Deep understanding of container workload deployments as well as related orchestration technologies (ie: Kubernetes) as well as experience with securing Terraform Infrastructure as Code deployments.
Familiar with SRE methodologies and passionate about solving operation problems by utilizing automation and software.
Ability to communicate effectively vertically and horizontally within the organization via demonstrated written and verbal communication skills.
Extensive experience securing GCP/AWS workloads in an enterprise-scale, mixed-vendor environment.
Preferred Qualifications:
Ability to communicate effectively vertically and horizontally within the organization via demonstrated written and verbal communication skills.
Eagerness to challenge the status quo, balanced with a reasonable and methodical approach to effecting change
Self-starting attitude and fearless ascent up the learning curve. Thrives in a distributed workplace environment
Advanced experience with container orchestration systems like Kubernetes, Mesos, etc.
Experience working in a hybrid environment as well as Public Cloud, e.g. Google Cloud Platform, AWS, Azure, etc.