Cloud LLM Ops Engineer
Job Summary:
We are seeking a passionate and experienced Cloud LLMops Engineer to join our growing team. You will play a critical role in building, automating, and maintaining our infrastructure on a leading cloud platform (Microsoft Azure, AWS, or GCP). In addition to core DevOps principles, you'll be an expert in containerization and orchestration using Kubernetes and AKS, enabling seamless deployments of our Large Language Models (LLMs) and GenAI solutions.
Responsibilities:
Utilize Infrastructure as Code (IaC) tools like Terraform to automate infrastructure provisioning and configuration for microservices deployments.
Implement and maintain Continuous Integration and Continuous Delivery (CI/CD) pipelines using tools like Azure DevOps to ensure smooth deployments of APIs and microservices.
Manage containerized applications using Docker and orchestrate deployments using Kubernetes and AKS.
Design, implement, and manage robust and scalable infrastructure on a leading cloud platform (Microsoft Azure, AWS, or GCP), utilizing their respective message routing services (e.g., Azure Service Bus, SQS, or Pub/Sub) for efficient communication within microservices architectures.
Design and implement source code management systems, branching strategy, versioning and release management.
Implement best practices for LLMops, ensuring efficient and reliable deployments of Large Language Models.
Collaborate with developers, data scientists, and researchers to understand application requirements for microservices and translate them into infrastructure solutions.
Monitor and maintain system health and performance of microservices and APIs, identifying and resolving potential issues proactively.
Implement security best practices using cloud platform services for authentication and authorization to ensure compliance with organizational security policies.
Automate routine tasks using scripting languages (Bash, Python, etc.) to improve efficiency and scalability across deployments.
Stay up-to-date on the latest advancements in cloud technologies, LLMops, GenAI solutions, and experience with configuring cloud-based load balancing and API management solutions, including familiarity with configuration options on leading cloud platforms (e.g., Azure Application Gateway, AWS API Gateway, GCP Apigee).
Participate in knowledge sharing and provide technical guidance to the team on DevOps best practices and emerging technologies.
Qualifications:
Proven experience as a DevOps Engineer with a strong understanding of cloud computing principles and microservices architectures.
Expertise in a leading cloud platform (Microsoft Azure, AWS, or GCP), including virtual machines, networking, storage, security services, and message routing.
In-depth knowledge of containerization technologies like Docker and container orchestration using Kubernetes and AKS (or equivalent platform).
Solid understanding of Infrastructure as Code (IaC) tools like Terraform.
Experience with CI/CD methodologies and tools for microservices deployments.
Familiarity with Large Language Models (LLMs) and GenAI solutions deployment best practices.
Excellent scripting skills (Bash, Python, etc.) for automation tasks.
Strong problem-solving and analytical skills.
Excellent communication and collaboration skills to work effectively with cross-functional teams.
Ability to work independently and manage multiple priorities effectively.
Passion for learning and staying current with new technologies.
Preferred Qualifications:
Experience working in a fast-paced, dynamic environment.
Certifications in cloud platform technologies (e.g., Azure Solutions Architect Expert, AWS Certified Solutions Architect - Associate).
Experience with Git version control system.