Senior Staff Machine Learning Engineer, GenAI Platform
Job Description:
- Lead and execute the vision, strategy, and roadmap for Reddit’s large-scale GenAI Platform.
- Define the platform architecture and operating model that enable teams to build, deploy, and scale GenAI products reliably.
- Drive the strategy for a unified LAG Gateway supporting internally and externally hosted LLMs through consistent APIs and abstractions.
- Set the direction for core platform capabilities such as rate and token limit management, intelligent failover, and production resilience.
- Shape Reddit’s approach to an enterprise-grade RAG system
- Establish the strategic direction for agentic AI workflows and tool-use patterns across the platform.
- Own the end-to-end platform strategy from concept through production adoption and long-term evolution.
- Drive MLOps and LLMOps standards across CI/CD, testing, versioning, evaluation, and lifecycle management.
- Define best practices for observability, monitoring, governance, and operational excellence across GenAI systems.
- Partner across engineering, product, and leadership to align platform investments with company priorities and user needs.
- Champion platform thinking with a strong focus on scalability, reliability, performance, and developer experience.
- Influence technical direction across teams by turning emerging AI capabilities into a scalable platform strategy.
Requirements:
- 10+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
- Have a track record of leading technical strategy and delivering AI platforms in cloud-based production environments at scale.
- Demonstrate strong execution by turning strategy into action, driving complex initiatives end to end, and consistently delivering high-quality platform outcomes.
- Bring deep experience operating Kubernetes and other orchestration systems in large-scale production environments.
- Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more
- Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
- Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders
- Strong focus on scalability, reliability, performance, and developer experience. You are an undying advocate for platform users and have a deep intuition for the genAI product development lifecycle.
- Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus
Benefits:
- Comprehensive Healthcare Benefits and Income Replacement Programs
- 401k with Employer Match
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave
Apply tot his job Apply To this Job