[Remote] Staff Site Reliability Operations Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Calix is a company focused on enabling Communication Service Providers to transform and future-proof their businesses through a cloud-first, AI-powered platform. They are seeking a Staff Site Reliability Engineer to lead their global platform reliability and observability strategy on Google Cloud Platform, leveraging advanced technologies to build intelligent infrastructure and provide technical leadership.

Responsibilities

Full-Stack Network Architecture: Architect, optimize, and troubleshoot complex networking infrastructure spanning Layer 1 through Layer 7, ensuring low-latency data transport, secure edge routing, and seamless service mesh integration
Grafana Stack Architecture: Design, scale, and optimize our unified observability platform using the Grafana Labs suite (Grafana, Mimir, Loki, Tempo, and Beyla)
AIOps & Intelligent Alerting: Deploy machine learning models and automated anomaly detection to cut through telemetry noise, reduce alert fatigue, and predict network or data pipeline bottlenecks
GKE Platform Engineering: Drive the architecture, scaling, security, and networking of production Google Kubernetes Engine (GKE) clusters
Data & Event Streaming Reliability: Tune, and maintain high-throughput Apache Kafka clusters to guarantee low-latency event delivery and high availability
Large-Scale Database Management: Ensure the performance, scalability, and disaster recovery readiness of our transactional and analytical data tiers across PostgreSQL, AlloyDB, and BigQuery
Automated Incident Response: Integrate AIOps insights with Grafana workflows to automate triage, accelerate root-cause analysis, and trigger auto-remediation scripts
Technical Leadership: Champion the long-term technical roadmap for distributed infrastructure engineering and GCP cloud-native observability standards
Mentorship: Coach senior and junior engineers on advanced debugging techniques, distributed systems thinking, and intelligent operations across a distributed workforce

Skills

Proven track record of high autonomy and successful delivery in a 100% remote engineering environment
8+ years in SRE, Production Engineering, or Distributed Systems infrastructure roles
Deep technical knowledge and debugging mastery across all OSI layers, including: L1-L3: Physical/fiber infrastructure awareness, switching, and advanced routing protocols (BGP, OSPF)
Transport layer tuning (TCP congestion control algorithms, UDP, QUIC)
Session management, TLS termination, DNS architecture, and advanced application protocols (HTTP/3, gRPC)
Expert-level mastery of Google Kubernetes Engine (GKE) internals, custom controllers, multi-cluster networking, and GitOps workflows
Proven track record managing high-throughput Apache Kafka pipelines and large-scale data environments across PostgreSQL, AlloyDB, and BigQuery
Deep, hands-on experience deploying and managing Grafana Enterprise/Cloud, Prometheus/Mimir, Loki, and Tempo at scale
Track record applying AI/ML techniques for time-series anomaly detection, log clustering, and correlation (e.g., Grafana Adaptive Metrics, BigPanda)
Advanced, production-scale expertise utilizing HashiCorp Terraform exclusively to provision and manage multi-region GCP cloud architectures
High proficiency in Go and Python for building custom infrastructure tooling, Kubernetes operators, and data integration scripts
Exceptional written and verbal communication skills, with an emphasis on creating clear documentation for asynchronous alignment
Deep knowledge of Google Cloud architectural best practices, Cloud SDN, Cloud Armor, Interconnect, Identity and Access Management (IAM), and cost optimization
Deep understanding of Linux internals, eBPF-based monitoring, kernel-level networking, and packet analysis tools (Wireshark, tcpdump)

Benefits

As a part of the total compensation package, this role may be eligible for a bonus.

Company Overview

Calix provides the cloud, software, systems and services for service providers to simplify business, excite subscribers and grow value It was founded in 1999, and is headquartered in San Jose, California, USA, with a workforce of 1001-5000 employees. Its website is http://www.calix.com.

Company H1B Sponsorship

Calix has a track record of offering H1B sponsorships, with 11 in 2026, 36 in 2025, 22 in 2024, 24 in 2023, 31 in 2022, 19 in 2021, 7 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Staff Site Reliability Operations Engineer

You might like

[Remote] Analyst/Senior Analyst, Business Operations & Strategy (Remote - Eastern/Central Region)

[Remote] Sr Manager, Project Management

[Remote] Associate State Product Manager

[Remote] Medicaid Acquisition Analyst

[Remote] eLearning Designer

[Remote] Senior Project Manager - Promo Med Ed

[Remote] Financial Planning & Analysis Lead (Coding)

[Remote] Business Operations Manager

[Remote] Azure Cloud Engineer

[Remote] Senior Product Manager – Women’s Health (Mammography, Digital Innovation)

[Entry Level/No Experience] Walgreens Virtual assistant jobs - Work From Home

Delta Airlines Virtual Assistant ? [work From Home] ( FULL TIME )

Senior AWS GovCloud DevOps Engineer

Crypto Trader (Full Training Provided)

Product Compliance Manager

AI Visual Designer

Senior Manager, HR Shared Services

Hybrid, Public Relations and Media Content Manager Job at US Company in Chicago

Utilization Nurse Consultant Clinical Team Lead

Experienced Remote Data Entry Specialist – Accurate and Efficient Data Management Professional