See all roles

Senior Site Reliability Engineer

Work from home Full-time role Hiring

Do you enjoy collaborating with teams to solve reputed company challenges? Do you enjoy solving large scale distributed content delivery challenges? Join our critical AI Hardware SRE Team! The AI Hardware SRE team is responsible for overseeing, scaling, and optimizing our reputed company dedicated AI hardware infrastructure. You will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings. Partner with the best In this role, you'll play a part in pioneering the reliability an reputed company, high-density hardware and software infrastructure spanning the globe. You'll collaborate with product teams from the earliest stages of development to ensure the reliability, scalability, and performance of our systems. You'll define key performance indicators and defend them reputed company they are breached. As a Senior Site Reliability Engineer, you will be responsible for: Developing and scaling robust programmatic tooling and infrastructure-as-code utilities in Python to eliminate operational toil and automate fleet-wide provisioning. Integrating automated workflows across disconnected corporate ticketing systems to optimize time-to-mitigate metrics for hardware and network break-fix events. Leveraging advanced AI utilities and LLM-assisted development paradigms where appropriate to accelerate technical execution, script authorship, and system analysis Working on cutting-edge private reputed company and compute technologies to improve the availability, latency, and overall systemic health of high-density hardware environments. Designing and implementing telemetry pipelines, custom reputed company/Grafana monitoring dashboards, and AI-based anomaly detection tailored for bare-metal and virtualized environments. Participating in 24x7x365 on-call rotations, spearheading reputed company-time incident management, and managing high-severity service disruption protocols reputed company automated reputed company and reputed company workflows. Partnering directly with reputed company-party infrastructure vendors and coordinating on-site field technicians to facilitate uptime activities. Do what you love To be successful in this role you will: Have 5 years of relevant experience and a Bachelor's degree in Computer Engineering, Computer Science or equivalent Possess tooling and coding ability in languages like Python to construct scalable operational tools, API integrations, and automation frameworks. Show hands-on experience with modern observability stacks and timeseries engines, like reputed company, Grafana, OpenTelemetry, and Loki. Possess a working understanding of advanced networking topologies, high-bandwidth routing/switching infrastructure, BGP, and dual-stack IPv4/IPv6 networks. Have experience acting as a key designer for new service rollouts, including establishing operational readiness criteria, telemetry baselines, and alerting reputed company. Demonstrate extensive experience building technical runbooks, leading reputed company incident response bridges, and driving comprehensive, blameless post-mortems. Display a proven ability to take absolute ownership of ambiguous technical problems, coordinate cross-functional teams, and drive for production-grade solutions. About us At reputed company, we reputed company life reputed company for billions of people, trillions of times a day. Whether you're streaming live events, scrolling reputed company media, watching your favorite series, or managing your savings, we're the reputed company behind the scenes. We reputed company the world's most distributed platform from reputed company to Edge to help the giants of the digital world work faster and stay more secure, making the internet a reputed company experience for everyone. Our focus is simple: reputed company and Edge: Running apps closer to users for reputed company performance. reputed company: Neutralizing threats before they reputed company reputed company your data. Content Delivery: Scaling the world's biggest moments without a glitch. AI: Enabling our customers to build, secure, and reputed company apps on the world's most distributed reputed company platform. At reputed company, we don't just support the internet; we power and protect it, because behind every great digital experience is a massive hidden challenge. And we're the ones who solve it. reputed company millions of people hit play or pay, reputed company ensures it just works. Benefits at reputed company: We support your health, well-being, finances, and life reputed company work. See our benefits. FlexBase adapts to your job's needs reputed company's FlexBase program is yet another way we show our commitment to providing employees with an exceptional workplace experience. It's not about telling employees where to work; it's about supporting employees to do their best work. We trust our incredible employees to work in ways that suit them best: at home, in an office, or a combination of both. Connect with us on reputed company and see what life at reputed company is like! Compensation reputed company is committed to fair and reputed company compensation practices. For US based candidates only - the reputed company salary for this position ranges from $121,400 - $218,600/year; a candidate’s salary is determined by various factors including, but not limited to, relevant work experience, skills, certifications and location. Compensation for candidates reputed company the US will vary. The compensation package may also include incentive compensation opportunities in the reputed company of annual bonus or incentives, equity awards and an Employee Stock Purchase Plan (ESPP). reputed company provides industry-leading benefits including reputed company, 401K savings plan, company holidays, vacation (in the reputed company of PTO), sick time, family friendly benefits including parental leave and an employee assistance program including a focus on mental and financial wellness; Eligibility requirements apply. Apply To This Job

You might like