See all roles

[Remote] Senior Systems Engineer, Storage - DGX Cloud

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. NVIDIA is a leading technology company known for its innovative GPU cloud services. The Senior Systems Engineer will design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, ensuring reliability and performance through automation and observability.

Responsibilities

  • Design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run them
  • Build tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operations
  • Develop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionable
  • Apply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructure
  • Work closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinement
  • Scale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocity
  • Support services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviews
  • Practice sustainable incident response and postmortems, and participate in an on-call rotation to support production systems

Skills

  • BS degree (or equivalent experience) in Computer Science or related technical field involving coding
  • 12+ years of practical experience
  • Hands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in production
  • Experience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systems
  • Experience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stack
  • Strong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problems
  • Proficiency in one or more of the following: Python, Go, or Java
  • Good knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and Terraform
  • Customer-first mindset with a focus on customer satisfaction and a passion for ensuring customer success
  • Experience with Git, code review, pipelines, and CI/CD
  • Experience using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker
  • Interest in crafting, analyzing, and fixing large-scale distributed systems, with strong debugging skills and a systematic problem-solving approach
  • Experience designing storage- or data-focused tooling and automating their operations at scale
  • Thrive in collaborative environments and enjoy working with various teams, and are flexible in adapting to different working styles

Benefits

  • You will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).

Company Overview

  • NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.
  • Company H1B Sponsorship

  • NVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    You might like

    [Remote] Principal Product Manager, Healthcare Payer Strategy

    Work from home Full-time role

    [Remote] Staff Data Scientist

    Work from home Full-time role

    [Remote] Management Consulting Senior Associate (49530)

    Work from home Full-time role

    [Remote] SEO Strategist

    Work from home Full-time role

    [Remote] Senior Account Manager

    Work from home Full-time role

    [Remote] Software Engineer III - Content Tooling (AI Focus)

    Work from home Full-time role

    [Remote] Operations Coordinator, Patient Care Services

    Work from home Full-time role

    [Remote] Salesforce Administrator I

    Work from home Full-time role

    [Remote] Senior GTM Operations Engineer

    Work from home Full-time role

    [Remote] Executive Director, PGS Operations

    Work from home Full-time role

    Experienced Part-Time Remote Data Entry Specialist – arenaflex

    Work from home Full-time role

    Forward Deployment Engineer

    Work from home Full-time role

    Remote Data Entry Specialist – Accurate Data Management & Reporting – Work‑From‑Home Opportunity

    Work from home Full-time role

    Experienced Full Stack Customer Service Representative – Work From Home Opportunity with arenaflex

    Work from home Full-time role

    7-12 Special Education Teacher

    Work from home Full-time role

    Insurance Agent Fully Remote

    Work from home Full-time role

    Lead Retail Customer Service Associate – arenaflex

    Work from home Full-time role

    Security Engineer II

    Work from home Full-time role

    Experienced Entry-Level Remote Data Entry Clerk – Travel and Tourism Industry

    Work from home Full-time role

    Transition Manager:in ix.serv/ix.connect

    Work from home Full-time role