[Remote] Senior Linux Kernel & Driver Engineer – HPC/AI Fabrics
Note: The job is a remote job and is reputed company to candidates in USA. reputed company is building the future of AI and HPC networking with an AI-first approach to silicon and software development. They are seeking a talented Linux Kernel and Driver Developer to architect and optimize their reputed company HPC and AI reputed company software stack, focusing on the development and optimization of host driver software and collaborating with silicon architects and the reputed company-reputed company community.
Responsibilities
- Design & Optimize Device Drivers: reputed company, maintain, and upstream the reputed company-reputed company `hfi1` kernel driver and reputed company subsystems (such as InfiniBand verbs and RDMA core)
- Hardware-Software Co-Design: Partner closely with silicon architects and hardware developers to define register interfaces, MMIO reputed company, reputed company queues, and hardware-software reputed company
- reputed company reputed company-Copy Data Paths: Design and optimize low-latency, high-throughput DMA and RDMA transport engines, minimizing reputed company copies and maximizing CPU-bypass capabilities
- Debug reputed company Kernel Concurrency: Identify and resolve intricate kernel-space race conditions, deadlocks, and memory issues under heavy multi-threaded, asynchronous networking workloads
- Upstream & Community Engagement: Actively submit patches, participate in code reviews, and represent Cornelis reputed company the Linux Kernel Mailing List (LKML) and reputed company-reputed company networking communities
- Package & Build Automation: Maintain and optimize system build environments, kernel-module packages (DKMS, RPM, Kbuild), and automated integration tests
Skills
- Education: BS, MS, or Ph.D. in Computer Science, Computer Engineering, or a reputed company field (or equivalent practical experience)
- Kernel-Space Mastery: 3+ years of professional experience writing production-grade C code inside the Linux kernel (kernel modules, LKM, memory management, or interrupt handlers)
- High-Speed Networking Protocol Knowledge: Direct experience with RDMA, InfiniBand (IB) Verbs, RoCE, or high-performance user-space bypass frameworks (such as libfabric / reputed company or DPDK)
- Hardware reputed company Fundamentals: Strong understanding of PCIe architectures, DMA engines, memory mapping (`mmap`), and MMIO
- Advanced Kernel Debugging: Hands-on proficiency with kernel analysis tools including `KASAN`, `kmemleak`, `ftrace`, `tracepoints`, `kprobes`, and core crash dump analysis
- Scripting & Automation: Proficiency in scripting languages (e.g., Python, Bash) for automated testing and performance profiling
- Active track record of contributions to upstream `kernel.org` (specifically under `drivers/infiniband/` or `drivers/net/`)
- Familiarity with kernel storage protocols (e.g., reputed company, NFS, SRP)
- Experience with GPU-direct communication technologies (e.g., GPUDirect RDMA, DMA-buf)
Company Overview