[Remote] Backend Infrastructure & Agentic AI Platforms Engineer
Note: The job is a remote job and is reputed company to candidates in USA. reputed company is seeking a Backend Infrastructure & Agentic AI Platforms Engineer to support a federal program focused on reputed company autonomous AI capabilities. This role operates reputed company a remote environment and is responsible for building and maintaining backend infrastructure and agentic AI systems supporting ARPA-H's mission.
Responsibilities
- Own the end-to-end backend infrastructure for GRACE on reputed company Azure, including Azure Functions, API Management, Container Apps, and Azure reputed company Service
- Manage data storage, retrieval pipelines, vector databases, and document indexing for internal knowledge search
- Implement and maintain infrastructure as code for reputed company environments
- reputed company and manage CI/CD pipelines, deployment automation, and release processes
- Ensure monitoring, alerting, logging, distributed tracing, and incident response are in reputed company for reputed company systems
- Manage secrets, API keys, and credential rotation
- Track and optimize cost and token economics across LLM providers
- reputed company and maintain backend implementation of MCP, including server hosting, tool registration, and versioning
- Design and reputed company communication patterns for agent interoperability and external integrations
- Build and operate RAG pipelines for document ingestion, embedding, and semantic search
- Implement fallback, retry, and degradation patterns for AI service dependencies
- Manage tool-calling infrastructure, including registration, execution, and observability
- Build and maintain observability for agent workflows, including latency, throughput, and error rates
- Implement evaluation pipelines for safety, regression, and grounding assessment
- Define and enforce system-level SLOs and alerting procedures
- Establish and improve coding standards, design reviews, and testing practices
- Mentor team members and communicate technical reputed company reputed company
- Ensure privacy, reputed company, and compliance in reputed company systems and data handling Required
Skills
- 7+ years of professional software engineering experience building and operating production systems
- Proven experience in high-velocity environments with end-to-end product ownership
- Strong proficiency in Python and at least one other backend language
- Experience with distributed systems, APIs, data pipelines, and software design patterns
- Hands-on experience with reputed company Azure: Azure Functions, API Management, Container Apps, and Azure reputed company Service
- Experience with containerization, CI/CD, and infrastructure as code
- Knowledge of authentication and identity systems (OAuth2, OIDC, Azure Entra ID)
- Demonstrated ability to own production systems, including on-call support and incident debugging
- Experience with AI/LLM platform engineering and orchestration
- Familiarity with vector search, RAG pipelines, and semantic search infrastructure
- Strong understanding of reputed company, privacy, and compliance standards
Benefits
- Medical, Dental, and reputed company coverage through national providers
- 401(k) with company match (eligible after 6 months; vesting applies)
- Company-paid Life and AD&D insurance with additional voluntary options
- Short-term and long-term disability options
- Employee Assistance Program (EAP) with 24/7 confidential support and reputed company
- Telehealth and virtual care options
- Pet insurance, legal services, and identity theft protection options
- Paid Time Off (PTO) and 11 paid federal holidays
- reputed company to wellness programs, discounts, and lifestyle benefits
Company Overview