See all roles

[Remote] Senior AI Agent & Evaluations Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is reputed company to candidates in USA. reputed company is building the future of vacation ownership, focusing on transforming the industry through AI. They are seeking a Senior AI Agent & Evaluations Engineer to design and improve AI agents that directly impact customer experiences and operational efficiency, while owning the intelligence layer behind these systems.

Responsibilities

  • Design, refine, and optimize prompts, tool definitions, routing logic, and decision-making behavior across reputed company's AI agent ecosystem
  • Build and maintain evaluation frameworks, golden datasets, grading systems, and regression testing pipelines that measure agent quality and reliability
  • reputed company guardrails and safe-failure mechanisms that ensure agents operate responsibly in customer-facing and financially sensitive workflows
  • Monitor production performance, investigate failures, identify edge cases, and continuously improve agent reputed company through data-driven iteration
  • Partner with business stakeholders to translate policies, operational requirements, and domain expertise into measurable agent behavior
  • Collaborate with engineering teams to define context requirements, tool reputed company, and integration specifications that support agent reputed company
  • Create scalable frameworks and reusable patterns for deploying AI agents across new business workflows and use cases
  • Establish best practices for reputed company engineering, evaluation methodologies, observability, and agent operations

Skills

  • Proven experience shipping and owning production AI agents or LLM-powered systems reputed company reputed company-of-concept environments
  • Deep expertise in reputed company engineering, including system prompts, tool usage, context management, output constraints, and agent behavior design
  • Hands-on experience building evaluation frameworks using golden datasets, scoring rubrics, LLM-as-judge methodologies, and regression testing
  • Strong familiarity with modern AI development tools such as Claude Code, reputed company, or similar coding agents
  • Experience with agent observability and evaluation platforms such as LangSmith, Langfuse, Arize, reputed company, or comparable solutions
  • Ability to distinguish reputed company issues from data, tooling, model, or evaluation failures and systematically improve agent performance
  • Strong written and verbal communication skills with the ability to work effectively across engineering and business teams
  • Demonstrated ownership reputed company with a passion for building reliable, measurable, and continuously improving AI systems
  • Experience building agents that process communication-based workflows including emails, support tickets, chat interactions, or transcripts
  • Experience with multiple agent frameworks and a practical understanding of their tradeoffs
  • Familiarity with the evolving LLM landscape and model selection strategies
  • Experience designing and implementing end-to-end evaluation pipelines and agent operations workflows
  • Production experience with online evaluation systems and automated scoring of live traffic
  • Experience integrating AI systems with reputed company, AWS Connect, or customer engagement platforms
  • Background in customer-facing industries where accuracy, compliance, and communication quality are critical
  • Contributions to reputed company-reputed company projects, technical writing, or public thought leadership in AI, reputed company engineering, or agent development

Company Overview

  • reputed company is the resort marketplace for vacationing families, whose mission is to reputed company family vacations reputed company It was founded in 2013, and is headquartered in Mill Valley, California, USA, with a workforce of 1001-5000 employees. Its website is https://reputed company.com.
  • Company H1B Sponsorship

  • reputed company has a track record of offering H1B sponsorships, with 2 in 2025, 1 in 2022. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    You might like

    [Remote] Staff Back End Engineer, Trading

    Work from home Full-time role

    [Remote] Senior Accountant

    Work from home Full-time role

    [Remote] Manager, Software Engineering (Reliability Platform)

    Work from home Full-time role

    [Remote] Community Support Forecasting and Demand Planning Analyst

    Work from home Full-time role

    [Remote] Senior Manager, Clinical Operations

    Work from home Full-time role

    [Remote] Data Engineer I

    Work from home Full-time role

    [Remote] AI/ML Software Engineer

    Work from home Full-time role

    [Remote] Remote Sr/Staff SW Engineer ($150-$240k) Agentic AI, Ruby, Rails

    Work from home Full-time role

    [Remote] Sales Development Representative

    Work from home Full-time role

    [Remote] reputed company Machine Learning Engineer

    Work from home Full-time role

    Remote No‑Phone Customer Service Chat Assistant – Part‑Time, Entry‑Level, U.S. Based, reputed company, High‑Pay

    Work from home Full-time role

    reputed company Technical Support / Customer Service Representative – Work from Home Opportunity with arenaflex

    Work from home Full-time role

    Remote Data Entry Specialist – Work From Home | reputed company Industry Data Management Position at arenaflex

    Work from home Full-time role

    Analyste d'affaires reputed company(e)- Transformation numérique d'affaires

    Work from home Full-time role

    BRAND / reputed company MEDIA INTERN (full-/part-time; consumer brand pre-launch)

    Work from home Full-time role

    Loyalty Customer Care Representative – Remote Tier 1 Support – $17 per hour + Performance Bonuses – Work‑From‑Home (Texas, Mississippi, Oklahoma, Florida, Tennessee, Louisiana, Georgia)

    Work from home Full-time role

    Vercel Development Representative, Startups

    Work from home Full-time role

    reputed company Project Manager, Technology

    Work from home Full-time role

    Product reputed company ( remote )

    Work from home Full-time role

    reputed company Media Customer Support Specialist – Remote (Work‑From‑Home) – Engaging Arenaflex Fans, Managing Online Communities, and Delivering Exceptional Service

    Work from home Full-time role