See all roles

Machine Learning Engineer - LLM Evaluation & Automation

Work from home Full-time role Hiring

We are seeking a highly skilled Machine Learning Engineer who specializes in leveraging Large Language Models (LLMs) for automated evaluation and quality assessment. In this role, you will design and build systems that automatically measure and improve the accuracy, relevance, and consistency of model outputs. You will lead initiatives to create evaluation pipelines, develop metrics, and deliver actionable insights for continuous improvements. This position requires strong technical expertise, analytical problem-solving abilities, and the capacity to manage projects across multiple cross-functional teams. Essential functions Responsibilities:

  • Design and implement automated systems and pipelines for evaluating LLM outputs.
  • Develop metrics and KPIs to measure output quality, accuracy, and consistency using LLM-based evaluations
  • Collaborate with Engineering teams to create automated logic checks and validation tools.
  • Partner with Data Scientists to analyze evaluation results and optimize prompt and task structures.
  • Provide feedback loops to ensure evaluation guidelines align with LLM-based assessments.
  • Investigate how LLM-derived evaluations can enhance product reliability and user experience.
  • Recommend refinements to prompt engineering, evaluation strategies, and automation tools.
  • Stay informed on emerging trends in LLM evaluation, automated quality assessment, and AI toolchains.
  • Continuously improve and expand automated evaluation processes based on industry best practices.

Qualifications

  • 5+ years of experience in ML engineering, NLP, or AI/ML automation.
  • Advanced degree (MS/PhD) in Statistics, Data Science, Computational Social Science, Quantitative Psychology, or a related field.
  • Hands-on experience in prompt engineering and designing LLM-based evaluation systems is preferred
  • Strong understanding of machine learning principles with focus on NLP and advanced LLM capabilities (e.g., Chain-of-Thought, agentic workflows)
  • Expertise in building automated evaluation or QA pipelines.
  • Excellent analytical and problem-solving skills with experience in root cause and error pattern analysis.
  • Proven project management and cross-functional collaboration experience.
  • Excellent communication skills to convey complex insights to technical and non-technical audiences.
  • Detail-oriented mindset with a focus on evaluation metrics, prompt design, and automation.
  • Ability to quickly adapt to new business rules and evaluation guidelines across diverse product domains.
  • Strong programming skills in Python and SQL.
  • Experience with big data technologies like PySpark for data aggregation and sampling is a strong plus

We offer

  • Opportunity to work on cutting-edge projects
  • Work with a highly motivated and dedicated team
  • Competitive salary
  • Flexible schedule
  • Benefits package - medical insurance, vision, dental, etc.
  • Corporate social events
  • Professional development opportunities
  • Well-equipped office

About us Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India. Apply tot his job Apply To this Job

You might like

Edge AI Engineer

Work from home Full-time role

Lead Machine Learning Engineer - Remote (US) or CA - Only W2

Work from home Full-time role

ML/AI Engineer - Junior Level

Work from home Full-time role

FPGA AI/ML Engineer – Part Time

Work from home Full-time role

Temporary Micro-Credential Grader – Industry-Focused Prompt Engineering for ROI-Driven Results

Work from home Full-time role

English Prompt Engineer: LLM Migration & Optimization

Work from home Full-time role

Machine Learning Engineer (PhD or MS Required) 756

Work from home Full-time role

Business Analyst; AI & Prompt Engineering

Work from home Full-time role

Prompt Engineer + SEO Content Systems (Fix AI Template for Scalable Pages)

Work from home Full-time role

Azure + AWS Cloud Engineer (GPT Model Integration, Prompt Engineering)

Work from home Full-time role

Experienced Senior Customer Success Manager – Healthcare Fintech and Software Solutions

Work from home Full-time role

Coding Specialist (Fluent in Japanese) - Freelance AI Trainer Project

Work from home Full-time role

Experienced Data Entry Specialist (Work From Home) – Aviation Industry

Work from home Full-time role

Kubernetes Engineer - Remote

Work from home Full-time role

Outbound Call Agent – French/Dutch Native Speaker (Remote in Spain)

Work from home Full-time role

Experienced Full Stack Customer Service Representative – Telecommunications and Remote Support

Work from home Full-time role

Experienced Full Stack Customer Service Representative – Remote Customer Experience Expert

Work from home Full-time role

Exhibitions and Engagement Librarian, Silicon Valley

Work from home Full-time role

Director of Client Success, Payer Risk Adjustment

Work from home Full-time role

Radiology Scheduler - Work from Home | $16.00/hr | Starts 5/21/26

Work from home Full-time role