See all roles

AI Evaluator for Software Engineering - US and Canada only

Work from home Full-time role Hiring

Senior AI Interaction Evaluator (Codex / Claude Code) Contract | $100–$200/hour | 10–20 hrs/week | Start ASAP (through early May) Check out this Loom video for more details! We’re looking for highly experienced software engineer (SR+) to help evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code. This is not a traditional engineering role. You won’t be writing production code. You’ll be evaluating something harder: whether the model thinks like a great engineer. What This Role Actually Is You will assess how AI coding agents behave in real-world scenarios — focusing on:

  • Whether the response makes sense
  • Whether the preamble and reasoning are useful
  • Whether the output reflects strong engineering judgment
  • Whether the interaction feels right to an experienced developer

This role is about engineering taste — not syntax correctness. What You’ll Be Doing

  • Evaluate AI-generated coding interactions end-to-end
  • Judge whether outputs are:
  • Useful
  • Correct (at a high level)
  • Aligned with how a strong engineer would think
  • Assess the quality of explanations and reasoning, not just code
  • Distinguish between different levels of response quality (e.g. what makes something a 2 vs 4)
  • Provide clear, opinionated feedback on:
  • What worked
  • What didn’t
  • What felt “off” or misleading
  • Help define what great looks like when interacting with tools like Cursor

What We Mean by “Taste” We’re specifically looking for engineers who can answer questions like:

  • Does this feel like something a strong engineer would actually say?
  • Is this explanation helpful, or just technically correct?
  • Is the model guiding the user well, or just dumping output?
  • Would this interaction build or erode trust?

You should be comfortable making subjective but rigorous judgments. Who You Are

  • Staff / Principal-level engineer (or equivalent experience)
  • Strong background in one of the below:
  • TypeScript / JavaScript
  • Python
  • Hands-on experience using:
  • OpenAI Codex
  • Claude Code
  • Cursor
  • Deep familiarity with modern AI-assisted dev workflows
  • Able to evaluate code without needing to fully execute or deeply review every line
  • Comfortable giving direct, opinionated feedback
  • High bar for what “good engineering” looks like

Nice to Have

  • Experience with tools like Cursor or similar AI-first IDEs
  • Prior exposure to prompt design or evaluation workflows
  • Experience mentoring senior engineers or defining engineering standards

Engagement Details

  • Rate: $100–$200/hour
  • Hours: ~10–20 hours/week
  • Duration: Through early May (with possible extension)
  • Start: ASAP
  • Process:
  • Take-home evaluation exercise
  • One behavioral interview

Apply tot his job Apply To this Job

You might like

Social Media Evaluator (Ukrainian-United States)

Work from home Full-time role

Simplified Chinese Content Evaluator Remote

Work from home Full-time role

Qualified Medical Evaluator (QME) - Orthopedic Physician - Part Time

Work from home Full-time role

Casino User Experience Evaluator (Hiring Immediately)

Work from home Full-time role

International Trade Research Analyst/Evaluator I (AD/CVD)

Work from home Full-time role

Part time Search Analyst United States (Spanish language)

Work from home Full-time role

RBT Performance Evaluator (ABA Therapy - US Healthcare) - EST Hours | Remote

Work from home Full-time role

Remote Psychiatrist or Psychologist — Veterans Disability Independent Medical Evaluator (1099)

Work from home Full-time role

Remote | Bahasa Indonesian Audio Generalist Evaluator Expert— $50/hour

Work from home Full-time role

Become a Luxury Brand Evaluator - Boca Raton, FL (Mission-based)

Work from home Full-time role

Experienced Customer Service Representative - Full Time (Remote within the State of Maine)

Work from home Full-time role

Experienced Patient Records Data Entry Specialist – Remote Opportunity with arenaflex

Work from home Full-time role

Perú - Geólogo/a Senior - Exploraciones

Work from home Full-time role

Senior Data Analytics Engineer – Hospitality Tech | Mexico

Work from home Full-time role

Experienced Customer Service Representative – Travel Industry Expert ($25/hour) in arenaflex

Work from home Full-time role

Sr BackEnd Developer, JAVA

Work from home Full-time role

Sr Consultant, Business Insights (Property & Casualty Insurance - Commercial, Excess & Surplus and Specialty Property)

Work from home Full-time role

Experienced Entry Level Customer Service Sales Representative – Remote Opportunity

Work from home Full-time role

Utilization Management Technician (Temp-to-Hire)

Work from home Full-time role

Partnership Specialist (Bangladesh and Thailand)

Work from home Full-time role