LLM - AI Quality Analyst (Personalization) - Spanish
What You’ll Be Doing We’re looking for a sharp, detail-oriented AI Quality Analyst to help evaluate a new personalization feature for Gemini. In this role, you’ll assess how well the model draws on your past Gemini conversations, Gmail, Google Search, and YouTube activity to make its responses more relevant and genuinely useful. This isn’t a passive review job; you’ll be both designing prompts and judging the quality of AI responses. You’ll need to think creatively about how to stress-test the model and analytically about whether it’s actually doing what it’s supposed to do. A typical day involves: Designing multi-turn conversational prompts (usually 1-5 turns) that push the AI to use your real personal information and experiences. Evaluating whether personalization was appropriately applied, based on what you were actually asking for. Checking for Grounding issues - making sure any claims the model makes about you are backed by evidence, not guesses or hallucinations. Assessing Integration quality - does the model weave in personal data naturally, or does it feel robotic and forced? Stack-ranking two model responses side-by-side (SxS) based on helpfulness, usability, and overall quality. Writing clear, well-reasoned rationales that reference specific turns in the conversation. Verifying “Debug Info” to confirm that chat summaries and data sources were actually used. Keeping your evaluation data clean by deleting test conversations after each session. What We’re Looking For You’ll thrive here if you’re someone who notices the subtle stuff. The difference between a response that’s technically correct and one that actually understands the person asking. Here’s what we need: Strong English reading and writing skills, this project is conducted in English. Experience in data annotation, AI quality evaluation, content moderation, or a related field. A BS/BA degree or equivalent experience in Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a similar analytical field. Ability to work independently and manage your own time in a remote environment. Reliable desktop or laptop with a stable internet connection. What Will Set You Apart You can evaluate nuanced, ambiguous AI responses and articulate what’s working and what isn’t. You have a good instinct for personalization. You can spot when the AI is making bad inferences or forcing connections that don’t make sense. You’re methodical about reviewing side-by-side responses and picking up on subtle differences in tone, naturalness, or overexplaining. Your written feedback is clear and specific. You reference actual moments in the conversation rather than speaking in generalities. You’ve designed prompts before or have experience testing AI systems in some capacity. Commitment & Availability This is a contractor engagement starting immediately. We’re building a 24-hour global team, so full-time availability in your local time zone is required. We offer two tracks: 30 hours/week - minimum 4 hours per day, with at least 4 hours of overlap with PST. 40 hours/week - same overlap requirement.
How to Apply
Our vetting process has three steps, all of which need to be completed for consideration: Screener Three assessments Language vetting Shortlisted candidates will receive a Job Interest Form. After profile review, you’ll be sent an assessment to complete within 24 hours. From there, we’ll reach out to finalists to discuss next steps and pre-onboarding requirements. Apply To This Job