Offer published on 2026-01-12
Functional AI Tester - GenAI
-
- Location
- : Pune, India
-
- Contract Type
- : Regular
Open positions
- - - - - - - - - - - -
About the Role
You will be involved in QA for GenAI features including Retrieval-Augmented Generation (RAG), conversational AI and Agentic evaluations. The role centers on:
Systematic GenAI evaluation (qualitative and quantitative metrics)
ETL and data quality testing for the data flows that feed AI systems
Python-driven automated testing
This position is hands-on and collaborative, partnering with AI engineers, data engineers, and product teams to define measurable acceptance criteria and ship high-quality AI features.
Key Responsibilities
Test strategy and planning
Define risk-based test strategies and detailed test plans for GenAI features.
Establish clear acceptance criteria with stakeholders for functional, safety, and data quality aspects.
Python test automation
Build and maintain automated test suites using Python (e.g., PyTest, requests).
Implement reusable utilities for prompt/response validation, dataset management, and result scoring.
Create regression baselines and golden test sets to detect quality drift.
GenAI evaluation
Develop evaluation harnesses covering factuality, coherence, helpfulness, safety, bias, and toxicity etc.
Design prompt suites, scenario-based tests, and golden datasets for reproducible measurements.
Implement guardrail tests including prompt-injection resilience, unsafe content detection, and PII redaction checks.
Track quality metrics over time.
RAG and semantic retrieval testing
Verify alignment between retrieved sources and generated answers.
Verify adversarial tests.
Measure retrieval relevance, precision/recall, grounding quality, and hallucination reduction.
API and application testing
Test REST endpoints supporting GenAI features (request/response contracts, error handling, timeouts).
ETL and data quality validation
Test ingestion and transformation logic; validate schema, constraints, and field-level rules.
Implement data profiling, reconciliation between sources and targets, and lineage checks.
Verify data privacy controls, masking, and retention policies across pipelines.
Non-functional testing
Performance and load testing focused on latency, throughput, concurrency, and rate limits for LLM calls.
Cost-aware testing (token usage, caching effectiveness) and timeout/retry behavior validation.
Reliability and resilience checks including error recovery and fallback behavior.
Share results and insights; recommend remediation and preventive actions.
Required Qualifications
Experience
5+ years in software QA, including test strategy, automation, and defect management.
2+ years testing AI/ML or GenAI features, with hands-on evaluation design.
4+ years testing ETL/data pipelines and data quality.
Technical skills
Python: Strong proficiency building automated tests and tooling (PyTest, requests, pydantic or similar).
API testing: REST contract testing, schema validation, negative testing.
GenAI evaluation: crafting prompt suites, golden datasets, rubric-based scoring, and automated evaluation pipelines.
RAG testing: retrieval relevance, grounding validation, chunking/indexing verification, and embedding checks.
ETL/data quality: schema and constraint validation, reconciliation, lineage awareness, data profiling.
Quality and governance
Understanding of LLM limitations and methods to detect/reduce hallucinations.
Safety and compliance testing including PII handling and prompt-injection resilience.
Strong analytical and debugging skills across services and data flows.
Soft skills
Excellent written and verbal communication; ability to translate quality goals into measurable criteria.
Collaboration with AI engineers, data engineers, and product stakeholders.
Organized, detail-oriented, and outcomes-focused.
Nice to Have
Experience with evaluation frameworks or tooling for LLMs and RAG quality measurement.
Experience creating synthetic datasets to stress specific behaviors.