Functional AI Tester - GenAI

Location

: Pune, India
Contract Type

: Regular

Open positions

- - - - - - - - - - - -

About the Role

You will be involved in QA for GenAI features including Retrieval-Augmented Generation (RAG), conversational AI and Agentic evaluations. The role centers on:

Systematic GenAI evaluation (qualitative and quantitative metrics)
ETL and data quality testing for the data flows that feed AI systems
Python-driven automated testing

This position is hands-on and collaborative, partnering with AI engineers, data engineers, and product teams to define measurable acceptance criteria and ship high-quality AI features.

Key Responsibilities

Test strategy and planning
- Define risk-based test strategies and detailed test plans for GenAI features.
- Establish clear acceptance criteria with stakeholders for functional, safety, and data quality aspects.
Python test automation
- Build and maintain automated test suites using Python (e.g., PyTest, requests).
- Implement reusable utilities for prompt/response validation, dataset management, and result scoring.
- Create regression baselines and golden test sets to detect quality drift.
GenAI evaluation
- Develop evaluation harnesses covering factuality, coherence, helpfulness, safety, bias, and toxicity etc.
- Design prompt suites, scenario-based tests, and golden datasets for reproducible measurements.
- Implement guardrail tests including prompt-injection resilience, unsafe content detection, and PII redaction checks.
- Track quality metrics over time.
RAG and semantic retrieval testing
- Verify alignment between retrieved sources and generated answers.
- Verify adversarial tests.
- Measure retrieval relevance, precision/recall, grounding quality, and hallucination reduction.
API and application testing
- Test REST endpoints supporting GenAI features (request/response contracts, error handling, timeouts).
ETL and data quality validation
- Test ingestion and transformation logic; validate schema, constraints, and field-level rules.
- Implement data profiling, reconciliation between sources and targets, and lineage checks.
- Verify data privacy controls, masking, and retention policies across pipelines.
Non-functional testing
- Performance and load testing focused on latency, throughput, concurrency, and rate limits for LLM calls.
- Cost-aware testing (token usage, caching effectiveness) and timeout/retry behavior validation.
- Reliability and resilience checks including error recovery and fallback behavior.
Share results and insights; recommend remediation and preventive actions.

Required Qualifications

Experience
- 5+ years in software QA, including test strategy, automation, and defect management.
- 2+ years testing AI/ML or GenAI features, with hands-on evaluation design.
- 4+ years testing ETL/data pipelines and data quality.
Technical skills
- Python: Strong proficiency building automated tests and tooling (PyTest, requests, pydantic or similar).
- API testing: REST contract testing, schema validation, negative testing.
- GenAI evaluation: crafting prompt suites, golden datasets, rubric-based scoring, and automated evaluation pipelines.
- RAG testing: retrieval relevance, grounding validation, chunking/indexing verification, and embedding checks.
- ETL/data quality: schema and constraint validation, reconciliation, lineage awareness, data profiling.
Quality and governance
- Understanding of LLM limitations and methods to detect/reduce hallucinations.
- Safety and compliance testing including PII handling and prompt-injection resilience.
- Strong analytical and debugging skills across services and data flows.
Soft skills
- Excellent written and verbal communication; ability to translate quality goals into measurable criteria.
- Collaboration with AI engineers, data engineers, and product stakeholders.
- Organized, detail-oriented, and outcomes-focused.

Nice to Have

Experience with evaluation frameworks or tooling for LLMs and RAG quality measurement.
Experience creating synthetic datasets to stress specific behaviors.

Apply