IS&Digital

Offer published on 2026-01-12

Functional AI Tester - GenAI

  • Location
    : Pune, India
  • Contract Type
    : Regular

Open positions

Functional AI Tester - GenAI

- - - - - - - - - - - -

About the Role

You will be involved in QA for GenAI features including Retrieval-Augmented Generation (RAG), conversational AI and Agentic evaluations. The role centers on:

  • Systematic GenAI evaluation (qualitative and quantitative metrics)

  • ETL and data quality testing for the data flows that feed AI systems

  • Python-driven automated testing

This position is hands-on and collaborative, partnering with AI engineers, data engineers, and product teams to define measurable acceptance criteria and ship high-quality AI features.

Key Responsibilities

  • Test strategy and planning

    • Define risk-based test strategies and detailed test plans for GenAI features.

    • Establish clear acceptance criteria with stakeholders for functional, safety, and data quality aspects.

  • Python test automation

    • Build and maintain automated test suites using Python (e.g., PyTest, requests).

    • Implement reusable utilities for prompt/response validation, dataset management, and result scoring.

    • Create regression baselines and golden test sets to detect quality drift.

  • GenAI evaluation

    • Develop evaluation harnesses covering factuality, coherence, helpfulness, safety, bias, and toxicity etc.

    • Design prompt suites, scenario-based tests, and golden datasets for reproducible measurements.

    • Implement guardrail tests including prompt-injection resilience, unsafe content detection, and PII redaction checks.

    • Track quality metrics over time.

  • RAG and semantic retrieval testing

    • Verify alignment between retrieved sources and generated answers.

    • Verify adversarial tests.

    • Measure retrieval relevance, precision/recall, grounding quality, and hallucination reduction.

  • API and application testing

    • Test REST endpoints supporting GenAI features (request/response contracts, error handling, timeouts).

  • ETL and data quality validation

    • Test ingestion and transformation logic; validate schema, constraints, and field-level rules.

    • Implement data profiling, reconciliation between sources and targets, and lineage checks.

    • Verify data privacy controls, masking, and retention policies across pipelines.

  • Non-functional testing

    • Performance and load testing focused on latency, throughput, concurrency, and rate limits for LLM calls.

    • Cost-aware testing (token usage, caching effectiveness) and timeout/retry behavior validation.

    • Reliability and resilience checks including error recovery and fallback behavior.

  • Share results and insights; recommend remediation and preventive actions.

 

Required Qualifications

  • Experience

    • 5+ years in software QA, including test strategy, automation, and defect management.

    • 2+ years testing AI/ML or GenAI features, with hands-on evaluation design.

    • 4+ years testing ETL/data pipelines and data quality.

  • Technical skills

    • Python: Strong proficiency building automated tests and tooling (PyTest, requests, pydantic or similar).

    • API testing: REST contract testing, schema validation, negative testing.

    • GenAI evaluation: crafting prompt suites, golden datasets, rubric-based scoring, and automated evaluation pipelines.

    • RAG testing: retrieval relevance, grounding validation, chunking/indexing verification, and embedding checks.

    • ETL/data quality: schema and constraint validation, reconciliation, lineage awareness, data profiling.

  • Quality and governance

    • Understanding of LLM limitations and methods to detect/reduce hallucinations.

    • Safety and compliance testing including PII handling and prompt-injection resilience.

    • Strong analytical and debugging skills across services and data flows.

  • Soft skills

    • Excellent written and verbal communication; ability to translate quality goals into measurable criteria.

    • Collaboration with AI engineers, data engineers, and product stakeholders.

    • Organized, detail-oriented, and outcomes-focused.

 

Nice to Have

  • Experience with evaluation frameworks or tooling for LLMs and RAG quality measurement.

  • Experience creating synthetic datasets to stress specific behaviors.

Apply