2+ years of manual QA experience, ideally in exploratory or context-driven testing environments
Practical understanding of LLMs and AI tools (ChatGPT, Claude, etc.)
Basic to intermediate knowledge of SQL (joins, filters, aggregations, subqueries)
Experience validating data pipelines, audit logs, or relational integrity
Able to detect both UI anomalies and backend data discrepancies
Clear written and verbal communication for reporting behavior-based bugs
Familiarity with testing non-deterministic or AI-powered systems
English level min.B2
Optional
Understanding of prompt engineering and how LLM behavior can shift with input changes
Familiarity with AI agent architectures (e.g., LangChain, ReAct, RAG systems)
Experience working with BI tools (e.g., Metabase, Redash) or data validation frameworks
Background in content moderation, safety testing, or AI/UX evaluation
Your responsibilities
Manually test AI-driven workflows that generate content, complete tasks, or make decisions
Assess AI behavior by checking for: Consistency and repeatability; Hallucinations, inaccuracy, or bias; Relevance and task alignment
Evaluate data integrity: Trace AI-generated data from interface to backend; Use SQL to validate how outputs are stored, structured, or logged; Compare AI intent/output to resulting records in the database
Reproduce and report subtle, fuzzy, or probabilistic issues with structured documentation
Collaborate with engineers, AI designers, and product owners to define quality criteria across system layers