This article is produced with scandiweb's eCommerce expertise

Collaborate with our development, PPC, SEO, data & analytics, or customer experience teams to grow your eCommerce business.

AI Agent Bites #3 – Test Task Submission AI Evaluator for HR

The problem we solve

Evaluating task submissions—especially when they involve multiple components and nuanced criteria—can be slow, inconsistent, and manually exhausting. Without a structured system, grading often becomes subjective and difficult to scale.

To bring structure, speed, and consistency to this process, we built a custom grading engine using Hunch Tools as the foundation. Third-party hiring tools didn’t quite offer the flexibility or depth we needed, so we took matters into our own hands.

How it works

Context for LLM

  • Grading materials – Rubric, benchmark examples, and the task document
  • Submission files – Slideshow and video transcript

Processing

The submission enters a multi-step evaluation pipeline in Hunch Tools, powered by a blend of eight AI agents. Here’s how it’s structured:

  • Multi-Agent Evaluator – Built with Gemini Pro 2.5, this system uses 5 agents to independently score each criterion. Their scores are then aggregated to deliver a balanced, transparent evaluation
  • Single-Agent Evaluators – 3 standalone LLMs (Gemini Pro 2.5, GPT 4.1, and GPT-3.5-turbo) assess the full submission holistically. These provide broad, top-level insights to complement the detailed criteria scores

All models run within a single Hunch Tools workflow, making multi-agent orchestration seamless and accessible.

Output

Each submission produces 4 outputs:

  • An aggregated, criteria-by-criteria evaluation from the multi-agent system
  • 3 complete submission assessments from individual LLMs

All results are outputted to the canvas interface thanks to the “output” feature, allowing to set certain nodes to be “outputs”. This plays further into the convenience of Hunch, which allows to convert any workflow (canvas) into a “tool”, which is effectively a shareable interface for running the whole workflow in the most accessible manner.

Example walkthrough

Our Hunch.Tools workflow
Tool interface – shareable via link
Outputs

Impact

The evaluation process now takes minutes —without sacrificing quality or depth. It ensures fairness, reduces workload, and enables a scalable approach to grading high-volume submissions with ease.

Built with

  • Hunch Tools – Workflow engine and logic
  • Gemini Pro 2.5 – Criteria-based agent evaluator + holistic scorer
  • GPT 4.1 – Holistic evaluator
  • GPT-3.5-turbo (“GPT o3”) – Holistic evaluator

Complexity

  • Low

Looking to leverage modern AI tools within your company? Get in touch and explore next steps.

Hire eCommerce experts

Get in touch for a free consultation.

Your request will be processed by

If you enjoyed this post, you may also like