title: "Auto-grading assistant for an RTO with 12,000 learners" dek: "A 2025 implementation that cut written-assessment grading from 18 minutes to 5 minutes per submission. Trainer satisfaction up; ASQA audit clean." sector: "Education & Training" client: "Registered Training Organisation · 12,000 learners · ASQA-regulated" engagement: "Audit → Pilot" duration: "14 weeks" year: "2025" outcome: "Grading time: 18 min → 5 min per submission · 0 ASQA audit findings on AI-assisted grading" solution: "Rubric-aligned GPT-4 grader against unit-of-competency mapping, with trainer-finaliser review queue and ASQA audit trail." timeSaved: "~13 minutes per submission · ~A$0.06 per assessment graded" visual: "none" cardFigure: "compliance" timeMetric: "13 min" timeMetricLabel: "saved / submission" costMetric: "A$0.06" costMetricLabel: "cost per assessment" speedMetric: "3.6×" speedMetricLabel: "faster grading" publishedAt: "2025-05-12" keywords:
- RTO AI Australia
- ASQA compliant grading
- VET sector automation
- auto-grading assistant
The problem
A Registered Training Organisation — 12,000 learners across Certificate III, IV and Diploma-level qualifications — was running written-assessment grading by hand. Trainers spent roughly 18 minutes per submission: reading the learner response, matching it against the rubric and unit-of-competency criteria, drafting feedback, lodging the grade in the LMS.
Volume had grown 31% over two years. Trainer headcount had grown 9%. The gap was being absorbed by overtime. Trainer-satisfaction scores were the leading-indicator metric on the CEO's dashboard and they were declining.
The CEO didn't want to "grade with AI". The ASQA standards are clear that competency decisions are professional judgements made by qualified trainers. What he wanted was for his trainers to spend their time on the judgement part of grading, not the rubric-reading and feedback-typing part.
What we did
Three weeks of scoping (including a thorough review of the AQF mapping for each qualification), eight weeks of build, three weeks of pilot. The deployed system:
- Read the learner's submitted response
- Mapped the response against the unit-of-competency criteria and the rubric
- Produced a draft grade with paragraph-level evidence pointers ("this section of the learner's submission addresses criterion 2.1.b")
- Drafted feedback in the trainer's voice — calibrated on five sample-graded submissions from each trainer
- Routed everything to a trainer-finaliser queue. The trainer was the competency-decision authority. Always.
- Wrote every draft, every trainer adjustment, every confidence score and every model version into an ASQA-aligned audit log
The system never made the competency decision. The trainer did. The system did the typing.
The outcome — at 6 months across all qualifications
| Before (FY24 baseline) | After (6 months in production) | |
|---|---|---|
| Submissions graded per week | ~3,400 | ~3,600 (within enrolment growth) |
| Trainer time per submission | ~18 min | ~5 min |
| Trainers added in period | n/a | 0 |
| Trainer-satisfaction score (internal survey) | 6.2 / 10 | 8.4 / 10 |
| Trainer-adjustment-to-draft rate | n/a | 23% (system surfaces a starting point, trainer adjusts) |
| Cost per assessment graded (model + infra) | n/a | A$0.06 |
| ASQA audit findings on AI-assisted grading | n/a | 0 (audit conducted month 5) |
The 23% trainer-adjustment rate is the metric the CEO most often cites. The trainers were not rubber-stamping the AI. They were exercising professional judgement on roughly one in four submissions, which is roughly what they did before — but now without the typing overhead.
The thing my trainers told me they wanted back was time to write good feedback. They had been writing rubric-shaped feedback because that was all they had time for. Now they write useful feedback.
— CEO, Registered Training Organisation
What we'd do differently
Per-trainer voice calibration earlier. We calibrated trainer voice in week ten using five sample submissions per trainer. We should have done this in week one. The early-pilot trainers found the feedback drafts impersonal until calibration landed.
Map AQF before unit-of-competency. We mapped unit-of-competency first, AQF level second. AQF should have been the framing — it's the scaffolding that ASQA audits against, and it would have made the assessment-criteria mapping cleaner.
What we didn't do
We didn't make any competency decision. We didn't process any submission without the trainer-finaliser step. We didn't deploy the system on summative assessments before the formative-assessment pilot had completed.
The most consequential design decision was the routing-queue UI: the trainer sees the system-drafted grade and feedback alongside the raw learner submission, with side-by-side scrolling. The trainer never sees the AI draft alone. This was the artefact ASQA spent the longest reviewing during the audit. They signed it off in writing.
