CLERC · THE DATA LAYER FOR SIGN LANGUAGE AI
Everyone chases the model.
The data is the bottleneck.
3
Native Deaf signers
300
ASL videos
128
Keypoints / frame
Open
On Hugging Face
HUGGING FACE · DATASET
epee-v01
The first parallel multi-signer ASL corpus · pilot release
LIVE / POSE KEYPOINTS
[ ALPHA ]
[ BRAVO ]
[ CHARLIE ]
Three native Deaf signers · real motion keypoints · one corpus.
WHAT THE DATA PROVES — PICK ONE
USE CASE 01 / ROBUSTNESS PROBE
It looks solved.
Then it meets a new signer.
The same simple recognizer, trained on one signer over 12 shared signs. High accuracy — until it is tested on a body it never saw.
Same signertrain & test on one person69%
New signera body the model never saw14%
Dashed line = random chance (8%, 1 of 12 signs).
69% → 14%. A model that looks production-ready on its own signer collapses to near-chance on a new one. Signer variation is not an edge case — it is the core problem. The fix is not a bigger model. It is structured data from many native signers. That is what CLERC builds.
METHOD — 1-nearest-neighbour on normalised hand keypoints · 12 glosses shared across all signers · held-out signer = CHARLIE · reproducible from epee-v01 · illustrative pilot.