CLERC · THE DATA LAYER FOR SIGN LANGUAGE AI

Everyone chases the model.
The data is the bottleneck.

3

Native Deaf signers

300

ASL videos

128

Keypoints / frame

Open

On Hugging Face

HUGGING FACE · DATASET

epee-v01

The first parallel multi-signer ASL corpus · pilot release

Explore the data →

LIVE / POSE KEYPOINTS

[ ALPHA ]

[ BRAVO ]

[ CHARLIE ]

Three native Deaf signers · real motion keypoints · one corpus.

WHAT THE DATA PROVES — PICK ONE

USE CASE 01 / ROBUSTNESS PROBE

It looks solved.
Then it meets a new signer.

The same simple recognizer, trained on one signer over 12 shared signs. High accuracy — until it is tested on a body it never saw.

Same signertrain & test on one person69%
New signera body the model never saw14%

Dashed line = random chance (8%, 1 of 12 signs).

69% → 14%. A model that looks production-ready on its own signer collapses to near-chance on a new one. Signer variation is not an edge case — it is the core problem. The fix is not a bigger model. It is structured data from many native signers. That is what CLERC builds.

METHOD — 1-nearest-neighbour on normalised hand keypoints · 12 glosses shared across all signers · held-out signer = CHARLIE · reproducible from epee-v01 · illustrative pilot.