300d, 100K whole-word vocabulary, trained 1,000,000 steps on DFSG-compliant mix (~2B tokens).
The standard word2vec evaluation: 19,544 analogy questions across 14 categories. Format: A:B :: C:? — find D such that the A-to-B relationship mirrors C-to-D.
All models evaluated on the same Google analogy test set (questions-words.txt, ~19.5K questions).
| Model | Corpus | Vocab | Dim | Overall | Semantic | Syntactic |
|---|---|---|---|---|---|---|
| V33 (ours) | DFSG-compliant mix | 100K | 300 | 59.2% | 47.5% | 65.4% |
| word2vec (Mikolov 2013) | Google News 100B | 3M | 300 | 61.0% | ~65% | ~57% |
| GloVe (Pennington 2014) | Common Crawl 42B | 1.9M | 300 | 75.0% | ~81% | ~70% |
| GloVe (Pennington 2014) | Wikipedia 6B | 400K | 300 | 71.7% | ~77% | ~67% |
| FastText (Bojanowski 2017) | Wikipedia 16B | 2.5M | 300 | 77.8% | ~77% | ~78% |
Note: Published models use 10-100x more training data. V33 trains on DFSG-compliant sources (Wikipedia, Gutenberg, Stack Exchange, arXiv, etc., ~2B tokens). Vocab coverage also matters — our 100K vocab covers 80% of test questions vs near-100% for larger vocabs.
| Category | Score | Accuracy | Coverage | Type |
|---|---|---|---|---|
| capital-common-countries | 374/506 | 73.9% | 100% | semantic |
| capital-world | 796/1,927 | 41.3% | 43% | semantic |
| currency | 5/338 | 1.5% | 39% | semantic |
| city-in-state | 1,111/2,261 | 49.1% | 92% | semantic |
| family | 304/420 | 72.4% | 83% | semantic |
| gram1-adjective-to-adverb | 309/992 | 31.1% | 100% | syntactic |
| gram2-opposite | 237/756 | 31.3% | 93% | syntactic |
| gram3-comparative | 1,208/1,332 | 90.7% | 100% | syntactic |
| gram4-superlative | 734/1,056 | 69.5% | 94% | syntactic |
| gram5-present-participle | 649/1,056 | 61.5% | 100% | syntactic |
| gram6-nationality-adjective | 847/1,299 | 65.2% | 81% | syntactic |
| gram7-past-tense | 1,089/1,560 | 69.8% | 100% | syntactic |
| gram8-plural | 1,122/1,332 | 84.2% | 100% | syntactic |
| gram9-plural-verbs | 509/870 | 58.5% | 100% | syntactic |
| Expression | Result | Top Matches |
|---|---|---|
| king - man + woman | queen | queen(0.75), princess(0.69), prince(0.58), infanta(0.55), empress(0.54), monarch(0.54) |
| paris - france + germany | berlin | berlin(0.78), munich(0.77), vienna(0.73), dresden(0.64), prague(0.63), stuttgart(0.63) |
| tokyo - japan + italy | bologna | bologna(0.58), turin(0.58), pisa(0.54), milan(0.54), perugia(0.54), rome(0.53) |
| bigger - big + small | smaller | smaller(0.69), larger(0.68), large(0.47), shorter(0.41), inconsiderable(0.41), insignificant(0.41) |
| went - go + come | came | came(0.89), brought(0.64), walked(0.63), coming(0.61), hurried(0.60), hastened(0.60) |
| queen - woman + man | king | king(0.68), majesty(0.57), regent(0.56), queen's(0.56), king's(0.53), monarch(0.53) |
| swimming - swim + run | running | running(0.75), runs(0.52), ran(0.48), racing(0.44), jumping(0.41), raced(0.40) |
| dogs - dog + cat | cats | cats(0.75), kittens(0.52), mice(0.52), rats(0.48), rabbits(0.48), pussy(0.46) |
| french - france + spain | spanish | spanish(0.85), portuguese(0.72), english(0.65), dutch(0.63), italian(0.62), castilian(0.59) |
| brother - man + woman | sister | sister(0.87), daughter(0.77), mother(0.75), cousin(0.72), husband(0.71), niece(0.71) |
| worst - bad + good | best | best(0.57), better(0.41), truest(0.36), hardest(0.35), well(0.33), greatest(0.33) |
| happy - good + bad | unhappy | unhappy(0.59), miserable(0.52), sad(0.49), wretched(0.47), happiest(0.45), glad(0.44) |
| Analogy | Expected | Got | Top 5 |
|---|---|---|---|
| king:man :: woman:? | queen | girl | girl(0.62), man's(0.60), creature(0.58), woman's(0.53), gentleman(0.51) |
| king:queen :: man:? | woman | woman | woman(0.73), girl(0.56), creature(0.55), lady(0.54), man's(0.52) |
| prince:man :: woman:? | princess | man's | man's(0.60), girl(0.58), creature(0.54), woman's(0.52), gentleman(0.49) |
| man:woman :: boy:? | girl | girl | girl(0.84), baby(0.70), mother(0.68), child(0.67), girls(0.65) |
| father:mother :: son:? | daughter | daughter | daughter(0.80), sister(0.72), brother(0.68), wife(0.67), grandson(0.65) |
| husband:wife :: brother:? | sister | son | son(0.76), sister(0.73), nephew(0.73), father(0.71), daughter(0.68) |
| he:she :: his:? | her | her | her(0.90), girl's(0.70), my(0.68), husband's(0.66), sister's(0.66) |
| big:bigger :: small:? | smaller | smaller | smaller(0.69), larger(0.68), large(0.47), shorter(0.41), inconsiderable(0.41) |
| good:better :: bad:? | worse | worse | worse(0.67), worst(0.45), easier(0.43), safer(0.42), wiser(0.41) |
| slow:slower :: fast:? | faster | faster | faster(0.70), swifter(0.42), quicker(0.40), fastest(0.40), thicker(0.40) |
| tall:taller :: short:? | shorter | shorter | shorter(0.63), long(0.40), shortened(0.39), broader(0.38), longest(0.37) |
| good:best :: bad:? | worst | worst | worst(0.59), easiest(0.44), cheapest(0.40), finest(0.39), safest(0.39) |
| big:biggest :: small:? | smallest | largest | largest(0.55), large(0.45), smaller(0.44), larger(0.42), smallest(0.41) |
| go:went :: come:? | came | came | came(0.89), brought(0.64), walked(0.63), coming(0.61), hurried(0.60) |
| see:saw :: hear:? | heard | heard | heard(0.80), came(0.67), knew(0.63), listened(0.62), spoke(0.62) |
| run:ran :: swim:? | swam | swam | swam(0.74), waded(0.58), dived(0.57), rowed(0.56), swimming(0.55) |
| eat:ate :: drink:? | drank | drank | drank(0.84), sipped(0.62), quaffed(0.60), drinking(0.58), wine(0.55) |
| take:took :: give:? | gave | gave | gave(0.89), giving(0.61), drew(0.51), came(0.51), gives(0.50) |
| france:paris :: germany:? | berlin | berlin | berlin(0.78), munich(0.77), vienna(0.73), dresden(0.64), prague(0.63) |
| france:paris :: italy:? | rome | bologna | bologna(0.69), rome(0.64), turin(0.63), milan(0.62), lucca(0.62) |
| japan:tokyo :: china:? | beijing | shanghai | shanghai(0.63), beijing(0.61), nanjing(0.55), peking(0.54), chinese(0.53) |
| france:paris :: england:? | london | london | london(0.79), edinburgh(0.56), philadelphia(0.55), york(0.55), vienna(0.54) |
| france:french :: spain:? | spanish | spanish | spanish(0.85), portuguese(0.72), english(0.65), dutch(0.63), italian(0.62) |
| france:french :: germany:? | german | german | german(0.84), russian(0.65), austrian(0.64), italian(0.63), dutch(0.62) |
| car:cars :: dog:? | dogs | dogs | dogs(0.71), cats(0.56), puppy(0.52), puppies(0.52), terrier(0.51) |
| child:children :: man:? | men | men | men(0.78), women(0.59), people(0.55), fellows(0.49), folks(0.49) |
How consistently word pairs share the same direction vector (1.0 = perfect, 0.0 = random).
| Direction | Consistency | Pairs | Examples |
|---|---|---|---|
| Gender (M→F) | 0.380 | 11 | king→queen, man→woman, boy→girl, father→mother, brother→sister, he→she |
| Tense (present→past) | 0.480 | 11 | go→went, run→ran, see→saw, come→came, eat→ate, take→took |
| Singular→Plural | 0.209 | 9 | car→cars, dog→dogs, cat→cats, house→houses, tree→trees, city→cities |
| Positive→Negative | 0.106 | 9 | happy→sad, good→bad, love→hate, beautiful→ugly, rich→poor, strong→weak |
| Country→Capital | 0.410 | 8 | france→paris, germany→berlin, italy→rome, japan→tokyo, spain→madrid, england→london |
| Country→Language | 0.661 | 8 | france→french, germany→german, spain→spanish, italy→italian, japan→japanese, china→chinese |
| Category | Within-Sim | Words | Members |
|---|---|---|---|
| Colors | 0.564 | 11 | red, blue, green, yellow, black, white, purple, orange, brown, pink |
| Countries | 0.465 | 12 | france, germany, england, spain, italy, japan, china, russia, india, brazil |
| Music | 0.441 | 10 | song, music, piano, guitar, drum, violin, orchestra, melody, rhythm, concert |
| Food | 0.436 | 12 | bread, cheese, meat, fish, rice, fruit, cake, soup, milk, butter |
| Emotions | 0.385 | 12 | happy, sad, angry, afraid, surprised, love, hate, joy, fear, hope |
| Weather | 0.380 | 10 | rain, snow, wind, storm, sun, cloud, thunder, fog, frost, ice |
| Animals | 0.353 | 13 | dog, cat, horse, fish, bird, wolf, bear, lion, tiger, elephant |
| Professions | 0.316 | 10 | doctor, teacher, lawyer, engineer, scientist, artist, soldier, farmer, priest, judge |
| Body parts | 0.291 | 12 | head, hand, foot, arm, leg, eye, ear, nose, mouth, heart |
| Math | 0.245 | 10 | number, equation, formula, theorem, proof, algebra, geometry, calculus, function, variable |
Diagonal = within-group similarity. Off-diagonal = between-group similarity.
| Animals | Body par | Colors | Countrie | Emotions | Food | Math | Music | Professi | Weather | |
|---|---|---|---|---|---|---|---|---|---|---|
| Animals | 0.35 | 0.16 | 0.16 | 0.02 | 0.09 | 0.16 | -0.03 | 0.08 | 0.10 | 0.14 |
| Body parts | 0.16 | 0.29 | 0.12 | -0.05 | 0.10 | 0.09 | -0.03 | 0.10 | 0.08 | 0.10 |
| Colors | 0.16 | 0.00 | 0.56 | 0.02 | 0.03 | 0.15 | -0.02 | 0.05 | -0.02 | 0.18 |
| Countries | 0.02 | 0.00 | 0.00 | 0.47 | 0.01 | 0.04 | -0.03 | 0.01 | 0.03 | 0.02 |
| Emotions | 0.09 | 0.00 | 0.03 | 0.01 | 0.39 | 0.03 | -0.08 | 0.11 | 0.14 | 0.09 |
| Food | 0.16 | 0.09 | 0.15 | 0.04 | 0.03 | 0.44 | 0.01 | 0.06 | 0.04 | 0.12 |
| Math | -0.03 | -0.03 | -0.02 | -0.03 | -0.08 | 0.01 | 0.24 | 0.01 | -0.03 | -0.02 |
| Music | 0.08 | 0.10 | 0.05 | 0.01 | 0.11 | 0.06 | 0.00 | 0.44 | 0.11 | 0.09 |
| Professions | 0.10 | 0.08 | -0.02 | 0.03 | 0.14 | 0.00 | 0.00 | 0.00 | 0.32 | 0.02 |
| Weather | 0.14 | 0.10 | 0.18 | 0.02 | 0.09 | 0.12 | 0.00 | 0.00 | 0.02 | 0.38 |