Skip-gram with negative sampling, 300d, 100K whole-word vocabulary, trained 500,000 steps on OpenWebText subset (~2B tokens).
The standard word2vec evaluation: 19,544 analogy questions across 14 categories. Format: A:B :: C:? — find D such that the A-to-B relationship mirrors C-to-D.
All models evaluated on the same Google analogy test set (questions-words.txt, ~19.5K questions).
| Model | Corpus | Vocab | Dim | Overall | Semantic | Syntactic |
|---|---|---|---|---|---|---|
| V28 (ours) | OpenWebText subset | 100K | 300 | 43.9% | 27.3% | 52.7% |
| word2vec (Mikolov 2013) | Google News 100B | 3M | 300 | 61.0% | ~65% | ~57% |
| GloVe (Pennington 2014) | Common Crawl 42B | 1.9M | 300 | 75.0% | ~81% | ~70% |
| GloVe (Pennington 2014) | Wikipedia 6B | 400K | 300 | 71.7% | ~77% | ~67% |
| FastText (Bojanowski 2017) | Wikipedia 16B | 2.5M | 300 | 77.8% | ~77% | ~78% |
Note: Published models use 10-100x more training data. V28 uses a small OpenWebText subset (~2B tokens). Vocab coverage also matters — our 100K vocab covers 80% of test questions vs near-100% for larger vocabs.
| Category | Score | Accuracy | Coverage | Type |
|---|---|---|---|---|
| capital-common-countries | 253/506 | 50.0% | 100% | semantic |
| capital-world | 519/1,927 | 26.9% | 43% | semantic |
| currency | 0/338 | 0.0% | 39% | semantic |
| city-in-state | 489/2,261 | 21.6% | 92% | semantic |
| family | 230/420 | 54.8% | 83% | semantic |
| gram1-adjective-to-adverb | 331/992 | 33.4% | 100% | syntactic |
| gram2-opposite | 105/756 | 13.9% | 93% | syntactic |
| gram3-comparative | 988/1,332 | 74.2% | 100% | syntactic |
| gram4-superlative | 389/1,056 | 36.8% | 94% | syntactic |
| gram5-present-participle | 499/1,056 | 47.3% | 100% | syntactic |
| gram6-nationality-adjective | 714/1,299 | 55.0% | 81% | syntactic |
| gram7-past-tense | 1,067/1,560 | 68.4% | 100% | syntactic |
| gram8-plural | 989/1,332 | 74.2% | 100% | syntactic |
| gram9-plural-verbs | 325/870 | 37.4% | 100% | syntactic |
| Expression | Result | Top Matches |
|---|---|---|
| king - man + woman | queen | queen(0.74), princess(0.63), prince(0.60), daughter(0.58), elizabeth(0.57), navarre(0.56) |
| paris - france + germany | berlin | berlin(0.68), munich(0.65), leipzig(0.62), vienna(0.61), bonn(0.58), dresden(0.55) |
| tokyo - japan + italy | rome | rome(0.53), pisa(0.52), turin(0.49), bologna(0.48), milan(0.47), della(0.45) |
| bigger - big + small | larger | larger(0.66), smaller(0.62), large(0.52), thicker(0.43), largest(0.43), fewer(0.41) |
| went - go + come | came | came(0.89), coming(0.71), gone(0.70), saw(0.69), walked(0.69), brought(0.67) |
| queen - woman + man | king | king(0.73), prince(0.63), charles(0.59), james(0.58), george(0.56), majesty(0.56) |
| swimming - swim + run | running | running(0.70), ran(0.49), runs(0.46), started(0.46), racing(0.43), raced(0.42) |
| dogs - dog + cat | cats | cats(0.54), rabbits(0.51), squirrels(0.50), mice(0.48), kittens(0.48), beasts(0.47) |
| french - france + spain | spanish | spanish(0.79), english(0.66), italian(0.66), portuguese(0.63), german(0.61), dutch(0.57) |
| brother - man + woman | sister | sister(0.84), daughter(0.79), mother(0.78), husband(0.75), wife(0.74), father(0.71) |
| worst - bad + good | best | best(0.56), wisest(0.44), happiest(0.40), kindest(0.40), what(0.39), greatest(0.39) |
| happy - good + bad | unhappy | unhappy(0.60), sad(0.54), glad(0.54), foolish(0.53), miserable(0.52), dreadful(0.51) |
| Analogy | Expected | Got | Top 5 |
|---|---|---|---|
| king:man :: woman:? | queen | girl | girl(0.69), man's(0.60), creature(0.60), woman's(0.56), boy(0.53) |
| king:queen :: man:? | woman | woman | woman(0.75), girl(0.66), creature(0.61), lady(0.59), man's(0.58) |
| prince:man :: woman:? | princess | girl | girl(0.65), man's(0.59), creature(0.56), woman's(0.53), thing(0.53) |
| man:woman :: boy:? | girl | girl | girl(0.82), baby(0.75), mother(0.72), she(0.69), aunt(0.69) |
| father:mother :: son:? | daughter | daughter | daughter(0.81), sister(0.74), wife(0.73), eldest(0.71), brother(0.70) |
| husband:wife :: brother:? | sister | son | son(0.81), father(0.77), daughter(0.74), nephew(0.73), sister(0.72) |
| he:she :: his:? | her | her | her(0.89), husband's(0.71), girl's(0.69), mother's(0.69), sister's(0.67) |
| big:bigger :: small:? | smaller | larger | larger(0.66), smaller(0.62), large(0.52), thicker(0.43), largest(0.43) |
| good:better :: bad:? | worse | worse | worse(0.69), easier(0.53), worst(0.48), rather(0.48), maybe(0.47) |
| slow:slower :: fast:? | faster | faster | faster(0.60), bigger(0.41), speed(0.39), run(0.37), fastest(0.37) |
| tall:taller :: short:? | shorter | shorter | shorter(0.53), long(0.49), longer(0.38), shortened(0.37), lengthened(0.37) |
| good:best :: bad:? | worst | worst | worst(0.59), better(0.47), worse(0.41), likely(0.39), newest(0.39) |
| big:biggest :: small:? | smallest | largest | largest(0.51), large(0.48), larger(0.41), considerable(0.40), smaller(0.39) |
| go:went :: come:? | came | came | came(0.89), coming(0.71), gone(0.70), saw(0.69), walked(0.69) |
| see:saw :: hear:? | heard | heard | heard(0.78), knew(0.68), came(0.68), spoke(0.64), listened(0.64) |
| run:ran :: swim:? | swam | swam | swam(0.65), leaped(0.55), sank(0.53), swimming(0.53), jumped(0.53) |
| eat:ate :: drink:? | drank | drank | drank(0.83), wine(0.65), brandy(0.62), drinking(0.60), tasted(0.59) |
| take:took :: give:? | gave | gave | gave(0.86), giving(0.65), drew(0.53), offered(0.52), came(0.51) |
| france:paris :: germany:? | berlin | berlin | berlin(0.68), munich(0.65), leipzig(0.62), vienna(0.61), bonn(0.58) |
| france:paris :: italy:? | rome | bologna | bologna(0.62), rome(0.61), turin(0.59), milan(0.57), vienna(0.55) |
| japan:tokyo :: china:? | beijing | shanghai | shanghai(0.57), beijing(0.56), taiwan(0.45), peking(0.45), university(0.45) |
| france:paris :: england:? | london | london | london(0.77), york(0.59), edinburgh(0.58), philadelphia(0.57), worcester(0.56) |
| france:french :: spain:? | spanish | spanish | spanish(0.79), english(0.66), italian(0.66), portuguese(0.63), german(0.61) |
| france:french :: germany:? | german | german | german(0.78), italian(0.65), english(0.61), dutch(0.59), russian(0.59) |
| car:cars :: dog:? | dogs | dogs | dogs(0.65), cats(0.54), pigs(0.47), animals(0.46), mongrel(0.46) |
| child:children :: man:? | men | men | men(0.74), people(0.60), women(0.59), who(0.55), woman(0.55) |
How consistently word pairs share the same direction vector (1.0 = perfect, 0.0 = random).
| Direction | Consistency | Pairs | Examples |
|---|---|---|---|
| Gender (M→F) | 0.311 | 11 | king→queen, man→woman, boy→girl, father→mother, brother→sister, he→she |
| Tense (present→past) | 0.505 | 11 | go→went, run→ran, see→saw, come→came, eat→ate, take→took |
| Singular→Plural | 0.228 | 9 | car→cars, dog→dogs, cat→cats, house→houses, tree→trees, city→cities |
| Positive→Negative | 0.115 | 9 | happy→sad, good→bad, love→hate, beautiful→ugly, rich→poor, strong→weak |
| Country→Capital | 0.347 | 8 | france→paris, germany→berlin, italy→rome, japan→tokyo, spain→madrid, england→london |
| Country→Language | 0.540 | 8 | france→french, germany→german, spain→spanish, italy→italian, japan→japanese, china→chinese |
| Category | Within-Sim | Words | Members |
|---|---|---|---|
| Colors | 0.631 | 11 | red, blue, green, yellow, black, white, purple, orange, brown, pink |
| Food | 0.531 | 12 | bread, cheese, meat, fish, rice, fruit, cake, soup, milk, butter |
| Countries | 0.503 | 12 | france, germany, england, spain, italy, japan, china, russia, india, brazil |
| Weather | 0.472 | 10 | rain, snow, wind, storm, sun, cloud, thunder, fog, frost, ice |
| Emotions | 0.470 | 12 | happy, sad, angry, afraid, surprised, love, hate, joy, fear, hope |
| Music | 0.462 | 10 | song, music, piano, guitar, drum, violin, orchestra, melody, rhythm, concert |
| Math | 0.452 | 10 | number, equation, formula, theorem, proof, algebra, geometry, calculus, function, variable |
| Animals | 0.409 | 13 | dog, cat, horse, fish, bird, wolf, bear, lion, tiger, elephant |
| Body parts | 0.356 | 12 | head, hand, foot, arm, leg, eye, ear, nose, mouth, heart |
| Professions | 0.345 | 10 | doctor, teacher, lawyer, engineer, scientist, artist, soldier, farmer, priest, judge |
Diagonal = within-group similarity. Off-diagonal = between-group similarity.
| Animals | Body par | Colors | Countrie | Emotions | Food | Math | Music | Professi | Weather | |
|---|---|---|---|---|---|---|---|---|---|---|
| Animals | 0.41 | 0.24 | 0.23 | 0.06 | 0.16 | 0.22 | -0.05 | 0.11 | 0.14 | 0.19 |
| Body parts | 0.24 | 0.36 | 0.19 | -0.04 | 0.18 | 0.14 | -0.04 | 0.14 | 0.13 | 0.15 |
| Colors | 0.23 | 0.00 | 0.63 | 0.04 | 0.07 | 0.21 | -0.01 | 0.10 | 0.02 | 0.26 |
| Countries | 0.06 | 0.00 | 0.00 | 0.50 | 0.01 | 0.08 | -0.03 | 0.02 | 0.06 | 0.03 |
| Emotions | 0.16 | 0.00 | 0.07 | 0.01 | 0.47 | 0.08 | -0.12 | 0.14 | 0.19 | 0.14 |
| Food | 0.22 | 0.14 | 0.21 | 0.08 | 0.08 | 0.53 | -0.01 | 0.08 | 0.06 | 0.17 |
| Math | -0.05 | -0.04 | -0.01 | -0.03 | -0.12 | -0.01 | 0.45 | -0.02 | -0.06 | -0.04 |
| Music | 0.11 | 0.14 | 0.10 | 0.02 | 0.14 | 0.08 | 0.00 | 0.46 | 0.13 | 0.11 |
| Professions | 0.14 | 0.13 | 0.02 | 0.06 | 0.19 | 0.00 | 0.00 | 0.00 | 0.34 | 0.03 |
| Weather | 0.19 | 0.15 | 0.26 | 0.03 | 0.14 | 0.17 | 0.00 | 0.00 | 0.03 | 0.47 |