Word2vec V28 — Embedding Geometry Probe

Skip-gram with negative sampling, 300d, 100K whole-word vocabulary, trained 500,000 steps on OpenWebText subset (~2B tokens).

23/26
Custom Analogies
88%
Custom Accuracy
43.9%
Google Benchmark
500
Vocab Visualized
300d
Embedding Dim

Google Analogy Benchmark (Standard Test)

The standard word2vec evaluation: 19,544 analogy questions across 14 categories. Format: A:B :: C:? — find D such that the A-to-B relationship mirrors C-to-D.

43.9%
Overall Accuracy
27.3%
Semantic
52.7%
Syntactic
80%
Coverage
6,898/15,705
Correct/Covered

Comparison with Published Models

All models evaluated on the same Google analogy test set (questions-words.txt, ~19.5K questions).

ModelCorpusVocabDimOverallSemanticSyntactic
V28 (ours)OpenWebText subset100K300 43.9%27.3%52.7%
word2vec (Mikolov 2013)Google News 100B3M300 61.0%~65%~57%
GloVe (Pennington 2014)Common Crawl 42B1.9M300 75.0%~81%~70%
GloVe (Pennington 2014)Wikipedia 6B400K300 71.7%~77%~67%
FastText (Bojanowski 2017)Wikipedia 16B2.5M300 77.8%~77%~78%

Note: Published models use 10-100x more training data. V28 uses a small OpenWebText subset (~2B tokens). Vocab coverage also matters — our 100K vocab covers 80% of test questions vs near-100% for larger vocabs.

Per-Category Breakdown

CategoryScoreAccuracyCoverageType
capital-common-countries253/50650.0%100%semantic
capital-world519/1,92726.9%43%semantic
currency0/3380.0%39%semantic
city-in-state489/2,26121.6%92%semantic
family230/42054.8%83%semantic
gram1-adjective-to-adverb331/99233.4%100%syntactic
gram2-opposite105/75613.9%93%syntactic
gram3-comparative988/1,33274.2%100%syntactic
gram4-superlative389/1,05636.8%94%syntactic
gram5-present-participle499/1,05647.3%100%syntactic
gram6-nationality-adjective714/1,29955.0%81%syntactic
gram7-past-tense1,067/1,56068.4%100%syntactic
gram8-plural989/1,33274.2%100%syntactic
gram9-plural-verbs325/87037.4%100%syntactic

t-SNE Visualization (Top 500 Words)

royalty
gender_m
gender_f
country
capital
animal
emotion
color
nature
food

Vector Arithmetic

ExpressionResultTop Matches
king - man + womanqueenqueen(0.74), princess(0.63), prince(0.60), daughter(0.58), elizabeth(0.57), navarre(0.56)
paris - france + germanyberlinberlin(0.68), munich(0.65), leipzig(0.62), vienna(0.61), bonn(0.58), dresden(0.55)
tokyo - japan + italyromerome(0.53), pisa(0.52), turin(0.49), bologna(0.48), milan(0.47), della(0.45)
bigger - big + smalllargerlarger(0.66), smaller(0.62), large(0.52), thicker(0.43), largest(0.43), fewer(0.41)
went - go + comecamecame(0.89), coming(0.71), gone(0.70), saw(0.69), walked(0.69), brought(0.67)
queen - woman + mankingking(0.73), prince(0.63), charles(0.59), james(0.58), george(0.56), majesty(0.56)
swimming - swim + runrunningrunning(0.70), ran(0.49), runs(0.46), started(0.46), racing(0.43), raced(0.42)
dogs - dog + catcatscats(0.54), rabbits(0.51), squirrels(0.50), mice(0.48), kittens(0.48), beasts(0.47)
french - france + spainspanishspanish(0.79), english(0.66), italian(0.66), portuguese(0.63), german(0.61), dutch(0.57)
brother - man + womansistersister(0.84), daughter(0.79), mother(0.78), husband(0.75), wife(0.74), father(0.71)
worst - bad + goodbestbest(0.56), wisest(0.44), happiest(0.40), kindest(0.40), what(0.39), greatest(0.39)
happy - good + badunhappyunhappy(0.60), sad(0.54), glad(0.54), foolish(0.53), miserable(0.52), dreadful(0.51)

Analogy Tests (23/26 = 88%)

AnalogyExpectedGotTop 5
king:man :: woman:?queengirlgirl(0.69), man's(0.60), creature(0.60), woman's(0.56), boy(0.53)
king:queen :: man:?womanwomanwoman(0.75), girl(0.66), creature(0.61), lady(0.59), man's(0.58)
prince:man :: woman:?princessgirlgirl(0.65), man's(0.59), creature(0.56), woman's(0.53), thing(0.53)
man:woman :: boy:?girlgirlgirl(0.82), baby(0.75), mother(0.72), she(0.69), aunt(0.69)
father:mother :: son:?daughterdaughterdaughter(0.81), sister(0.74), wife(0.73), eldest(0.71), brother(0.70)
husband:wife :: brother:?sistersonson(0.81), father(0.77), daughter(0.74), nephew(0.73), sister(0.72)
he:she :: his:?herherher(0.89), husband's(0.71), girl's(0.69), mother's(0.69), sister's(0.67)
big:bigger :: small:?smallerlargerlarger(0.66), smaller(0.62), large(0.52), thicker(0.43), largest(0.43)
good:better :: bad:?worseworseworse(0.69), easier(0.53), worst(0.48), rather(0.48), maybe(0.47)
slow:slower :: fast:?fasterfasterfaster(0.60), bigger(0.41), speed(0.39), run(0.37), fastest(0.37)
tall:taller :: short:?shortershortershorter(0.53), long(0.49), longer(0.38), shortened(0.37), lengthened(0.37)
good:best :: bad:?worstworstworst(0.59), better(0.47), worse(0.41), likely(0.39), newest(0.39)
big:biggest :: small:?smallestlargestlargest(0.51), large(0.48), larger(0.41), considerable(0.40), smaller(0.39)
go:went :: come:?camecamecame(0.89), coming(0.71), gone(0.70), saw(0.69), walked(0.69)
see:saw :: hear:?heardheardheard(0.78), knew(0.68), came(0.68), spoke(0.64), listened(0.64)
run:ran :: swim:?swamswamswam(0.65), leaped(0.55), sank(0.53), swimming(0.53), jumped(0.53)
eat:ate :: drink:?drankdrankdrank(0.83), wine(0.65), brandy(0.62), drinking(0.60), tasted(0.59)
take:took :: give:?gavegavegave(0.86), giving(0.65), drew(0.53), offered(0.52), came(0.51)
france:paris :: germany:?berlinberlinberlin(0.68), munich(0.65), leipzig(0.62), vienna(0.61), bonn(0.58)
france:paris :: italy:?romebolognabologna(0.62), rome(0.61), turin(0.59), milan(0.57), vienna(0.55)
japan:tokyo :: china:?beijingshanghaishanghai(0.57), beijing(0.56), taiwan(0.45), peking(0.45), university(0.45)
france:paris :: england:?londonlondonlondon(0.77), york(0.59), edinburgh(0.58), philadelphia(0.57), worcester(0.56)
france:french :: spain:?spanishspanishspanish(0.79), english(0.66), italian(0.66), portuguese(0.63), german(0.61)
france:french :: germany:?germangermangerman(0.78), italian(0.65), english(0.61), dutch(0.59), russian(0.59)
car:cars :: dog:?dogsdogsdogs(0.65), cats(0.54), pigs(0.47), animals(0.46), mongrel(0.46)
child:children :: man:?menmenmen(0.74), people(0.60), women(0.59), who(0.55), woman(0.55)

Directional Consistency

How consistently word pairs share the same direction vector (1.0 = perfect, 0.0 = random).

DirectionConsistencyPairsExamples
Gender (M→F)0.31111king→queen, man→woman, boy→girl, father→mother, brother→sister, he→she
Tense (present→past)0.50511go→went, run→ran, see→saw, come→came, eat→ate, take→took
Singular→Plural0.2289car→cars, dog→dogs, cat→cats, house→houses, tree→trees, city→cities
Positive→Negative0.1159happy→sad, good→bad, love→hate, beautiful→ugly, rich→poor, strong→weak
Country→Capital0.3478france→paris, germany→berlin, italy→rome, japan→tokyo, spain→madrid, england→london
Country→Language0.5408france→french, germany→german, spain→spanish, italy→italian, japan→japanese, china→chinese

Semantic Clusters

CategoryWithin-SimWordsMembers
Colors0.63111red, blue, green, yellow, black, white, purple, orange, brown, pink
Food0.53112bread, cheese, meat, fish, rice, fruit, cake, soup, milk, butter
Countries0.50312france, germany, england, spain, italy, japan, china, russia, india, brazil
Weather0.47210rain, snow, wind, storm, sun, cloud, thunder, fog, frost, ice
Emotions0.47012happy, sad, angry, afraid, surprised, love, hate, joy, fear, hope
Music0.46210song, music, piano, guitar, drum, violin, orchestra, melody, rhythm, concert
Math0.45210number, equation, formula, theorem, proof, algebra, geometry, calculus, function, variable
Animals0.40913dog, cat, horse, fish, bird, wolf, bear, lion, tiger, elephant
Body parts0.35612head, hand, foot, arm, leg, eye, ear, nose, mouth, heart
Professions0.34510doctor, teacher, lawyer, engineer, scientist, artist, soldier, farmer, priest, judge

Inter-Group Similarity Matrix

Diagonal = within-group similarity. Off-diagonal = between-group similarity.

AnimalsBody parColorsCountrieEmotionsFoodMathMusicProfessiWeather
Animals0.410.240.230.060.160.22-0.050.110.140.19
Body parts0.240.360.19-0.040.180.14-0.040.140.130.15
Colors0.230.000.630.040.070.21-0.010.100.020.26
Countries0.060.000.000.500.010.08-0.030.020.060.03
Emotions0.160.000.070.010.470.08-0.120.140.190.14
Food0.220.140.210.080.080.53-0.010.080.060.17
Math-0.05-0.04-0.01-0.03-0.12-0.010.45-0.02-0.06-0.04
Music0.110.140.100.020.140.080.000.460.130.11
Professions0.140.130.020.060.190.000.000.000.340.03
Weather0.190.150.260.030.140.170.000.000.030.47