Languages

Language Families

Explore all language families, their proposed homelands, and the migrations that spread them.

Homeland: Northeast Africa (most supported) or the Levant

Afroasiatic

The Afroasiatic language family (also called Afrasian or Hamito-Semitic) encompasses approximately 400 languages spoken by around 500 million people across North Africa, the Horn of Africa, the Sahel, West Africa, and the Middle East. Its principal branches are Semitic, Egyptian (extinct), Berber, Cushitic, Chadic, and Omotic. The family's deep time depth — estimates range from 10,000 to 18,000 years — makes it one of the oldest demonstrably related language families on Earth. The homeland debate remains unresolved. The Northeast African hypothesis, supported by researchers including Christopher Ehret and Igor Diakonoff, places the family's origin in or near the Horn of Africa or the eastern Sahara around 15,000–10,000 BCE, when those regions were wetter and more hospitable during the African Humid Period. The Levantine hypothesis, favoured by some Semitists, proposes an origin in the Near East, with the Afroasiatic family spreading into Africa via early agricultural dispersals. The distribution of the most basal branches (Omotic and Cushitic) in Ethiopia and the Horn of Africa lends support to an African origin. The Semitic branch achieved its remarkable spread through successive historical expansions: Akkadian became the lingua franca of the ancient Near East by the 3rd millennium BCE; Aramaic replaced it across much of the region by the 1st millennium BCE; and Arabic, carried by the Islamic conquests of the 7th–8th centuries CE, displaced most of the earlier Semitic languages across the Fertile Crescent, Egypt, and North Africa. Ancient Egyptian, attested for over 4,000 years from the 32nd century BCE to the 17th century CE in its Coptic form, represents an entirely separate and extinct branch. The Chadic branch, with over 150 languages including Hausa (the most widely spoken sub-Saharan African language by native speakers), is distributed across the Lake Chad basin. The Berber languages (collectively called Tamazight) are spoken by Amazigh populations across North Africa, where they represent a pre-Arabic stratum displaced but never fully replaced by the Arab expansion. The Cushitic languages — including Oromo and Somali, among Africa's most widely spoken languages — are distributed across the Horn of Africa and parts of East Africa.

Homeland: Taiwan (Out of Taiwan hypothesis)

Austronesian

The Austronesian language family is the world's most geographically dispersed, spanning from Madagascar in the west to Rapa Nui (Easter Island) in the east — a range of over 14,000 kilometres — and from Taiwan and Hawaii in the north to New Zealand in the south. It contains approximately 1,200–1,300 languages spoken by around 380 million people, and represents one of the most remarkable episodes of maritime dispersal in human prehistory. The Out of Taiwan hypothesis, now supported by a convergence of linguistic, genetic, and archaeological evidence, proposes that Austronesian languages originated among farming communities in Taiwan around 4000–3500 BCE. The first stage of dispersal, around 2000–1500 BCE, carried Austronesian speakers southward into the Philippines and the Indonesian archipelago. A subsequent wave pushed westward: ancestors of Malagasy speakers crossed the Indian Ocean to Madagascar around 500–1000 CE, a voyage of over 6,000 kilometres that ranks among the most extraordinary maritime migrations in history. The Polynesian expansion into Remote Oceania represents the final phase. Ancestral Polynesian culture (the Lapita culture, distinguished by a characteristic geometric pottery tradition) emerged in Island Melanesia around 1400–1000 BCE. From there, Polynesian voyagers progressively settled Tonga, Samoa, and the Marquesas, before radiating outward to Hawaii (c. 300–600 CE), New Zealand (c. 1280 CE), and Rapa Nui. Ancient DNA has confirmed that this expansion was demographically distinct from earlier Melanesian populations, with Polynesian populations showing minimal Melanesian admixture, suggesting a rapid "leap-frog" dispersal through already-inhabited islands. The Austronesian expansion is linguistically remarkable for its speed and homogeneity. Despite spanning 4,000 years of separation, many Austronesian languages retain cognate vocabularies and grammatical structures that attest to a common origin. The family includes the world's most widely spoken Austronesian language, Malay/Indonesian (with approximately 270 million speakers as a first or second language), which served as the trade lingua franca of island Southeast Asia for centuries before becoming the national language of Indonesia and Malaysia.

Homeland: Nigeria–Cameroon border region (Benue-Congo heartland)

Bantu

The Bantu languages form a subgroup of the Niger-Congo family comprising approximately 500 languages spoken by around 350 million people across the southern two-thirds of Africa. The term "Bantu" refers to the shared word for "people" (variants of *bantʊ) across the family, coined by Wilhelm Bleek in 1862. It is one of the most intensively studied language groups in historical linguistics because its expansion is both linguistically traceable and, unusually, genetically and archaeologically documented in high resolution. Proto-Bantu is reconstructed as having been spoken in or near the Nigeria-Cameroon border region, in what linguists call the Benue-Congo heartland, around 3000–2500 BCE. The earliest Bantu speakers were farmers growing yams and oil palms in the West African rainforest belt. As they moved southward and eastward, the expansion bifurcated: a western stream moved down the Atlantic coast toward the Congo Basin, while an eastern stream skirted the north of the equatorial rainforest and entered East Africa, where Bantu farmers encountered existing populations of Cushitic pastoralists and Khoisan hunter-gatherers. Ancient DNA from sites across sub-Saharan Africa has transformed understanding of this expansion. Studies of ancient remains from the Congo Basin, East Africa, and southern Africa consistently show that pre-Bantu hunter-gatherer populations (related to the western Rainforest Hunter-Gatherers and the southern African Khoisan) were largely replaced by Bantu-ancestry populations over the past 2,000 years, though with variable admixture. The timing of this replacement closely follows the linguistic and archaeological evidence for Bantu arrival in each region. The eastern Bantu expansion produced some of Africa's most widely spoken languages. Swahili, a coastal Bantu language enriched by centuries of Indian Ocean trade, became East Africa's principal trade language and is now spoken by approximately 200 million people. Zulu and Xhosa, southernmost Bantu languages with distinctive click consonants borrowed from contact with Khoisan languages, are among South Africa's official languages. The Bantu expansion's demographic impact on Africa rivals the Indo-European expansion into Europe and the Austronesian dispersal across the Pacific as one of the defining demographic events of the Holocene.

Homeland: South or Northwest India (Baluchistan/Sindh region proposed for early Proto-Dravidian)

Dravidian

The Dravidian language family comprises approximately 80 languages spoken by around 250 million people, predominantly in South India and Sri Lanka. The four major literary languages — Tamil, Telugu, Kannada, and Malayalam — together account for the bulk of speakers, with Tamil being the oldest attested and arguably the world's longest-continuously-spoken classical language still in everyday use, with inscriptions dating to around 300 BCE and a literary tradition stretching back to the ancient Sangam anthologies. The origins and ancient history of the Dravidian family are deeply contested. The most provocative hypothesis links Dravidian to the Indus Valley Civilisation (c. 3300–1300 BCE), the largest urban civilisation of the ancient world, which produced an undeciphered script and left no directly readable texts. The distribution of early Dravidian speakers, the presence of the Dravidian language Brahui in Baluchistan (in the heart of the former Indus zone), and lexical reconstructions suggesting Proto-Dravidian speakers were familiar with urban and agricultural practices all support a possible connection. If correct, Dravidian languages were the administrative language of the world's first planned cities. The hypothesis remains unproven because the Indus script cannot be read. Ancient DNA from South Asia has clarified the broader population history. Modern South Indians carry ancestry from two principal sources: the Ancient Ancestral South Indians (AASI), a deeply divergent population related to the earliest inhabitants of South and Southeast Asia; and a component related to Iranian farmers from the Zagros highlands, who arrived in South Asia around 4000–3000 BCE and whose descendants were likely early Dravidian speakers. A third layer — the steppe-derived ancestry linked to Indo-Aryan languages — arrived later, around 2000–1500 BCE, and predominantly affected northern India while leaving South India with lower steppe proportions, consistent with the geographic distribution of Dravidian languages. The displacement of Dravidian languages northward by Indo-Aryan languages following the steppe migrations has left Dravidian languages as an island in South Asia, surrounded by Indo-European languages to the north and Tibeto-Burman languages to the northeast. The isolated survival of Brahui in Baluchistan may represent a northern remnant of what was once a more widely distributed Dravidian-speaking zone.

Homeland: Pontic-Caspian steppe (Yamnaya hypothesis, most supported) or Anatolia (farming dispersal hypothesis)

Indo-European

The Indo-European language family is the world's largest by number of speakers, encompassing around 3.2 billion people across languages as diverse as English, Hindi, Russian, Persian, and Bengali. It originated in a reconstructed ancestor language known as Proto-Indo-European (PIE), which linguists have painstakingly assembled from systematic sound correspondences across its daughter languages — a process pioneered by Sir William Jones's observation in 1786 that Sanskrit, Greek, and Latin shared structural similarities too regular to be coincidental. The homeland question has been one of the most contested in historical linguistics and archaeogenetics. The dominant model, the Kurgan hypothesis advanced by Marija Gimbutas and refined by David Anthony, proposes that PIE was spoken by pastoral nomadic populations on the Pontic-Caspian steppe (modern Ukraine and southern Russia) around 4500–3500 BCE, associated with the Yamnaya archaeological culture. Ancient DNA evidence published from 2015 onward has provided strong support: massive westward migrations of Yamnaya-related individuals into Europe around 3000–2500 BCE correlate precisely with the introduction of Indo-European languages into Europe and the near-replacement of earlier Neolithic farmer genetics in many regions. The spread of Indo-European languages into South Asia was driven by a separate but related expansion. Ancient DNA from sites in the Swat Valley of Pakistan and the Gangetic plain shows a significant influx of steppe-ancestry individuals around 2000–1500 BCE, contemporaneous with the Rigvedic period and consistent with the introduction of Sanskrit. The competing Anatolian hypothesis, championed by Colin Renfrew, proposes that PIE spread earlier with the first farmers from Anatolia around 8000–6000 BCE — a model that fits poorly with the ancient DNA evidence as currently understood, though a hybrid model incorporating both farming dispersal and later steppe migration has gained some traction. The Anatolian branch (Hittite, Luwian, Palaic) is the earliest attested and diverged first from PIE. The subsequent spread of Indo-European languages followed multiple routes: westward into Europe (Celtic, Italic, Germanic, Balto-Slavic, Greek, Albanian); southward into Iran and South Asia (Indo-Iranian); and into Armenia and the Caucasus. European colonial expansion from the 15th century CE carried Indo-European languages — particularly English, Spanish, Portuguese, and French — to every inhabited continent, making the family globally dominant by the 21st century.

Homeland: Korean Peninsula / Northern Kyushu, Japan (Yayoi migration hypothesis)

Japonic

The Japonic language family comprises Japanese and the Ryukyuan languages spoken in the Okinawa and Amami island chains of Japan. Together they are spoken by approximately 130 million people, making Japonic one of the world's major language families despite being among the geographically most confined. The family's position within the broader linguistic landscape of East Asia remains debated: proposals for a genetic relationship with Korean (forming a Japano-Koreanic macro-family), or with the Altaic languages, have been advanced but remain controversial. The population history behind Japonic is now substantially clarified by ancient DNA. Two major ancestral layers are documented in the Japanese population. The Jōmon people, hunter-gatherers who inhabited the Japanese archipelago from at least 16,000 years ago and possibly earlier, spoke languages of unknown affiliation. From around 900 BCE onward, Yayoi migrants from continental East Asia — genomically similar to populations from the Korean Peninsula and the Liaodong region of northeastern China — introduced wet rice agriculture to Kyushu. Most researchers now believe that the Yayoi migrants carried the ancestral Japonic language, and that the subsequent replacement of Jōmon culture across most of Japan was accompanied by a language shift, though the Jōmon-descended Ainu people retained their non-Japonic language in Hokkaido. The Ryukyuan languages, spoken across the islands stretching from Kyushu to Taiwan, are the sister languages of Japanese within the Japonic family. They diverged from the Japanese main islands' ancestor language between approximately 1,500 and 2,000 years ago, as Japonic-speaking populations spread southward through the island chain. Some Ryukyuan varieties are mutually unintelligible with Japanese and with each other, attesting to considerable time depth. The Ryukyuan kingdoms maintained an independent cultural and linguistic identity until the Satsuma invasion of 1609, and the languages remain under demographic threat from Standard Japanese. Japanese is notable for its highly elaborate system of grammatical honorifics (keigo), which encodes social hierarchy in verb forms, nouns, and pronouns, reflecting the centuries of rigid social stratification in Japanese society. Its writing system uniquely combines Chinese-derived logographic kanji (used for content words) with two phonetic syllabaries — hiragana (for grammatical elements and native vocabulary) and katakana (for loanwords and technical terms) — alongside the Western Latin alphabet (rōmaji). This plurality of scripts in a single text reflects over 1,500 years of contact with Chinese civilisation.

Homeland: Southern Africa (current distribution largely reflects original range)

Khoisan

The Khoisan languages are spoken by the San (Bushmen) and Khoekhoe (Khoikhoi) peoples of southern Africa and by two isolated languages — Sandawe and Hadza — in Tanzania. The term "Khoisan" is a convenience grouping based on two shared features: the use of click consonants (produced by the tongue or lips creating negative pressure in the mouth) and association with pre-agricultural populations of southern and eastern Africa. However, most linguists today consider Khoisan to comprise several distinct language families with no demonstrated common ancestor, rather than a single genetic unit. Despite their relatively small number of speakers (estimated at 300,000–400,000 total), the Khoisan languages are of extraordinary scientific importance. Population genetic studies have consistently identified the southern African San as the most genetically divergent of all living human populations, bearing the earliest-diverging lineages in the human family tree. The divergence between San populations and all other modern humans may date to 250,000–100,000 years ago, deep in the Middle Stone Age. This extreme genetic antiquity suggests that the ancestors of the Khoisan have occupied southern Africa for an enormous span of human prehistory, making them the custodians of one of the oldest continuous cultural and linguistic traditions on Earth. The click consonant systems of the Khoisan languages are among the most complex phonological inventories documented in any human language. The Taa language (|Xam), now nearly extinct, contains up to 164 distinct phonemes — the largest documented phoneme inventory of any language — including five distinct click types, each with multiple phonemic variations. Clicks have been borrowed into neighbouring languages: Zulu and Xhosa, Bantu languages that absorbed Khoisan contact during the southward Bantu expansion, both contain click consonants derived from this contact. Hadza and Sandawe, the two Tanzanian languages sometimes grouped with Khoisan, appear to be language isolates unrelated to each other or to the southern African Khoisan families. Hadza speakers are particularly notable: genetically, Hadza people carry one of the most divergent non-African-derived lineages and represent a very ancient East African population. Their click-rich language may preserve features of the phonological systems of early modern humans in eastern Africa, though this hypothesis requires caution — click consonants can evolve independently.

Homeland: Eastern Eurasian Steppe, eastern Mongolia / Manchuria border region

Mongolic

The Mongolic language family comprises approximately 13 languages and dialects spoken by around 6–7 million people across Mongolia, Inner Mongolia (China), Russia (Buryatia and Kalmykia), and parts of Central Asia. Despite its relatively small number of speakers, Mongolic achieved extraordinary historical reach through the Mongol Empire of the 13th–14th centuries CE — the largest contiguous land empire in history — which temporarily spread the Mongolian language as a prestige administrative language from Korea to Persia. Proto-Mongolic is thought to have originated on the Eastern Eurasian steppe, in the region of modern eastern Mongolia and the Manchuria-Mongolia border area. Ancient DNA from the Mongolian steppe confirms that the populations associated with Mongolic cultural traditions had a primarily East Asian (or Eastern Eurasian) genetic profile, distinct from the more mixed East-West Eurasian profile of earlier steppe populations such as the Xiongnu. The Mongols and their Turkic-speaking predecessors on the steppe were genetically and linguistically distinct, despite the cultural similarities imposed by a shared pastoral nomadic lifestyle. The Mongol conquests of the 13th century — Genghis Khan's unification of the steppe tribes and his descendants' campaigns across Eurasia — were among the most demographically destructive events of the medieval period. Ancient DNA from post-conquest Central Asia, Iran, and China shows increased East Asian ancestry consistent with Mongolian admixture in conquered populations. The short-lived Pax Mongolica also created the conditions for massive cultural and biological exchange across Eurasia, potentially accelerating the spread of plague (Yersinia pestis) that caused the Black Death. The Kalmyks represent a western extension of Mongolic: descendants of the Oirat Mongols who migrated to the lower Volga steppe in the 17th century, they maintain a Mongolic language and Tibetan Buddhist culture as a minority population in the Russian Federation. Buryat, spoken in Siberian Russia around Lake Baikal, is the most northern Mongolic variety and preserves archaic features. The relationship of Mongolic to Turkic and Tungusic (forming the proposed Altaic macro-family) remains debated and is rejected by most mainstream historical linguists.

Homeland: West Africa, Nigeria–Cameroon border region (broad) or the Niger Delta

Niger-Congo

The Niger-Congo language family is Africa's largest language family by number of languages (approximately 1,500–2,000) and by number of speakers (around 700 million), and one of the world's largest language families overall. It encompasses most of the languages of sub-Saharan Africa, including the Bantu languages (spoken across the southern two-thirds of the continent), the Volta-Niger languages of West Africa (including Yoruba and Igbo), the Atlantic languages (including Wolof and Fula), and many others. The family's deepest relationships remain contested. Some linguists treat Niger-Congo as a valid genetic unit; others, most notably John Williamson, have argued that the supposed family is a convenience grouping of African languages that share some features but may not all descend from a single ancestor. The Bantu subgroup — which alone contains around 500 languages and is spoken by approximately 350 million people across Central, Eastern, and Southern Africa — is uncontroversially a coherent genetic unit, with a well-reconstructed common ancestor (Proto-Bantu) and a clear homeland in the Nigeria-Cameroon border area. The Bantu expansion is one of the most significant demographic and linguistic events of the past 5,000 years. Beginning around 3000 BCE, Bantu-speaking farmers — equipped with iron tools, cattle, and cultivated sorghum and millet — spread southward and eastward from the Nigeria-Cameroon border region. Ancient DNA from East and Southern Africa confirms that this expansion involved substantial population movement: pre-Bantu hunter-gatherer populations (related to the Khoisan and forest-dwelling groups) were largely replaced or absorbed. The expansion reached southern Africa by approximately 300–500 CE and fundamentally transformed the genetic, cultural, and linguistic landscape of the continent's southern half. The Fula (Fulani) people, speakers of a Niger-Congo language (Fula belongs to the Atlantic branch), represent a distinct case: a predominantly pastoral people who spread across the Sahel from West Africa to East Africa over the past millennium, carrying their language as a trade and pastoral network language rather than as the language of farming colonisers. Yoruba and Igbo, spoken by approximately 40 million and 30 million people respectively in Nigeria, are among Africa's most numerically significant languages and have been carried to the Caribbean and Americas through the transatlantic slave trade.

Homeland: Central Sudan / Lake Chad area or the Nile Valley (Upper Nile hypothesis)

Nilo-Saharan

Nilo-Saharan is a proposed language family of approximately 200 languages spoken by roughly 50–60 million people across a broad band of northeastern and north-central Africa, from Mali and Niger in the west to Tanzania and Kenya in the east, with concentrations in Sudan, South Sudan, Chad, Uganda, and Ethiopia. The family's internal coherence is among the most contested in African linguistics: while scholars broadly agree on a core grouping, the inclusion of some branches (such as Songhay) remains disputed, and the family may prove to be a genealogical unit, an areal grouping, or several unrelated families. The most securely established subgroup within Nilo-Saharan is the Nilotic branch, which includes Dinka, Nuer, Maasai, Luo, and Acholi. The Nilotes are associated with a cattle-herding and semi-pastoral way of life and have been historically dominant across the upper Nile basin and the East African Rift Valley. Ancient DNA from East African pastoral sites dated to around 3000–1000 BCE shows a population with substantial ancestry from the middle Nile corridor, consistent with a southward expansion of Nilotic or related populations into eastern Africa. The Maasai, one of the most recognisable pastoral peoples of East Africa, speak an Eastern Nilotic language (Maa) and genetically derive from a mixture of Nilo-Saharan-related pastoralists and pre-existing East African populations. Their expansion southward into the East African Rift Valley occurred within approximately the past 1,000 years, displacing or absorbing earlier agropastoral populations. The Songhay languages of the Niger Bend (spoken in Mali and Niger, including Zarma) have a complex history: the Songhay Empire (c. 1464–1591 CE) was one of the largest pre-colonial African states, and its language spread as a trade and administrative medium across a wide region. The Nubian languages of Sudan and southern Egypt represent an ancient stratum of Nilo-Saharan, possibly related to the ancient Meroitic language of the kingdom of Kush and Meroe, though Meroitic itself remains only partially deciphered. The relationship between the modern Nubian languages and the ancient populations of the Nile corridor — who interacted directly with Pharaonic Egypt for millennia — is a subject of active research combining linguistics, archaeology, and ancient genomics.

Homeland: Levant or Northeast Africa (Northern Horn)

Semitic

The Semitic languages are a branch of the Afroasiatic family, comprising around 70–80 languages spoken by approximately 330–400 million people. They include Arabic, the world's most widely spoken Semitic language and an official language of 22 countries; Hebrew, uniquely revived as a spoken vernacular in the 19th–20th centuries after centuries as a liturgical language; and Amharic, the official language of Ethiopia and the world's second most widely spoken Semitic language by native speakers. The reconstructed homeland of Proto-Semitic remains debated. The Levantine hypothesis, supported by the distribution of early Semitic speakers in the ancient Near East (Akkadian in Mesopotamia, Canaanite-Phoenician in the Levant, Ugaritic on the Syrian coast), places its origin in or near the Fertile Crescent. The African hypothesis, supported by the presence of the most diverse and archaic Semitic languages in Ethiopia and the Horn of Africa (South Semitic subgroup), proposes a Northeastern African origin with subsequent spread into Asia. The Semitic languages have been uniquely important in human history through their association with three of the world's major religions. Hebrew is the language of the Torah and the ancient Israelite religious and literary tradition. Aramaic was the spoken language of Jesus and the lingua franca of the ancient Near East for over a millennium; it survives today in isolated communities in Syria, Iraq, and the diaspora. Arabic is the language of the Quran and became, through the Islamic conquests of the 7th–8th centuries CE, the prestige language of an empire stretching from Spain to Central Asia and a vehicle for the transmission of ancient Greek philosophy, medicine, and mathematics to the medieval world. The Semitic writing systems hold special significance in the history of literacy. The Proto-Sinaitic script (c. 1900–1800 BCE), developed by Semitic-speaking workers in Egyptian turquoise mines, was the ancestor of the Phoenician alphabet — from which virtually all alphabets in use today, including Greek, Latin, Arabic, Hebrew, and Ethiopic, descend. This single invention, originating in a Semitic-language context, transformed literacy from an elite pictographic technology to a widely learnable phonetic system.

Homeland: Yellow River basin, northern China (Sinitic) or the Tibetan Plateau / Yunnan (Tibeto-Burman)

Sino-Tibetan

The Sino-Tibetan language family is the world's second largest by number of speakers, with over 1.4 billion speakers — the vast majority of them speakers of Chinese languages, particularly Mandarin. The family divides into two principal branches: the Sinitic languages (the Chinese languages, including Mandarin, Cantonese, Wu, Min, and Hakka) and the Tibeto-Burman languages, a highly diverse group encompassing Tibetan, Burmese, and hundreds of smaller languages across the Himalayan region and mainland Southeast Asia. The family's origin and internal structure remain actively debated. Phylogenetic analyses suggest that Sino-Tibetan languages diverged from a common ancestor (Proto-Sino-Tibetan) around 7,000–6,000 years ago, associated with early millet-farming populations in the Yellow River basin of northern China. The Bayesian phylogenetic study by Zhang et al. (2019) placed the family's origin in the Yellow River region, consistent with the hypothesis that the spread of Sino-Tibetan languages was driven by the agricultural expansion of millet farmers who subsequently diversified as they moved southward and westward across China's highly varied terrain. Ancient DNA evidence supports a significant demographic expansion from northern Chinese farming populations into central and southern China, Southeast Asia, and eventually the Tibetan Plateau. The peopling of Tibet involved a distinct genetic and linguistic event: Tibetans carry a high frequency of a EPAS1 gene variant associated with high-altitude adaptation, derived from introgression from archaic Denisovan-related populations, suggesting that the earliest human inhabitants of the Tibetan Plateau were not the ancestors of Tibetan speakers but were absorbed and partially replaced by incoming Sino-Tibetan-speaking farmers. The Chinese languages, despite their enormous internal diversity in phonology (Cantonese has up to nine tones; Mandarin four), are unified by a shared logographic writing system that has served as the medium of elite literacy and administration in China for over three millennia. This writing system preserves enormous amounts of historical linguistic information — ancient pronunciation can be reconstructed from rhyme dictionaries and classical poetry — and served as the vehicle for Chinese cultural and technological transmission across East and Southeast Asia.

Homeland: Altai Mountains region / southern Siberia–Mongolia border

Turkic

The Turkic language family comprises approximately 35–40 languages spoken by around 200 million people across a vast arc from Turkey and the Balkans through Central Asia to Siberia and northwestern China. The family is notable for its extraordinary geographic range, its high degree of mutual intelligibility across many of its members (particularly the Oghuz branch including Turkish, Azerbaijani, and Turkmen), and its association with some of the most consequential steppe empires of the 1st–2nd millennia CE. Proto-Turkic is reconstructed as having been spoken in or near the Altai Mountains region of southern Siberia and Mongolia, probably in the first few centuries CE or slightly earlier. Ancient DNA from the Eastern Eurasian steppe shows that the peoples associated with early Turkic cultural expansion had a genetic profile mixing East Asian (Mongolian-related) ancestry with smaller components of earlier steppe populations (Saka and Xiongnu-related). The rapid expansion of Turkic languages across Eurasia from the 6th century CE onward was driven primarily by the political and military dominance of Turkic-speaking confederacies — the Göktürks, Uyghur Khaganate, Khazars, Pechenegs, Cumans, and eventually the Seljuk and Ottoman Empires. The Seljuk and Ottoman expansions carried the Oghuz branch of Turkic into Anatolia, Iran, and Azerbaijan from the 11th century CE onward. In Anatolia, Turkification was a linguistic replacement of a predominantly Greek and Armenian-speaking region, driven by elite takeover, urbanisation, and gradual population movement over several centuries rather than wholesale replacement — ancient DNA from Ottoman-period Anatolia confirms substantial continuity with pre-Turkic Byzantine-era populations. The Ottoman Empire made Turkish the language of administration, law, and prestige across a vast territory from Hungary to Iraq for five centuries. The Siberian Turkic languages — including Sakha (Yakut), spoken as far north as the Arctic Circle, and Tuvan — document a remarkable northward expansion across some of the world's most extreme environments. Sakha speakers arrived in the Lena River basin around 1000–1300 CE and rapidly adapted to subarctic conditions, developing a distinctive pastoral and horse-herding culture. Their genetic profile combines Turkic steppe ancestry with substantial admixture from Siberian forager populations absorbed during this northward expansion.

Homeland: Ural Mountains region (Siberia–Europe boundary), possibly Western Siberia

Uralic

The Uralic language family comprises approximately 40 languages spoken by around 25 million people, primarily in northern Europe and Siberia. Its two largest languages — Finnish and Hungarian — are spoken in Europe but are conspicuously unrelated to the surrounding Indo-European languages, a fact that puzzled European scholars for centuries before the demonstration of a Uralic family relationship in the 18th century. Other Uralic languages include Estonian, the Sami languages of northern Scandinavia, and a range of smaller languages spoken across the Ural Mountains and western Siberia. Proto-Uralic is reconstructed as having been spoken in the vicinity of the Ural Mountains — the geographical boundary between Europe and Asia — around 7,000–4,000 BCE, making it roughly contemporary with Proto-Indo-European. The early Uralic speakers were likely hunter-gatherers or mixed forager-herder communities adapted to the boreal forest and forest-steppe environments of the Ural region. The reconstructed vocabulary of Proto-Uralic includes terms for fishing, forest-dwelling, and cold-climate adaptation, consistent with this environment. Hungarian (Magyar) represents the most dramatic example of linguistic long-distance migration in European history. The Magyars were a nomadic people of the Eurasian steppe who spoke a Finno-Ugric language (distantly related to Finnish and Estonian within Uralic). Following a migration westward from the Ural region that brought them successively through the Pontic steppe, they settled the Carpathian Basin around 895 CE — an intrusion of a Uralic-speaking steppe people into the heart of Slavic and Romance-speaking Europe. Ancient DNA from Magyar burials of the Conquest Period shows substantial East Asian (steppe) ancestry in the elite burials, confirming the migration, while modern Hungarians show much lower East Asian proportions, indicating later admixture with surrounding European populations. The Sami languages, spoken by the Sámi people of Lapland across northern Norway, Sweden, Finland, and Russia's Kola Peninsula, represent a distinct branch of Uralic adapted to subarctic conditions. The Sámi are genetically and historically associated with the pre-farming populations of northern Europe: ancient DNA from Scandinavian Mesolithic hunter-gatherers shows genetic affinity with modern Sámi, suggesting that the Sámi preserve, albeit with later admixture, a genetic signal from Europe's earliest post-glacial inhabitants. Nenets and other Samoyedic languages of Siberia, spoken by reindeer herders of the Arctic, represent the most linguistically archaic branch of Uralic and are distributed closest to the reconstructed Proto-Uralic homeland.