The “South Semitic” sprachbund

In linguistics, a sprachbund (German for ‘language league’) is any group of languages which are not necessarily closely related but which nevertheless have many similar features as a result of proximity and language contact. A very well-known example is the Balkan sprachbund, comprised of such languages as Albanian, Greek, Romanian, and Bulgarian. Although these languages come from separate branches of the Indo-European language family, the long history of contact between speakers within a relatively small geographic region has resulted in a convergence on a number of features such as the formation of the future tense, the loss of the infinitive verb, and common vocabulary.

Some scholars have argued that within the Semitic language family, there existed a South Semitic sprachbund, which consisted of Arabic, Ṣayhadic (Sabaean, Qatabānian, etc.), the Modern South Arabian languages (Soqotri, Mehri, etc.), and the Ethiopic languages (Gǝ‘ǝz, Amharic, Gurage, etc.). Throughout most of the 20th century, it was assumed that the unique features that these languages shared was because of a recent common ancestor. However, currently the prevailing assumption is that they are a result of extensive and prolonged interaction between their respective speakers.

Usually, three features are highlighted: 1) the universal change of Proto-Semitic *p > f; 2) broken/internal plurals; and 3) the L-stem form of the verb. I will discuss these below.


The Proto-Semitic language is assumed to have had a *p consonant; however, within the South Semitic sprachbund, this consonant changed, or spirantized, to ‹f›. Thus, Hebrew ‹pēḥām› ‘coal’ corresponds to Arabic ‹faḥm›, Tigrinya ‹fäḥam›, Amharic ‹fǝm›, and Soqotri ‹fḥam›.

The *p > f shift does occur in other Semitic languages as well. In Hebrew and Aramaic when ‹p› follows a vowel, it becomes ‹f›. For example, contrast Hebrew ‹pārastî› ‘I spread’ with ‹efrôs› ‘I will spread.’ In the South Semitic sprachbund, by contrast, this change was unconditional; that is, it affected every occurrence of Proto-Semitic *p regardless of its position within the word.

Incidentally, many Ethiopic languages regained the ‹p› and additionally acquired an emphatic (i.e. ejective) form of the consonant ‹ṗ›, but the words in which these two sounds occur are mostly borrowings from other languages. The letters in the Gǝ‘ǝz script used to represent ‹p› and ‹ṗ› are modified from the ‹t› and the ‹ṣ› letters, respectively. For example, ተ ‹tä› versus ፐ ‹pä› and ጸ ‹ṣä› versus ጰ ‹ṗä›.


Across the Semitic languages, the most common method of pluralization is suffixation. For example, in Akkadian ‹šarr-um› ‘king’ is pluralized by dropping the singular ‹-um› and adding the plural ‹-ū› for ‹šarr-ū› ‘kings.’ The members of the South Semitic sprachbund also exhibit this form of pluralization. For example, some Tigrinya nouns add the plural suffix ‹-at›; thus ‹säb› ‘person’ > <säb-at› ‘persons.’ This is known as the external plural.

Within the sprachbund, there is a second type of pluralization called broken or internal plurals. Recall from my previous posts that Semitic morphology, which is the formation of words and other grammatical units, is based on a root-pattern system. For example, the Gǝ‘ǝz root ‹ṣḥf› means “write.” Inserting the pattern |maC₁C₂aC₃| into this root resulted in ‹maṣḥaf› which means ‘book.’ However, this word was not pluralized by the addition of a suffix. Instead an entirely different pattern was applied to the root of the noun itself, specifically |maC₁āC₂ǝC₃t| which resulted in ‹maṣāḥǝft› ‘books.’ The name for this type of plural comes from the fact that the singular form of the verb is “broken” apart by the insertion of additional vowels and consonants.

Although there is residual evidence of the broken plural in other Semitic languages, only the South Semitic sprachbund exhibits its extensive use. Additionally, not only do the sprachbund members utilize the broken plural, they also share a number of identical pluralization patterns. Perhaps, the most common such pattern is |’VCCV̄C| (where V stands for a short vowel and V̄ for a long vowel). Examples include the following:

  • Arabic ‹wazn› ‘weight’ > ‹’awzān›
  • Gǝ‘ǝz ‹faras› ‘horse’ > <’afrās›
  • Tigre ‹mǝdǝr› ‘land’ > ‹’amdār›
  • Ṣayhadic * ‹hgr› ‘town’ > ‹’hgr›
  • Harsusi ‹ḥamθ› ‘lower belly’ > ‹eḥmōθ›
  • Shehri ‹ḥarf› ‘gold coin’ > ‹ɔḥrɔf›

Despite the preceding examples. the situation in the Ethiopic languages has changed over the centuries. This language group is divided into two subgroups, North (Gǝ‘ǝz, Tigrinya, and Tigre) and South (Amharic, Gurage, Argobba, and so on). Only the North group maintains extensive use of the broken plural. The Gurage languages have almost entirely lost it, and in Amharic what broken plurals exist have been directly borrowed from Gǝ‘ǝz. The loss of this feature is most likely a result of a separate sprachbund with the surrounding Cushitic and Omotic languages, which generally have external plurals.


Like the nouns above, verbs are also formed through the root-pattern system. The Semitic languages all have different templates that provide nuanced changes to the meaning of the verb root. For example in Ḥarsusi and Mehri, two Modern South Arabian languages, applying the root ḳ-f-d to the basic verb pattern |C₁ǝC₂ōC₃| results in ‹ḳǝfōd› ‘to descend,’ while the causative pattern |aC₁C₂ōC₃| results in ‹aḳfōd› ‘to put down (i.e. to cause to descend).’ The latter is known as the C-stem (“C” for causative), and variations of it are exhibited throughout all the Semitic languages. 

In the South Semitic sprachbund, there is a unique pattern known as the L-stem, which features the lengthening of the first vowel from the basic verb form: |C₁aC₂aC₃| > |C₁āC₂aC₃|. In Arabic, this pattern typically denotes the involvement of another person in some sort of reciprocal fashion: ‹qatala› ‘he killed’ vs ‹qātala› ‘he fought (i.e. he killed another)’; ‹kataba› ‘he wrote’ vs ‹kātaba› ‘he corresponded (i.e. he wrote to another).’ There is also a variant of this pattern |taC₁āC₂aC₃|, which denotes a reflexive or reciprocal meaning: ‹taqātal-ū› ‘they fought each other.’

In Gǝ‘ǝz, there is no special meaning to this pattern: ‹bāraka› ‘he blessed’ and ‹māsana› ‘he perished.’ Furthermore, the |taC₁āC₂aC₃| pattern is simply the passive form: ‹tabāraka› ‘he was blessed.’ However, some basic verb forms take on this L-stem passive to create a reciprocal: ‹ḳatala› ‘he killed’ > ‹taḳātal-u› ‘they fought/killed each other,’ identical in meaning and form to the Arabic تقاتلوا ‹taqātal-ū›.

Based on currently available information, the L-stem can only be attributed to Arabic and Ethiopic. The Ṣayhadic scripts did not mark vowels, so it is unclear from the surviving texts if they had an L-stem, while the Modern South Arabian languages, if they ever had the L-stem at all, have merged it with another verb pattern. Nevertheless, the existence of the L-stem in both Arabic and Ethiopic and its use in similar ways points to the effects of the South Semitic sprachbund.


Originally the above features were so convincing to scholars that they placed the members of this sprachbund into one subfamily within Semitic and called it the South Semitic branch. This was the case throughout most of the 20th century, when the classification looked something like this:

However, starting in the 1970s, scholars began to reanalyze their assumptions. Some argued that despite the similarities, there were still important differences within this “South Semitic branch,” including the following:

  1. A conjugation pattern of the imperfective tense (representing both present and future) in Arabic and Ṣayhadic closely resembling the Northwest Semitic forms, which differ from all other varieties of Semitic languages;
  2. Grammatical rules governing the definite article “the” that were identical in Arabic and Northwest Semitic languages; and
  3. the formation of the tens (i.e. twenty, thirty, forty, etc.) based on a noun plural suffix ‹-îm/-īn/-ūn› as opposed to the general Semitic ‹-ā› found in Ethiopic and Akkadian.

These and several other important features together suggested a common Central Semitic subfamily. Since then, there have been many revisions of the traditional classification. The following is one current example taken from Huehnergard and Rubin (2011):

In this tree, Arabic and Ṣayhadic are moved from South Semitic into a “Central Semitic” branch along with Aramaic and Canaanite, while Modern South Arabian and Ethiopic/Ethiopian each get their own separate branches. Currently, the majority of scholars subscribe to some variation of this tree.

In any case, whether a revised classification or the traditional one is correct, it is undeniable that the Semitic languages of Arabia and East Africa interacted with and influenced each other over a significant period of time. What is most remarkable is that the evidence of those interactions can still be observed all of these centuries later.

* The Ṣayhadic script did not mark vowels so the existence of this particular pluralization pattern, in which the second vowel is long is based on conjecture.


The Semitic languages

The Semitic languages are part of a language family called Afro-Asiatic, which among others includes the Berber/Tamazight, Cushitic, and (ancient) Egyptian languages. The Semitic languages are assumed to have descended from a single source, which is called Proto-Semitic. There is no record of this language, but scholars have been able to piece a lot of information about it from evidence in the daughter languages.


Proto-Semitic and its close relative Proto-Berber were most likely spoken somewhere in Northeastern Africa. Around 3500 BCE, driven by the desertification that would create the Sahara Desert, the speakers of Proto-Semitic migrated east into the Levant, where their presence led to the collapse of the indigenous cultures that existed there. It seems that the Semites didn’t emigrate all at once but rather in waves. Some of them ended up in northern Syria, some in Iraq, others in the Levant and the northern Arabian Peninsula, and still others in the southern Arabian Peninsula and across the Red Sea into East Africa.


Bronze head of an Akkadian ruler, probably Sargon the Great, c. 23rd – 22nd century BCE. Source.

These migration patterns led to the divisions within Semitic. There are many competing theories regarding the classification of these divisions. The most common divides Semitic into East and West groups (Huehnergard and Rubin: 2011). The East group, composed of Eblaite, Akkadian, and Babylonian, died out in the 8th century BCE. The West group is divided into three subgroups. The first is Central Semitic, which is further divided into Northwest Semitic — composed of Aramaic, Ugaritic, and Canaanite, Arabic, and Ṣayhadic. The second is Ethiopic, which is composed of the Semitic languages spoken in eastern Africa. The last subgrouping is the Modern South Arabian languages, which are spoken in the southern Arabian Peninsula.

Other scholars propose theories that significantly deviate from this model. Lipiński (1997), for example, argues that there are four not two macro-divisions. According to him, the Semitic that was spoken in northern Syria developed into the North Semitic branch (composed of Ugaritic and Amorite), in Iraq into the East Semitic branch (Akkadian and Babylonian), in the Levant and northern Arabia into the West Semitic branch (e.g. Arabic, Aramaic, and Canaanite), and finally in southern Arabia and East Africa into the South Semitic branch (Ṣahyadic, Ethiopic, and Modern South Arabian). It should be noted that this is a highly idiosyncratic view that is not widely accepted.

Whichever is the correct division, the largest number of living Semitic languages can be found in East Africa, including Amharic, Gurage, Tigre, and Tigrinya. Outside of that region, the most common Semitic language is Arabic and its highly diverse spoken dialects. Additionally, there are Modern Hebrew; the Neo-Aramaic languages, like Assyro-Chaldean, Turoyo, and Neo-Mandaic; and the Modern South Arabian languages, like Soqotri, Mehri, and Shehri.


In order for a group of languages to constitute a “family,” they must share a large number of unique linguistic features that cannot be attributed to mere borrowings or simultaneous development through contact between speakers. The following is a sampling of the unique features that define Semitic languages.



Maimonides’ autograph draft of his legal code, Mishneh Torah (from the Cairo Genizah), in cursive Sephardic script (Egypt, c. 1180). Source.

All Semitic languages have or had a series of “emphatic” consonants. In proto-Semitic there were at least five ‹ṭ, ḳ, ṱ, ṣ, ṣ́›. Only (standard) Arabic has maintained this series. The Canaanite languages like Phoenician and Hebrew, only had three, having merged ‹ṱ› and ‹ṣ́› with ‹ṣ›. Ethiopic languages also merged these consonants, but many of them also developed new emphatics, such as ‹ṗ› and ‹č̣›.

The term “emphatic” is necessarily imprecise because these consonants are realized differently in the daughter languages. Originally, they were most likely ejective consonants. Only the Ethiopic and Modern South Arabian languages preserve this pronunciation today. In Arabic and most Neo-Aramaic languages, they are pharyngealized (click here to listen to the difference between plain and emphatic consonants in Arabic). In Maltese and Modern Hebrew, the emphatic consonants have been lost under the influence of European languages.


Every Semitic language has two genders, masculine and feminine. The masculine is usually the base form, while the feminine is indicated with a suffix.



Gospel of Luke in Ge‘ez, from the Church of Gännätä Maryam, c. 1500. Source.

The feminine is marked by the suffix ‹-t›. Examples include Akkadian ‹šarr-at-› “queen,” Arabic ‹bint› “daughter,” Gǝ‘ǝz ‹barakat› “blessing,” Hebrew ‹rē’šī› “beginning.” Within the Afro-Asiatic family, this is not unique to Semitic languages. The Berber languages, for example, also mark the feminine with ‹t›, but there it is a circumfix (appearing at the beginning and end of the word). Thus, ‹amaziɣ› ‘Amazigh man’ is masculine, and ‹tamaziɣt› ‘Amazigh woman’ is feminine.

In a number of Central Semitic languages, like Arabic and Hebrew, this suffix was deleted in isolated words, but reappeared if the word was part of a phrase. For example, in Arabic ‘writing,’ feminine noun, is ‹kitāba›; however, ‘a boy’s writing’ is ‹kitābat walad›. Similarly, in Modern Hebrew these are ‹ktiva› and ‹ktivat yéled›.


Semitic languages characteristically divide the second person pronoun into masculine and feminine forms. Examples of the singular forms of “you,” respectively, include Akkadian ‹atta, atti›, Arabic ‹’anta, ’anti›, Geʻez ‹’ānta, ’ānti›, and Hebrew ‹’attā, ’at›. Separate forms also exist in the plural pronouns.

However, in some modern languages and dialects this distinction has been lost or reduced. In many Arabic dialects and other languages like Harari, spoken in Ethiopia, the second person plural no longer distinguishes between gender. Others such as Maltese and Tunisian Arabic have lost the distinction in the singular as well.


The vast majority of Semitic lexicons are composed of abstract roots of three, or sometimes four, consonants. Words are formed by applying these roots to different patterns of vowels and consonants.


Folio from the “Blue Qur’ān.” Second half 9th–mid-10th century CE. Source.

For example, in Arabic the root k-t-b denotes ‘write.’ By itself, it cannot be used in a sentence. However, applying it to the pattern C₁āC₂iC₃, which means ‘doer of [root],’ results in ‹kātib› ‘writer.’ Applying it to the pattern maCC₂aC₃, ‘place of [root],’ results in ‹maktab› ‘desk, office’ (literally, a place where one writes). Other words formed from this root include ‹maktūb› ‘letter’; ‹kitāba› ‘writing’; ‹kātaba› ‘he corresponded (with)’; and ‹istiktāb› ‘dictation.’

This system is very flexible, and it is possible to create new roots from existing words and even from foreign languages. For example, the root ’-m-r-k originates from “America” and means “Americanize.” Thus applying it an existing verb pattern for 4-consonant roots ‹taC₁aCC₃aC₄a› results in ‹taamraka› “he became American.”

These are only a few of the features that distinguish Semitic languages. There many others, such as a verb conjugation system originally centered around aspect rather than tense; object and possessive pronouns as suffixes; and the dual number in verbs, nouns, and adjectives. These subjects are for another day perhaps.

I leave you with a side-by-side comparison of hypothesized Proto-Semitic words and their attested forms in four daughter languages (color-coded according to which branch of Semitic they belong to):

FATHER *’ab- ab- ’ab- ’āḇ ’ab
MOTHER *’imm umm- ’umm- ’imma ǝmm
GOD *’il(-āh-)- il- ’ilāh- ’ēl(ōh)
HOUSE *bayt bīt- bayt- bayiṯ, bēṯ bet
ROPE *ḥabl- ebl- ḥabl- ḥeḇel ḥabl
PEACE *s₁alām- šalām- salām- šālōm salām
WATER *māy- mū- mā’- mayim māy
BLOOD *dam- dam- dam- dām dam
EAR *’uḏn- azn-/uzn- ’uḏn- ’ōzen ’ǝzn
EYE *‘ayn- īn- ‘ayn- ‘ayin, ‘ēn ‘ayn
HAND *yad- id- yad- yāḏ ’ǝd
TONGUE *lis₁ān- lišān- lisān- lāšōn lǝssān
TOOTH *s₁inn- šinn- sinn- šēn sǝnn
BULL/OX *ṯawr- šūr- ṯawr- šōr sor
HORN *ḳarn- qarn- qarn- qeren ḳarn


