The “South Semitic” sprachbund

In linguistics, a sprachbund (German for ‘language league’) is any group of languages which are not necessarily closely related but which nevertheless have many similar features as a result of proximity and language contact. A very well-known example is the Balkan sprachbund, comprised of such languages as Albanian, Greek, Romanian, and Bulgarian. Although these languages come from separate branches of the Indo-European language family, the long history of contact between speakers within a relatively small geographic region has resulted in a convergence on a number of features such as the formation of the future tense, the loss of the infinitive verb, and common vocabulary.

Some scholars have argued that within the Semitic language family, there existed a South Semitic sprachbund, which consisted of Arabic, Ṣayhadic (Sabaean, Qatabānian, etc.), the Modern South Arabian languages (Soqotri, Mehri, etc.), and the Ethiopic languages (Gǝ‘ǝz, Amharic, Gurage, etc.). Throughout most of the 20th century, it was assumed that the unique features that these languages shared was because of a recent common ancestor. However, currently the prevailing assumption is that they are a result of extensive and prolonged interaction between their respective speakers.

Usually, three features are highlighted: 1) the universal change of Proto-Semitic *p > f; 2) broken/internal plurals; and 3) the L-stem form of the verb. I will discuss these below.


The Proto-Semitic language is assumed to have had a *p consonant; however, within the South Semitic sprachbund, this consonant changed, or spirantized, to ‹f›. Thus, Hebrew ‹pēḥām› ‘coal’ corresponds to Arabic ‹faḥm›, Tigrinya ‹fäḥam›, Amharic ‹fǝm›, and Soqotri ‹fḥam›.

The *p > f shift does occur in other Semitic languages as well. In Hebrew and Aramaic when ‹p› follows a vowel, it becomes ‹f›. For example, contrast Hebrew ‹pārastî› ‘I spread’ with ‹efrôs› ‘I will spread.’ In the South Semitic sprachbund, by contrast, this change was unconditional; that is, it affected every occurrence of Proto-Semitic *p regardless of its position within the word.

Incidentally, many Ethiopic languages regained the ‹p› and additionally acquired an emphatic (i.e. ejective) form of the consonant ‹ṗ›, but the words in which these two sounds occur are mostly borrowings from other languages. The letters in the Gǝ‘ǝz script used to represent ‹p› and ‹ṗ› are modified from the ‹t› and the ‹ṣ› letters, respectively. For example, ተ ‹tä› versus ፐ ‹pä› and ጸ ‹ṣä› versus ጰ ‹ṗä›.


Across the Semitic languages, the most common method of pluralization is suffixation. For example, in Akkadian ‹šarr-um› ‘king’ is pluralized by dropping the singular ‹-um› and adding the plural ‹-ū› for ‹šarr-ū› ‘kings.’ The members of the South Semitic sprachbund also exhibit this form of pluralization. For example, some Tigrinya nouns add the plural suffix ‹-at›; thus ‹säb› ‘person’ > <säb-at› ‘persons.’ This is known as the external plural.

Within the sprachbund, there is a second type of pluralization called broken or internal plurals. Recall from my previous posts that Semitic morphology, which is the formation of words and other grammatical units, is based on a root-pattern system. For example, the Gǝ‘ǝz root ‹ṣḥf› means “write.” Inserting the pattern |maC₁C₂aC₃| into this root resulted in ‹maṣḥaf› which means ‘book.’ However, this word was not pluralized by the addition of a suffix. Instead an entirely different pattern was applied to the root of the noun itself, specifically |maC₁āC₂ǝC₃t| which resulted in ‹maṣāḥǝft› ‘books.’ The name for this type of plural comes from the fact that the singular form of the verb is “broken” apart by the insertion of additional vowels and consonants.

Although there is residual evidence of the broken plural in other Semitic languages, only the South Semitic sprachbund exhibits its extensive use. Additionally, not only do the sprachbund members utilize the broken plural, they also share a number of identical pluralization patterns. Perhaps, the most common such pattern is |’VCCV̄C| (where V stands for a short vowel and V̄ for a long vowel). Examples include the following:

  • Arabic ‹wazn› ‘weight’ > ‹’awzān›
  • Gǝ‘ǝz ‹faras› ‘horse’ > <’afrās›
  • Tigre ‹mǝdǝr› ‘land’ > ‹’amdār›
  • Ṣayhadic * ‹hgr› ‘town’ > ‹’hgr›
  • Harsusi ‹ḥamθ› ‘lower belly’ > ‹eḥmōθ›
  • Shehri ‹ḥarf› ‘gold coin’ > ‹ɔḥrɔf›

Despite the preceding examples. the situation in the Ethiopic languages has changed over the centuries. This language group is divided into two subgroups, North (Gǝ‘ǝz, Tigrinya, and Tigre) and South (Amharic, Gurage, Argobba, and so on). Only the North group maintains extensive use of the broken plural. The Gurage languages have almost entirely lost it, and in Amharic what broken plurals exist have been directly borrowed from Gǝ‘ǝz. The loss of this feature is most likely a result of a separate sprachbund with the surrounding Cushitic and Omotic languages, which generally have external plurals.


Like the nouns above, verbs are also formed through the root-pattern system. The Semitic languages all have different templates that provide nuanced changes to the meaning of the verb root. For example in Ḥarsusi and Mehri, two Modern South Arabian languages, applying the root ḳ-f-d to the basic verb pattern |C₁ǝC₂ōC₃| results in ‹ḳǝfōd› ‘to descend,’ while the causative pattern |aC₁C₂ōC₃| results in ‹aḳfōd› ‘to put down (i.e. to cause to descend).’ The latter is known as the C-stem (“C” for causative), and variations of it are exhibited throughout all the Semitic languages. 

In the South Semitic sprachbund, there is a unique pattern known as the L-stem, which features the lengthening of the first vowel from the basic verb form: |C₁aC₂aC₃| > |C₁āC₂aC₃|. In Arabic, this pattern typically denotes the involvement of another person in some sort of reciprocal fashion: ‹qatala› ‘he killed’ vs ‹qātala› ‘he fought (i.e. he killed another)’; ‹kataba› ‘he wrote’ vs ‹kātaba› ‘he corresponded (i.e. he wrote to another).’ There is also a variant of this pattern |taC₁āC₂aC₃|, which denotes a reflexive or reciprocal meaning: ‹taqātal-ū› ‘they fought each other.’

In Gǝ‘ǝz, there is no special meaning to this pattern: ‹bāraka› ‘he blessed’ and ‹māsana› ‘he perished.’ Furthermore, the |taC₁āC₂aC₃| pattern is simply the passive form: ‹tabāraka› ‘he was blessed.’ However, some basic verb forms take on this L-stem passive to create a reciprocal: ‹ḳatala› ‘he killed’ > ‹taḳātal-u› ‘they fought/killed each other,’ identical in meaning and form to the Arabic تقاتلوا ‹taqātal-ū›.

Based on currently available information, the L-stem can only be attributed to Arabic and Ethiopic. The Ṣayhadic scripts did not mark vowels, so it is unclear from the surviving texts if they had an L-stem, while the Modern South Arabian languages, if they ever had the L-stem at all, have merged it with another verb pattern. Nevertheless, the existence of the L-stem in both Arabic and Ethiopic and its use in similar ways points to the effects of the South Semitic sprachbund.


Originally the above features were so convincing to scholars that they placed the members of this sprachbund into one subfamily within Semitic and called it the South Semitic branch. This was the case throughout most of the 20th century, when the classification looked something like this:

However, starting in the 1970s, scholars began to reanalyze their assumptions. Some argued that despite the similarities, there were still important differences within this “South Semitic branch,” including the following:

  1. A conjugation pattern of the imperfective tense (representing both present and future) in Arabic and Ṣayhadic closely resembling the Northwest Semitic forms, which differ from all other varieties of Semitic languages;
  2. Grammatical rules governing the definite article “the” that were identical in Arabic and Northwest Semitic languages; and
  3. the formation of the tens (i.e. twenty, thirty, forty, etc.) based on a noun plural suffix ‹-îm/-īn/-ūn› as opposed to the general Semitic ‹-ā› found in Ethiopic and Akkadian.

These and several other important features together suggested a common Central Semitic subfamily. Since then, there have been many revisions of the traditional classification. The following is one current example taken from Huehnergard and Rubin (2011):

In this tree, Arabic and Ṣayhadic are moved from South Semitic into a “Central Semitic” branch along with Aramaic and Canaanite, while Modern South Arabian and Ethiopic/Ethiopian each get their own separate branches. Currently, the majority of scholars subscribe to some variation of this tree.

In any case, whether a revised classification or the traditional one is correct, it is undeniable that the Semitic languages of Arabia and East Africa interacted with and influenced each other over a significant period of time. What is most remarkable is that the evidence of those interactions can still be observed all of these centuries later.

* The Ṣayhadic script did not mark vowels so the existence of this particular pluralization pattern, in which the second vowel is long is based on conjecture.


Huehnergard, J. (2005). “Features of Central Semitic.” In A. Gianto, Biblical and Oriental Essays in Memory of William L. Moran

Huehnergard, J. and Rubin, A. (2011). “Phyla and Waves: Models of Classification of the Semitic Languages.” In S. Weninger (ed.), The Semitic Languages: An International Handbook.

Lipiński, E. (1997). Semitic Languages: Outline of a Comparative Grammar.

Ratcliffe, R. R. (1998). “Defining Morphological Isoglosses: The ‘Broken’ Plural and Semitic Subclassification.” Journal of Near Eastern Studies.

Simeone-Senelle, M-C. (1997). “The Modern South Arabian Languages.” In R. Hetzron (ed.), The Semitic Languages.