On the etymology of شيطان ‘Satan’ in Arabic


An illustration of Archangel Michael defeating the Devil from an 18th century Ethiopian psalter. Source.

The common assumption about the Arabic word for Satan شَيْطان ‹šayṭān› is that it ultimately comes from the Hebrew שָׂטָן ‹śāṭān›. Although the difference between Arabic ش ‹š› and Hebrew שׂ ‹ś› can be explained as a product of regular sound change, the Arabic form nevertheless presents a conundrum because the same ‹ā› vowel in the Hebrew form is realized as two separate forms: ‹ay› and ‹ā›. Unfortunately, there is no rule of regular vowel correspondence that would explain this discrepancy, so we must look to other factors.

It has been noted that there was a tradition of very early Arabic orthography to indicate the word-medial ‹ā› with a “tooth” ـىـ (essentially a dotless yā’ ـيـ). However, the theory goes, this tradition was eventually lost, and when dots were introduced into the Arabic script, the “tooth” was reinterpreted as the letter yā’. This would explain why in Arabic Abraham is pronounced, uniquely out of the all the Semitic languages, as ‹’ibrahīm›: the long a sound was misinterpreted as a long i sound by the presence of the reanalyzed “tooth.” However, scholars such as Donner (2008:37-38) have cast doubt on this theory.

Whatever its truth, it does not adequately explain the form ‹šayṭān› because it cannot account for why, although both long vowels occur word-medially, only one of them underwent re-analyzation. Kropp (2007) suggests that we should look to the southern branch of Semitic, specifically Ge‘ez, as the (at least partial) source of the Arabic form. The Gospels were first translated into Ge‘ez from Greek in the fourth century CE, and in these texts Satan was transcribed as ሠይጣን ‹šäyṭan› (and later ሰይጣን ‹säyṭan›). Kropp does not satisfactorily explain why the first vowel diphthongized to ‹ay›, but he cites one source that suggests it was a pejorative form. But the Ge‘ez phonological explanation is secondary to the Arabic etymology, and it is sufficient to say that this was the Ge‘ez form and that Arabs, being in close physical and cultural contact with the Semitic-speaking populations of East Africa, were aware of this.


An Islamic depiction of the Devil by Mehmed ibn Emīr Hasan al-Su’ūdī, 1582. Source.

At the same time, Arabic had an unrelated, native word شطن ‹šaṭan› (from root ‹šṭn›), which meant ‘rope.’ By metaphorical extension, it also had a secondary meanings of ‘snake’ in the form of ‹šayṭān›. It should be noted that the root pattern for this word — C1ayC2āC3 — is a regular one in Arabic; e.g., compare بيداء ‹baydā› ‘desert’ to بدء ‹bad’› ‘beginning.’ ‹šayṭān› was a not uncommon word and was even attested as a tribal name.

With the advent of Islam, the concept of Satan, the prime adversary of God, was for the first time widely introduced into the Arabic language. Based on the influence of Ge‘ez where ሠይጣን ‹šäyṭan› as Satan already existed and because of its Arabic meaning as ‘snake,’ Kropp argues that شيطان ‹šayṭān› was a convenient candidate for the name of this entity, and the original meaning eventually gave way to the new one.

There is further evidence that Ge‘ez was essential to the formation of this word in the Qur’ānic phrase الشيطان الرجيم ‹l-šayṭān l-rajīm› ‘stoned Satan’ — i.e. ‘stoned’ in the sense of ‘pelted with stones’. The root of ‹rajīm› — ‹rjm› — is common throughout the Semitic languages, and it exhibits some semantic diversity. The sense of ‘stone (verb)’ is present in Arabic, Hebrew, and Aramaic, for example, but in Akkadian it meant ‘call, sue in court’ and in Ge‘ez ‘curse.’ Indeed the Ge‘ez Bible refers to the snake in the Garden of Eden as ርግምት ‹rǝgǝmt› ‘cursed,’ and there is a reconstructed phrase ሠይጣነ ረጊሞ ‹šäyṭanä rägimo› ‘by cursing Satan.’ Thus the cultural exchange between the Arabian Peninsula and East Africa may have facilitated the calquing of the phrase ‹l-šayṭān l-rajīm› from Ge‘ez into Arabic.


“The Garšūnī language”


My involvement in cataloging Syriac and Arabic manuscripts over the last few years has impressed upon me how often and actively Syriac Orthodox and Chaldean scribes (and presumably, readers) used Garšūnī: it is anything but an isolated occurrence in these collections. This brings to the fore questions of how these scribes and readers thought about Garšūnī. Did they consider it simply a writing system, a certain kind of Arabic, or something else? At least a few specific references to “Garšūnī” in colophons may help us answer them. Scribes sometimes make reference to their transcriptions from Arabic script into Syriac script, and elsewhere a scribe mentions translation “from Garšūnī into Syriac” (CFMM 256, p. 344; after another text in the same manuscript, p. 349, we have in Arabic script “…who transcribed and copied [naqala wa-kataba] from Arabic into Garšūnī”). Such statements show that scribes certainly considered Arabic and Garšūnī distinctly.

The development of verbal negation in Arabic


Different versions of the ligature lām-alif لا, meaning “no,” in the Kufic calligraphic style. Source.

Standard Arabic, both Modern and Classical/Qur’ānic forms, is well-known for having very complex synthetic grammar, consisting of a noun case system, verbal moods indicated by a suffix vowel, a passive form indicated by internal vowel alterations, and so on. In modern spoken Arabic, on the other hand, the grammar is in many instances simpler and less dependent on these types of grammatical constructions.

To illustrate the difference between Standard and modern spoken Arabic, we can look to how they indicate the negative ‘not.’ Standard Arabic has five ways to negate a verb — ‹lays-›, ‹mā›, ‹lā›, ‹lam›, and ‹lan› — each with their own grammatical rules. Modern spoken Arabic, on the other hand, has only one ‹mā› — although it should be noted that the rules governing it vary from dialect to dialect.

Below I will provide a brief description of the grammar of verbal negation in Standard Arabic followed by the development and grammar of ‹mā› in the spoken dialects.


‹lays-› ‘to not be’

Arabic is a zero-copula language, i.e., there is no explicit signal to indicate the relationship between the subject of a sentence and its predicate. In plain language, this means that the verb “to be” is not used in the present tense. Because there is no verb, sentences with a zero copula are referred to as nominal sentences (as opposed to verbal sentences). For example, ‘the elephant is big’ is literally ‘the elephant big’:

l-fīl-u kabīr-∅-un
DEF-elephant-NOM big-MNOM.INDEF

the elephant is big

Note that both the subject and predicate are in the nominative case (indicated by the definite and indefinite suffixes ‹-u› and ‹-un)›.

To negate a nominal sentence, a special verb ‹lays-›, which means ‘to not be,’ is inserted before the predicate. ‹lays› is an irregular and defective verb: it exists only in the imperfective aspect (essentially the non-past tense) but is conjugated with perfective markers. Additionally, it forces the predicate to take on accusative case (marked below by ‹-an›):

l-fīl-u lays-a kabīr-∅-an
DEF-elephant-NOM not.be-3.M.SG big-MACC.INDEF

the elephant is not big

‹mā› + perfective

The negative marker ‹mā› is used only to negate verbs in the perfective aspect (for purposes of this post, perfective aspect is basically the past tense). In comparison to the other negative markers, the rules governing its use are very simple, requiring only that it be placed before the verb and causing no other alterations:

’akal-a l-fīl-u
eat.PFV-3.M.SG DEF-elephant-NOM

the elephant ate

’akal-a l-fīl-u
not.PFV eat.PFV-3.M.SG DEF-elephant-NOM

the elephant did not eat

‹lā› + imperfective indicative

In Standard Arabic, the negative marker ‹lā› holds many functions. In addition to being the word for ‘no’ and indicating exception (e.g. ‹lā ’ilāh-a ’illā llāh-u› ‘there is no god but God’), it also negates a verb in the imperfective aspect in the indicative mood (essentially, the non-past tense). Like ‹mā›, it does not alter the verb:

ya-’kul-u l-fīl-u
3.M.SG-eat.IMPVIND DEF-elephant-NOM

the elephant eats/is eating

ya-’kul-u l-fīl-u
not.IMPV 3.M.SG-eat.IMPVIND DEF-elephant-NOM

the elephant does not eat/is not eating

‹lam› + imperfective jussive

We have already established that ‹mā› negates verbs in the imperfective aspect/past tense. There is a second way to negate such verbs with ‹lam›, which in modern usage, is far more common than simple ‹mā›. ‹lam› is unusual in that, despite having a past tense meaning, it requires the verb to be in the imperfective aspect/non-past tense, and it also causes the verb to take the jussive mood, which is marked by a null suffix in the example below:

’akal-a l-fīl-u
eat.PFV-3.M.SG DEF-elephant-NOM

the elephant ate

lam ya-’kul- l-fīl-u
not.PFV 3.M.SG-eat.IMPVJUS DEF-elephant-NOM

the elephant did not eat

لن ‹lan› + imperfective subjunctive

The final negative form is ‹lan›, which indicates negation in the future, i.e., ‘will not.’ ‹lan› also takes the imperfective/present form of the verb but causes the verb to take the subjunctive mood, which is marked by the ‹-a› suffix below:

sa-ya-’kul-u l-fīl-u
FUT-3.M.SG-eat.IMPV-.IND DEF-elephant-NOM

the elephant will eat

lan ya-’kul-a l-fīl-u
not.FUT 3.M.SG-eat.IMPVSUBJ DEF-elephant-NOM

the elephant will not eat


In modern spoken Arabic, the negation rules are much simpler. There is only one basic negative form used for all verbal sentences ‹mā› and another for nominal sentences, which differs from one spoken variety to another. In some varieties, the verbal negative has a variant, a circumfix ‹ma-…-(i)š› inserted before and after the verb.

To illustrate the way this negative works, I will describe the rules in my own dialect of urban Palestinian, as that is the one I know best. However, other spoken dialects share the same basic principles, so much of what follows applies to them as well.

Verbal sentences

To negate a verbal sentence, there are two options. The first is to insert ‹mā› by itself immediately before the verb (I will only provide past tense forms since the rules are the same for the present tense):

l-fīl ’akal-∅
DEF-elephant eat.PFV-3.M.SG

the elephant ate

l-fīl ’akal-∅
DEF-elephant not.VB eat.PFV-3.M.SG

the elephant did not eat

The second method of negation is to insert the circumfix ‹ma-…-(i)š› around the verb. In this construction, ‘the elephant did not eat’ is as follows:

l-fīl ma-’akal-∅-
DEF-elephant not-eat.PFV-3.M.SGnot

the elephant did not eat

In my estimation, the circumfix method is slightly more common. It originates from the reduction of ‹mā› and the noun شيء ‹šay’› ‘thing’ to bound affixes. The original use of ‹šay’› was probably for emphasis, as in ‘he did not eat a thing.’ This usage eventually lost its emphatic meaning and was generalized to all verbs and contracted into a suffix. This is very similar to the process that occurred in French, where a phrase like il ne marche pas ‘he does not walk a step‘ was generalized to all verbs, including verbs where pas ‘step’ would not normally fit, such as il ne mange pas ‘he does not eat.’

Nominal sentences

Recall from above that a nominal sentence has no verb but expresses the relationship between the subject and the predicate. The example I used above was ‘the elephant is big.’ In urban Palestinian Arabic, this sentence is translated as follows:

l-fīl kbīr-∅
DEF-elephant big-M

the elephant is big

To negate this sentence, the word ‹miš› (taking the place of ‹lays-› in Standard Arabic) is inserted before the predicate. This word is derived from the circumfix described above. Since there is no verb in this sentence, there is nothing to separate the two parts of the circumfix from each other, so the result is ‹ma-∅-iš› > ‹m-∅-iš› > ‹miš›:

l-fīl miš kbīr-∅
DEF-elephant not.N big-M

the elephant is not big

Implications of the simplified negation

In Standard Arabic ‹mā›, in addition to being a negative marker, is also the word for ‘what’ in nominal sentences (the counterpart for verbal sentences is ماذا ‹māḏā›). However, in modern spoken Arabic, ‹mā› developed such a strong association with negation that it caused this usage to become obsolete. Instead, to distinguish ‘what’ from the negative particle, spoken Arabic varieties devised new words from the construction ‹’ayy-u šay’› or ‹’ayy-u šay’-in huwa› ‘lit. which thing (it is).’ In Moroccan, this was reduced to ‹aš›; in Levantine Arabic, to ‹’ēš› or ‹šū›; in Iraqi Arabic, to ‹šinū›. The standard Arabic words for ‘why’ are derived from ‘what’: <li-mā> and <li-māḏā> (lit. ‘for what’). From this ‘what’ word, a new ‘why’ was developed. For example, in Iraqi and Levantine Arabic, it is ‹l-ēš› ‘lit. for what (mirroring the form in Standard Arabic)’ and in Moroccan Arabic, it is ‹‘alaš› ‘lit. on what.’

As you may have noticed in the examples of I have given, not only in the discrepancies in the interrogative pronouns and the negative construction but also the conjugation of the verbs themselves, there are many differences between Standard Arabic and modern spoken Arabic. To the unfamiliar eye, they may even appear to be entirely separate languages. However, the situation is more complex than that, and in an upcoming post, I will describe the relationship between these two varieties of Arabic and their history in more detail. But for now, this small peak behind the curtain will have to suffice.

Persian loanwords in Arabic


Pahlavi inscription from the mid-Sassanian period. The Pahlavi script was based on Aramaic and is a testament to the cultural exchange between Iranian and Semitic languages. Source.

Much has been said about the influence of Arabic, as the language of Islam, on the speech of majority Muslim peoples. However, not nearly as much attention has been paid to how other languages have shaped and changed Arabic. Persian, mostly through word borrowings, has been one of the most important — if not the most important — of these languages, and what follows is a brief description of the nature of that Persian influence and how Arabic adapted to it.


The first thing to note is not all Persian words entered into Arabic directly. Many of them were borrowed through Aramaic, which was major lingua franca and trade language of the region (as well as serving as the official language of the Achaemenid dynasty of ancient Persia). As such, many of the Arabic words for spices, plants, precious stones, and other common goods are Persian in origin. Examples include the following:

  • Arab. ‹ballūr› ‘crystal’ < Pers. ‹belūr›
  • ‹fayrūz› ‘turquoise’ < ‹pīrūze›
  • ‹hāl›/‹hayl› ‘cardamom’ < ‹hel›
  • ‹ˀibrīq› ‘water jug’ < ultimately from ‹āb› ‘water’ + ‹rīxtan› ‘to pour’
  • ‹kanz› ‘treasure’ < Mid. Pers. ‹ganj› via Aram.
  • ‹lāzaward› ‘lapis lazuli’ < ‹lāj(a)vard›
  • ‹līmūn› ‘lemon’ < ‹līmū›
  • ‹marjān› ‘coral’ < Mid. Pers. ‹murvārīt› ‘pearl’ via Aram. ‹margānītā›
  • ‹mawz› ‘banana’ < Mid. Pers. ‹mōz›
  • ‹misk› ‘musk’ < Mid. Pers. ‹mušk›
  • ‹nisrīn› ‘dog rose’ < ‹nasrīn›
  • ‹sabānix› ‘spinach’ < ‹aspanāx›
  • ‹sunbul› ‘hyacinth’ < ‹sonbol›
  • ‹šabat›/‹šibitt› ‘dill’ < ‹ševīd›
  • ‹xiyār› ‘cucumber’ < ‹xiyār›
  • ‹yāqūt› ‘ruby’ < ‹yāqūt›
  • ‹yašb› ‘jasper’ < ‹yashp›
  • ‹yasmīn› ‘jasmine’ < ‹yāsamīn›
  • ‹zanjabīl› ‘ginger’ < Mid. Pers.‹singavēr› via Aram.
  • ‹zumurrud› ‘emerald’ < ‹zomorrod›

The defining characteristic of the majority of these loanwords is that they diverge from the typical native Arabic word structure, which is usually composed of only three root consonants that are modified according to specific, productive patterns. Words like <lāzaward> and <zanjabīl> immediately stand out in this regard.

Some of the words that seem to look like Arabic words conflict in meaning. For example, seems to have the root m-s-k, but this is an already existing root meaning ‘touch, grasp.’ Similarly, <xiyār> conflicts with the root x-y-r meaning ‘choice.’


Persian and Arabic, although having a long history of interaction and mutual influence, are not related languages. The former is a member of the Indo-European language family and is genetically related to Greek, Latin, Armenian, Sanskrit, English, etc. Arabic, on the other hand, is a member of the Afro-Asiatic language family, like Aramaic, Ge‘ez, Tamazight, Hausa, Somali, etc. There are significant differences between the two, including in their respective sound systems, which means that when words are borrowed, they must undergo a process of (sometimes drastic) adaptation to the sound, syllable, and word structure of the borrowing language.

This happens in Arabic borrowings into Persian; for example, Arabic ‹ádab› ‘discipline, politeness’ (stress on the first syllable in accordance with Arabic stress rules) becomes ‹adáb› in Persian (stress on the second syllable in accordance with Persian stress rules). And certainly the reverse is true in that Arabic also adapted the pronunciation and structure of Persian words into a format compatible with its grammar. The following three sections highlight some of these changes.


Many words in Middle Persian ended in ‹-g›; for example, the name of the language was ‹pārsīg›, ‘plan’ was ‹barnāmag›, and ‘pistachio’ was ‹pistag›. Middle Persian was spoken up until the 9th century, meaning that Arabic borrowed many words from this stage of Persian, including ones ending in ‹-g›. Modern Standard Arabic lacks a ‹g›, so the modern reflex of the Persian is usually ‹j› or ‹q›. Thus ‹barnāmag› is Arabic ‹barnāmij› and ‹pistag› became ‹fustuq›. However, it should be noted that developed from an original and that is pronounced as in many modern spoken varieties of Arabic. So these adaptations are not random substitutions.

As with all languages, Middle Persian experienced changes and eventually developed into New (i.e. modern) Persian. Among the changes was that the final ‹-g› was dropped. In modern Persian, the name of the language is ‹pārsī› or <fārsī>, ‘plan’ is ‹barnāme›, and ‘pistachio’ is ‹peste›. But this sound change only occurred in Persian, meaning it had no effect on any Persian loanwords in Arabic. The result is that Arabic contains “fossilized” forms of many Persian words. Other examples include the following:

  • Arab. ‹banafsaj› ‘violet’ versus Pers. ‹banafše›
  • ‹baydaq› ‘pawn (chess)’ versus ‹piyāde›
  • ‹dībāj› ‘silk brocade’ versus ‹dībā›
  • ‹namūḏaj› ‘example’ versus ‹namūne›
  • ‹ṭāzaj› ‘fresh’ versus ‹tāze›

Interestingly, in some spoken Arabic varieties, the word for ‘fresh’ is ‹ṭāza›, lacking any evidence of the Middle Persian final ‹-g›. Ostensibly, this would suggest that Persian loans in Arabic were indeed affected by the sound change. However, this is actually an instance in which Arabic re-borrowed the word – this second time however via Turkish, which had adopted it from Persian after the final had disappeared.

CHANGE OF ‹č› TO ‹ṣ› OR ‹s›

Persian has a ‹č› sound (as in ‘chair’) which standard Arabic and many spoken varieties lack. Normally, in borrowings from other languages, this sound is usually changed to ‹š› (as in ‘share’), such as Arabic ‹šekk› from English ‘check.’ However, in words of Persian origin, ‹č› tends to correspond to Arabic ‹ṣ› or occasionally ‹s›.

At first glance, this is strange because ‹ṣ› is a pharyngealized sound; that is, it is pronounced by creating a simultaneous constriction in the pharynx as the sound is produced in the mouth. Compare the plain ‹s> of <sūs› (‘licorice’) and its pharyngealized counterpart ‹ṣ> in <ṣūṣ› (‘chick’). The latter sounds very like a deep ‹s› but nothing at all like ‹č›. This is likely due to the tradition of Iranian languages, such as Sogdian and Pahlavi, to use the Aramaic letter representing <ṣ> for <č>. Aramaic speakers would have likely read the letter as <ṣ>, which then was passed on to Arabic speakers. There are a number of loanwords that exhibit this change, including the following:

  • Arab. ‹jaṣṣ› ‘plaster, gypsum’ < Pers. ‹gač›
  • ‹raṣāṣ› ‘lead’ < Mid. Pers. ‹arčīč›
  • ‹ṣandal› ‘sandal, sandalwood’ < ‹čandal›
  • ‹ṣārūj› ‘mortar’ < Mid. Pers. ‹čārūg›
  • ‹ṣihrīj› ‘cistern’ < Mid. Pers. ‹čahrēg›
  • ‹ṣīn› ‘China’ < ‹čīn›
  • ‹sirāj› ‘lamp’ < ‹čerāġ›


Arabic, being a Semitic language, has a root-and-pattern system of morphology, meaning that roots composed of consonants are applied to pre-existing patterns to form words. An example of this is ‹kitāb› ‘book’ and ‹maktab› ‘desk,’ both from the root k-t-b ‘write.’ The plurals of many words are also determined by pre-existing patterns, so that ‘books’ is ‹kutub› and ‘desks’ is ‹makātib›. Notice that the consonants stay the same, but the vowels are altered to indicate pluralization.

Some Persian words that were borrowed into Arabic very strongly resembled certain plural patterns, and indeed Arabic speakers interpreted these words as plurals. Recall from my post on the etymology of ‹tājir› ‘merchant’ that this is not unheard of in Arabic. But by reinterpreting originally singular words as plurals, speakers created a lexical ‘gap’ where a singular ought to have been. The solution was to back-form new singular forms from plurals based on the forms that exist for native Arabic words.

To illustrate, the Middle Persian word for ‘pawn’ (as in the chess piece) was ‹payādag›, which was borrowed as Arabic ‹bayādiq›. Comparing ‹bayādiq› to ‹makātib› `desks,’ one can immediately recognize that while the consonants differ, the vowels are identical. Thus Arabic speakers interpreted ‹bayādiq› not as ‘pawn’ but as ‘pawns.‘ Based on this interpretation, if ‹bayādiq› is a plural whose form matches ‹makātib›, then it would follow that the singular form of the former would match that of the latter. Thus, ‹baydaq› ‘pawn’ — having the same vowels and structure as ‹maktab› `desk’ — was back-formed as the new singular form.

Some other words that exhibit this back-formation include:

  • ‹firdaws› ‘paradise’ from Old Iranian *‹paridaiza›, which was borrowed as ‹farādīs› ‘paradises.’
  • ‹jāmūs› ‘water buffalo’ from Middle Persian ‹gāwmeš›, which was borrowed as ‹jawāmīs› ‘water buffalos.’
  • ‹nibr› ‘warehouse’ from Middle Persian ‹anbār›, which was borrowed as ‹ˀanbār› ‘warehouses.’

Persian nasta‘līq calligraphy by Maqsud ibn Mahmud, 1708. Nasta‘līq is a calligraphic style invented by Persians for their adapted Arabic script in the 14th-15th century. Source.


Many if not most of the loanwords discussed above were clearly borrowed in pre-Islamic or early Islamic times. This makes sense, given that Arabic was not an established language of prestige in that era unlike Persian, which was the language of one of the two most powerful empires in the region at the time. As Arabic ascended in influence through association with Islam, Persian borrowings into Arabic decreased.

However, they did not stop, and many spoken Arabic varieties continued to borrow Persian words. Obviously, those varieties spoken near or in Iran, such as Iraqi Arabic, contain more Persian loanwords than those that are not. Two examples from my own dialect (urban Palestinian) are ‹bābūj› ‘slipper’ from ‹pāpūč› and ‹šākūš› ‘hammer’ from ‹čakoš›. Neither of these words exists in standard Arabic, but they are both widely used in many spoken varieties of Arabic. The existence of two words for ‘fresh’ borrowed from two different eras of Persian discussed above also demonstrates the continued influence of the language on Arabic.


The “South Semitic” sprachbund

In linguistics, a sprachbund (German for ‘language league’) is any group of languages which are not necessarily closely related but which nevertheless have many similar features as a result of proximity and language contact. A very well-known example is the Balkan sprachbund, comprised of such languages as Albanian, Greek, Romanian, and Bulgarian. Although these languages come from separate branches of the Indo-European language family, the long history of contact between speakers within a relatively small geographic region has resulted in a convergence on a number of features such as the formation of the future tense, the loss of the infinitive verb, and common vocabulary.

Some scholars have argued that within the Semitic language family, there existed a South Semitic sprachbund, which consisted of Arabic, Ṣayhadic (Sabaean, Qatabānian, etc.), the Modern South Arabian languages (Soqotri, Mehri, etc.), and the Ethiopic languages (Gǝ‘ǝz, Amharic, Gurage, etc.). Throughout most of the 20th century, it was assumed that the unique features that these languages shared was because of a recent common ancestor. However, currently the prevailing assumption is that they are a result of extensive and prolonged interaction between their respective speakers.

Usually, three features are highlighted: 1) the universal change of Proto-Semitic *p > f; 2) broken/internal plurals; and 3) the L-stem form of the verb. I will discuss these below.


The Proto-Semitic language is assumed to have had a *p consonant; however, within the South Semitic sprachbund, this consonant changed, or spirantized, to ‹f›. Thus, Hebrew ‹pēḥām› ‘coal’ corresponds to Arabic ‹faḥm›, Tigrinya ‹fäḥam›, Amharic ‹fǝm›, and Soqotri ‹fḥam›.

The *p > f shift does occur in other Semitic languages as well. In Hebrew and Aramaic when ‹p› follows a vowel, it becomes ‹f›. For example, contrast Hebrew ‹pārastî› ‘I spread’ with ‹efrôs› ‘I will spread.’ In the South Semitic sprachbund, by contrast, this change was unconditional; that is, it affected every occurrence of Proto-Semitic *p regardless of its position within the word.

Incidentally, many Ethiopic languages regained the ‹p› and additionally acquired an emphatic (i.e. ejective) form of the consonant ‹ṗ›, but the words in which these two sounds occur are mostly borrowings from other languages. The letters in the Gǝ‘ǝz script used to represent ‹p› and ‹ṗ› are modified from the ‹t› and the ‹ṣ› letters, respectively. For example, ተ ‹tä› versus ፐ ‹pä› and ጸ ‹ṣä› versus ጰ ‹ṗä›.


Across the Semitic languages, the most common method of pluralization is suffixation. For example, in Akkadian ‹šarr-um› ‘king’ is pluralized by dropping the singular ‹-um› and adding the plural ‹-ū› for ‹šarr-ū› ‘kings.’ The members of the South Semitic sprachbund also exhibit this form of pluralization. For example, some Tigrinya nouns add the plural suffix ‹-at›; thus ‹säb› ‘person’ > <säb-at› ‘persons.’ This is known as the external plural.

Within the sprachbund, there is a second type of pluralization called broken or internal plurals. Recall from my previous posts that Semitic morphology, which is the formation of words and other grammatical units, is based on a root-pattern system. For example, the Gǝ‘ǝz root ‹ṣḥf› means “write.” Inserting the pattern |maC₁C₂aC₃| into this root resulted in ‹maṣḥaf› which means ‘book.’ However, this word was not pluralized by the addition of a suffix. Instead an entirely different pattern was applied to the root of the noun itself, specifically |maC₁āC₂ǝC₃t| which resulted in ‹maṣāḥǝft› ‘books.’ The name for this type of plural comes from the fact that the singular form of the verb is “broken” apart by the insertion of additional vowels and consonants.

Although there is residual evidence of the broken plural in other Semitic languages, only the South Semitic sprachbund exhibits its extensive use. Additionally, not only do the sprachbund members utilize the broken plural, they also share a number of identical pluralization patterns. Perhaps, the most common such pattern is |’VCCV̄C| (where V stands for a short vowel and V̄ for a long vowel). Examples include the following:

  • Arabic ‹wazn› ‘weight’ > ‹’awzān›
  • Gǝ‘ǝz ‹faras› ‘horse’ > <’afrās›
  • Tigre ‹mǝdǝr› ‘land’ > ‹’amdār›
  • Ṣayhadic * ‹hgr› ‘town’ > ‹’hgr›
  • Harsusi ‹ḥamθ› ‘lower belly’ > ‹eḥmōθ›
  • Shehri ‹ḥarf› ‘gold coin’ > ‹ɔḥrɔf›

Despite the preceding examples. the situation in the Ethiopic languages has changed over the centuries. This language group is divided into two subgroups, North (Gǝ‘ǝz, Tigrinya, and Tigre) and South (Amharic, Gurage, Argobba, and so on). Only the North group maintains extensive use of the broken plural. The Gurage languages have almost entirely lost it, and in Amharic what broken plurals exist have been directly borrowed from Gǝ‘ǝz. The loss of this feature is most likely a result of a separate sprachbund with the surrounding Cushitic and Omotic languages, which generally have external plurals.


Like the nouns above, verbs are also formed through the root-pattern system. The Semitic languages all have different templates that provide nuanced changes to the meaning of the verb root. For example in Ḥarsusi and Mehri, two Modern South Arabian languages, applying the root ḳ-f-d to the basic verb pattern |C₁ǝC₂ōC₃| results in ‹ḳǝfōd› ‘to descend,’ while the causative pattern |aC₁C₂ōC₃| results in ‹aḳfōd› ‘to put down (i.e. to cause to descend).’ The latter is known as the C-stem (“C” for causative), and variations of it are exhibited throughout all the Semitic languages. 

In the South Semitic sprachbund, there is a unique pattern known as the L-stem, which features the lengthening of the first vowel from the basic verb form: |C₁aC₂aC₃| > |C₁āC₂aC₃|. In Arabic, this pattern typically denotes the involvement of another person in some sort of reciprocal fashion: ‹qatala› ‘he killed’ vs ‹qātala› ‘he fought (i.e. he killed another)’; ‹kataba› ‘he wrote’ vs ‹kātaba› ‘he corresponded (i.e. he wrote to another).’ There is also a variant of this pattern |taC₁āC₂aC₃|, which denotes a reflexive or reciprocal meaning: ‹taqātal-ū› ‘they fought each other.’

In Gǝ‘ǝz, there is no special meaning to this pattern: ‹bāraka› ‘he blessed’ and ‹māsana› ‘he perished.’ Furthermore, the |taC₁āC₂aC₃| pattern is simply the passive form: ‹tabāraka› ‘he was blessed.’ However, some basic verb forms take on this L-stem passive to create a reciprocal: ‹ḳatala› ‘he killed’ > ‹taḳātal-u› ‘they fought/killed each other,’ identical in meaning and form to the Arabic تقاتلوا ‹taqātal-ū›.

Based on currently available information, the L-stem can only be attributed to Arabic and Ethiopic. The Ṣayhadic scripts did not mark vowels, so it is unclear from the surviving texts if they had an L-stem, while the Modern South Arabian languages, if they ever had the L-stem at all, have merged it with another verb pattern. Nevertheless, the existence of the L-stem in both Arabic and Ethiopic and its use in similar ways points to the effects of the South Semitic sprachbund.


Originally the above features were so convincing to scholars that they placed the members of this sprachbund into one subfamily within Semitic and called it the South Semitic branch. This was the case throughout most of the 20th century, when the classification looked something like this:

However, starting in the 1970s, scholars began to reanalyze their assumptions. Some argued that despite the similarities, there were still important differences within this “South Semitic branch,” including the following:

  1. A conjugation pattern of the imperfective tense (representing both present and future) in Arabic and Ṣayhadic closely resembling the Northwest Semitic forms, which differ from all other varieties of Semitic languages;
  2. Grammatical rules governing the definite article “the” that were identical in Arabic and Northwest Semitic languages; and
  3. the formation of the tens (i.e. twenty, thirty, forty, etc.) based on a noun plural suffix ‹-îm/-īn/-ūn› as opposed to the general Semitic ‹-ā› found in Ethiopic and Akkadian.

These and several other important features together suggested a common Central Semitic subfamily. Since then, there have been many revisions of the traditional classification. The following is one current example taken from Huehnergard and Rubin (2011):

In this tree, Arabic and Ṣayhadic are moved from South Semitic into a “Central Semitic” branch along with Aramaic and Canaanite, while Modern South Arabian and Ethiopic/Ethiopian each get their own separate branches. Currently, the majority of scholars subscribe to some variation of this tree.

In any case, whether a revised classification or the traditional one is correct, it is undeniable that the Semitic languages of Arabia and East Africa interacted with and influenced each other over a significant period of time. What is most remarkable is that the evidence of those interactions can still be observed all of these centuries later.

* The Ṣayhadic script did not mark vowels so the existence of this particular pluralization pattern, in which the second vowel is long is based on conjecture.


On the etymology of شرموطة ‹šarmūṭa›

شرموطة ‹šarmūṭa› is one of the most taboo curse words in the Arabic language. It means ‘whore,’ but it is much more potent as an insult. Speaking from personal experience, I very rarely hear it.

Linguistically, it has an unusual structure for an Arabic word. Its root is composed of four consonants š-r-m-ṭ, while most roots are composed of three. Its word pattern |C₁aC₂C₃ūC₄a| is also relatively rare. Perhaps because of how taboo and structurally unusual it is, ‹šarmūṭa› has been the subject of some highly imaginative and very wrong etymology theories.

The entry in the English Wiktionary claims, “Most Arab linguists agree that it is of non-Semitic origin.” The Arabic Wiktionary entry, after giving the correct explanation, provides the most common folk etymology: that it comes from the French charmante, meaning “charming, delightful.” The story is that during the colonial period, French soldiers would call Arab girls who flirted with them charmante. The local population, who could not pronounce it correctly, mistook the word for something dirty given the improper behavior of these women and therefore interpreted it as ‘whore.’

The Arabic Wiki entry goes on to claim that charmante is also the origin of Sharm, as in Sharm el Sheikh, the Egyptian resort town. That factoid is indisputably false because “sharm” is a perfectly ordinary Arabic word meaning “bay.” Even leaving that aside, there is no plausible explanation for how charmante — with the nasal /ɑ̃/ vowel in the second syllable — became ‹šarmūṭa› with a long /uː/ vowel. Compare the Arabic colloquial word for “elevator” ‹aṣansēr›, which is a direct borrowing from French ascenseur. The en of the second syllable — identical in pronunciation to the an in charmante — does not transform into a long /uː/ vowel, even among non-French-literate Arabic speakers. The charmante explanation is nonsense, most likely the result of someone noticing that it sounds somewhat similar to ‹šarmūṭa› and deciding that both words must have a common (French) origin.

The actual etymology is so fantastically convoluted that the charmante theory is almost an insult by comparison. The original root of ‹šarmūṭa› is actually triliteral š-r-ṭ, which means ‘slice, tear off.’ Recall from my post on Semitic languages that Arabic, being a Semitic language, has a root-pattern system of word formation. Basic verbs are formed with the pattern C₁aC₂aC₃-; applying the root š-r-ṭ to this pattern results in ‹šaraṭ-› ‘to slice, tear off.’ The passive participle of that verb form has the pattern maC₁C₂ūC₃-, and the resulting word is ‹mašrūṭ-›, meaning ‘sliced, torn off.’ At some point, through a type of sound change called metathesis¹, where the sounds of a word are rearranged, ‹mašrūṭ-› became ‹šarmūṭ-›.

In Arabic, passive participles are adjectives, and adjectives can be nouns. So ‹mašrūṭ-›/‹šarmūṭ-› meant not only ‘sliced, torn off’ but also ‘a thing that is sliced or torn off.’ The feminine form of this word ‹šarmūṭa› came to mean ‘rag’ (as in a torn piece of cloth), specifically one used to wipe up dirt². By metaphor, this word then morphed into an insult against women. Calling a woman a ‹šarmūṭa› was saying that she was as “dirty” (i.e. in terms of morals and reputation) as a dish rag. Thus was born one of the worst insults in the Arabic language.

‹šarmūṭa› is not French nor Ancient Egyptian as someone hilariously tried to claim. No, it is a purely Arabic word, although one with a very exceptional journey.

¹ This rearranging of sounds happens in all languages. For example, in English, “aks” is a nonstandard variant of the word “ask”, both of which forms go back to Anglo-Saxon; and in Spanish, milagro ”miracle” comes from Latin miraculum.

² A Saudi twitter user reports that in certain parts of Saudi Arabia, the word for rag is ‹šamṭūr›, which is a metathesis of the already metathetic ‹šarmūṭ›! This second metathesis may well have resulted from a need to separate the original meaning of the word from the now vulgar usage.

The Semitic languages

The Semitic languages are part of a language family called Afro-Asiatic, which among others includes the Berber/Tamazight, Cushitic, and (ancient) Egyptian languages. The Semitic languages are assumed to have descended from a single source, which is called Proto-Semitic. There is no record of this language, but scholars have been able to piece a lot of information about it from evidence in the daughter languages.


Proto-Semitic and its close relative Proto-Berber were most likely spoken somewhere in Northeastern Africa. Around 3500 BCE, driven by the desertification that would create the Sahara Desert, the speakers of Proto-Semitic migrated east into the Levant, where their presence led to the collapse of the indigenous cultures that existed there. It seems that the Semites didn’t emigrate all at once but rather in waves. Some of them ended up in northern Syria, some in Iraq, others in the Levant and the northern Arabian Peninsula, and still others in the southern Arabian Peninsula and across the Red Sea into East Africa.


Bronze head of an Akkadian ruler, probably Sargon the Great, c. 23rd – 22nd century BCE. Source.

These migration patterns led to the divisions within Semitic. There are many competing theories regarding the classification of these divisions. The most common divides Semitic into East and West groups (Huehnergard and Rubin: 2011). The East group, composed of Eblaite, Akkadian, and Babylonian, died out in the 8th century BCE. The West group is divided into three subgroups. The first is Central Semitic, which is further divided into Northwest Semitic — composed of Aramaic, Ugaritic, and Canaanite, Arabic, and Ṣayhadic. The second is Ethiopic, which is composed of the Semitic languages spoken in eastern Africa. The last subgrouping is the Modern South Arabian languages, which are spoken in the southern Arabian Peninsula.

Other scholars propose theories that significantly deviate from this model. Lipiński (1997), for example, argues that there are four not two macro-divisions. According to him, the Semitic that was spoken in northern Syria developed into the North Semitic branch (composed of Ugaritic and Amorite), in Iraq into the East Semitic branch (Akkadian and Babylonian), in the Levant and northern Arabia into the West Semitic branch (e.g. Arabic, Aramaic, and Canaanite), and finally in southern Arabia and East Africa into the South Semitic branch (Ṣahyadic, Ethiopic, and Modern South Arabian). It should be noted that this is a highly idiosyncratic view that is not widely accepted.

Whichever is the correct division, the largest number of living Semitic languages can be found in East Africa, including Amharic, Gurage, Tigre, and Tigrinya. Outside of that region, the most common Semitic language is Arabic and its highly diverse spoken dialects. Additionally, there are Modern Hebrew; the Neo-Aramaic languages, like Assyro-Chaldean, Turoyo, and Neo-Mandaic; and the Modern South Arabian languages, like Soqotri, Mehri, and Shehri.


In order for a group of languages to constitute a “family,” they must share a large number of unique linguistic features that cannot be attributed to mere borrowings or simultaneous development through contact between speakers. The following is a sampling of the unique features that define Semitic languages.



Maimonides’ autograph draft of his legal code, Mishneh Torah (from the Cairo Genizah), in cursive Sephardic script (Egypt, c. 1180). Source.

All Semitic languages have or had a series of “emphatic” consonants. In proto-Semitic there were at least five ‹ṭ, ḳ, ṱ, ṣ, ṣ́›. Only (standard) Arabic has maintained this series. The Canaanite languages like Phoenician and Hebrew, only had three, having merged ‹ṱ› and ‹ṣ́› with ‹ṣ›. Ethiopic languages also merged these consonants, but many of them also developed new emphatics, such as ‹ṗ› and ‹č̣›.

The term “emphatic” is necessarily imprecise because these consonants are realized differently in the daughter languages. Originally, they were most likely ejective consonants. Only the Ethiopic and Modern South Arabian languages preserve this pronunciation today. In Arabic and most Neo-Aramaic languages, they are pharyngealized (click here to listen to the difference between plain and emphatic consonants in Arabic). In Maltese and Modern Hebrew, the emphatic consonants have been lost under the influence of European languages.


Every Semitic language has two genders, masculine and feminine. The masculine is usually the base form, while the feminine is indicated with a suffix.



Gospel of Luke in Ge‘ez, from the Church of Gännätä Maryam, c. 1500. Source.

The feminine is marked by the suffix ‹-t›. Examples include Akkadian ‹šarr-at-› “queen,” Arabic ‹bint› “daughter,” Gǝ‘ǝz ‹barakat› “blessing,” Hebrew ‹rē’šī› “beginning.” Within the Afro-Asiatic family, this is not unique to Semitic languages. The Berber languages, for example, also mark the feminine with ‹t›, but there it is a circumfix (appearing at the beginning and end of the word). Thus, ‹amaziɣ› ‘Amazigh man’ is masculine, and ‹tamaziɣt› ‘Amazigh woman’ is feminine.

In a number of Central Semitic languages, like Arabic and Hebrew, this suffix was deleted in isolated words, but reappeared if the word was part of a phrase. For example, in Arabic ‘writing,’ feminine noun, is ‹kitāba›; however, ‘a boy’s writing’ is ‹kitābat walad›. Similarly, in Modern Hebrew these are ‹ktiva› and ‹ktivat yéled›.


Semitic languages characteristically divide the second person pronoun into masculine and feminine forms. Examples of the singular forms of “you,” respectively, include Akkadian ‹atta, atti›, Arabic ‹’anta, ’anti›, Geʻez ‹’ānta, ’ānti›, and Hebrew ‹’attā, ’at›. Separate forms also exist in the plural pronouns.

However, in some modern languages and dialects this distinction has been lost or reduced. In many Arabic dialects and other languages like Harari, spoken in Ethiopia, the second person plural no longer distinguishes between gender. Others such as Maltese and Tunisian Arabic have lost the distinction in the singular as well.


The vast majority of Semitic lexicons are composed of abstract roots of three, or sometimes four, consonants. Words are formed by applying these roots to different patterns of vowels and consonants.


Folio from the “Blue Qur’ān.” Second half 9th–mid-10th century CE. Source.

For example, in Arabic the root k-t-b denotes ‘write.’ By itself, it cannot be used in a sentence. However, applying it to the pattern C₁āC₂iC₃, which means ‘doer of [root],’ results in ‹kātib› ‘writer.’ Applying it to the pattern maCC₂aC₃, ‘place of [root],’ results in ‹maktab› ‘desk, office’ (literally, a place where one writes). Other words formed from this root include ‹maktūb› ‘letter’; ‹kitāba› ‘writing’; ‹kātaba› ‘he corresponded (with)’; and ‹istiktāb› ‘dictation.’

This system is very flexible, and it is possible to create new roots from existing words and even from foreign languages. For example, the root ’-m-r-k originates from “America” and means “Americanize.” Thus applying it an existing verb pattern for 4-consonant roots ‹taC₁aCC₃aC₄a› results in ‹taamraka› “he became American.”

These are only a few of the features that distinguish Semitic languages. There many others, such as a verb conjugation system originally centered around aspect rather than tense; object and possessive pronouns as suffixes; and the dual number in verbs, nouns, and adjectives. These subjects are for another day perhaps.

I leave you with a side-by-side comparison of hypothesized Proto-Semitic words and their attested forms in four daughter languages (color-coded according to which branch of Semitic they belong to):

FATHER *’ab- ab- ’ab- ’āḇ ’ab
MOTHER *’imm umm- ’umm- ’imma ǝmm
GOD *’il(-āh-)- il- ’ilāh- ’ēl(ōh)
HOUSE *bayt bīt- bayt- bayiṯ, bēṯ bet
ROPE *ḥabl- ebl- ḥabl- ḥeḇel ḥabl
PEACE *s₁alām- šalām- salām- šālōm salām
WATER *māy- mū- mā’- mayim māy
BLOOD *dam- dam- dam- dām dam
EAR *’uḏn- azn-/uzn- ’uḏn- ’ōzen ’ǝzn
EYE *‘ayn- īn- ‘ayn- ‘ayin, ‘ēn ‘ayn
HAND *yad- id- yad- yāḏ ’ǝd
TONGUE *lis₁ān- lišān- lisān- lāšōn lǝssān
TOOTH *s₁inn- šinn- sinn- šēn sǝnn
BULL/OX *ṯawr- šūr- ṯawr- šōr sor
HORN *ḳarn- qarn- qarn- qeren ḳarn


