Morphological Features


Arabic has a rich morphology and a single word can function as an entire sentence in English. For example the Arabic word fajaʿalnāhum (فَجَعَلْنَٰهُمُ) found in verse (23:41) can be translated into the English sentence "and We made them". The reason that such a compact syntax is possible is that the single word can be divided into 4 distinct morphological segments:

and We made them

Fig 1. Morphological segmentation for word (23:41:4).

  • fa - a prefixed conjunction ("and")
  • jaʿal - the stem, a perfect past tense verb ("made") inflected as first person masculine plural
  • - a suffixed subject pronoun ("We")
  • hum – a suffixed object pronoun ("them")

This single-word sentence has VSO (verb-subject-object) order. In general Arabic is rather flexible with regards to word order since case endings can be used to determine the role of each word in a sentence. Word order is typically used to emphasize different parts of a sentence. In the Quranic Arabic corpus, a part-of-speech tag has been assigned to each morphological segment that makes up a word. For example the word above has 4 part-of-speech tags, with one tag for each of its 4 segments:

  • CONJ - conjunction
  • V - verb
  • PRON - pronoun (for the attached subject pronoun)
  • PRON - pronoun (a second pronoun segment for the attached object pronoun)

Although multiple segments can be fused together into a single word usually only one segment will be identified as the stem. Any segments preceding the stem are prefixes and any segments following the stem are suffixes. Prefix and suffix segments are optional while the stem segment is the unmodified form of the word. Occasionally a word will have two stems such as the contraction عَن + مَا = عَمَّ:

About what

Fig 2. A contraction of two stems in word (78:1:1).


As well as part-of-speech tags, multiple inflection features are assigned to each morphological segment. For example, features for person, gender and number. The features for prefixes end in + and are shown in figures 3 to 7 below. In contrast features for suffixes start with +.

Feature Name Segment part-of-speech / description
Al+ determiner (al) DET – determiner prefix ("the")
bi+ preposition (bi) P – preposition prefix ("by", "with", "in")
ka+ preposition (ka) P – preposition prefix ("like" or "thus")
ta+ preposition (ta) P – particle of oath prefix used as a preposition ("by Allah")
sa+ future particle (sa) P – prefixed particle indicating the future ("they will")
ya+ vocative particle () VOC – a vocative prefix usually translated as "O"
ha+ vocative particle () VOC – a vocative prefix usually translated as "Lo!"

Fig 3. Features identifying prefixed segments.

Feature Name Segment part-of-speech / description
A:INTG+ interrogative particle (alif) INTG – prefixed interrogative particle ("is?", "did?", "do?")
A:EQ+ equalization particle (alif) EQ – prefixed equalization particle ("whether")

Fig 4. Features identifying the particle alif as a prefix.

Feature Name Segment part-of-speech / description
w:CONJ+ conjunction (wa) CONJ – conjunction prefix ("and")
w:REM+ resumption (wa) REM – resumption prefix ("then" or "so")
w:CIRC+ circumstantial (wa) CIRC – circumstantial prefix ("while")
w:SUP+ supplemental (wa) SUP – supplemental prefix ("then" or "so")
w:P+ preposition (wa) P – particle of oath prefix used as a preposition ("by the pen")
w:COM+ comitative (wa) COM – comitative prefix ("with")

Fig 5. Features identifying the particle wāw as a prefix.

Feature Name Segment part-of-speech / description
f:REM+ resumption (fa) REM – resumption prefix ("then" or "so")
f:CONJ+ conjunction (fa) CONJ – conjunction prefix ("and")
f:RSLT+ result (fa) RSLT – result prefix ("then")
f:SUP+ supplemental (fa) SUP – supplemental prefix ("then" or "so")
f:CAUS+ cause (fa) CAUS – cause prefix ("then" or "so")

Fig 6. Features identifying the particle fa as a prefix.

Feature Name Segment part-of-speech / description
l:P+ preposition (lām) P – the letter lām as a prefixed preposition
l:EMPH+ emphasis (lām) P – the letter lām as a prefixed particle used to give emphasis
l:PRP+ purpose (lām) P – the letter lām as a prefixed particle used to indicate purpose
l:IMPV+ imperative (lām) P – the letter lām as a prefixed particle used to form an imperative

Fig 7. Features identifying the particle lām as a prefix.

Roots and Lemmas

In Arabic and other Semitic languages such as Hebrew, similar words may be grouped together according to a root. This is a sequence of typically 3 or 4 consonants (known as radicals) which together form a triliteral or quadriliteral root. From a single root a wide variety of words may be formed, with distinct yet related meanings. For example from the triliteral root kāf tā bā (ك ت ب) the verb "write" may be formed, as well as its derivatives in Arabic including "writing", "book", "author", "library" and "office".

The concept of a lemma is also used to group similar words together at a finer level of granularity than a root. The lemma groups word-forms that differ only by inflectional (as opposed to derivational) morphology, and do not vary in meaning. Unlike the root, the lemma is an actual word selected to represent the group and is typically the same word as used in dictionary headings. A third feature used to group words together is the SP (special) feature. Certain groups of verbs and particles have special rules in Arabic grammar with regards to case endings and syntactic roles.

Both roots and lemmas are used in the Quranic Arabic corpus so that words may be easily grouped together to form an electronic lexicon of the Quran in classical Arabic. For verbs, only the root (not lemma) is indicated, since the remaining morphological features are sufficient to determine the final form of the verb. Nouns, proper nouns and adjectives have both a root and a lemma. Other parts of speech such as particles only have lemmas (not roots) indicated, since these fall outside of the root + template paradigm. The following table lists the morphological features used to group similar words together. These features make use of extended Buckwalter transliteration:

Feature Name Description
ROOT: root Indicates the (usually triliteral) root of a word, for example ROOT:ktb
LEM: lemma Specifies the common lemma for a group of words, for example LEM:kitaAb
SP: special Indicates that the word belongs to a special group, for example SP:<in~

Fig 8. Root and lemma features.

Person, Gender and Number

In Arabic, words may inflect for person, gender and number. Unlike in English words inflect not only for plural and singular but also for the dual. For example there is a distinct word-form to represent "two books". In the Quranic Arabic corpus, the features for person, gender and number are combined using a concatenative notation. For example 3MS represents third person, masculine, singular. Similarly 2D represents second person, dual. The concept of gender in Arabic grammar may refer to either semantic, morphemic or grammatical gender (see the grammar of gender).

Feature Arabic Name Values Description
person الاسناد 1, 2, 3 first person, second person, third person
gender الجنس M, F masculine, feminine
number العدد S, D, P singular, dual, plural

Fig 9. Features for person, gender and number.

Verb Features

The morphological features discussed in this section apply to verbs as well as to their derivatives: the active participle, passive participle and the verbal noun. An important verb feature is the aspect. This is closely related to but distinct from the concept of tense. In Quranic Arabic the aspect of a verb is either perfect, imperfect, or imperative. The perfect roughly corresponds to the past tense in English although there is a distinction: the perfect refers to actions which have been completed. In addition to aspect, verbs in Quranic Arabic are conjugated for mood. Imperfect verbs may be found in the indicative, subjunctive and jussive moods. The indicative mood is the normal "default" mood so that if the mood feature is not tagged then the verb should be considered to be in the indicative mood.

The two other features used for verbs and their derivatives are voice (active or passive) and form. The active voice is the default and if not indicated a verb should be considered to be in the active voice. Verb forms are indicated using roman numerals, as found in Arabic dictionaries, so that (IV) represents a fourth form verb.

Feature Arabic Name Description
PERF فعل ماض Perfect verb
IMPF فعل مضارع Imperfect verb
IMPV فعل أمر Imperative verb

Fig 10. Aspect features.

Feature Arabic Name Description
IND مرفوع Indicative mood (default)
SUBJ منصوب Subjunctive mood
JUS مجزوم Jussive mood

Fig 11. Mood features.

Feature Arabic Name Description
ACT مبني للمعلوم Active voice (default)
PASS مبني للمجهول Passive voice

Fig 12. Voice features.

Feature Description
I First form (default)
II Second form
III Third form
IV Fourth form
V Fifth form
VI Sixth form
VII Seventh form
VIII Eighth form
IX Ninth form
X Tenth form
XI Eleventh form
XII Twelfth form

Fig 13. Verb form features.

Derived Nouns

In Quranic Arabic, the active participle, passive participle and verbal noun are three types of nominals which are derived directly from verbs. In the Quranic Arabic corpus these are tagged with the noun or adjective part-of speech-tag and include one out of three possible derivation features. For example active participles are tagged in the corpus as POS:N ACT PCPL. The verbal features above that apply to verbs also apply to derived nouns (aspect, mood, voice and form) and are used to indicate the morphology of the original verb that the noun was derived from. Figure 14 below shows the derivation features used to indicate the type of a derived noun:

Feature Arabic Name Description
ACT PCPL اسم فاعل Active participle
PASS PCPL اسم مفعول Passive participle
VN مصدر Verbal noun

Fig 14. Derivation features.

Nominal Features

The feature Al+ is used to denote the prefixed determiner al ("the") attached to nominals (nouns, proper nouns and adjectives). In Arabic there is no indefinite article ("a"/"an" in English). Instead tanwīn is used and diacritics are attached to the end of a word to mark it as indefinite. The features DEF and INDEF are used to indicate the state of a noun as definite or as indefinite respectively (see figure 15 below). Nominals may be found in one of three grammatical cases: the nominative case, the accusative case, and the genitive case (see figure 16):

Feature Arabic Name Description
DEF معرفة Definite state
INDEF نكرة Indefinite state

Fig 15. State features.

Feature Arabic Name Description
NOM مرفوع Nominative case
ACC منصوب Accusative case
GEN مجرور Genitive case

Fig 16. Case features.


In the Quranic Arabic Corpus, three features are used to indicate suffixes. These are attached pronouns, the vocative suffix and the nūn of emphasis. The vocative suffix is denoted by the morphological feature +VOC and is used only with the word allāh to produce the vocative word-form allāhumma. The morphological feature +n:EMPH is used to denote the emphatic usage of nūn as an attached suffix.

Attached pronoun suffixes are identified using the PRON: compound morphological feature. Pronouns attached to nouns are possessive pronouns, and when attached to verbs they are either subject or object pronouns. An attached pronoun may inflect for person, gender and number. A concatenative notation is used with the PRON: tag. For example PRON:3MS represents a third person masculine singular suffixed pronoun. Similarly PRON:2D represents a second person dual suffixed pronoun. See figure 9 above for person, gender and number features.

See Also

