Traditional Arabic grammar defines a detailed part-of-speech hierarchy which applies to both words and morphological segments. Fundamentally, a word may be classified as a nominal ism (اسم), verb fiʿil (فعل) or a particle ḥarf (حرف). The set of nominals include nouns, pronouns, adjectives and adverbs. The particles include prepositions, conjunctions and interrogatives, as well as many others. Morphological annotation in the Quranic Arabic corpus divdes words into multiple segments. Each segment is assigned a part-of-speech tag. These tags are detailed in the following sections. In addition to part-of-speech tags, each segment is annotated using a set of multiple morphological features.


The first of the three basic parts-of-speech are the nominals ism (literally "names" in Arabic). The tags for nominals in the Quranic corpus are shown in Figure 1 below:

  Tag Arabic Name Description
Nouns N اسم Noun
PN اسم علم Proper noun
Derived nominals ADJ صفة Adjective
IMPN اسم فعل أمر Imperative verbal noun
Pronouns PRON ضمير Personal pronoun
DEM اسم اشارة Demonstrative pronoun
REL اسم موصول Relative pronoun
Adverbs T ظرف زمان Time adverb
LOC ظرف مكان Location adverb

Fig 1. Part-of-speech tagset for nominals.

Proper Nouns

Proper nouns are annotated using the PN tag in the Quranic corpus. In Arabic orthography, there is no distinction between a proper noun and a noun, whereas in English these are written with the first letter capitalized. Proper nouns in Arabic are known by convention and through the fact that they have the grammatical property of being definite even though they do not carry the al- determiner prefix. The set of proper nouns includes personal names such as "the prophet ibrāhīm". In Arabic, proper nouns as known as اسم علم.


Three types of pronoun are identified in the corpus using the tags PRON, DEM and REL. The personal pronouns (PRON) are those which are found in English ("I", "we", "you", "them", "us") together with pronouns found only in Quranic Arabic, such as those inflected for the dual or feminine (for example antumā, "you two"). When segmenting words for morphological annotation, the PRON tag is also used to identify attached pronoun segments, which are suffixes that appear at the end of words. In the case of nouns these are possessive pronouns. For example "his book" is fused into a single Arabic word-form (kitābuhu). Suffixed pronouns attached to verbs will be either subject pronouns or object pronouns.

The DEM tag is used to identify demonstrative pronouns ("this", "that", "these", "those"). In Quranic Arabic, these are termed ism ishāra (literally, "the names of pointing"). The REL tag is used to identify relative pronouns which connect a relative clause to its main clause (for example "the book that you bought"). In Arabic grammar, relative pronouns are known as ism mawṣūl ("the names of connection").


Adjectives (صفة) are closely related to nouns in Quranic Arabic, and it is sometimes not straightforward to distinguish between the two as both carry the same morphological features. For example both nouns and adjectives accept the determiner al- ("the"). The rule followed in the Quranic corpus is to mark a word as an adjective if it is considered to be one according to its syntactic role in the sentence. A nominal tagged as an adjective will directly follow the noun that it describes.


The second of the three basic parts-of-speech is the verb. All verbs in the Quranic corpus are tagged using the V (verb) tag. Each verb is also annotated using multiple morphological features to specify conjugation. In Quranic Arabic, verbs can be conjugated according to three different grammatical aspects (perfect, imperfect and imperative) as well as moods of the imperfect (indicative, subjunctive and jussive). Nouns derived from verbs – such as active and passive participles – are tagged as N (noun) and are annotated using the "derivation" feature.

  Tag Arabic Name Description
Verbs V فعل Verb

Fig 2. Verb part-of-speech tag.


The third of the three basic parts-of-speech is the particle. Particles include prepositions, lām prefixes, conjunctions and others. Interrogative particles are tagged using INTG, which includes the independent particle hal and the prefixed interrogative alif. Negative particles in the Quranic Arabic corpus are tagged as NEG. Certain negative particles may place a following imperfect verb into the subjunctive or jussive mood. The VOC tag is used to identify vocative particles and prefixes such as in yā-rabbi. In English this would be roughly translated using the archaic vocative particle "O", as in "O my Lord". Part-of-speech tags for particles and Quranic initials are shown in Figure 3 below.

  Tag Arabic Name Description
Prepositions P حرف جر Preposition
lām Prefixes EMPH لام التوكيد Emphatic lām prefix
IMPV لام الامر Imperative lām prefix
PRP لام التعليل Purpose lām prefix
Conjunctions CONJ حرف عطف Coordinating conjunction
SUB حرف مصدري Subordinating conjunction
Particles ACC حرف نصب Accusative particle
AMD حرف استدراك Amendment particle
ANS حرف جواب Answer particle
AVR حرف ردع Aversion particle
CAUS حرف سببية Particle of cause
CERT حرف تحقيق Particle of certainty
CIRC حرف حال Circumstantial particle
COM واو المعية Comitative particle
COND حرف شرط Conditional particle
EQ حرف تسوية Equalization particle
EXH حرف تحضيض Exhortation particle
EXL حرف تفصيل Explanation particle
EXP أداة استثناء Exceptive particle
FUT حرف استقبال Future particle
INC حرف ابتداء Inceptive particle
INT حرف تفسير Particle of interpretation
INTG حرف استفهام Interogative particle
NEG حرف نفي Negative particle
PREV حرف كاف Preventive particle
PRO حرف نهي Prohibition particle
REM حرف استئنافية Resumption particle
RES أداة حصر Restriction particle
RET حرف اضراب Retraction particle
RSLT حرف واقع في جواب الشرط Result particle
SUP حرف زائد Supplemental particle
SUR حرف فجاءة Surprise particle
VOC حرف نداء Vocative particle
Disconnected Letters INL حروف مقطعة Quranic initials

Fig 3. Part-of-speech tagset for particles and the Quranic initials.


In the case of attached prefixes, the prepositions are straightforward. Certain single letter prepositions may be fused to a word as a prefix. These include bi, ka, ta, wa, and one of the senses of lām. The prefixed prepositions ta and wa occur in Quranic Arabic but not are not typically found in the modern standard form of the language. They are used in the Quran as particles of oath, for example ta-allah, "by Allah". In addition independent words may be prepositions. In the Quranic Arabic corpus, prepositions are identified by the P tag. A word is tagged as a preposition if and only if it considered to be a genitive preposition ḥarf jar (حرف جر).

Quranic Initials (Disconnected Letters)

The Quranic initials (or muqattaʿāt in Arabic) are sequences of mysterious letters, such as alif lām mīm (ا ل م), which make up the first verses of several chapters in the Holy Quran but do not combine to form words. These are also known as the disconnected letters (حروف مقطعة). There are numerous suggestions as to the meaning of these letters and so they are given their own part-of-speech tag (INL). This is so as to avoid any assumptions as to their meaning, which would be the case if they were tagged as being proper nouns or as abbreviations (see http://en.wikipedia.org/wiki/muqatta'at).

In the Quran, 30 verses in 29 chapters begin with initials. All initials occur at the first verse, except for those in chapter 42. This chapter has a pair of initials at verses (42:1) and at (42:2). The full meaning behind the Quranic initials is still not yet clearly understood, and there are differing opinions as to their interpretation. One observation is that the initials are almost always followed by a description of Quranic revelation itself. The only occurrence of double initials - at the first two verses of chapter 42 - is followed by two mentions of revelation, at verse (42:3).

