Traditional Arabic grammar defines a detailed part-of-speech hierarchy which applies to both words and morphological segments. Fundamentally, a word may be classified as a nominal ism (اسم), verb fiʿil (فعل) or a particle ḥarf (حرف). The set of nominals include nouns, pronouns, adjectives and adverbs. The particles include prepositions, conjunctions and interrogatives, as well as many others. Morphological annotation in the Quranic Arabic corpus divdes words into multiple segments. Each segment is assigned a part-of-speech tag. These tags are detailed in the following sections. In addition to part-of-speech tags, each segment is annotated using a set of multiple morphological features.
The first of the three basic parts-of-speech are the nominals ism (literally "names" in Arabic). The tags for nominals in the Quranic corpus are shown in Figure 1 below:
|PN||اسم علم||Proper noun|
|IMPN||اسم فعل أمر||Imperative verbal noun|
|DEM||اسم اشارة||Demonstrative pronoun|
|REL||اسم موصول||Relative pronoun|
|Adverbs||T||ظرف زمان||Time adverb|
|LOC||ظرف مكان||Location adverb|
Fig 1. Part-of-speech tagset for nominals.
Proper nouns are annotated using the PN tag in the Quranic corpus. In Arabic orthography, there is no distinction between a proper noun and a noun, whereas in English these are written with the first letter capitalized. Proper nouns in Arabic are known by convention and through the fact that they have the grammatical property of being definite even though they do not carry the al- determiner prefix. The set of proper nouns includes personal names such as "the prophet ibrāhīm". In Arabic, proper nouns as known as اسم علم.
Three types of pronoun are identified in the corpus using the tags PRON, DEM and REL. The personal pronouns (PRON) are those which are found in English ("I", "we", "you", "them", "us") together with pronouns found only in Quranic Arabic, such as those inflected for the dual or feminine (for example antumā, "you two"). When segmenting words for morphological annotation, the PRON tag is also used to identify attached pronoun segments, which are suffixes that appear at the end of words. In the case of nouns these are possessive pronouns. For example "his book" is fused into a single Arabic word-form (kitābuhu). Suffixed pronouns attached to verbs will be either subject pronouns or object pronouns.
The DEM tag is used to identify demonstrative pronouns ("this", "that", "these", "those"). In Quranic Arabic, these are termed ism ishāra (literally, "the names of pointing"). The REL tag is used to identify relative pronouns which connect a relative clause to its main clause (for example "the book that you bought"). In Arabic grammar, relative pronouns are known as ism mawṣūl ("the names of connection").
Adjectives (صفة) are closely related to nouns in Quranic Arabic, and it is sometimes not straightforward to distinguish between the two as both carry the same morphological features. For example both nouns and adjectives accept the determiner al- ("the"). The rule followed in the Quranic corpus is to mark a word as an adjective if it is considered to be one according to its syntactic role in the sentence. A nominal tagged as an adjective will directly follow the noun that it describes.
The second of the three basic parts-of-speech is the verb. All verbs in the Quranic corpus are tagged using the V (verb) tag. Each verb is also annotated using multiple morphological features to specify conjugation. In Quranic Arabic, verbs can be conjugated according to three different grammatical aspects (perfect, imperfect and imperative) as well as moods of the imperfect (indicative, subjunctive and jussive). Nouns derived from verbs – such as active and passive participles – are tagged as N (noun) and are annotated using the "derivation" feature.
Fig 2. Verb part-of-speech tag.
The third of the three basic parts-of-speech is the particle. Particles include prepositions, lām prefixes, conjunctions and others. Interrogative particles are tagged using INTG, which includes the independent particle hal and the prefixed interrogative alif. Negative particles in the Quranic Arabic corpus are tagged as NEG. Certain negative particles may place a following imperfect verb into the subjunctive or jussive mood. The VOC tag is used to identify vocative particles and prefixes such as in yā-rabbi. In English this would be roughly translated using the archaic vocative particle "O", as in "O my Lord". Part-of-speech tags for particles and Quranic initials are shown in Figure 3 below.
|lām Prefixes||EMPH||لام التوكيد||Emphatic lām prefix|
|IMPV||لام الامر||Imperative lām prefix|
|PRP||لام التعليل||Purpose lām prefix|
|Conjunctions||CONJ||حرف عطف||Coordinating conjunction|
|SUB||حرف مصدري||Subordinating conjunction|
|Particles||ACC||حرف نصب||Accusative particle|
|AMD||حرف استدراك||Amendment particle|
|ANS||حرف جواب||Answer particle|
|AVR||حرف ردع||Aversion particle|
|CAUS||حرف سببية||Particle of cause|
|CERT||حرف تحقيق||Particle of certainty|
|CIRC||حرف حال||Circumstantial particle|
|COM||واو المعية||Comitative particle|
|COND||حرف شرط||Conditional particle|
|EQ||حرف تسوية||Equalization particle|
|EXH||حرف تحضيض||Exhortation particle|
|EXL||حرف تفصيل||Explanation particle|
|EXP||أداة استثناء||Exceptive particle|
|FUT||حرف استقبال||Future particle|
|INC||حرف ابتداء||Inceptive particle|
|INT||حرف تفسير||Particle of interpretation|
|INTG||حرف استفهام||Interogative particle|
|NEG||حرف نفي||Negative particle|
|PREV||حرف كاف||Preventive particle|
|PRO||حرف نهي||Prohibition particle|
|REM||حرف استئنافية||Resumption particle|
|RES||أداة حصر||Restriction particle|
|RET||حرف اضراب||Retraction particle|
|RSLT||حرف واقع في جواب الشرط||Result particle|
|SUP||حرف زائد||Supplemental particle|
|SUR||حرف فجاءة||Surprise particle|
|VOC||حرف نداء||Vocative particle|
|Disconnected Letters||INL||حروف مقطعة||Quranic initials|
Fig 3. Part-of-speech tagset for particles and the Quranic initials.
In the case of attached prefixes, the prepositions are straightforward. Certain single letter prepositions may be fused to a word as a prefix. These include bi, ka, ta, wa, and one of the senses of lām. The prefixed prepositions ta and wa occur in Quranic Arabic but not are not typically found in the modern standard form of the language. They are used in the Quran as particles of oath, for example ta-allah, "by Allah". In addition independent words may be prepositions. In the Quranic Arabic corpus, prepositions are identified by the P tag. A word is tagged as a preposition if and only if it considered to be a genitive preposition ḥarf jar (حرف جر).
Quranic Initials (Disconnected Letters)
The Quranic initials (or muqattaʿāt in Arabic) are sequences of mysterious letters, such as alif lām mīm (ا ل م), which make up the first verses of several chapters in the Holy Quran but do not combine to form words. These are also known as the disconnected letters (حروف مقطعة). There are numerous suggestions as to the meaning of these letters and so they are given their own part-of-speech tag (INL). This is so as to avoid any assumptions as to their meaning, which would be the case if they were tagged as being proper nouns or as abbreviations (see http://en.wikipedia.org/wiki/muqatta'at).
In the Quran, 30 verses in 29 chapters begin with initials. All initials occur at the first verse, except for those in chapter 42. This chapter has a pair of initials at verses (42:1) and at (42:2). The full meaning behind the Quranic initials is still not yet clearly understood, and there are differing opinions as to their interpretation. One observation is that the initials are almost always followed by a description of Quranic revelation itself. The only occurrence of double initials - at the first two verses of chapter 42 - is followed by two mentions of revelation, at verse (42:3).