This example generates a frequency table, which shows the number of occurrences of each letter within the Quranic text. An analysis table is used to tabulate and group the results.
Java Example
public class CharacterFrequencyExample { public static void main() { // Create a new analysis table. AnalysisTable table = new AnalysisTable("Character"); // Add each character to the table. for (Token token : Document.getTokens()) { for (ArabicCharacter character : token) { if (character.getType() != null) { table.add(character.getType()); } } } // Group and display results. AnalysisTable groupTable = table.group("Character"); groupTable.sort("Count", SortOrder.Descending); System.out.println(groupTable); } }
Program Output
Character Count --------- ----- Alif 62381 Lam 38102 Noon 27268 Meem 26735 Waw 25676 Ya 19143 Ha 14850 Ra 12403 Ba 11491 Ta 10520 Kaf 10497 Ain 9405 Fa 8747 Qaf 7034 AlifMaksura 6605 Seen 6010 Dal 5991 Thal 4932 HHa 4140 SmallHighRoundedZero 3988 Jeem 3317 Hamza 3059 Kha 2497 TaMarbuta 2344 Sheen 2124 Sad 2074 DDad 1686 Zain 1599 Tha 1414 TTa 1273 SmallWaw 1257 Ghain 1221 SmallYa 995 DTha 853 SmallHighMeemIsolatedForm 510 Tatweel 495 SmallLowMeem 99 SmallHighUprightRectangularZero 66 SmallHighSeen 2 SmallLowSeen 1 SmallHighNoon 1 RoundedHighStopWithFilledCentre 1 EmptyCentreLowStop 1 EmptyCentreHighStop 1
Discussion
The results show the most common characters in the Quranic text, together with their frequency counts. The characters listed above are defined in the orthography model, and are loaded from Tanzil XML. The table shows 62381 occurrences of alif, including occurrences with hamza and alif khanjarīya. Within the orthography model, the most frequent letters are alif, lām, nūn then mīm.