Qur'an | Word by Word | Audio | Prayer Times
__ Sign In
 
__

Java API - Character Frequency Example

__

This example generates a frequency table, which shows the number of occurrences of each letter within the Quranic text. An analysis table is used to tabulate and group the results.

Java Example

public class CharacterFrequencyExample {

    public static void main() {

        // Create a new analysis table.
        AnalysisTable table = new AnalysisTable("Character");

        // Add each character to the table.
        for (Token token : Document.getTokens()) {
            for (ArabicCharacter character : token) {
                if (character.getType() != null) {
                    table.add(character.getType());
                }
            }
        }

        // Group and display results.
        AnalysisTable groupTable = table.group("Character");
        groupTable.sort("Count", SortOrder.Descending);
        System.out.println(groupTable);
    }
}

Program Output

Character                       Count
---------                       -----
Alif                            62381
Lam                             38102
Noon                            27268
Meem                            26735
Waw                             25676
Ya                              19143
Ha                              14850
Ra                              12403
Ba                              11491
Ta                              10520
Kaf                             10497
Ain                             9405
Fa                              8747
Qaf                             7034
AlifMaksura                     6605
Seen                            6010
Dal                             5991
Thal                            4932
HHa                             4140
SmallHighRoundedZero            3988
Jeem                            3317
Hamza                           3059
Kha                             2497
TaMarbuta                       2344
Sheen                           2124
Sad                             2074
DDad                            1686
Zain                            1599
Tha                             1414
TTa                             1273
SmallWaw                        1257
Ghain                           1221
SmallYa                         995
DTha                            853
SmallHighMeemIsolatedForm       510
Tatweel                         495
SmallLowMeem                    99
SmallHighUprightRectangularZero 66
SmallHighSeen                   2
SmallLowSeen                    1
SmallHighNoon                   1
RoundedHighStopWithFilledCentre 1
EmptyCentreLowStop              1
EmptyCentreHighStop             1

Discussion

The results show the most common characters in the Quranic text, together with their frequency counts. The characters listed above are defined in the orthography model, and are loaded from Tanzil XML. The table shows 62381 occurrences of alif, including occurrences with hamza and alif khanjarīya. Within the orthography model, the most frequent letters are alif, lām, nūn then mīm.

See Also

Language Research Group
University of Leeds
__