org.jqurantree.arabic
Class ByteFormat

java.lang.Object
  extended by org.jqurantree.arabic.ByteFormat

public class ByteFormat
extends java.lang.Object

ByteFormat is a support class which is used to decode the internal byte format of ArabicText. Internally, Arabic text is represented by a byte buffer, with a fixed width for each letter, including its diacritics. This class is used when accessing the byte[] buffer through ArabicText and ArabicCharacter method calls.

Each ArabicCharacter is represented by 3 bytes in the buffer. The first byte encodes the character type. The second and third bytes form a vector of bits. Each diacritic type has a fixed position in the bit vector, and if the bit is set then the diacritic is present. The maximum range of values possible in this encoding scheme would be 256 character types, and combinations of 16 diacritic types. In practice, only 44 CharacterTypes and 13 DiacriticTypes are used.

Author:
Kais Dukes

Field Summary
static int CHARACTER_WIDTH
          The number of bytes representing each ArabicCharacter in the buffer.
static byte WHITESPACE
          The buffer byte value representing a space delimiter.
 
Method Summary
static int getDiacriticCount(byte[] buffer, int offset)
          Gets the number of diacritics attached to the character.
static boolean isAlifKhanjareeya(byte[] buffer, int offset)
          Determines if Alif Khanjareeya is attached.
static boolean isDamma(byte[] buffer, int offset)
          Determines if a Damma is present.
static boolean isDammatan(byte[] buffer, int offset)
          Determines if Dammatan are present.
static boolean isDiacritic(byte[] buffer, int offset, DiacriticType diacriticType)
          Determines if a diacritic is present.
static boolean isFatha(byte[] buffer, int offset)
          Determines if a Fatha is present.
static boolean isFathatan(byte[] buffer, int offset)
          Determines if Fathatan are present.
static boolean isHamzaAbove(byte[] buffer, int offset)
          Determines if a Hamza is present above the character.
static boolean isHamzaBelow(byte[] buffer, int offset)
          Determines if a Hamza is present below the character.
static boolean isHamzatWasl(byte[] buffer, int offset)
          Determines if Hamzat Wasl is attached.
static boolean isKasra(byte[] buffer, int offset)
          Determines if a Kasra is present.
static boolean isKasratan(byte[] buffer, int offset)
          Determines if Kasratan are present.
static boolean isLetter(byte[] buffer, int offset)
          Determines if the character is an Arabic letter.
static boolean isMaddah(byte[] buffer, int offset)
          Determines if a Maddah is present.
static boolean isShadda(byte[] buffer, int offset)
          Determines if a Shadda is present.
static boolean isSingleDiacritic(byte[] buffer, int offset, DiacriticType diacriticType)
          Determines if only a single diacritic is attached.
static boolean isSukun(byte[] buffer, int offset)
          Determines if a Sukun is present.
static void setDiacritic(byte[] buffer, int offset, DiacriticType diacriticType)
          Sets a diacritic as present.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CHARACTER_WIDTH

public static final int CHARACTER_WIDTH
The number of bytes representing each ArabicCharacter in the buffer.

See Also:
Constant Field Values

WHITESPACE

public static final byte WHITESPACE
The buffer byte value representing a space delimiter.

See Also:
Constant Field Values
Method Detail

setDiacritic

public static void setDiacritic(byte[] buffer,
                                int offset,
                                DiacriticType diacriticType)
Sets a diacritic as present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
diacriticType - the type of diacritic to set

isDiacritic

public static boolean isDiacritic(byte[] buffer,
                                  int offset,
                                  DiacriticType diacriticType)
Determines if a diacritic is present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
diacriticType - the type of diacritic
Returns:
true if the diacritic is present; false otherwise

isFatha

public static boolean isFatha(byte[] buffer,
                              int offset)
Determines if a Fatha is present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Fatha is present; false otherwise.

isDamma

public static boolean isDamma(byte[] buffer,
                              int offset)
Determines if a Damma is present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Damma is present; false otherwise.

isKasra

public static boolean isKasra(byte[] buffer,
                              int offset)
Determines if a Kasra is present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Kasra is present; false otherwise.

isFathatan

public static boolean isFathatan(byte[] buffer,
                                 int offset)
Determines if Fathatan are present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if Fathatan are present; false otherwise.

isDammatan

public static boolean isDammatan(byte[] buffer,
                                 int offset)
Determines if Dammatan are present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if Dammatan are present; false otherwise.

isKasratan

public static boolean isKasratan(byte[] buffer,
                                 int offset)
Determines if Kasratan are present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if Kasratan are present; false otherwise.

isShadda

public static boolean isShadda(byte[] buffer,
                               int offset)
Determines if a Shadda is present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Shadda is present; false otherwise.

isSukun

public static boolean isSukun(byte[] buffer,
                              int offset)
Determines if a Sukun is present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Sukun is present; false otherwise.

isMaddah

public static boolean isMaddah(byte[] buffer,
                               int offset)
Determines if a Maddah is present.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Maddah is present; false otherwise.

isHamzaAbove

public static boolean isHamzaAbove(byte[] buffer,
                                   int offset)
Determines if a Hamza is present above the character.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Hamza is present above the character; false otherwise.

isHamzaBelow

public static boolean isHamzaBelow(byte[] buffer,
                                   int offset)
Determines if a Hamza is present below the character.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if a Hamza is present below the character; false otherwise.

isHamzatWasl

public static boolean isHamzatWasl(byte[] buffer,
                                   int offset)
Determines if Hamzat Wasl is attached.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if Hamzat Wasl is attached; false otherwise.

isAlifKhanjareeya

public static boolean isAlifKhanjareeya(byte[] buffer,
                                        int offset)
Determines if Alif Khanjareeya is attached.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if Alif Khanjareeya is attached; false otherwise.

isSingleDiacritic

public static boolean isSingleDiacritic(byte[] buffer,
                                        int offset,
                                        DiacriticType diacriticType)
Determines if only a single diacritic is attached.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
diacriticType - the single diacritic
Returns:
true if the character has only the specified diacritic and no others; false otherwise.

getDiacriticCount

public static int getDiacriticCount(byte[] buffer,
                                    int offset)
Gets the number of diacritics attached to the character.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
a positive number if the character has any attached diacritics, otherwise zero.

isLetter

public static boolean isLetter(byte[] buffer,
                               int offset)
Determines if the character is an Arabic letter.

Parameters:
buffer - the byte[] buffer
offset - the offset of the character
Returns:
true if the character is an Arabic letter, and not a Quranic symbol; false otherwise.


Copyright© Kais Dukes, 2009. All Rights Reserved.