org.jqurantree.arabic
Class ArabicText

java.lang.Object
  extended by org.jqurantree.arabic.ArabicText
All Implemented Interfaces:
java.lang.Iterable<ArabicCharacter>
Direct Known Subclasses:
Token, Verse

public class ArabicText
extends java.lang.Object
implements java.lang.Iterable<ArabicCharacter>

ArabicText is an immutable sequence of ArabicCharacters. This class is analogous to the Java String class, where a String is an immutable sequence of 2-byte Unicode characters. The character data in ArabicText is stored in a byte[] buffer, with a fixed width for each letter, including its diacritics. ArabicCharacter instances are a view on the buffer. They are created on demand and are garbage collected. See ByteFormat for details on the internal byte format used.

Author:
Kais Dukes

Field Summary
protected  byte[] buffer
          The byte[] buffer holding the character data.
protected  int characterCount
          The number of characters in the text.
protected  int offset
          The offset to the first character in the buffer.
 
Constructor Summary
protected ArabicText(byte[] buffer, int offset, int characterCount)
          Used internally to create a new Arabic text instance, when constructing the orthography model.
  ArabicText(java.lang.String text)
          Creates a new Arabic text instance from Unicode character data.
 
Method Summary
static ArabicText fromBuckwalter(java.lang.String text)
          Converts Buckwalter transliteration into Arabic text.
static ArabicText fromEncoding(java.lang.String text, EncodingType encodingType)
          Decodes a string into Arabic text, according to the specified encoding scheme.
static ArabicText fromUnicode(java.lang.String text)
          Converts Unicode character data into Arabic text.
 ArabicCharacter getCharacter(int index)
          Gets the ArabicCharacter at the specified index.
 CharacterType getCharacterType(int index)
          Gets the type of ArabicCharacter at the specified index.
 int getLength()
          Gets the number of characters in the text.
 int getLetterCount()
          Gets the number of Arabic letters, which is the number of ArabicCharacters excluding any Quranic symbols.
 ArabicText getSubstring(int start, int end)
          Gets a new ArabicText instance that is a substring of this instance.
 java.util.Iterator<ArabicCharacter> iterator()
          Gets an iterator which may be used to enumerate through each character in the text.
 ArabicText removeDiacritics()
          Gets a new ArabicText instance that is a copy of this instance excluding any attached diacritics.
 ArabicText removeNonLetters()
          Gets a new ArabicText instance that is a copy of this instance excluding any Quranic symbols.
 java.lang.String toBuckwalter()
          Converts the Arabic text to Buckwalter transliteration.
 java.lang.String toSimpleEncoding()
          Converts the Arabic text to simple encoding.
 java.lang.String toString()
          Converts the Arabic text to Unicode.
 java.lang.String toString(EncodingType encodingType)
          Converts the Arabic text to a string according the specified encoding scheme.
 java.lang.String toString(EncodingType encodingType, EncodingOptions options)
          Converts the Arabic text to a string according the specified encoding scheme and encoding options.
 java.lang.String toUnicode()
          Converts the Arabic text to Unicode.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

buffer

protected final byte[] buffer
The byte[] buffer holding the character data.


offset

protected final int offset
The offset to the first character in the buffer.


characterCount

protected final int characterCount
The number of characters in the text. Each character is represented by 3 bytes in the buffer.

Constructor Detail

ArabicText

protected ArabicText(byte[] buffer,
                     int offset,
                     int characterCount)
Used internally to create a new Arabic text instance, when constructing the orthography model.

Parameters:
buffer - the byte[] buffer holding the character data
offset - the offset of the first character in the buffer
characterCount - the number of characters in the text, each character is represented by 3 bytes in the buffer

ArabicText

public ArabicText(java.lang.String text)
Creates a new Arabic text instance from Unicode character data.

Parameters:
text - a string holding the text, which will be decoded from Unicode character data
See Also:
UnicodeDecoder
Method Detail

fromUnicode

public static ArabicText fromUnicode(java.lang.String text)
Converts Unicode character data into Arabic text.

Parameters:
text - a string holding the text, which will be decoded from Unicode character data
Returns:
a new Arabic text instance
See Also:
UnicodeDecoder

fromBuckwalter

public static ArabicText fromBuckwalter(java.lang.String text)
Converts Buckwalter transliteration into Arabic text.

Parameters:
text - a string holding the text, which will be decoded from Buckwalter transliteration
Returns:
a new Arabic text instance
See Also:
BuckwalterDecoder

fromEncoding

public static ArabicText fromEncoding(java.lang.String text,
                                      EncodingType encodingType)
Decodes a string into Arabic text, according to the specified encoding scheme.

Parameters:
text - the string to decode
encodingType - the encoding scheme to use
Returns:
a new Arabic text instance

toString

public java.lang.String toString()
Converts the Arabic text to Unicode.

Overrides:
toString in class java.lang.Object
Returns:
a string representing the Arabic text as Unicode character data
See Also:
UnicodeEncoder

toString

public java.lang.String toString(EncodingType encodingType)
Converts the Arabic text to a string according the specified encoding scheme.

Parameters:
encodingType - the encoding scheme to use
Returns:
a string representing the Arabic text

toString

public java.lang.String toString(EncodingType encodingType,
                                 EncodingOptions options)
Converts the Arabic text to a string according the specified encoding scheme and encoding options.

Parameters:
encodingType - the encoding scheme to use
options - the encoding options to use when encoding the text
Returns:
a string representing the Arabic text

toUnicode

public java.lang.String toUnicode()
Converts the Arabic text to Unicode.

Returns:
a string representing the Arabic text as Unicode character data
See Also:
UnicodeEncoder

toBuckwalter

public java.lang.String toBuckwalter()
Converts the Arabic text to Buckwalter transliteration.

Returns:
a string representing the Arabic text using Buckwalter transliteration
See Also:
BuckwalterEncoder

toSimpleEncoding

public java.lang.String toSimpleEncoding()
Converts the Arabic text to simple encoding.

Returns:
a string representing the Arabic text using simple encoding
See Also:
SimpleEncoder

getLength

public int getLength()
Gets the number of characters in the text. Each Arabic letter or Quranic symbol, including any attached diacritics, counts as a single character.

Returns:
the number of characters, a positive integer

getCharacter

public ArabicCharacter getCharacter(int index)
Gets the ArabicCharacter at the specified index. The index is zero-based, ranging from 0 to getLength() - 1, inclusive.

Parameters:
index - the zero-based index of the character
Returns:
the ArabicCharacter at the specified index

getCharacterType

public CharacterType getCharacterType(int index)
Gets the type of ArabicCharacter at the specified index. The index is zero-based, ranging from 0 to getLength() - 1, inclusive.

Parameters:
index - the zero-based index of the character
Returns:
the type of Arabic letter or Quranic symbol at the specified index, such as Alif or Ba

iterator

public java.util.Iterator<ArabicCharacter> iterator()
Gets an iterator which may be used to enumerate through each character in the text.

Specified by:
iterator in interface java.lang.Iterable<ArabicCharacter>
Returns:
an ArabicCharacter iterator

getSubstring

public ArabicText getSubstring(int start,
                               int end)
Gets a new ArabicText instance that is a substring of this instance. The substring begins at the specified start index, and ends before the character at the specified end index. The length of the substring will be end - start.

Parameters:
start - the zero-based start index of the substring, inclusive.
end - the zero-based end index of the substring, exclusive.
Returns:
the specified substring as Arabic text

removeDiacritics

public ArabicText removeDiacritics()
Gets a new ArabicText instance that is a copy of this instance excluding any attached diacritics. The number of ArabicCharacters in both strings will be equal. The returned string will not have any attached diacritics, so calls to methods such as ArabicCharacter.isFatha() will return false.

Returns:
a new Arabic text instance without diacritics

getLetterCount

public int getLetterCount()
Gets the number of Arabic letters, which is the number of ArabicCharacters excluding any Quranic symbols. The value returned will always be less or equal to getLength().

Returns:
the number of Arabic letters in the text

removeNonLetters

public ArabicText removeNonLetters()
Gets a new ArabicText instance that is a copy of this instance excluding any Quranic symbols. The returned string will have only Arabic letters, so that getLetterCount() == getLength() .

Returns:
a new Arabic text instance with ony Arabic letters


Copyright© Kais Dukes, 2009. All Rights Reserved.