|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.jqurantree.arabic.ArabicText
public class ArabicText
ArabicText
is an immutable sequence of
ArabicCharacters
. This class is
analogous to the Java String
class, where a String
is an immutable sequence of 2-byte Unicode characters. The character data in
ArabicText
is stored in a byte[]
buffer, with a
fixed width for each letter, including its diacritics.
ArabicCharacter
instances are a view on the buffer. They are
created on demand and are garbage collected. See ByteFormat
for
details on the internal byte format used.
Field Summary | |
---|---|
protected byte[] |
buffer
The byte[] buffer holding the character data. |
protected int |
characterCount
The number of characters in the text. |
protected int |
offset
The offset to the first character in the buffer. |
Constructor Summary | |
---|---|
protected |
ArabicText(byte[] buffer,
int offset,
int characterCount)
Used internally to create a new Arabic text instance, when constructing the orthography model. |
|
ArabicText(java.lang.String text)
Creates a new Arabic text instance from Unicode character data. |
Method Summary | |
---|---|
static ArabicText |
fromBuckwalter(java.lang.String text)
Converts Buckwalter transliteration into Arabic text. |
static ArabicText |
fromEncoding(java.lang.String text,
EncodingType encodingType)
Decodes a string into Arabic text, according to the
specified encoding scheme. |
static ArabicText |
fromUnicode(java.lang.String text)
Converts Unicode character data into Arabic text. |
ArabicCharacter |
getCharacter(int index)
Gets the ArabicCharacter at the specified index. |
CharacterType |
getCharacterType(int index)
Gets the type of ArabicCharacter at the specified index. |
int |
getLength()
Gets the number of characters in the text. |
int |
getLetterCount()
Gets the number of Arabic letters, which is the number of ArabicCharacters excluding any Quranic symbols. |
ArabicText |
getSubstring(int start,
int end)
Gets a new ArabicText instance that is a substring of this
instance. |
java.util.Iterator<ArabicCharacter> |
iterator()
Gets an iterator which may be used to enumerate through each character in the text. |
ArabicText |
removeDiacritics()
Gets a new ArabicText instance that is a copy of this
instance excluding any attached diacritics. |
ArabicText |
removeNonLetters()
Gets a new ArabicText instance that is a copy of this
instance excluding any Quranic symbols. |
java.lang.String |
toBuckwalter()
Converts the Arabic text to Buckwalter transliteration. |
java.lang.String |
toSimpleEncoding()
Converts the Arabic text to simple encoding. |
java.lang.String |
toString()
Converts the Arabic text to Unicode. |
java.lang.String |
toString(EncodingType encodingType)
Converts the Arabic text to a string according the specified
encoding scheme. |
java.lang.String |
toString(EncodingType encodingType,
EncodingOptions options)
Converts the Arabic text to a string according the specified
encoding scheme and encoding options. |
java.lang.String |
toUnicode()
Converts the Arabic text to Unicode. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected final byte[] buffer
byte[]
buffer holding the character data.
protected final int offset
protected final int characterCount
Constructor Detail |
---|
protected ArabicText(byte[] buffer, int offset, int characterCount)
buffer
- the byte[]
buffer holding the character dataoffset
- the offset of the first character in the buffercharacterCount
- the number of characters in the text, each character is
represented by 3 bytes in the bufferpublic ArabicText(java.lang.String text)
text
- a string
holding the text, which will be decoded
from Unicode character dataUnicodeDecoder
Method Detail |
---|
public static ArabicText fromUnicode(java.lang.String text)
text
- a string
holding the text, which will be decoded
from Unicode character data
UnicodeDecoder
public static ArabicText fromBuckwalter(java.lang.String text)
text
- a string
holding the text, which will be decoded
from Buckwalter transliteration
BuckwalterDecoder
public static ArabicText fromEncoding(java.lang.String text, EncodingType encodingType)
string
into Arabic text, according to the
specified encoding scheme.
text
- the string
to decodeencodingType
- the encoding scheme to use
public java.lang.String toString()
toString
in class java.lang.Object
string
representing the Arabic text as Unicode
character dataUnicodeEncoder
public java.lang.String toString(EncodingType encodingType)
string
according the specified
encoding scheme.
encodingType
- the encoding scheme to use
string
representing the Arabic textpublic java.lang.String toString(EncodingType encodingType, EncodingOptions options)
string
according the specified
encoding scheme and encoding options.
encodingType
- the encoding scheme to useoptions
- the encoding options to use when encoding the text
string
representing the Arabic textpublic java.lang.String toUnicode()
string
representing the Arabic text as Unicode
character dataUnicodeEncoder
public java.lang.String toBuckwalter()
string
representing the Arabic text using
Buckwalter transliterationBuckwalterEncoder
public java.lang.String toSimpleEncoding()
string
representing the Arabic text using simple
encodingSimpleEncoder
public int getLength()
public ArabicCharacter getCharacter(int index)
ArabicCharacter
at the specified index. The index
is zero-based, ranging from 0
to
getLength() - 1
, inclusive.
index
- the zero-based index of the character
ArabicCharacter
at the specified indexpublic CharacterType getCharacterType(int index)
ArabicCharacter
at the specified index. The
index is zero-based, ranging from 0
to
getLength() - 1
, inclusive.
index
- the zero-based index of the character
public java.util.Iterator<ArabicCharacter> iterator()
iterator
in interface java.lang.Iterable<ArabicCharacter>
ArabicCharacter
iteratorpublic ArabicText getSubstring(int start, int end)
ArabicText
instance that is a substring of this
instance. The substring begins at the specified start
index,
and ends before the character at the specified end
index.
The length of the substring will be end - start
.
start
- the zero-based start index of the substring, inclusive.end
- the zero-based end index of the substring, exclusive.
public ArabicText removeDiacritics()
ArabicText
instance that is a copy of this
instance excluding any attached diacritics. The number of
ArabicCharacters
in both strings will be equal. The returned
string will not have any attached diacritics, so calls to methods such as
ArabicCharacter.isFatha()
will return false
.
public int getLetterCount()
ArabicCharacters
excluding any Quranic symbols. The value
returned will always be less or equal to getLength()
.
public ArabicText removeNonLetters()
ArabicText
instance that is a copy of this
instance excluding any Quranic symbols. The returned string will have
only Arabic letters, so that getLetterCount() == getLength()
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |