org.jqurantree.arabic.encoding
Class ArabicEncoderBase

java.lang.Object
  extended by org.jqurantree.arabic.encoding.ArabicEncoderBase
All Implemented Interfaces:
ArabicEncoder
Direct Known Subclasses:
BuckwalterEncoder, SimpleEncoder, UnicodeEncoder

public abstract class ArabicEncoderBase
extends java.lang.Object
implements ArabicEncoder

ArabicEncoderBase is an abstract base class providing a common implementation for ArabicText encoders. The class supports the ArabicText.toString(EncodingType) method by implementing table-driven encoding. An EncodingTableBase instance is used to lookup the mapping for each character in the source text.

The following encoding algorithm is reversible, ensuring that round trip testing is possible. For each ArabicCharacter:

Step 1. If the letter or Quranic symbol has a diacritic that forms a well known combination, then map this onto a single output character. If Hamza above was the diacritic used, then remove this from the list of diacritics to consider. The 6 well known combinations are:

- Alif/Waw/Ya + Hamza above
- Alif + Hamza below
- Alif + Hamzat wasl
- Alif + Khanjareeya (superscript Alif)

Step 2. If Step 1 did not apply, then use the EncodingTableBase instance to determine the output character to use for the letter or Quranic symbol, without its diacritics.

Step 3. Use the encoding table to form output characters out any remaining diacritics, in the following order:

- Hamza above
- Shadda
- Fathatan
- Dammatan
- Kasratan
- Fatha
- Damma
- Kasra
- Sukun
- Maddah

Author:
Kais Dukes

Field Summary
protected  java.lang.StringBuilder text
          A string buffer used to hold the encoder's plain text output.
 
Constructor Summary
protected ArabicEncoderBase()
          Creates a new encoder.
protected ArabicEncoderBase(EncodingTableBase encodingTable)
          Creates a new encoder using the specified encoding table.
 
Method Summary
 java.lang.String encode(byte[] buffer, int offset, int characterCount, EncodingOptions options)
          Encodes the internal ByteFormat into plain text according to the encoding scheme.
protected  void encodeCharacter(byte[] buffer, int offset)
          Encodes a single ArabicCharacter in the internal ByteFormat.
protected  void writeCharacterSeperator()
          Overriden by derived encoders to write a seperator between each ArabicCharacter.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

text

protected final java.lang.StringBuilder text
A string buffer used to hold the encoder's plain text output.

Constructor Detail

ArabicEncoderBase

protected ArabicEncoderBase()
Creates a new encoder.


ArabicEncoderBase

protected ArabicEncoderBase(EncodingTableBase encodingTable)
Creates a new encoder using the specified encoding table.

Parameters:
encodingTable - the encoding table to use when performing table-driven encoding.
Method Detail

encode

public java.lang.String encode(byte[] buffer,
                               int offset,
                               int characterCount,
                               EncodingOptions options)
Description copied from interface: ArabicEncoder
Encodes the internal ByteFormat into plain text according to the encoding scheme.

Specified by:
encode in interface ArabicEncoder
Parameters:
buffer - the byte[] array to encode in the internal ByteFormat
offset - the starting offset in the buffer
characterCount - the number of characters to encode. Each character is represented by 3 bytes in the buffer.
Returns:
a plain text string

writeCharacterSeperator

protected void writeCharacterSeperator()
Overriden by derived encoders to write a seperator between each ArabicCharacter.


encodeCharacter

protected void encodeCharacter(byte[] buffer,
                               int offset)
Encodes a single ArabicCharacter in the internal ByteFormat.

Parameters:
buffer - the byte[] buffer holding the character
offset - the offset of the character within the buffer


Copyright© Kais Dukes, 2009. All Rights Reserved.