BidiFormatterpublic final class BidiFormatter extends Object Utility class for formatting text for display in a potentially opposite-directionality context
without garbling. The directionality of the context is set at formatter creation and the
directionality of the text can be either estimated or passed in when known. Provides the
following functionality:
1. Bidi Wrapping
When text in one language is mixed into a document in another, opposite-directionality language,
e.g. when an English business name is embedded in a Hebrew web page, both the inserted string
and the text surrounding it may be displayed incorrectly unless the inserted string is explicitly
separated from the surrounding text in a "wrapper" that:
- Declares its directionality so that the string is displayed correctly. This can be done in
Unicode bidi formatting codes by {@link #unicodeWrap} and similar methods.
- Isolates the string's directionality, so it does not unduly affect the surrounding content.
Currently, this can only be done using invisible Unicode characters of the same direction as
the context (LRM or RLM) in addition to the directionality declaration above, thus "resetting"
the directionality to that of the context. The "reset" may need to be done at both ends of the
string. Without "reset" after the string, the string will "stick" to a number or logically
separate opposite-direction text that happens to follow it in-line (even if separated by
neutral content like spaces and punctuation). Without "reset" before the string, the same can
happen there, but only with more opposite-direction text, not a number. One approach is to
"reset" the direction only after each string, on the theory that if the preceding opposite-
direction text is itself bidi-wrapped, the "reset" after it will prevent the sticking. (Doing
the "reset" only before each string definitely does not work because we do not want to require
bidi-wrapping numbers, and a bidi-wrapped opposite-direction string could be followed by a
number.) Still, the safest policy is to do the "reset" on both ends of each string, since RTL
message translations often contain untranslated Latin-script brand names and technical terms,
and one of these can be followed by a bidi-wrapped inserted value. On the other hand, when one
has such a message, it is best to do the "reset" manually in the message translation itself,
since the message's opposite-direction text could be followed by an inserted number, which we
would not bidi-wrap anyway. Thus, "reset" only after the string is the current default. In an
alternative to "reset", recent additions to the HTML, CSS, and Unicode standards allow the
isolation to be part of the directionality declaration. This form of isolation is better than
"reset" because it takes less space, does not require knowing the context directionality, has a
gentler effect than "reset", and protects both ends of the string. However, we do not yet allow
using it because required platforms do not yet support it.
Providing these wrapping services is the basic purpose of the bidi formatter.
2. Directionality estimation
How does one know whether a string about to be inserted into surrounding text has the same
directionality? Well, in many cases, one knows that this must be the case when writing the code
doing the insertion, e.g. when a localized message is inserted into a localized page. In such
cases there is no need to involve the bidi formatter at all. In some other cases, it need not be
the same as the context, but is either constant (e.g. urls are always LTR) or otherwise known.
In the remaining cases, e.g. when the string is user-entered or comes from a database, the
language of the string (and thus its directionality) is not known a priori, and must be
estimated at run-time. The bidi formatter can do this automatically using the default
first-strong estimation algorithm. It can also be configured to use a custom directionality
estimation object. |
Fields Summary |
---|
private static TextDirectionHeuristicCompat | DEFAULT_TEXT_DIRECTION_HEURISTICThe default text direction heuristic. | private static final char | LREUnicode "Left-To-Right Embedding" (LRE) character. | private static final char | RLEUnicode "Right-To-Left Embedding" (RLE) character. | private static final char | PDFUnicode "Pop Directional Formatting" (PDF) character. | private static final char | LRMUnicode "Left-To-Right Mark" (LRM) character. | private static final char | RLM | private static final String | LRM_STRING | private static final String | RLM_STRING | private static final String | EMPTY_STRINGEmpty string constant. | private static final int | FLAG_STEREO_RESET | private static final int | DEFAULT_FLAGS | private static final BidiFormatter | DEFAULT_LTR_INSTANCE | private static final BidiFormatter | DEFAULT_RTL_INSTANCE | private final boolean | mIsRtlContext | private final int | mFlags | private final TextDirectionHeuristicCompat | mDefaultTextDirectionHeuristicCompat | private static final int | DIR_LTREnum for directionality type. | private static final int | DIR_UNKNOWN | private static final int | DIR_RTL |
Constructors Summary |
---|
private BidiFormatter(boolean isRtlContext, int flags, TextDirectionHeuristicCompat heuristic)
mIsRtlContext = isRtlContext;
mFlags = flags;
mDefaultTextDirectionHeuristicCompat = heuristic;
|
Methods Summary |
---|
private static int | getEntryDir(java.lang.String str)Returns the directionality of the first character with strong directionality in the string,
or DIR_UNKNOWN if none was encountered. Treats a non-BN character between an
LRE/RLE/LRO/RLO and its matching PDF as a strong character, LTR after LRE/LRO, and RTL after
RLE/RLO. The results are undefined for a string containing unbalanced LRE/RLE/LRO/RLO/PDF
characters. The intended use is to check whether a logically separate item that ends with a
character of the string's entry directionality and precedes the string inline (not counting
any neutral characters in between) would "stick" to it in an opposite-directionality context,
thus being displayed in an incorrect position. An LRM or RLM character (the one of the
context's directionality) between the two will prevent such sticking.
return new DirectionalityEstimator(str, false /* isHtml */).getEntryDir();
| private static int | getExitDir(java.lang.String str)Returns the directionality of the last character with strong directionality in the string, or
DIR_UNKNOWN if none was encountered. For efficiency, actually scans backwards from the end of
the string. Treats a non-BN character between an LRE/RLE/LRO/RLO and its matching PDF as a
strong character, LTR after LRE/LRO, and RTL after RLE/RLO. The results are undefined for a
string containing unbalanced LRE/RLE/LRO/RLO/PDF characters. The intended use is to check
whether a logically separate item that starts with a number or a character of the string's
exit directionality and follows this string inline (not counting any neutral characters in
between) would "stick" to it in an opposite-directionality context, thus being displayed in
an incorrect position. An LRM or RLM character (the one of the context's directionality)
between the two will prevent such sticking.
return new DirectionalityEstimator(str, false /* isHtml */).getExitDir();
| public static android.support.v4.text.BidiFormatter | getInstance()Factory for creating an instance of BidiFormatter for the default locale directionality.
return new Builder().build();
| public static android.support.v4.text.BidiFormatter | getInstance(boolean rtlContext)Factory for creating an instance of BidiFormatter given the context directionality.
return new Builder(rtlContext).build();
| public static android.support.v4.text.BidiFormatter | getInstance(java.util.Locale locale)Factory for creating an instance of BidiFormatter given the context locale.
return new Builder(locale).build();
| public boolean | getStereoReset()
return (mFlags & FLAG_STEREO_RESET) != 0;
| public boolean | isRtl(java.lang.String str)Estimates the directionality of a string using the default text direction heuristic.
return mDefaultTextDirectionHeuristicCompat.isRtl(str, 0, str.length());
| public boolean | isRtlContext()
return mIsRtlContext;
| private static boolean | isRtlLocale(java.util.Locale locale)Helper method to return true if the Locale directionality is RTL.
return (TextUtilsCompat.getLayoutDirectionFromLocale(locale) == ViewCompat.LAYOUT_DIRECTION_RTL);
| private java.lang.String | markAfter(java.lang.String str, TextDirectionHeuristicCompat heuristic)Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the
overall or the exit directionality of a given string is opposite to the context directionality.
Putting this after the string (including its directionality declaration wrapping) prevents it
from "sticking" to other opposite-directionality text or a number appearing after it inline
with only neutral content in between. Otherwise returns the empty string. While the exit
directionality is determined by scanning the end of the string, the overall directionality is
given explicitly by a heuristic to estimate the {@code str}'s directionality.
final boolean isRtl = heuristic.isRtl(str, 0, str.length());
// getExitDir() is called only if needed (short-circuit).
if (!mIsRtlContext && (isRtl || getExitDir(str) == DIR_RTL)) {
return LRM_STRING;
}
if (mIsRtlContext && (!isRtl || getExitDir(str) == DIR_LTR)) {
return RLM_STRING;
}
return EMPTY_STRING;
| private java.lang.String | markBefore(java.lang.String str, TextDirectionHeuristicCompat heuristic)Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the
overall or the entry directionality of a given string is opposite to the context
directionality. Putting this before the string (including its directionality declaration
wrapping) prevents it from "sticking" to other opposite-directionality text appearing before
it inline with only neutral content in between. Otherwise returns the empty string. While the
entry directionality is determined by scanning the beginning of the string, the overall
directionality is given explicitly by a heuristic to estimate the {@code str}'s directionality.
final boolean isRtl = heuristic.isRtl(str, 0, str.length());
// getEntryDir() is called only if needed (short-circuit).
if (!mIsRtlContext && (isRtl || getEntryDir(str) == DIR_RTL)) {
return LRM_STRING;
}
if (mIsRtlContext && (!isRtl || getEntryDir(str) == DIR_LTR)) {
return RLM_STRING;
}
return EMPTY_STRING;
| public java.lang.String | unicodeWrap(java.lang.String str, TextDirectionHeuristicCompat heuristic, boolean isolate)Formats a string of given directionality for use in plain-text output of the context
directionality, so an opposite-directionality string is neither garbled nor garbles its
surroundings. This makes use of Unicode bidi formatting characters.
The algorithm: In case the given directionality doesn't match the context directionality, wraps
the string with Unicode bidi formatting characters: RLE+{@code str}+PDF for RTL text, or
LRE+{@code str}+PDF for LTR text.
If {@code isolate}, directionally isolates the string so that it does not garble its
surroundings. Currently, this is done by "resetting" the directionality after the string by
appending a trailing Unicode bidi mark matching the context directionality (LRM or RLM) when
either the overall directionality or the exit directionality of the string is opposite to that
of the context. If the formatter was built using {@link Builder#stereoReset(boolean)} and
passing "true" as an argument, also prepends a Unicode bidi mark matching the context
directionality when either the overall directionality or the entry directionality of the
string is opposite to that of the context. Note that as opposed to the overall
directionality, the entry and exit directionalities are determined from the string itself.
Does *not* do HTML-escaping.
final boolean isRtl = heuristic.isRtl(str, 0, str.length());
StringBuilder result = new StringBuilder();
if (getStereoReset() && isolate) {
result.append(markBefore(str,
isRtl ? TextDirectionHeuristicsCompat.RTL : TextDirectionHeuristicsCompat.LTR));
}
if (isRtl != mIsRtlContext) {
result.append(isRtl ? RLE : LRE);
result.append(str);
result.append(PDF);
} else {
result.append(str);
}
if (isolate) {
result.append(markAfter(str,
isRtl ? TextDirectionHeuristicsCompat.RTL : TextDirectionHeuristicsCompat.LTR));
}
return result.toString();
| public java.lang.String | unicodeWrap(java.lang.String str, TextDirectionHeuristicCompat heuristic)Operates like {@link #unicodeWrap(String, android.support.v4.text.TextDirectionHeuristicCompat, boolean)}, but assumes
{@code isolate} is true.
return unicodeWrap(str, heuristic, true /* isolate */);
| public java.lang.String | unicodeWrap(java.lang.String str, boolean isolate)Operates like {@link #unicodeWrap(String, android.support.v4.text.TextDirectionHeuristicCompat, boolean)}, but uses the
formatter's default direction estimation algorithm.
return unicodeWrap(str, mDefaultTextDirectionHeuristicCompat, isolate);
| public java.lang.String | unicodeWrap(java.lang.String str)Operates like {@link #unicodeWrap(String, android.support.v4.text.TextDirectionHeuristicCompat, boolean)}, but uses the
formatter's default direction estimation algorithm and assumes {@code isolate} is true.
return unicodeWrap(str, mDefaultTextDirectionHeuristicCompat, true /* isolate */);
|
|