File Doc Category Size Date Package
BidiFormatter.java API Doc Android 5.1 API 40260 Thu Mar 12 22:22:56 GMT 2015 android.support.v4.text

BidiFormatter

java.lang.Object

public final class BidiFormatter extends Object

Utility class for formatting text for display in a potentially opposite-directionality context without garbling. The directionality of the context is set at formatter creation and the directionality of the text can be either estimated or passed in when known. Provides the following functionality:

1. Bidi Wrapping When text in one language is mixed into a document in another, opposite-directionality language, e.g. when an English business name is embedded in a Hebrew web page, both the inserted string and the text surrounding it may be displayed incorrectly unless the inserted string is explicitly separated from the surrounding text in a "wrapper" that:

- Declares its directionality so that the string is displayed correctly. This can be done in Unicode bidi formatting codes by {@link #unicodeWrap} and similar methods.

- Isolates the string's directionality, so it does not unduly affect the surrounding content. Currently, this can only be done using invisible Unicode characters of the same direction as the context (LRM or RLM) in addition to the directionality declaration above, thus "resetting" the directionality to that of the context. The "reset" may need to be done at both ends of the string. Without "reset" after the string, the string will "stick" to a number or logically separate opposite-direction text that happens to follow it in-line (even if separated by neutral content like spaces and punctuation). Without "reset" before the string, the same can happen there, but only with more opposite-direction text, not a number. One approach is to "reset" the direction only after each string, on the theory that if the preceding opposite- direction text is itself bidi-wrapped, the "reset" after it will prevent the sticking. (Doing the "reset" only before each string definitely does not work because we do not want to require bidi-wrapping numbers, and a bidi-wrapped opposite-direction string could be followed by a number.) Still, the safest policy is to do the "reset" on both ends of each string, since RTL message translations often contain untranslated Latin-script brand names and technical terms, and one of these can be followed by a bidi-wrapped inserted value. On the other hand, when one has such a message, it is best to do the "reset" manually in the message translation itself, since the message's opposite-direction text could be followed by an inserted number, which we would not bidi-wrap anyway. Thus, "reset" only after the string is the current default. In an alternative to "reset", recent additions to the HTML, CSS, and Unicode standards allow the isolation to be part of the directionality declaration. This form of isolation is better than "reset" because it takes less space, does not require knowing the context directionality, has a gentler effect than "reset", and protects both ends of the string. However, we do not yet allow using it because required platforms do not yet support it.

Providing these wrapping services is the basic purpose of the bidi formatter.

2. Directionality estimation How does one know whether a string about to be inserted into surrounding text has the same directionality? Well, in many cases, one knows that this must be the case when writing the code doing the insertion, e.g. when a localized message is inserted into a localized page. In such cases there is no need to involve the bidi formatter at all. In some other cases, it need not be the same as the context, but is either constant (e.g. urls are always LTR) or otherwise known. In the remaining cases, e.g. when the string is user-entered or comes from a database, the language of the string (and thus its directionality) is not known a priori, and must be estimated at run-time. The bidi formatter can do this automatically using the default first-strong estimation algorithm. It can also be configured to use a custom directionality estimation object.

Fields Summary
private static TextDirectionHeuristicCompat
DEFAULT_TEXT_DIRECTION_HEURISTIC
The default text direction heuristic.
private static final char
LRE
Unicode "Left-To-Right Embedding" (LRE) character.
private static final char
RLE
Unicode "Right-To-Left Embedding" (RLE) character.
private static final char
PDF
Unicode "Pop Directional Formatting" (PDF) character.
private static final char
LRM
Unicode "Left-To-Right Mark" (LRM) character.
private static final char
RLM
private static final String
LRM_STRING
private static final String
RLM_STRING
private static final String
EMPTY_STRING
Empty string constant.
private static final int
FLAG_STEREO_RESET
private static final int
DEFAULT_FLAGS
private static final BidiFormatter
DEFAULT_LTR_INSTANCE
private static final BidiFormatter
DEFAULT_RTL_INSTANCE
private final boolean
mIsRtlContext
private final int
mFlags
private final TextDirectionHeuristicCompat
mDefaultTextDirectionHeuristicCompat
private static final int
DIR_LTR
Enum for directionality type.
private static final int
DIR_UNKNOWN
private static final int
DIR_RTL
Constructors Summary
private BidiFormatter(boolean isRtlContext, int flags, TextDirectionHeuristicCompat heuristic)
param
isRtlContext Whether the context directionality is RTL or not.
param
flags The option flags.
param
heuristic The default text direction heuristic.
mIsRtlContext = isRtlContext; mFlags = flags; mDefaultTextDirectionHeuristicCompat = heuristic;
Methods Summary
private static int getEntryDir(java.lang.String str)
Returns the directionality of the first character with strong directionality in the string, or DIR_UNKNOWN if none was encountered. Treats a non-BN character between an LRE/RLE/LRO/RLO and its matching PDF as a strong character, LTR after LRE/LRO, and RTL after RLE/RLO. The results are undefined for a string containing unbalanced LRE/RLE/LRO/RLO/PDF characters. The intended use is to check whether a logically separate item that ends with a character of the string's entry directionality and precedes the string inline (not counting any neutral characters in between) would "stick" to it in an opposite-directionality context, thus being displayed in an incorrect position. An LRM or RLM character (the one of the context's directionality) between the two will prevent such sticking.
param
str the string to check.
return new DirectionalityEstimator(str, false /* isHtml */).getEntryDir();
private static int getExitDir(java.lang.String str)
Returns the directionality of the last character with strong directionality in the string, or DIR_UNKNOWN if none was encountered. For efficiency, actually scans backwards from the end of the string. Treats a non-BN character between an LRE/RLE/LRO/RLO and its matching PDF as a strong character, LTR after LRE/LRO, and RTL after RLE/RLO. The results are undefined for a string containing unbalanced LRE/RLE/LRO/RLO/PDF characters. The intended use is to check whether a logically separate item that starts with a number or a character of the string's exit directionality and follows this string inline (not counting any neutral characters in between) would "stick" to it in an opposite-directionality context, thus being displayed in an incorrect position. An LRM or RLM character (the one of the context's directionality) between the two will prevent such sticking.
param
str the string to check.
return new DirectionalityEstimator(str, false /* isHtml */).getExitDir();
public static android.support.v4.text.BidiFormatter getInstance()
Factory for creating an instance of BidiFormatter for the default locale directionality.
return new Builder().build();
public static android.support.v4.text.BidiFormatter getInstance(boolean rtlContext)
Factory for creating an instance of BidiFormatter given the context directionality.
param
rtlContext Whether the context directionality is RTL.
return new Builder(rtlContext).build();
public static android.support.v4.text.BidiFormatter getInstance(java.util.Locale locale)
Factory for creating an instance of BidiFormatter given the context locale.
param
locale The context locale.
return new Builder(locale).build();
public boolean getStereoReset()
return
Whether directionality "reset" should also be done before a string being bidi-wrapped, not just after it.
return (mFlags & FLAG_STEREO_RESET) != 0;
public boolean isRtl(java.lang.String str)
Estimates the directionality of a string using the default text direction heuristic.
param
str String whose directionality is to be estimated.
return
true if {@code str}'s estimated overall directionality is RTL. Otherwise returns false.
return mDefaultTextDirectionHeuristicCompat.isRtl(str, 0, str.length());
public boolean isRtlContext()
return
Whether the context directionality is RTL
return mIsRtlContext;
private static boolean isRtlLocale(java.util.Locale locale)
Helper method to return true if the Locale directionality is RTL.
param
locale The Locale whose directionality will be checked to be RTL or LTR
return
true if the {@code locale} directionality is RTL. False otherwise.
return (TextUtilsCompat.getLayoutDirectionFromLocale(locale) == ViewCompat.LAYOUT_DIRECTION_RTL);
private java.lang.String markAfter(java.lang.String str, TextDirectionHeuristicCompat heuristic)
Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the overall or the exit directionality of a given string is opposite to the context directionality. Putting this after the string (including its directionality declaration wrapping) prevents it from "sticking" to other opposite-directionality text or a number appearing after it inline with only neutral content in between. Otherwise returns the empty string. While the exit directionality is determined by scanning the end of the string, the overall directionality is given explicitly by a heuristic to estimate the {@code str}'s directionality.
param
str String after which the mark may need to appear.
param
heuristic The text direction heuristic that will be used to estimate the {@code str}'s directionality.
return
LRM for RTL text in LTR context; RLM for LTR text in RTL context; else, the empty string.
final boolean isRtl = heuristic.isRtl(str, 0, str.length()); // getExitDir() is called only if needed (short-circuit). if (!mIsRtlContext && (isRtl || getExitDir(str) == DIR_RTL)) { return LRM_STRING; } if (mIsRtlContext && (!isRtl || getExitDir(str) == DIR_LTR)) { return RLM_STRING; } return EMPTY_STRING;
private java.lang.String markBefore(java.lang.String str, TextDirectionHeuristicCompat heuristic)
Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the overall or the entry directionality of a given string is opposite to the context directionality. Putting this before the string (including its directionality declaration wrapping) prevents it from "sticking" to other opposite-directionality text appearing before it inline with only neutral content in between. Otherwise returns the empty string. While the entry directionality is determined by scanning the beginning of the string, the overall directionality is given explicitly by a heuristic to estimate the {@code str}'s directionality.
param
str String before which the mark may need to appear.
param
heuristic The text direction heuristic that will be used to estimate the {@code str}'s directionality.
return
LRM for RTL text in LTR context; RLM for LTR text in RTL context; else, the empty string.
final boolean isRtl = heuristic.isRtl(str, 0, str.length()); // getEntryDir() is called only if needed (short-circuit). if (!mIsRtlContext && (isRtl || getEntryDir(str) == DIR_RTL)) { return LRM_STRING; } if (mIsRtlContext && (!isRtl || getEntryDir(str) == DIR_LTR)) { return RLM_STRING; } return EMPTY_STRING;
public java.lang.String unicodeWrap(java.lang.String str, TextDirectionHeuristicCompat heuristic, boolean isolate)
Formats a string of given directionality for use in plain-text output of the context directionality, so an opposite-directionality string is neither garbled nor garbles its surroundings. This makes use of Unicode bidi formatting characters.
The algorithm: In case the given directionality doesn't match the context directionality, wraps the string with Unicode bidi formatting characters: RLE+{@code str}+PDF for RTL text, or LRE+{@code str}+PDF for LTR text.
If {@code isolate}, directionally isolates the string so that it does not garble its surroundings. Currently, this is done by "resetting" the directionality after the string by appending a trailing Unicode bidi mark matching the context directionality (LRM or RLM) when either the overall directionality or the exit directionality of the string is opposite to that of the context. If the formatter was built using {@link Builder#stereoReset(boolean)} and passing "true" as an argument, also prepends a Unicode bidi mark matching the context directionality when either the overall directionality or the entry directionality of the string is opposite to that of the context. Note that as opposed to the overall directionality, the entry and exit directionalities are determined from the string itself.
Does *not* do HTML-escaping.
param
str The input string.
param
heuristic The algorithm to be used to estimate the string's overall direction.
param
isolate Whether to directionally isolate the string to prevent it from garbling the content around it
return
Input string after applying the above processing.
final boolean isRtl = heuristic.isRtl(str, 0, str.length()); StringBuilder result = new StringBuilder(); if (getStereoReset() && isolate) { result.append(markBefore(str, isRtl ? TextDirectionHeuristicsCompat.RTL : TextDirectionHeuristicsCompat.LTR)); } if (isRtl != mIsRtlContext) { result.append(isRtl ? RLE : LRE); result.append(str); result.append(PDF); } else { result.append(str); } if (isolate) { result.append(markAfter(str, isRtl ? TextDirectionHeuristicsCompat.RTL : TextDirectionHeuristicsCompat.LTR)); } return result.toString();
public java.lang.String unicodeWrap(java.lang.String str, TextDirectionHeuristicCompat heuristic)
Operates like {@link #unicodeWrap(String, android.support.v4.text.TextDirectionHeuristicCompat, boolean)}, but assumes {@code isolate} is true.
param
str The input string.
param
heuristic The algorithm to be used to estimate the string's overall direction.
return
Input string after applying the above processing.
return unicodeWrap(str, heuristic, true /* isolate */);
public java.lang.String unicodeWrap(java.lang.String str, boolean isolate)
Operates like {@link #unicodeWrap(String, android.support.v4.text.TextDirectionHeuristicCompat, boolean)}, but uses the formatter's default direction estimation algorithm.
param
str The input string.
param
isolate Whether to directionally isolate the string to prevent it from garbling the content around it
return
Input string after applying the above processing.
return unicodeWrap(str, mDefaultTextDirectionHeuristicCompat, isolate);
public java.lang.String unicodeWrap(java.lang.String str)
Operates like {@link #unicodeWrap(String, android.support.v4.text.TextDirectionHeuristicCompat, boolean)}, but uses the formatter's default direction estimation algorithm and assumes {@code isolate} is true.
param
str The input string.
return
Input string after applying the above processing.
return unicodeWrap(str, mDefaultTextDirectionHeuristicCompat, true /* isolate */);