<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://okapiframework.org/wiki/index.php?action=history&amp;feed=atom&amp;title=SRX_and_Java</id>
	<title>SRX and Java - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://okapiframework.org/wiki/index.php?action=history&amp;feed=atom&amp;title=SRX_and_Java"/>
	<link rel="alternate" type="text/html" href="http://okapiframework.org/wiki/index.php?title=SRX_and_Java&amp;action=history"/>
	<updated>2026-05-24T09:32:39Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.38.2</generator>
	<entry>
		<id>http://okapiframework.org/wiki/index.php?title=SRX_and_Java&amp;diff=327&amp;oldid=prev</id>
		<title>Ysavourel: 1 revision imported</title>
		<link rel="alternate" type="text/html" href="http://okapiframework.org/wiki/index.php?title=SRX_and_Java&amp;diff=327&amp;oldid=prev"/>
		<updated>2016-06-04T23:19:59Z</updated>

		<summary type="html">&lt;p&gt;1 revision imported&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;1&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;1&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 19:19, 4 June 2016&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-notice&quot; lang=&quot;en&quot;&gt;&lt;div class=&quot;mw-diff-empty&quot;&gt;(No difference)&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</summary>
		<author><name>Ysavourel</name></author>
	</entry>
	<entry>
		<id>http://okapiframework.org/wiki/index.php?title=SRX_and_Java&amp;diff=326&amp;oldid=prev</id>
		<title>Jhargraveiii at 18:04, 15 July 2015</title>
		<link rel="alternate" type="text/html" href="http://okapiframework.org/wiki/index.php?title=SRX_and_Java&amp;diff=326&amp;oldid=prev"/>
		<updated>2015-07-15T18:04:58Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;The SRX 2.0 standard is based on the [http://www.gala-global.org/oscarStandards/srx/srx20.html#Intro_RegExp ICU regular expression notation].&lt;br /&gt;
&lt;br /&gt;
Many Java applications use Java's regular expressions to implement [[SRX]] because ICU4J (ICU for Java) does not provide support of ICU regular expressions.&lt;br /&gt;
&lt;br /&gt;
As of version 1.7 Java has support for most of the Unicode-enabled features as described in ICU. For example in Java &amp;quot;&amp;lt;code&amp;gt;\w&amp;lt;/code&amp;gt;&amp;quot; means &amp;quot;&amp;lt;code&amp;gt;[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]&amp;lt;/code&amp;gt;&amp;quot; like in ICU. Some ICU features can be replaced by an equivalent expression in Java, but some other features simply cannot be implemented in Java.&lt;br /&gt;
&lt;br /&gt;
The following table shows the ICU and Java differences (assuming the UNICODE_CHARACTER_CLASS flag is set). The &amp;lt;span class=&amp;quot;hi&amp;quot;&amp;gt;yellow entries&amp;lt;/span&amp;gt; denote a case where the ICU expression needs to be mapped to a Java equivalent (sometimes a complex one), and the &amp;lt;span class=&amp;quot;red&amp;quot;&amp;gt;red entries&amp;lt;/span&amp;gt; indicate the cases where the ICU expression cannot be mapped in Java.&lt;br /&gt;
&lt;br /&gt;
{{NoteBox|Starting in M28, '''the Okapi implementation of SRX no longer uses the ICU Regex option by default.''' Java patterns are used and Unicode processing enabled via the UNICODE_CHARACTER_CLASS flag. (You can test this for example in [[Ratel]]).}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot; cellpadding=&amp;quot;5&amp;quot; cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|+&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| '''ICU Meta Character''' || '''Java Equivalent''' || '''ICU Description''' &lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \a || same || Match a BELL, \u0007&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \A || same || Match at the beginning of the input. Differs from ^ in that \A will not match after a new line within the input.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \b, outside of a set || same || Match if the current position is a word boundary. Boundaries occur at the transitions between word (\w) and non-word (\W) characters, with combining marks ignored. And the option UREGEX_UWORD is assumed to be NOT set (default).&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \b, within a set || \b is invalid when within a set.&amp;lt;br/&amp;gt;Use \u0008 instead. || Match a BACKSPACE, \u0008.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \B || same || Match if the current position is not a word boundary. And the option UREGEX_UWORD is assumed to be NOT set (default).&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \cX || same || Match a control-X character.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \d || same || Match any character with the Unicode General Category of Nd (Number, Decimal Digit.)&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \D || same || Match any character that is not a decimal digit.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \e || same || Match an ESCAPE, \u001B.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \E || same || Terminates a \Q ... \E quoted sequence.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \f || same || Match a FORM FEED, \u000C.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \G || same || Match if the current position is at the end of the previous match.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \n || same || Match a LINE FEED, \u000A.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| style=&amp;quot;background-color:red;color:white;&amp;quot;|\N{UNICODE CHARACTER NAME} || Does not exists || Match the named character.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \p{UNICODE PROPERTY NAME} || same || Match any character with the specified Unicode Property.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \P{UNICODE PROPERTY NAME} || same || Match any character not having the specified Unicode Property.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \Q || same || Quotes all following characters until \E.  &lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \r || same || Match a CARRIAGE RETURN, \u000D.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \s || same || Match a white space character. White space is defined as [\t\n\f\r\p{Z}].&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \S || same || Match a non-white space character.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \t || same ||Match a HORIZONTAL TABULATION, \u0009.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
|\uhhhh ||same || Match the character with the hex value hhhh.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| style=&amp;quot;background-color:red;color:white;&amp;quot;|\Uhhhhhhhh || Does not exist ||Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \w || same || Match a word character. Word characters are [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \W || same || Match a non-word character.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \x{hhhh} || same ||Match the character with hex value hhhh&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \xhh || same || Match the character with two digit hex value hh&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| style=&amp;quot;background-color:yellow;color:black;&amp;quot;|\X || Can approximate with complex regex (see extended grapheme support at bottom of page): [http://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions Unicode Java Regex Equivalents] || Match a [http://www.unicode.org/unicode/reports/tr29/ Grapheme Cluster].&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \Z || same || Match if the current position is at the end of input, but before the final line terminator, if one exists.  &lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \z || same || Match if the current position is at the end of input.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \0nnn || same || Match the character with octal value nnn.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \n || same || Back Reference. Match whatever the nth capturing group matched. n must be &amp;gt;1 and &amp;lt; total number of capture groups in the pattern.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| [pattern] || same || Match any one character from the set. See [http://icu.sourceforge.net/userguide/unicodeSet.html UnicodeSet] for a full description of what may appear in the pattern.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| . || same || Match any character.&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| ^ || same || Match at the beginning of a line. &lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| $ || same || Match at the end of a line. &lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| \ || same || &amp;lt;nowiki&amp;gt;Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . /&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
[[Category:Segmentation]] [[Category:SRX]]&lt;/div&gt;</summary>
		<author><name>Jhargraveiii</name></author>
	</entry>
</feed>