Discussion:
Case insensitive Unicode Regex matching?
nikolaj lindberg
2008-09-11 09:50:34 UTC
Permalink
Hi List[ScalaUser],

does anyone know if there is a way to make a scala.util.matching.Regex (the
one returned by RichString.r) case insensitive for Unicode strings?
That is, I want case insensitive matching that works for non-Latin
alphabets.

In Java, you can do

import java.util.regex.*;
Pattern p = Pattern.compile(regEx,
Pattern.CASE_INSENSITIVE|Pattern.UNICODE_CASE);

and since there appears to be a Java Pattern lurking inside Scala's Regex,
maybe there is some way to do this in Scala too (without resorting to using
the Java classes directly)?

Kind regards,
/nikolaj
Stepan Koltsov
2008-09-11 10:25:49 UTC
Permalink
Use "(?ii)" in string. See java.util.regex.Pattern javadoc.

S.

On Thu, Sep 11, 2008 at 13:50, nikolaj lindberg
Post by nikolaj lindberg
does anyone know if there is a way to make a scala.util.matching.Regex (the
one returned by RichString.r) case insensitive for Unicode strings?
That is, I want case insensitive matching that works for non-Latin
alphabets.
In Java, you can do
import java.util.regex.*;
Pattern p = Pattern.compile(regEx,
Pattern.CASE_INSENSITIVE|Pattern.UNICODE_CASE);
and since there appears to be a Java Pattern lurking inside Scala's Regex,
maybe there is some way to do this in Scala too (without resorting to using
the Java classes directly)?
Stepan Koltsov
2008-09-11 10:40:45 UTC
Permalink
I've added a comment to the Regex scaladoc:

https://lampsvn.epfl.ch/trac/scala/changeset/16090/

S.
Post by Stepan Koltsov
Use "(?ii)" in string. See java.util.regex.Pattern javadoc.
S.
On Thu, Sep 11, 2008 at 13:50, nikolaj lindberg
Post by nikolaj lindberg
does anyone know if there is a way to make a scala.util.matching.Regex (the
one returned by RichString.r) case insensitive for Unicode strings?
That is, I want case insensitive matching that works for non-Latin
alphabets.
In Java, you can do
import java.util.regex.*;
Pattern p = Pattern.compile(regEx,
Pattern.CASE_INSENSITIVE|Pattern.UNICODE_CASE);
and since there appears to be a Java Pattern lurking inside Scala's Regex,
maybe there is some way to do this in Scala too (without resorting to using
the Java classes directly)?
nikolaj lindberg
2008-09-11 12:37:05 UTC
Permalink
On Thu, Sep 11, 2008 at 12:25 PM, Stepan Koltsov
Post by Stepan Koltsov
Use "(?ii)" in string. See java.util.regex.Pattern javadoc.
Thanks,

but I cannot make that work. This is what I get:

println("(?ii)thanks".r.replaceAllIn("THANKS!", "")) // Prints "!"
println("(?ii)ÓÐÁÓÉÂÏ".r.replaceAllIn("óðáóéâï!", "")) // Prints,
"óðáóéâï!", not "!"

i.e., the upper/lower non-Latin UTF8 characters don't match. I tried the
above running a script in 2.7.1.final.

Any idea of what I might be doing wrong...?

/nikolaj
Post by Stepan Koltsov
S.
On Thu, Sep 11, 2008 at 13:50, nikolaj lindberg
Post by nikolaj lindberg
does anyone know if there is a way to make a scala.util.matching.Regex
(the
Post by nikolaj lindberg
one returned by RichString.r) case insensitive for Unicode strings?
That is, I want case insensitive matching that works for non-Latin
alphabets.
In Java, you can do
import java.util.regex.*;
Pattern p = Pattern.compile(regEx,
Pattern.CASE_INSENSITIVE|Pattern.UNICODE_CASE);
and since there appears to be a Java Pattern lurking inside Scala's
Regex,
Post by nikolaj lindberg
maybe there is some way to do this in Scala too (without resorting to
using
Post by nikolaj lindberg
the Java classes directly)?
nikolaj lindberg
2008-09-11 12:50:06 UTC
Permalink
On Thu, Sep 11, 2008 at 12:25 PM, Stepan Koltsov
Post by Stepan Koltsov
Use "(?ii)" in string. See java.util.regex.Pattern javadoc.
Ok, it should be "(?iu)", then I works as expected.

Thanks,
/nikolaj
Post by Stepan Koltsov
S.
On Thu, Sep 11, 2008 at 13:50, nikolaj lindberg
Post by nikolaj lindberg
does anyone know if there is a way to make a scala.util.matching.Regex
(the
Post by nikolaj lindberg
one returned by RichString.r) case insensitive for Unicode strings?
That is, I want case insensitive matching that works for non-Latin
alphabets.
In Java, you can do
import java.util.regex.*;
Pattern p = Pattern.compile(regEx,
Pattern.CASE_INSENSITIVE|Pattern.UNICODE_CASE);
and since there appears to be a Java Pattern lurking inside Scala's
Regex,
Post by nikolaj lindberg
maybe there is some way to do this in Scala too (without resorting to
using
Post by nikolaj lindberg
the Java classes directly)?
Loading...