|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.solr.analysis.PatternTokenizerFactory
public class PatternTokenizerFactory
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group" "pattern" is the regular expression. "group" says which group to extract into tokens. group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from: http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String) Using group >= 0 selects the matching group as the token. For example, if you have: pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc' the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
Field Summary | |
---|---|
protected Map<String,String> |
args
|
protected int |
group
|
static String |
GROUP
|
protected Pattern |
pattern
|
static String |
PATTERN
|
Constructor Summary | |
---|---|
PatternTokenizerFactory()
|
Method Summary | |
---|---|
TokenStream |
create(Reader input)
Split the input using configured pattern |
Map<String,String> |
getArgs()
The arguments passed to init() |
static List<Token> |
group(Matcher matcher,
String input,
int group)
Create tokens from the matches in a matcher |
void |
init(Map<String,String> args)
Require a configured pattern |
static List<Token> |
split(Matcher matcher,
String input)
This behaves just like String.split( ), but returns a list of Tokens rather then an array of strings |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String PATTERN
public static final String GROUP
protected Map<String,String> args
protected Pattern pattern
protected int group
Constructor Detail |
---|
public PatternTokenizerFactory()
Method Detail |
---|
public void init(Map<String,String> args)
init
in interface TokenizerFactory
public Map<String,String> getArgs()
getArgs
in interface TokenizerFactory
public TokenStream create(Reader input)
create
in interface TokenizerFactory
public static List<Token> split(Matcher matcher, String input)
public static List<Token> group(Matcher matcher, String input, int group)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |