JAVA中如何高效的实现SQL的like语法?
阿里妹导读
本文主要介绍了一些主流的解析器是怎么实现like的语法逻辑,接着作者分析了几种实现方式的优劣,终采用状态机的方式,针对场景一步一步进行性能优化。
提及
近在优化项目的like语法,那既然谈到了SQL,我们不妨来看看一些主流的解析器是怎么实现like的语法逻辑。这里需要提一下主流的两种SQL解析器,它们分别是ANTLR和Calcite。
/** SQL {@code LIKE} function. */public static boolean like(String s,String pattern){ final String regex = Like.sqlToRegexLike(pattern, null); return Pattern.matches(regex, s);}
/** Translates a SQL LIKE pattern to Java regex pattern.*/static String sqlToRegexLike(String sqlPattern,char escapeChar) { int i; final int len = sqlPattern.length(); final StringBuilder javaPattern = new StringBuilder(len + len); for (i = ; i < len; i++) { char c = sqlPattern.charAt(i); if (JAVA_REGEX_SPECIALS.indexOf(c) >= ) { javaPattern.append('\\'); } if (c == escapeChar) { if (i == (sqlPattern.length() - 1)) { throw invalidEscapeSequence(sqlPattern, i); } char nextChar = sqlPattern.charAt(i + 1); if ((nextChar == '_') || (nextChar == '%') || (nextChar == escapeChar)) { javaPattern.append(nextChar); i++; } else { throw invalidEscapeSequence(sqlPattern, i); } } else if (c == '_') { javaPattern.append('.'); } else if (c == '%') { javaPattern.append("(?s:.*)"); } else { javaPattern.append(c); } } return javaPattern.toString();}
...try { Pattern pattern = patterns.get(buildKey(right, escTmp), new Callable<Pattern>() { public Pattern call() throws Exception { return Pattern.compile(buildPattern(right, escTmp), Pattern.CASE_INSENSITIVE); } }); Matcher m = pattern.matcher(left); return m.matches() ? 1l : 0l;} catch (ExecutionException e) { throw new FunctionException(e.getCause());}...
到此,综上来看,不少项目是基于正则表达式来完成的,接下来我整理了下我近实现的几种方式:
正则表达式实现
public static boolean like(final String dest, final String pattern) { String regex = regexParse(pattern); regex = regex.replace("_",".").replace("%",".*?"); Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE | Pattern.DOTALL); return p.matcher(dest).matches();}这种方式在代码层面简单明了,但是性能非常差,多次replace的使用就已经进行了多次遍历,这里有个可以优化的点,对于单个字符做替换可以选择用replaceChars(str, searchChar, replaceChar)这个方案。
贪婪模式
懒惰模式
独占模式
简单算法实现
public static boolean like(final String dest, final String pattern) { int destPointer = , patternPointer = ; int destRecall = -1, patternRecall = -1; final int patternLen = pattern.length(); final int destLen = dest.length(); while( destPointer < destLen) { ...... ...... } while(patternPointer < patternLen && pattern.chatAt(patternPointer) == '%') { patternPointer++; } return patternPointer == patternLen;} 有个场景我们不得不去考虑,那就是回溯的情况,举个例子:
有限状态机实现
穷举法
查表法
状态模式
public void compile(final String pattern) { ... LikeStateMachine machine = LikeStateMachine.build(pattern); ...}
构建的过程就是我们把pattern解析加载的过程,我采用的方式是构建链表的方式。实现就是遍历构建的过程,compile时间复杂度O(n)
回溯场景优化
常用场景优化
public LikeParserResult compile(final String pattern) { return parseLikeExpress(pattern);}
....public boolean match(final String dest, LikeParserResult likeParserResult) { switch (likeParserResult.getMatchResult()) { case LikeMatchResult.TYPE.ALLMATCH: return true; case LikeMatchResult.TYPE.EQUALS: return doEquals(dest, likeParserResult.getFinalPattern()); case LikeMatchResult.TYPE.STARTSWITH: return doStartsWith(dest, likeParserResult.getFinalPattern()); case LikeMatchResult.TYPE.ENDSWITH: return doEndsWith(dest, likeParserResult.getFinalPattern()); case LikeMatchResult.TYPE.CONTAINS: return doContain(dest, likeParserResult.getFinalPattern()); default: //或者别的实现 return likeParserResult.getLikeStateMachine().match(dest); }}
上面给出的代码是为了清楚的看到里面运行,终根据自己擅长的代码风格(函数式Function,接口等),还可以进一步优化和精简。
...public LikeParserResult compile(final String pattern) { return parseLikeExpress(pattern);}
public boolean match(final String dest, LikeParserResult likeParserResult) { return likeParserResult.getMatcher().match(dest);}
...
public class StartsWithMatcher implements Matcher { ... public Boolean match(String dest) { return dest.startsWith(this.pattern); }}...
压测数据对比
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
后
相关文章