WordPress 中文分词搜索

上图是改进之后的搜索效果,wordpress的搜索按照网上的说法做的比较烂,貌似是全字段匹配,于是搜索上面的关键词”ida调试器”就出现了下面的状况,啥都没有:

虽然blog的访问量不大,但是作为一个强迫症患者,这样的结果其实相对来说有点难以接受。网上搜索了一下相关的文章和插件,发现貌似都没啥用。于是就只能自己动手来实现这个东西了。python下的结巴分词相对来说使用还是比较方便的,搜了一下发现还真有个jieba的php版本https://github.com/jonnywang/phpjieba。那就简单了,首先安装结巴分词,按照github上的指导进行安装结课,不过安装过程中可能会遇到如下的错误:

configure: error: Cannot find php-config. Please use –with-php-config=PATH

这是因为没有定位到php-config文件,通过–with-php-config参数指定文件路径即可:./configure –with-php-config=/usr/local/php/bin/php-config 

完整的安装命令以及输出日志如下:

root@blog:~# git clone https://github.com/jonnywang/phpjieba.git
Cloning into 'phpjieba'...
remote: Enumerating objects: 178, done.
remote: Total 178 (delta 0), reused 0 (delta 0), pack-reused 178
Receiving objects: 100% (178/178), 4.25 MiB | 220.00 KiB/s, done.
Resolving deltas: 100% (73/73), done.
root@blog:~# cd phpjieba/cjieba/
root@blog:~/phpjieba/cjieba# make
g++ -O2 -o lib/cjieba.o -c -DLOGGING_LEVEL=LL_WARNING -I./deps/ include/jieba.cpp
g++ -O2 -fPIC -o lib/libcjieba.so -c -DLOGGING_LEVEL=LL_WARNING -I./deps/ include/jieba.cpp
ar rs lib/libcjieba.a lib/cjieba.o 
ar: creating lib/libcjieba.a
gcc -O2 -o demo demo.c -L./lib -lcjieba -lstdc++ -lm
root@blog:~/phpjieba/cjieba# cd ..
root@blog:~/phpjieba# php
php      php-fpm  phpize   
root@blog:~/phpjieba# php
php      php-fpm  phpize   
root@blog:~/phpjieba# phpize 
Configuring for:
PHP Api Version:         20180731
Zend Module Api No:      20180731
Zend Extension Api No:   320180731
root@blog:~/phpjieba# ./config
-bash: ./config: No such file or directory
root@blog:~/phpjieba# ./configure 
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for a sed that does not truncate output... /usr/bin/sed
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -E
checking for icc... no
checking for suncc... no
checking whether cc understands -c and -o together... yes
checking for system library directory... lib
checking if compiler supports -R... no
checking if compiler supports -Wl,-rpath,... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
configure: error: Cannot find php-config. Please use --with-php-config=PATH
root@blog:~/phpjieba# make
make: *** No targets specified and no makefile found.  Stop.
root@blog:~/phpjieba# which php
/usr/bin/php
root@blog:~/phpjieba# ls /usr/bin/php
php      php-fpm  phpize   
root@blog:~/phpjieba# ls /usr/bin/php
php      php-fpm  phpize   
root@blog:~/phpjieba# ls /usr/local/php/bin/php
php         php-cgi     php-config  phpdbg      phpize      
root@blog:~/phpjieba# ls /usr/local/php/bin/php
php         php-cgi     php-config  phpdbg      phpize      
root@blog:~/phpjieba# ls /usr/local/php/bin/php
php         php-cgi     php-config  phpdbg      phpize      
root@blog:~/phpjieba# ls /usr/local/php/bin/php-c
php-cgi     php-config  
root@blog:~/phpjieba# ls /usr/local/php/bin/php-c
php-cgi     php-config  
root@blog:~/phpjieba# ls /usr/local/php/bin/php-config 
/usr/local/php/bin/php-config
root@blog:~/phpjieba# ./configure --with-php-config=/usr/local/php/bin/php-config 
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for a sed that does not truncate output... /usr/bin/sed
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -E
checking for icc... no
checking for suncc... no
checking whether cc understands -c and -o together... yes
checking for system library directory... lib
checking if compiler supports -R... no
checking if compiler supports -Wl,-rpath,... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for PHP prefix... /usr/local/php
checking for PHP includes... -I/usr/local/php/include/php -I/usr/local/php/include/php/main -I/usr/local/php/include/php/TSRM -I/usr/local/php/include/php/Zend -I/usr/local/php/include/php/ext -I/usr/local/php/include/php/ext/date/lib
checking for PHP extension directory... /usr/local/php/lib/php/extensions/no-debug-non-zts-20180731
checking for PHP installed headers prefix... /usr/local/php/include/php
checking if debug is enabled... no
checking if zts is enabled... no
checking for re2c... re2c
checking for re2c version... 1.3 (ok)
checking for gawk... gawk
checking whether to enable jieba support... yes, shared
checking for ld used by cc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for /usr/bin/ld option to reload object files... -r
checking for BSD-compatible nm... /usr/bin/nm -B
checking whether ln -s works... yes
checking how to recognize dependent libraries... pass_all
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking dlfcn.h usability... yes
checking dlfcn.h presence... yes
checking for dlfcn.h... yes
checking the maximum length of command line arguments... 1572864
checking command to parse /usr/bin/nm -B output from cc object... ok
checking for objdir... .libs
checking for ar... ar
checking for ranlib... ranlib
checking for strip... strip
checking if cc supports -fno-rtti -fno-exceptions... no
checking for cc option to produce PIC... -fPIC
checking if cc PIC flag -fPIC works... yes
checking if cc static flag -static works... yes
checking if cc supports -c -o file.o... yes
checking whether the cc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... no

creating libtool
appending configuration tag "CXX" to libtool
configure: creating ./config.status
config.status: creating config.h
root@blog:~/phpjieba# make
/bin/bash /root/phpjieba/libtool --mode=compile cc  -I. -I/root/phpjieba -DPHP_ATOM_INC -I/root/phpjieba/include -I/root/phpjieba/main -I/root/phpjieba -I/usr/local/php/include/php -I/usr/local/php/include/php/main -I/usr/local/php/include/php/TSRM -I/usr/local/php/include/php/Zend -I/usr/local/php/include/php/ext -I/usr/local/php/include/php/ext/date/lib -I/root/phpjieba/cjieba/include  -DHAVE_CONFIG_H  -g -O2   -c /root/phpjieba/jieba.c -o jieba.lo 
mkdir .libs
 cc -I. -I/root/phpjieba -DPHP_ATOM_INC -I/root/phpjieba/include -I/root/phpjieba/main -I/root/phpjieba -I/usr/local/php/include/php -I/usr/local/php/include/php/main -I/usr/local/php/include/php/TSRM -I/usr/local/php/include/php/Zend -I/usr/local/php/include/php/ext -I/usr/local/php/include/php/ext/date/lib -I/root/phpjieba/cjieba/include -DHAVE_CONFIG_H -g -O2 -c /root/phpjieba/jieba.c  -fPIC -DPIC -o .libs/jieba.o
/bin/bash /root/phpjieba/libtool --mode=link cc -DPHP_ATOM_INC -I/root/phpjieba/include -I/root/phpjieba/main -I/root/phpjieba -I/usr/local/php/include/php -I/usr/local/php/include/php/main -I/usr/local/php/include/php/TSRM -I/usr/local/php/include/php/Zend -I/usr/local/php/include/php/ext -I/usr/local/php/include/php/ext/date/lib -I/root/phpjieba/cjieba/include  -DHAVE_CONFIG_H  -g -O2    -o jieba.la -export-dynamic -avoid-version -prefer-pic -module -rpath /root/phpjieba/modules  jieba.lo -Wl,-rpath,/root/phpjieba/cjieba/lib -L/root/phpjieba/cjieba/lib -lcjieba -lstdc++
cc -shared  .libs/jieba.o  -L/root/phpjieba/cjieba/lib -lcjieba -lstdc++  -Wl,-rpath -Wl,/root/phpjieba/cjieba/lib -Wl,-soname -Wl,jieba.so -o .libs/jieba.so
creating jieba.la
(cd .libs && rm -f jieba.la && ln -s ../jieba.la jieba.la)
/bin/bash /root/phpjieba/libtool --mode=install cp ./jieba.la /root/phpjieba/modules
cp ./.libs/jieba.so /root/phpjieba/modules/jieba.so
cp ./.libs/jieba.lai /root/phpjieba/modules/jieba.la
PATH="$PATH:/sbin" ldconfig -n /root/phpjieba/modules
----------------------------------------------------------------------
Libraries have been installed in:
   /root/phpjieba/modules

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,--rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------

Build complete.
Don't forget to run 'make test'.

root@blog:~/phpjieba# make install
Installing shared extensions:     /usr/local/php/lib/php/extensions/no-debug-non-zts-20180731/
root@blog:~/phpjieba# ls /usr/local/php/lib/php/extensions/no-debug-non-zts-20180731/
jieba.so  opcache.a  opcache.so
root@blog:~/phpjieba# ls ~/phpjieba/cjieba/dict/
hmm_model.utf8  idf.utf8  jieba.dict.utf8  stop_words.utf8  user.dict.utf8

安装完成之后修改php.ini添加如下代码:

; jieba
extension=jieba.so
jieba.enable=1
jieba.dict_path=/root/phpjieba/cjieba/dict

安装之后重启php-fpm服务,重启之后可以通过phpjieba 的example目录下的test_jieba.php检测是否可以正常运行:

如果能够正常运行那么就证明安装成功了。到这里第一步就成功了,下面进行第二部,修改搜索相关代码。

修改主体的functions.php添加如下代码:

 function custom_search( $search_result, $wp_query ) {
  global $wpdb;
  if( !$wp_query->is_search ) {
   return $search_result; 
  }
  if( !isset( $wp_query->query_vars ) ) {
   return $search_result; 
  }
$key_string=$wp_query->query_vars['s'];
  $keywords =jieba($key_string);
  if ( count( $keywords ) > 0 ) {
   $search_result = '';
   foreach ( $keywords as $keyword ) {
    if ( !empty( $keyword ) ) {
     $keywords = '%' . esc_sql( $keyword ) . '%';
     $search_result .= " 
      AND (
       {$wpdb->posts}.post_title LIKE '{$keywords}'
        OR {$wpdb->posts}.post_content LIKE '{$keywords}'
        OR {$wpdb->posts}.ID IN (
         SELECT distinct post_id
         FROM {$wpdb->postmeta}
         WHERE meta_value LIKE '{$keywords}'
        )
      ) ";
    }
   }
  }
  return $search_result;
 }
 add_filter( 'posts_search','custom_search', 10, 2 );

添加完成无误之后就可以尝试新的搜索功能了。

另外如果要让404页面支持分词,那么需要修改为以下代码:

foreach($result as $value){
 //echo "{$value}
"; $args = array('s'=>$value); $the_query = new WP_Query( $args ); if ( $the_query->have_posts() ) { //_e("

Search Results for: ".get_query_var('s')."

"); while ( $the_query->have_posts() ) { $the_query->the_post(); ?>
  • -- (Keyword: )

  • 修改之后效果

    参考链接:

    https://designsupply-web.com/media/knowledgeside/5811/
    https://github.com/jonnywang/phpjieba
    https://www.zhaokeli.com/article/1570.html

    ☆版权☆

    * 网站名称:obaby@mars
    * 网址:https://lang.ma/
    * 个性:https://oba.by/
    * 本文标题: 《WordPress 中文分词搜索》
    * 本文链接:https://lang.ma/2020/09/7555
    * 短链接:https://oba.by/?p=7555
    * 转载文章请标明文章来源,原文标题以及原文链接。请遵从 《署名-非商业性使用-相同方式共享 2.5 中国大陆 (CC BY-NC-SA 2.5 CN) 》许可协议。


    You may also like

    1 comment

    发表回复

    您的电子邮箱地址不会被公开。 必填项已用 * 标注