IBM webMethods Hybrid Integration

IBM webMethods Hybrid Integration

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Question about using tf:containsText function on Chinese content

  • 1.  Question about using tf:containsText function on Chinese content

    Posted Mon September 16, 2013 02:51 AM

    Hi All,

     I would like to ask if there are any techniques or tricks in using the tf:containText on UTF-8 contents. I have a XML database of about 1GB data and 4GB index contains both Chinese and English. Each XML record varies from 2XXKB to 9XXKB. The database is in UTF-8 encoding. 
    

    For english queries, I use query like tf:containsText(title,‘Hong Kong’) for doing the searching. For Chinese, as there is no space between the words, I need to use query like tf:containsText(title,‘??’) that have the asterisk before and after the keyword to perform the query. I use this setup before and it is OK. However, it takes much time difference when the database this time is relatively larger. For the English query, it takes a few seconds. However, it takes over 30 seconds or more for the Chinese queries.

    I use the old X-Query and the performance is fine with at most 20 seconds only. I would like to ask if there are any setting I can perform or any thing I can done to improve the performance when using XQuery. I know that there used to be a Chinese tokenizer but it is already discontinued. Thank you for the help.


    #Tamino
    #API-Management
    #webMethods