java - AppEngine Approximate Partial String Matching Algorithm -


so, realize covers wide array of topics , pieces of them have been covered before on stackoverflow, such this question. similarly, partial string matching , approximate string matching popular algorithmic discussions, seems. however, using these ideas in conjunction suit problems both need discussed seems highly inefficient. i'm looking way combine 2 problems in 1 solution, efficiently.

right now, i'm using appengine java , persistent datastore. annoying, since doesn't seem have arithmetic usage in queries make things easier, i'm considering doing precalculation , storing field in database. essentially, idea friend , having on how possibly implement system matching , more or less hoping suggestions on how make more efficient. if needs scrapped in favor of better exists, can handle that, well.


let's start off basic example of i'd search for. consider following nonsense sentence:

the isolating layer rackets principal beneath hypocritical rubbish.

if user search

isalatig pri

i think starting match string, , value should returned. current method considering using assigns value test divisibility. essentially, there table following data

a: 2        b: 3        c: 5 d: 7        e: 11       f: 13 ... 

with each character being mapped prime number (multiple characters don't make difference, 1 character needed). , if query string divides string in database, value returned possible match.

after this, keywords aren't listed stopwords compared search string see if starting substrings of words in possible match under given threshold of edit distance (currently using levenshtein distance).

distance("isalatig", "isolating") == 2 distance("pri", "principal") == 0 // since principal has starting                                    // substring of pri passes 

the total distance each query ranked in ascending order , top n values returned person doing querying.


this basic idea behind algorithm, though since first time dealing such scenario, realize i'm missing important (or entire idea may wrong). best way handle current situation i'm trying implement. similarly, if there utilities appengine offers combat i'm trying do, please let me know.

first off, clarification: app engine doesn't allow arithmetic in queries because there's no efficient way query on result of arbitrary arithmetic expression. when in sql database, planner forced select inefficient query plan, involves scanning candidate records 1 one.

your scheme not work same reason: there's no way index integer such can efficiently query numbers divisible target number. other potential issues include words translate numbers large store in fixed length integer, , being unable distinguish between 'rental', 'learnt' , 'antler'.

if discard moment requirement matching arbitrary prefixes of strings, searching full-text indexing, typically implemented using inverted index , stemming. support fulltext search on app engine roadmap hasn't been released yet; in meantime best option appears searchablemodel, or using external search engine such google site search.


Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -