excel - Is there any way I can speed up this VBA algorithm? -
i looking implement vba trie-building algorithm able process substantial english lexicon (~50,000 words) in relatively short amount of time (less 15-20 seconds). since c++ programmer practice (and first time doing substantial vba work), built quick proof-of-concept program able complete task on computer in half second. when came time test vba port however, took 2 minutes same -- unacceptably long amount of time purposes. vba code below:
node class module:
public letter string public next_nodes new collection public is_word boolean
main module:
dim tree node sub build_trie() set tree = new node dim file, a, b, c integer dim current node dim wordlist collection set wordlist = new collection file = freefile open "c:\corncob_caps.txt" input file while not eof(file) dim line string line input #file, line wordlist.add line loop = 1 wordlist.count set current = tree b = 1 len(wordlist.item(a)) dim match boolean match = false dim char string char = mid(wordlist.item(a), b, 1) c = 1 current.next_nodes.count if char = current.next_nodes.item(c).letter set current = current.next_nodes.item(c) match = true exit end if next c if not match dim new_node node set new_node = new node new_node.letter = char current.next_nodes.add new_node set current = new_node end if next b current.is_word = true next end sub
my question simply, can algorithm sped up? saw sources vba collection
s not efficient dictionary
s , attempted dictionary
-based implementation instead took equal amount of time complete worse memory usage (500+ mb of ram used excel on computer). extremely new vba knowledge of both syntax overall features/limitations limited -- why don't believe algorithm efficient possibly be; tips/suggestions appreciated.
thanks in advance
nb: lexicon file referred code, "corncob_caps.txt", available here (download "all caps" file)
there number of small issues , few larger opportunities here. did first vba work, forgive me if i'm telling things know
small things first:
dim file, a, b, c integer
declares file, , b variants. integer
16 bit sign, there may risk of overflows, use long
instead.
dim
'ing inside loops counter-productive: unlike c++ not loop scoped.
the real opportunity is:
use for each
can iterate collections: faster indexing.
on hardware original code ran in 160s. code in 2.5s (both plus time load word file collection, 4s)
sub build_trie() dim t1 long dim wd variant dim nd node set tree = new node ' dim file, a, b, c integer : declares file, a, b variant dim file integer, long, b long, c long ' integer 16 bit signed dim current node dim wordlist collection set wordlist = new collection file = freefile open "c:\corncob_caps.txt" input file ' no point in doing inside loop, not scoped loop dim line string dim match boolean dim char string dim new_node node while not eof(file) 'dim line string line input #file, line wordlist.add line loop t1 = gettickcount each wd in wordlist ' each faster 'for = 1 wordlist.count set current = tree b = 1 len(wd) 'dim match boolean match = false 'dim char string char = mid$(wd, b, 1) each nd in current.next_nodes 'for c = 1 current.next_nodes.count if char = nd.letter 'if char = current.next_nodes.item(c).letter set current = nd 'set current = current.next_nodes.item(c) match = true exit end if next nd if not match 'dim new_node node set new_node = new node new_node.letter = char current.next_nodes.add new_node set current = new_node end if next b current.is_word = true next wd debug.print "time = " & gettickcount - t1 & " ms" end sub
edit:
loading word list dynamic array reduce load time sub second. aware redim preserve expensive, in chunks
dim long, sz long sz = 10000 dim wordlist() string redim wordlist(0 sz) file = freefile open "c:\corncob_caps.txt" input file = 0 while not eof(file) 'dim line string line input #file, line wordlist(i) = line = + 1 if > sz sz = sz + 10000 redim preserve wordlist(0 sz) end if 'wordlist.add line loop redim preserve wordlist(0 - 1)
then loop through
= 0 ubound(wordlist) wd = wordlist(i)
Comments
Post a Comment