excel - Is there any way I can speed up this VBA algorithm? -


i looking implement vba trie-building algorithm able process substantial english lexicon (~50,000 words) in relatively short amount of time (less 15-20 seconds). since c++ programmer practice (and first time doing substantial vba work), built quick proof-of-concept program able complete task on computer in half second. when came time test vba port however, took 2 minutes same -- unacceptably long amount of time purposes. vba code below:

node class module:

public letter string public next_nodes new collection public is_word boolean 

main module:

dim tree node  sub build_trie()     set tree = new node     dim file, a, b, c integer     dim current node     dim wordlist collection     set wordlist = new collection     file = freefile     open "c:\corncob_caps.txt" input file     while not eof(file)         dim line string         line input #file, line         wordlist.add line     loop     = 1 wordlist.count         set current = tree         b = 1 len(wordlist.item(a))             dim match boolean             match = false             dim char string             char = mid(wordlist.item(a), b, 1)             c = 1 current.next_nodes.count                 if char = current.next_nodes.item(c).letter                     set current = current.next_nodes.item(c)                     match = true                     exit                 end if             next c             if not match                 dim new_node node                 set new_node = new node                 new_node.letter = char                 current.next_nodes.add new_node                 set current = new_node             end if         next b         current.is_word = true     next end sub 

my question simply, can algorithm sped up? saw sources vba collections not efficient dictionarys , attempted dictionary-based implementation instead took equal amount of time complete worse memory usage (500+ mb of ram used excel on computer). extremely new vba knowledge of both syntax overall features/limitations limited -- why don't believe algorithm efficient possibly be; tips/suggestions appreciated.

thanks in advance

nb: lexicon file referred code, "corncob_caps.txt", available here (download "all caps" file)

there number of small issues , few larger opportunities here. did first vba work, forgive me if i'm telling things know

small things first:
dim file, a, b, c integer declares file, , b variants. integer 16 bit sign, there may risk of overflows, use long instead.

dim'ing inside loops counter-productive: unlike c++ not loop scoped.

the real opportunity is:

use for each can iterate collections: faster indexing.

on hardware original code ran in 160s. code in 2.5s (both plus time load word file collection, 4s)

sub build_trie()     dim t1 long     dim wd variant     dim nd node      set tree = new node     ' dim file, a, b, c integer  : declares file, a, b variant     dim file integer, long, b long, c long     ' integer 16 bit signed      dim current node     dim wordlist collection     set wordlist = new collection     file = freefile     open "c:\corncob_caps.txt" input file      ' no point in doing inside loop, not scoped loop     dim line string     dim match boolean     dim char string     dim new_node node      while not eof(file)         'dim line string         line input #file, line         wordlist.add line     loop       t1 = gettickcount     each wd in wordlist ' each faster     'for = 1 wordlist.count         set current = tree         b = 1 len(wd)             'dim match boolean             match = false             'dim char string             char = mid$(wd, b, 1)             each nd in current.next_nodes             'for c = 1 current.next_nodes.count                 if char = nd.letter                 'if char = current.next_nodes.item(c).letter                     set current = nd                     'set current = current.next_nodes.item(c)                     match = true                     exit                 end if             next nd             if not match                 'dim new_node node                 set new_node = new node                 new_node.letter = char                 current.next_nodes.add new_node                 set current = new_node             end if         next b         current.is_word = true     next wd      debug.print "time = " & gettickcount - t1 & " ms" end sub 

edit:

loading word list dynamic array reduce load time sub second. aware redim preserve expensive, in chunks

    dim long, sz long     sz = 10000     dim wordlist() string     redim wordlist(0 sz)      file = freefile     open "c:\corncob_caps.txt" input file      = 0     while not eof(file)         'dim line string         line input #file, line         wordlist(i) = line         = + 1         if > sz             sz = sz + 10000             redim preserve wordlist(0 sz)         end if         'wordlist.add line     loop     redim preserve wordlist(0 - 1) 

then loop through

    = 0 ubound(wordlist)         wd = wordlist(i) 

Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -