01.22.06

AVL Balanced Tree’s – Part 2

Posted in ALPHA, College, Computer Science, High School, School at 3:06 pm by Nick

Time: 2 hours
Total: 5 hours
——————-

I spent some time verifying the balance of the tree’s and everything looks pretty good. However, I have one node in -every- tree I create with a balance of 2 (which is bad). The code leads me to believe that this node used to be a root node that was later rotated out of the root spot.

I will attempt later (if my professor can give me a lead) to fix the problem, but overall the effects are extremely positive. I ran the program on a 338 meg text file named osu.medline, which contains various informations on effects of drugs, diseases, disorders, their causes, and things like that.

This is a sample output (I won’t paste all 7.2 million words it read) with the number of nodes/unique words capped at 200,000 (Note: words of length 3 or less are discareded):

Nodes w/ 0 bal: 136003
Nodes w/ 1 bal: 32057
Nodes w/-1 bal: 31939
Nodes w/ other: 1
Overal balance: 120

Number of words: 7201810
Number of nodes: 200000

Time: 32 seconds*


It took just over 32 seconds to read 7.2 million words and inset them into the tree (and balance the tree accordingly). This is very impressive. The fact that 136,000 of the 200,000 nodes are balanced perfectly (balance == 0) means the tree is in very good working order, and the only problem is the “other” node, which has a balance of two. The others, -1 and 1, are local fluxuations that exist and will exist until the tree need’s rebalancing at that particular spot.

In closing (for now), the method outlined works. It is fast (7.2 million searches and additions! in 32 seconds!) and provides a very good tree.

*This time is a little off. Sadly the timer is not very resolute, and it in no way accounts for the hundreds of thousands of small words (length < = 3) that were read, translated to lowercase (with smashed punctuation), and then discarded. Also, this test was run on a Sidhe (http://sidhe.cs.uni.edu), a very fast machine.

-- edit --
I added two lines of code to avoid even smashing/converting strings of length less than 4 already. This decreased the overall time by one second:

Nodes w/ 0 bal: 136003
Nodes w/ 1 bal: 32057
Nodes w/-1 bal: 31939
Nodes w/ other: 1
Overal balance: 120

Number of words: 7201810
Number of nodes: 200000

Time: 31 seconds

-edit-
Code has been updated. The original is main.cpp, the recent additions are under main_mod.cpp.

http://www.nick-cash.com/download/cs2/1

Leave a Comment