03.18.07

No habla espanol!

Posted in General at 2:42 pm by Nick

So, Mexico was awesome. Seriously. Amazing weather, great water, awesome resort. It was excellent. Most days involved lots of sitting and thinking about doing something, usually spending several hours in a hammock.

While in Mexico I went…
- hiking
- rappelling
- zip lining
- canoing
- kayaking
- jet skiing
- para sailing
- swimming with dolphins
- mini-golfing
- swimming in an amazing underground reservoir.

I also…
- played giant chess
- climbed the tallest Mayan temple and examined a few others
- slept and ate normal amounts of food
- sat on the pier just to listen to the waves and look at the sky
- walked on the beach just to feel the wet sand in between my toes
- spent a lot of time thinking
- played poker, hearts, bullshit, and phase 10 with Mandy and her parents

03.09.07

Isla de Cozumel

Posted in General at 11:14 am by Nick

Well, I’m gone for Spring Break. No updates until I get back. I’m off to the Mexican island of Cozumel. :)

Have a good break everyone.

03.01.07

Quicksort w/ Stacks

Posted in ALPHA, Computer Science, Computer Science III, Programming at 8:57 pm by Nick

I finally got around to working on this project/assignment. It was a very interesting way to implement quicksort, though I’m inclined to think it isn’t the fastest in many cases. The code picks the first element as the partition point (which works decently for random numbers), but sometimes it is a very poor choice (for example, if the list is nearly sorted already).

The code: http://www.nick-cash.com/download/cs3/2

—-

Here is the algorithm I followed (from Data Structure Techniques by Thomas A. Standish using C++ operators):

Quicksort sorts a sequence of n numbers in Table T into ascending order. The numbers in T are referred to as T[1], T[2], …, T[n]. The algorithm uses an auxiliary stack S, which is initialized to the empty stack. Algorithm 2.2 is used as a subroutine to partition subsequences of the numbers in T into suitable subproblems. We assume n is at least 2.

1. [Initialize] Set L = 1, R = n, and S = NULL. (L is the leftmost index of the interval (L, R) in Table T, and r is the rightmost index.)

2. [Partition interval.] Call Algorithm 2.2 with input (L, R) to partition interval (L,R) of T into two intervals: an interval (L1,R1) of numbers < T[L] and an interval (L2,R2) of numbers > T[L].

3. [Postpone larger subproblem] If (R1 – L1) > (R2 – L2), then perform S < = (L1, R1) (that is, push the interval onto the stack), set L = L2, set R = R2 and go to step 4. Otherwise, perform S <= (L2,R2), set L = L1, set R = R1 and go to step 4. (This puts the larger intervals on the stack to be done later and goes to step 4 to sort the smaller ones).

4. [Interval nontrivial?] If R > L then go to step 2. (Otherwise, R < = L so the interval (L,R) contains at most one number and need not be sorted. Thus, the stack can next be examined at step 5 to see if there are any more postponed subproblems to consider.)

5. [Done?] If S == NULL, the algorithm terminates. Otherwise, perform (L,R) <= S (that is, pop the top interval from the stack) and go back to step 4.

-- Algorithm 2.2 --
Using repeated exchanges, this algorithm partitions the distinct numbers in positions T[L], T[L+1], ..., T[R] into two subintervals: the interval (L1, R1) containing numbers less than T[L] and the interval (L2, R2) containing numbers greater than T[L]. The output consists of the ordered pairs of indices (L1, L2) and (L2, R2) for these two intervals. At termination, the numbers originally in T[L] lies between and separates the two subintervals. It is assumed initially that L < R.

1. [Initialize] Set i = L and set j = R.
2. [Decrease j] If T[j] > T[i], set j = j-1 and repeat this step. (This step finds the greatest j such that T[j] < = T[i].)
3. [Exchange or done?] If j > i, then T[i] < -> T[j] (that is, exchange T[i] and T[j]) and go to step 4. Otherwise, set (L1, R1) to (L, i-1), set (L2,R2) to (i+1, R), and terminate the algorithm.
4. [Increase i] If T[i] < T[j], set i = i -1 and repeat this step. (this step finds the least i such that T[i] >= T[j].)
5. [Exchange or done?] If j > i, then T[i] < -> T[j] (that is, exchange T[i] and T[j]) and go to step 2. Otherwise, set (L1, R1) to (L, j-1), set (L2,R2) to (j+1, R), and terminate the algorithm.

—–

The code works really well for smaller vectors. It easily sorts 10k random integers on my crappy machine in less then one second. This code also had a few quirks to it that I didn’t see initially. If you tried to sort sufficiently large vectors it would segfault and using GDB I was able to find that j and i would be corrupted in Partition. To fix this I added and statements so that j and i had to be in a realistic range (that is, j could be no less then L and i could be no greater then R).

Also, as the book says, if you have a nondistinct list of integers (that is, a list that contains two or more of the same number) then it would not work. It would find its way into a constant swapping of data. To fix this I changed the loops so that it would be >= for decrementing j and < = for incrementing i. Along with this I changed the swaps to be ( j > i && (v[j] != v[i])). Thus, if the numbers were equal it would not swap. These changes appear to allow the list to sort a nondistinct list correctly.

Lastly, I figured out the program was taking a crazy long time. It would take me 35 seconds to sort 10k elements. I added a statement to see how fast I was generating random numbers and I was surprised when it came back and showed me 35 seconds of the run time was generating random numbers. I examined my code to see how I could speed it up and I realized I had been adding elements to the front of a vector. This, of course, is a terrible method of doing things since it has to shift up all of the elements constantly. After I changed it to add to the end the time went drastically down, being able to generate 3 million random integers in 2 seconds.

All in all this code works really well and is quite fast. I would love to throw it up on Sidhe and see how fast it sorts compared to my previous qsorts, which could sort 5 million random integers in 14 seconds on Sidhe. On my home machine I ran a sort of 3 million integers (that took 2 seconds to generate) and it sorted in 2061 seconds (34.35 minutes). I ran it a second time and it sorted in 2283 seconds (38.05 minutes). I’m not sure what it chose for an index on either run, but those are not very good sort times in my opinion. I plan to test it using 499 as the partition point and then see how fast it runs. Of course my machine is several years old, so 34 minutes might not be a bad time. :P

Note: The code has two special test cases. If you enter 9 you will get the books sorting example. I did this for bug testing (using the comprehensive output) so I could see step by step what the program was doing. If you enter 11 it gives you the book example (of 9 numbers) and two of the same number at different points in the vector. This was used to test the code against nondistinct number lists.