03.01.07

Quicksort w/ Stacks

Posted in ALPHA, Computer Science, Computer Science III, Programming at 8:57 pm by Nick

I finally got around to working on this project/assignment. It was a very interesting way to implement quicksort, though I’m inclined to think it isn’t the fastest in many cases. The code picks the first element as the partition point (which works decently for random numbers), but sometimes it is a very poor choice (for example, if the list is nearly sorted already).

The code: http://www.nick-cash.com/download/cs3/2

—-

Here is the algorithm I followed (from Data Structure Techniques by Thomas A. Standish using C++ operators):

Quicksort sorts a sequence of n numbers in Table T into ascending order. The numbers in T are referred to as T[1], T[2], …, T[n]. The algorithm uses an auxiliary stack S, which is initialized to the empty stack. Algorithm 2.2 is used as a subroutine to partition subsequences of the numbers in T into suitable subproblems. We assume n is at least 2.

1. [Initialize] Set L = 1, R = n, and S = NULL. (L is the leftmost index of the interval (L, R) in Table T, and r is the rightmost index.)

2. [Partition interval.] Call Algorithm 2.2 with input (L, R) to partition interval (L,R) of T into two intervals: an interval (L1,R1) of numbers < T[L] and an interval (L2,R2) of numbers > T[L].

3. [Postpone larger subproblem] If (R1 – L1) > (R2 – L2), then perform S < = (L1, R1) (that is, push the interval onto the stack), set L = L2, set R = R2 and go to step 4. Otherwise, perform S <= (L2,R2), set L = L1, set R = R1 and go to step 4. (This puts the larger intervals on the stack to be done later and goes to step 4 to sort the smaller ones).

4. [Interval nontrivial?] If R > L then go to step 2. (Otherwise, R < = L so the interval (L,R) contains at most one number and need not be sorted. Thus, the stack can next be examined at step 5 to see if there are any more postponed subproblems to consider.)

5. [Done?] If S == NULL, the algorithm terminates. Otherwise, perform (L,R) <= S (that is, pop the top interval from the stack) and go back to step 4.

-- Algorithm 2.2 --
Using repeated exchanges, this algorithm partitions the distinct numbers in positions T[L], T[L+1], ..., T[R] into two subintervals: the interval (L1, R1) containing numbers less than T[L] and the interval (L2, R2) containing numbers greater than T[L]. The output consists of the ordered pairs of indices (L1, L2) and (L2, R2) for these two intervals. At termination, the numbers originally in T[L] lies between and separates the two subintervals. It is assumed initially that L < R.

1. [Initialize] Set i = L and set j = R.
2. [Decrease j] If T[j] > T[i], set j = j-1 and repeat this step. (This step finds the greatest j such that T[j] < = T[i].)
3. [Exchange or done?] If j > i, then T[i] < -> T[j] (that is, exchange T[i] and T[j]) and go to step 4. Otherwise, set (L1, R1) to (L, i-1), set (L2,R2) to (i+1, R), and terminate the algorithm.
4. [Increase i] If T[i] < T[j], set i = i -1 and repeat this step. (this step finds the least i such that T[i] >= T[j].)
5. [Exchange or done?] If j > i, then T[i] < -> T[j] (that is, exchange T[i] and T[j]) and go to step 2. Otherwise, set (L1, R1) to (L, j-1), set (L2,R2) to (j+1, R), and terminate the algorithm.

—–

The code works really well for smaller vectors. It easily sorts 10k random integers on my crappy machine in less then one second. This code also had a few quirks to it that I didn’t see initially. If you tried to sort sufficiently large vectors it would segfault and using GDB I was able to find that j and i would be corrupted in Partition. To fix this I added and statements so that j and i had to be in a realistic range (that is, j could be no less then L and i could be no greater then R).

Also, as the book says, if you have a nondistinct list of integers (that is, a list that contains two or more of the same number) then it would not work. It would find its way into a constant swapping of data. To fix this I changed the loops so that it would be >= for decrementing j and < = for incrementing i. Along with this I changed the swaps to be ( j > i && (v[j] != v[i])). Thus, if the numbers were equal it would not swap. These changes appear to allow the list to sort a nondistinct list correctly.

Lastly, I figured out the program was taking a crazy long time. It would take me 35 seconds to sort 10k elements. I added a statement to see how fast I was generating random numbers and I was surprised when it came back and showed me 35 seconds of the run time was generating random numbers. I examined my code to see how I could speed it up and I realized I had been adding elements to the front of a vector. This, of course, is a terrible method of doing things since it has to shift up all of the elements constantly. After I changed it to add to the end the time went drastically down, being able to generate 3 million random integers in 2 seconds.

All in all this code works really well and is quite fast. I would love to throw it up on Sidhe and see how fast it sorts compared to my previous qsorts, which could sort 5 million random integers in 14 seconds on Sidhe. On my home machine I ran a sort of 3 million integers (that took 2 seconds to generate) and it sorted in 2061 seconds (34.35 minutes). I ran it a second time and it sorted in 2283 seconds (38.05 minutes). I’m not sure what it chose for an index on either run, but those are not very good sort times in my opinion. I plan to test it using 499 as the partition point and then see how fast it runs. Of course my machine is several years old, so 34 minutes might not be a bad time. :P

Note: The code has two special test cases. If you enter 9 you will get the books sorting example. I did this for bug testing (using the comprehensive output) so I could see step by step what the program was doing. If you enter 11 it gives you the book example (of 9 numbers) and two of the same number at different points in the vector. This was used to test the code against nondistinct number lists.

02.17.07

Computer Science At Last, ISR Homework

Posted in ALPHA, College, Computer Science, Programming, School at 5:41 pm by Nick

At long last I have an update! A real update. I have been delving ever so slightly into the realm of Information Storage and Retrieval. This wasn’t my first time writing code to build a dictionary of terms, but other lessons have been interesting. Generating effective stop lists and weighting terms are all very practical techniques. And our textbook is free!

Assignment 1:

Read the computing abstracts file and generate a dictionary of terms. Convert terms to lower case remove extraneous endings and special characters. Sort the dictionary by frequency and turn in, along with the program, the first page of the sorted results.

The code, library, and related files:
http://www.nick-cash.com/download/isr/1/

Notes:
I spent more time on this then it required simply to get a workable namespace going. All classes and related functions will go into the ISR library for use in later homework, and, at the end, I’ll have a workable Info Storage and Retrieval library with tested code.


Assignment 2:

Using the dictionary from the above, select high frequency and low frequency words and from these build a file that will be a stop list. Write a program that will process the abstracts (title and abstract). Your program will read each word in each title/abstract text and ignore words that are in the stop list. It will build a new dictionary vector giving the total number of times each word occurs and a document frequency vector giving the number of documents in which each term occurs. It will then calculate, for each word, the Inverse Document Frequency weight of the word. Hint: for each document, build the dictionary directly and generate a document-term matrix. Then after all documents have been processed, create the document-frequency vector by going through the document vectors and incrementing the D-F vector for each word found

The code, library, and related files:
http://www.nick-cash.com/download/isr/2/

Notes:
This assignment looked daunting, but when I got started it was actually relatively easy. It was just a lot of work. I hit a few spots that proved troublesome, mainly due to the reading of the file (I should have written my own reading routine that processed each character instead of extracting from std::cin) and my being tired. Overall I’m fairly proud of this code as I didn’t know how to do any of it when I started. However, as with most code I look back on, I see things I definitely could improve. This code isn’t particularly efficient in my opinion, but overall it still runs fairly fast.

09.27.06

ALPHA – An Update At Last

Posted in ALPHA, Programming at 11:49 am by Nick

Time: 1 hour
Total: 2 hours
—————

So, I realize I’ve been very relaxed about updating my ALPHA project. This is soon to change, as ALPHA is as much a class as anything else. I’ll work harder on updating it.

That said, I have quite a few assignments from CS III that I will post later tonight when I am at home. The teacher has seriously been pulling at some painful nerves for many people, but I must say I’ve learned a lot of Ada syntax and methodology quite quickly. The techniques are really nothing new (I’ll be excited if we do talk about newer stuff, I like learning new programming techniques), but we are incrementally learning how to use Ada.

Ada itself is a very lengthy programming language. Printing out text to the screen is no short function call (Ada -> Ada.Text_IO.Put_Line(“Look, text!”); C++ -> cout < < "Look, text!\n"; C -> printf( “Look, text!\n”); ) and is generally harder to type because of the capital letters and underscores. However, its strong typing and extreme flexibility (and lack of pointers to keep track of) is pretty nice. Arrays in Ada allow you to specify the index! This is amazing since in C/C++ all index’s start at 0, and they must be numbers. It is nice in Ada to be able to use any discrete data type for an index.

I still have a feeling I can emulate some of the strong data type features of Ada in C++ using classes and data types. It may not be as pretty, but it will be good none the less. Many errors occur from range checking on integers, so that will be one focal point. The goal here is to have C++ programmers program, but not use -any- built-in data types. Integers, Characters, Floats, Doubles, Shorts, Longs will all be recreated classes. I expect this will slow down just about all programs using it, but I doubt the actual speed loss will be signifcant nor noticable.

There is much planning to do yet. I also need to find a large block of free time to commit to this cause. Supposing this turns out well, I will use it to finally work on Kaladea, which will be a rather large scale test of it. Then again, Kaladea is a long project. Speaking of which, it is techinically part of this project I have some design specs I still need to work out. I’m also working on updating the KaladeaMUD web site, which will contain most KaladeaMUD developments. Which also reminds me I need to work on a MySQL inteface in C++ for KaladeaMUD. Chances are I need to look into getting Ruby to work with KaladeaMUD as well, or at least design an interface to call the script interpreters.

Since KaladeaMUD will be query for -ALL- information now (very little should be kept in memory) it allows me to leave out interaction between the server and the scripts (apart from calling the scripts) and the sharing of memory. This was a huge problem before (it is incredibly hard to share data between programs), but now it should all be peachy if implemented correctly.

09.09.06

ALPHA – Fall ‘06 Project

Posted in ALPHA, Computer Science, General, Programming at 11:42 pm by Nick

It has been very difficult to determine what I wanted to do this year. I’ve neglected my computer so much recently I’m sure it should be something computer related.

On that note, two things come to mind. One, my long running interest (and chance to become even minorly famous among computer sciene nerds) , would be to attempt to speed up or break A*. This would be difficult, but I believe it can be done. Alternatively, I wanted to save such a thing for my undergraduate thesis.

Second, create a basic typing library for C++ to emulate strong typing conventions and subtypes like those used in Ada. This would certainly be helpful, and with a little work and dynamic approach could even become popular. 88% of programming errors come from integers with large domains that end up with unexpected values according to McCormick. Thus, it would be nice to find a way to limit this.

Using templates, STL, classes and inheritance I think I could create a good library to accomplish the said goals in C++. This would make for less errors and more compiler help (rather than runtime errors). Yay.

The only other option would be to create a very lightweight MySQL library in Ruby for use with Kaladea. And, for that matter, I should finish answerings Gammon’s questions on Kaladea, and finish reading the material posted in the BabbleMUD wiki. ::sigh:: So much to do, so little time. Further that note, I need to finish reading the script for the play and finding the sounds, and the robotics team starts up this tuesday.

There’s a thought. Maybe the robots programming can be my project? Chances are it will be like last year: a broad collection of programs of varying degrees of difficulty with the only point being to try new things.

We shall see.

08.22.06

Computer Science III – Post Lab #0

Posted in ALPHA, College, Computer Science, Programming, School at 2:52 pm by Nick

Time: 1 hour
Total: 1 hour
——————-

I already have homework. ::sigh:: Actually, I’m posting because I completed one of the tasks. It was an intersting refresher as I haven’t done much practical programming this crazy summer.

The first assignment was very much like the word counter we wrote back in CS I or CS II. We had to count the number of occurances of each printable character in a text file with a few other specifications relating to errors, output, etc.

Overall it was a decent way to spend an hour. I enjoy playing with STL maps, and they made this much easier than it would have been otherwise. McCormick seems to want comments on just about everything, so I think half of the code is either white space or comments.

I must admit I am a little wary posting this here. I don’t think any of the CS majors at UNI read my blog, but I don’t want to ruin their homework. Either way, if they copied it then they would all take some heavy hits on their grades.

Code -> http://www.nick-cash.com/download/cs3/postlab0/

05.23.06

Sanity Program

Posted in ALPHA, Computer Science, Programming at 10:50 pm by Nick

Time: 1 hour
Total: 77 hours
————————–

With a tad bit of spurring from Wade I turned the code in my AIM info into a real working program. This inspired a friend of mine to attempt to write a “sanity program” in Ruby. The code itself seems almost poetic, though it accomplishes nothing. Interestingly enough the program actually reflects moods, and when I lower the homework/academic levels to lower (and currently unrealistic) numbers the program stays sane longer.

The original code is still within my info, though since it looks bad I will not post it here.

You can find the sanity program (executable and code) in its entirety at http://www.nick-cash.com/download/sanity

[edit]
After reviewing the code, there is one thing that I think definitely needs changing. I am referring to these two lines:

short percent_to_help_sanity = (NICKS_CARING_FRIENDS*10) > 100 ? 100 : (NICKS_CARING_FRIENDS*10);
short percent_to_hurt_sanity = 100-(percent_to_help_sanity/2);

According to the way it is written, the higher the help_sanity, the higher the hurt_sanity as well. That doesnt really make sense, though it worked before because it checks the sanity range first. A more accurate line for hurt_sanity would be:

short percent_to_hurt_sanity = 100-(100/(percent_to_help_sanity > 0 ? percent_to_help_sanity : 5)*2);

So, best case scenario, percent_to_hurt_sanity would be 98+. Worst case, 60+ would be hurt_sanity and helpping it would be impossible.

05.14.06

KaladeaMUD – Planning Wiki

Posted in ALPHA, KaladeaMUD at 10:26 pm by Nick

Time: 1 hour
Total: 76 hours
————————–

The rest of my project will be devoted to planning out KaladeaMUD and quite possibly sticking it into a Wiki. While I have much to do (and little time for programming), I hope to read more of Bartle’s book and figure out more of the issues.

I also figured there is quite a bit more reading I need to do in general. I should check to see what introductory economic majors at UNI use and read most of it. The worst thing for an online game is a super inflated currency. Hopefully come next year I’ll begin programming on it again, and this time with a battle plan.

In the mean time I will, over the summer and perhaps sometime soon, be learning both Ada and Lua. Lua looks to be the best scripting language to tie into Kaladea, and Ada is the language of choice for CS III. Now, I have pretty much skated through CS I and CS II without much strife or personal investmet. I doubt CS III will be the same way, so I’m looking forward to learning a new language.

In other news, I have one assignment (one extra credit assign if I can muster it) before I take my last final at UNI. To get an A in the class I need a 92% on the final. However, to settle for an A- I need to get a 66% on the final, and since the class average was 71% I’m guessing I should at least get that.

05.04.06

Quick Sort and Templates

Posted in ALPHA, Computer Science, Programming at 5:01 pm by Nick

Time: 3 hours
Total: 75 hours
————————–

Recently I’ve been catching up on school, which means my Computer Org. homework. For assignment 6 we had to write up the quicksort algorithm to sort and array in MIPS. This is an annoying task, but I finally got it worked out due to my awesome professor. I decided to write up the algorithm in C++ and use template’s so it could work with many datatypes.

The algorithm is a very fast recursive way for sorting arrays. For example, on Sidhe, it took 14 seconds to sort 5 million random integers from 1 to 10,000. This is very good performance!

The code was tested sorting floats (numbers with decimal points), characters (A-Z and a-z), and integers. All elements were thrown into the array at random. The output files below are 500 random element arrays. Also, in the link found below you can find the code for both the C++ version and the MIPS (assembly) version.

Code and output -> http://www.nick-cash.com/download/qsort

I also posted on Gammon’s forum about this. The post may eventually contain responses on the code above:
http://www.gammon.com.au/forum/bbshowpost.php?bbsubject_id=6526&page=999999

[update]
David “Ksilyan” Haley gave me an example of passing arguments through templates instead of using defines, which you can find at: http://david.the-haleys.org/programming/misc/quicksort-template.cpp.html

Also backed up locally for reference at: http://www.nick-cash.com/download/qsort/files/ksilyan-qsort-template.html

04.29.06

KaladeaMUD – Code Posting

Posted in ALPHA, Computer Science, KaladeaMUD, Programming at 6:14 pm by Nick

Time: 1 hour
Total: 72 hours
————————–

Alright, so I exported the code (and SQL) and threw it up for the world to see. Sadly it is in a bad shape, as I made a last ditch effort something like six weeks ago to institute the character class that avatars were to be derived from. When I came back (within the last few days) it was hard to tell what I had done to it and what the errors were exactly.

The statistics are listed at the bottom of the display page. It resulted in a net gain of 2646 lines and 9 files. Sadly, I had to gather the stats by hand, so I’m thinking I may make a program that will tally lines.

In any event, here is Scratch MUD with all I have added (movement, rooms, help files, and a bit more):

http://www.nick-cash.com/download/cs2/final

— edit —

I have updated the KaladeaMUD project page on the right with a listing of added features, the stats, and a few comments.

04.27.06

KaladeaMUD – Restart?

Posted in ALPHA, KaladeaMUD at 7:21 pm by Nick

Time: 1 hour
Total: 71 hours
————————–

Despite the listed time for this, I’ve given it some serious thought. Ultimately, I would like to finish reading Designing Virtual Worlds, reread parts of MUD Programming, and finally learn some more Lua. An alternative to Lua would be Python. There are several reasons I like Python (very clean and neat scripting language), and the book MUD Programming uses Python for all game logic, while the basic kernel is coded in C++. This is a very flexible and efficient design.

Something I would also like to do is iron out a few more issues due to the design. Ultimately the design process on Kaladea could take a year or more. I am not very satisfied with the way it has turned out from Scratch, as I think the MySQL implementation (and a few other methods) are very bulky and annoying.

Thus, I have decided to scrap this current codebase. It has made some tremendous leaps, but it is currently just to unwieldly to work in the long run. I do, however, plan to keep the code around and use it for my CS II final project.

« Previous entries Next Page » Next Page »