Thursday, June 11, 2009

Quest Log: Question Bank I/O

Occasionally, the topic of a post requires detail and precision and consequently I feel it is necessary to take a serious tone. Such as the topic of this post was a major learning process for me and I feel I would be doing it injustice to obfuscate it in any way.

The saving and loading of question banks seemed easy enough at first but like many thing was much more complex than I had originally anticipated. A question bank consists of subjects and each subject consists of levels and each level consists of questions. Everything except for the questions are basically folders and therefor only contain a title, subtitle and author (type string). The meat of a question bank are the questions. Each question consists of several fields all of which are string or int type except for the query (the actual question) and the answer. These both need to contain richtext and at first were being held in System.Windows.Documents.TextRange.

The TextRange class has a Load and Save method that allows to read/write from a stream in a specific format with one being RTF. These methods were my keys to being able to save/load a Question Bank or so I thought. The first problem I quickly came up against was that TextRange.Load() will read from a stream but reads from the current position to the end of the stream. This was was a problem because each question bank file contains many RTF sections not to mention all of the other data.

With my experience in Streams being extremely limited, I was stumped on how solve this issue. Fortunately at my favorite programming site, StackOverflow I was able to get some help. The solution is pretty straight forward. I simply had to take each RTF section of the file (the query and answer of each question) and put that into its own MemoryStream. Now when using TextRange.Load() it could read to the end of the MemoryStream and everyone would be happy. Problem solved! ... right?

I continued on with development under the assumption that the I/O had been completely taken care of. As I was completing the implementation of the Question Bank Importer I attempted to import a real FITS Question Bank that has 1703 questions in it. It took about a minute or two to import but that would soon be improved upon so at least it worked! I was excited... but my excitement was short lived. Soon afterward, I found myself going to test something else. I ran the debugger and waited for the application to run...and waited...and waited some more and then finally it opened up. It took about a total of 3 minutes!

Remeber that all Question Banks are loaded at startup. Up until this point of development I had been testing everything with two test Question Banks and each of these had about 5-10 questions in them. Everything worked great with these small Question Banks; however, the reality is that FITS writes Question Banks that can have over 2000 questions in them. This is 200 times more than my question bank. In other words, I had a real problem.

The first thing that came to mind was that I needed to get my hands on a profiler. I had actually never used a profiler before and only had heard about them. After doing a little research I found RedGate's ANTS Profiler. I love this piece of software. After running the profiler I quickly found the culprit. TextRange.Load() was taking an absurd amount of the CPU time, about 83%, with another 10% was being devoted to getting the RTF from the FileStream into the MemoryStream (I was parsing it...).

I quickly realized that there was no way to change the speed of TextRange.Load() and that I needed to find another way to do the I/O for the RTF sections of the file. I once again looked to StackOverflow for some guidance. A few suggestions were made, but ultimately I decided to change the type of Query and Answer from TextRange to simply byte[]. This is a MUCH better approach. In retrospect I can see that TextRange was a horrible design decision. It is designed to be used for selecting a section of a FlowDocument and not for storing data. With the byte[] it simply hold what would go into the MemoryStream and then whenever the text of a query or answer need to be displayed this byte[] is loaded with TextRange.Load().

With this alteration the two major sections of the load time were completely cut out and loading went from 2-3 minutes to an unnoticeable load time (1~ second). Needless to say this was a victory for me. It was quite an issue but with it being solved I was glad to have gone through it. It was afterall a great learning experience.

Code samples can be found in the links to the StackOverflow questions.

Because this was such a huge learning process for me I think I deserve a...

Level Up!
Jasson grows to Level 9 Programmer.
Jasson's HP increases by 20.
Jasson's INT increases by 4.
Jasson learns skill "Profiler".

That's more like it!

*saved game*

2 comments:

  1. Maybe ExamView bumped up against the same problem. Their solution is to limit a bank to 250 questions. You're smarter than they are!

    ReplyDelete
  2. Dude, you are a rockin progger! Progger is a word I just made up in order to not type the whole word programmer!

    You rock!

    ReplyDelete