I was recently asked to look at a project to record and profile answers from a multiple-choice questionnaire. The idea was that the questionnaire creator would be able to specify their ideal respondent profile and that the system would be able to provide a list of best match respondents. There could potentially be many thousands of respondents. A cursory glance suggested that the project was simple but as the details of the requirements were revealed it was quickly realised that the project was going to be more involved than first thought.
Questionnaire Rules
Although the questions were multiple choice a typical question would expect multiple answers.For example, the question could be "Which of the following languages do you speak (tick all that apply)"?
The business rules were that the questionnaire creator must be able to specify which answers the respondent
- Must select.
- Must not select.
- Should select.
In addition the questionnaire creator must be able to attach scores to the "Could possibly select" answers to tune the responses. The analyser of the questionnaire also wanted to be able to downgrade respondents that had provided too many answers, i.e. were indicating that they were over-qualified. No question would have over 16 answers although it was unusual for a question to have over10 possible answers.
First design thoughts
The initial thought was to create a completely normalised structure with a record per answer in the database.
Choosing the "Must Selects" was easy because it was a straight INNER JOIN.
Choosing the "Must Not Selects" was easy because it was effectively a LEFT JOIN.
The problem came with the "Should Selects" because of the need to identify which ones had been answered where it was desirable and which ones had been answered where they weren’t.
Obviously these also required outer join queries.
Add into the mix the fact that each questionnaire (of which there would be many) could have potentially 150 questions, probably 1,000 answers and potentially 60,000 respondents and clearly the volumes of data were going to pose a challenge.
Then add the requirement to send the data to remote locations via internet connections.
Bitwise Operators
At one time programmers were expected to be fully conversant with bitwise operators. In this day and age it isn’t nearly as important as it was, but it is still a useful skill to have.
Operator | SQL Operator | Description |
---|---|---|
OR | | | Where either or both bits equals 1 then answer is 1. |
NOT | ~ | The inverse of the bit i.e. 1 becomes 0 and 0 become 1. |
AND | & | Where both bits are 1 the answer will be 1. |
XOR | ^ | When either but not both arguments are 1 the answer will be 1, otherwise zero. |
One old trick with bitwise operators was to use the XOR statement to swap two integers without using a 3rd memory location.
An example of this is shown below
DECLARE @Arg1 Int, @Arg2 Int SET @Arg1 = 5 SET @Arg2 = 7 SET @Arg1 = @Arg1 ^ @Arg2 SET @Arg2 = @Arg2 ^ @Arg1 SET @Arg1 = @Arg1 ^ @Arg2 SELECT @Arg1,@Arg2 |
Applying bits to the problem
Suppose that we stored a response to a question as a single number and used the bits to indicate a yes/no answer for the possible answers. Let us go back to our "What languages do you speak?" question. By using bits we could score them as follows
Language | Bit | Value |
---|---|---|
English | 1 | 1 |
French | 2 | 2 |
German | 3 | 4 |
Italian | 4 | 8 |
Bengali | 5 | 16 |
Farsi | 6 | 32 |
Chinese | 7 | 64 |
Welsh | 8 | 128 |
If someone responded saying that they spoke English and Welsh then their answer would be recorded as 1+ 128 = 129.
So we have a single integer for our answer and 3 integers to store our ideal profiles. "Must have", "Must not have" and "Should have".
Against the response an additional integer could hold the eventual score for the response.
This reduces the amount of data that is stored dramatically thereby making data transmission much more efficient.
Downside
OK, we’ve cut out about 80% of our storage requirements; the problem is that the normalised version allowed a more or less complete SQL solution.
Whether or not this IS a problem is open to interpretation. Just because you can do the entire thing in SQL doesn’t mean that you should do the entire thing in SQL.
However, storing our responses in bits means that now need SQL and a front end application to score the results.
Bitwise logic
Selecting the initial records
If someone must have responded in a certain way or must not have answered in a certain way then this is a black and white response. If either condition provides an unsatisfactory answer then the whole respondent can be rejected.
The comparison we would use would be
If ("Profile" AND "Must Have") ="Profile" Then accept it.
If ("Profile" AND "Must Not Have") = 0 Then accept it.
For accepted records we also need to remove the "must have" score.
Or in SQL
SELECT B.*,Score = B.Answer – A.MustHave FROM dbo.Tbl_Profile AS A INNER JOIN dbo.Response AS B ON A.QuestionnaireId = B.QuestionnaireId AND A.QuestionID= B.QuestionId WHERE (A.MustNotHave & B.Answer) = 0 AND (A.MustHave & B.Answer) = A.MustHave |
Extracting the "Should Select" options
Having brought back the records that satisfy our compulsory requirements we now have to separate out the answers that match our "Should have" profile, and those answers that are in excess of our "Should have" profile.
Step 1 is to "OR" our results together as in the table below.
Welsh | Chinese | Farsi | Bengali | Italian | German | French | English | Value | |
---|---|---|---|---|---|---|---|---|---|
Profile | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 205 |
Answer | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 85 |
OR | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 221 |
We wanted someone who could speak Welsh, Chinese, Italian, German and English
We are evaluating someone who speaks Chinese, Bengali, German and English.
To find out which elements of the profile were satisfied we take our profile AND the Answer as shown below.
Welsh | Chinese | Farsi | Bengali | Italian | German | French | English | Value | |
---|---|---|---|---|---|---|---|---|---|
Profile | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 205 |
Answer | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 85 |
AND | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 69 |
To find out which elements of the profile were not satisfied we must take our OR result and XOR it with the actual answer as shown below.
Welsh | Chinese | Farsi | Bengali | Italian | German | French | English | Value | |
---|---|---|---|---|---|---|---|---|---|
OR | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 221 |
Answer | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 85 |
XOR | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 136 |
Et voila we can see that our candidate doesn’t speak Welsh or Italian.
Conversely, to find out which answers our candidate has in excess of requirements we would take our OR result and XOR it with our profile as shown below
Welsh | Chinese | Farsi | Bengali | Italian | German | French | English | Value | |
---|---|---|---|---|---|---|---|---|---|
OR | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 221 |
Profile | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 205 |
XOR | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 16 |
And here we can see that our candidate speaks Bengali, which we did not require.
Translating this into SQL
In the real application I did the bitwise operations in the application rather than within SQL mainly because there were other bitwise operations that I needed to perform that have no direct SQL equivalents. I also wanted to keep all major operations compartmentalised.
It should be noted that bitwise operations are usually VERY fast. As far as I am aware all CPU's have these functions in their ALU (Arithmentic and Logic Unit) therefore any compiled language should implement bitwise operations efficiently.
I could have requested my two values in the SELECT statement of my original query as follows
SELECT A.*, Score = B.Answer – A.MustHave ExtraProfile= (B.Answer |A.MustHave ) ^ B.Answer , ExtraAnswer= (B.Answer | A.MustHave) ^ A.Answer FROM dbo.Tbl_Profile AS A INNER JOIN dbo.Response AS B ON A.QuestionnaireId = B.QuestionnaireId AND A.QuestionID= B.QuestionId WHERE (A.MustNotHave & B.Answer) = 0 AND (A.MustHave & B.Answer) = A.MustHave
The final bitwise piece
The final piece of the puzzle was to translate the two scores back into individual bit positions corresponding to our answers.
This was done in the front end application because the action to do this involves a bitwise shift. These numbers were then posted into a stored procedure as parameters to calculate and store the weighted score for the particular answer.
The weighting was done in two stages.
Firstly, when the questionnaire creator specified the questions and answers the answers table they filled in a weighting column to allow specific answers to be up-weighted or downgraded. As shown in the table below
Language | Weight |
---|---|
English | 2 |
French | 0 |
German | 1 |
Italian | 1 |
Bengali | -1 |
Farsi | -1 |
Chinese | 3 |
Welsh | 1 |
Here we can see that English and Chinese were the most highly prized skills, where as the ability to speak the indic languages actually detracted from the respondents score.
Secondly the questionnaire creator could set a threshold for the number of answers allowed for a given question. This means that although there are 5 languages we prize, if someone speaks all 5 this may detract from their score because they could be considered over qualified.
On the whole the respondents score would be calculated by: -
- Adding the score produced by their full response.
- Subtracting the score for the profile elements that do not appear in the response.
- Subtracting a point for every answer over a threshold amount.
End game
Once the responses were stored it became a simple matter to list the respondent details highest score first.
A refinement that was made at a later stage was the ability to score a respondents answers against a new questionnaire to use as a predictive tool.
For example, the "what languages do you speak?" question could be incorporated into another questionnaire, and the original response used to predict respondents behaviour in hypothetical scenarios.
Overall the project provided an interesting challenge with an opportunity to resurrect an old skill.