strange JOIN issue

Questions and answers regarding general SQL and backend databases

strange JOIN issue

Postby jdbruijn » Wed Jun 17, 2015 12:57 pm

In my database I store answers to survey questions, for this I have 2 tables: an answer table (1 record for each person that answered the survey) and an answeritem table (contains all the answers that that person entered into the survey)
The answeritem table looks like this:
id (integer)
answer_id (integer)
question_id (integer) -> references the question for this answer
value (varchar)

Each survey has multiple questions. One of the questions is a score from 0 to 10. one of the other questions is text area. We now want to show all the answers that were entered for this text area for persons that gave a score of 10
So I use the following sql:
Code: Select all
SELECT ai.value, CAST(score.value AS INT) AS SCORE
FROM answer a
LEFT JOIN survey s ON a.survey_id = s.survey_id
LEFT JOIN answeritem score ON a.answer_id = score.answer_id AND s.score_question = score.question_id -- This join is used to get the correct score record in the answeritems table
LEFT JOIN answeritem ai ON a.answer_id = ai.answer_id AND ai.question_id = s.text_question -- we only want to see the answers for a specific question, the id for this question is stored in the surveys table (each survey has a diffent question ID for the text area question)
WHERE CAST(SCORE.return_value AS INT) = 10)


As soon as I add the WHERE clause to my sql it fails with an error: Invalid input syntax for integer: "some text value".
The text value is in the answeritem table, but not in records that should be available for the SCORE join.
Using google I found a way around this by using the following WHERE clause:
Code: Select all
WHERE SCORE.return_value ~ '^\d+$' AND CAST(SCORE.return_value AS INT) = 10)

My questions are:
1. Why does the original WHERE clause not work? Isn't the join specific enough to only filter the records in answeritem that contain score values?
2. Is this fix the best solution for this problem, or is there a better solution I can use for this?

The problem is related to my joins, because when I modify the second join into:
LEFT JOIN answeritem score ON a.answer_id = score.answer_id AND score.question_id = 11 -- Only show answers for 1 specific survey (where the score question has id 11)
Then the error also gone. I just don't understand why using the original join results into a different dataset.

I hope my explanation is clear enough :|
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm

Re: strange JOIN issue

Postby omar » Mon Jun 22, 2015 2:19 pm

I think the problem is that your score field is of type charvar and that you are casting it to int. What if somebody does not answer a question or enters an evil response on purpose? Then you are casting null or some other invalid string to int. What happens if somebody entered 10+? Maybe you are using validation to prevent this but if the field is numeric it will be less error-prone in my opinion. If you need to store additional information for some answers then I would suggest adding a separate field for that. If you really want to leave it like this then test for return_value = '10'. Hope this helps.
Intrasoft, Founder
Omar van Galen
omar@intrasoft.nl
+31-(0)6-21234586
Servoy Developer
omar
 
Posts: 377
Joined: Sat Feb 12, 2011 4:51 pm
Location: Intrasoft, The Netherlands

Re: strange JOIN issue

Postby jdbruijn » Mon Jun 22, 2015 3:22 pm

Hi Omar,

The responses for the score field are fixed, the user has to selected it from a list. So I can tell you with 100% certainty that it are valid integers.
Like I said in my original post, I already have a workaround for my problem. But I am curious why SQL behaves like it does in this case.
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm

Re: strange JOIN issue

Postby omar » Mon Jun 22, 2015 3:55 pm

I believe you ;-) However, the error message *strongly* suggests that you are casting an invalid string at some point. The empty string for example produces the exact same error:

SELECT ''::integer;

For me it's a design consideration, if you need a number, use a number field.
Intrasoft, Founder
Omar van Galen
omar@intrasoft.nl
+31-(0)6-21234586
Servoy Developer
omar
 
Posts: 377
Joined: Sat Feb 12, 2011 4:51 pm
Location: Intrasoft, The Netherlands

Re: strange JOIN issue

Postby jdbruijn » Mon Jun 22, 2015 4:11 pm

I understand your point, but I'm trying to understand why SQL behaves like this.
The data set that the join defines only returns values that can be converted to integers.

The error message Invalid input syntax for integer: "some text value". When I look in the database for "some text value" it is not in a record that should be accessable through the join, so why is SQL trying compare it in my where clause?
I'm not trying to be difficult :P , just trying to understand the SQL logic here.
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm

Re: strange JOIN issue

Postby ROCLASI » Mon Jun 22, 2015 6:55 pm

Hi Jos,

I would start with looking at what the join actually returns (so without the where clause) and take it from there.
Also I would then return all the columns involved in the joins and such.
Robert Ivens
SAN Developer / Servoy Valued Professional / Servoy Certified Developer

ROCLASI Software Solutions / JBS Group, Partner
Mastodon: @roclasi
--
ServoyForge - Building Open Source Software.
PostgreSQL - The world's most advanced open source database.
User avatar
ROCLASI
Servoy Expert
 
Posts: 5438
Joined: Thu Oct 02, 2003 9:49 am
Location: Netherlands/Belgium

Re: strange JOIN issue

Postby omar » Thu Jun 25, 2015 3:00 pm

If this WHERE clause solves your problem then you would expect replacing

Code: Select all
WHERE SCORE.return_value ~ '^\d+$' AND CAST(SCORE.return_value AS INT) = 10)


by

Code: Select all
WHERE SCORE.return_value !~ '^\d+$'


to show you the problematic record(s), does it?

One other point, you use LEFT OUTER JOINS, are there answers that don't belong to questions or questions that don't belong to surveys? If not I would recommend using INNER JOINS, it might make your query a little faster and prevents casting null values.

If this does not help could you provide a small solution with sample data? I know you have a workaround but I like solving SQL puzzles :-)
Intrasoft, Founder
Omar van Galen
omar@intrasoft.nl
+31-(0)6-21234586
Servoy Developer
omar
 
Posts: 377
Joined: Sat Feb 12, 2011 4:51 pm
Location: Intrasoft, The Netherlands

Re: strange JOIN issue

Postby jdbruijn » Thu Jun 25, 2015 3:37 pm

Hi Omar,

Changing the where clause as you susgested:
Code: Select all
WHERE SCORE.return_value !~ '^\d+$'

Results in an empy result, just like I expected.

I will try to create a small database with sample data for these tables to show you the problem.
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm

Re: strange JOIN issue (SOLVED)

Postby omar » Fri Jun 26, 2015 3:11 pm

Had a look at this and the problem turned out to be in records outside of the query scope that contained strings that could not be converted to integers.

The reason that this query fails is that PostgreSQL (and other databases as well) determine an executionplan that may involve applying the where clause (with the CAST) on all records of the table, even if part of those records are excluded by the join conditions.

The validity of a query should not depend on the order in which the query parts are executed.
Intrasoft, Founder
Omar van Galen
omar@intrasoft.nl
+31-(0)6-21234586
Servoy Developer
omar
 
Posts: 377
Joined: Sat Feb 12, 2011 4:51 pm
Location: Intrasoft, The Netherlands

Re: strange JOIN issue

Postby ROCLASI » Fri Jun 26, 2015 7:33 pm

So I guess when you wrap the query with the joins in another select that holds the where clause you should get the proper results like so:
Code: Select all
SELECT value, CAST(score AS INT) AS score
FROM (
    SELECT ai.value, score.value as score, score.return_value
    FROM answer a
    LEFT JOIN survey s ON (a.survey_id = s.survey_id)
    LEFT JOIN answeritem score ON (a.answer_id = score.answer_id AND s.score_question = score.question_id)
    LEFT JOIN answeritem ai ON (a.answer_id = ai.answer_id AND ai.question_id = s.text_question)
) t
WHERE CAST(return_value AS INT) = 10


Or writing it as a Common Table Expression (CTE) like so:
Code: Select all
WITH surveyanswers AS (
    SELECT ai.value, score.value as score, score.return_value
    FROM answer a
    LEFT JOIN survey s ON (a.survey_id = s.survey_id)
    LEFT JOIN answeritem score ON (a.answer_id = score.answer_id AND s.score_question = score.question_id)
    LEFT JOIN answeritem ai ON (a.answer_id = ai.answer_id AND ai.question_id = s.text_question)
)
SELECT value, CAST(score AS INT) AS score
FROM surveyanswers
WHERE CAST(return_value AS INT) = 10


Does this work for you ?
Robert Ivens
SAN Developer / Servoy Valued Professional / Servoy Certified Developer

ROCLASI Software Solutions / JBS Group, Partner
Mastodon: @roclasi
--
ServoyForge - Building Open Source Software.
PostgreSQL - The world's most advanced open source database.
User avatar
ROCLASI
Servoy Expert
 
Posts: 5438
Joined: Thu Oct 02, 2003 9:49 am
Location: Netherlands/Belgium

Re: strange JOIN issue

Postby jdbruijn » Mon Jun 29, 2015 4:27 pm

Thanks for the tip Robert.

I really should take a good look at when to use CTE in my queries.
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm

Re: strange JOIN issue

Postby omar » Tue Jun 30, 2015 5:43 pm

Sorry Robert, but that does not work either, you will get the same error for the same reasons. You expect PostgreSQL to do the inner query first and apply the outer criteria to the inner results and it might... but in this case it does not. It's just not a good idea to mix datatypes into one column. To simplify and clarify the issue that we are dealing with, consider the following two tables:

answers table
Code: Select all
answer_id        return_value       question_id
=========        ==============     ==========
1                 8                    2
2                 7                    2
3                 10                   2
4                 Lorum ipsum          1
5                 Whatever             1
6                 Something            1


questions table
Code: Select all
question_id     category
==========      ========
1                  A
2                  B


Code: Select all
select * from answers where cast(return_value as int)=10

The code above obviously fails and reproduces the error that Jos gets.


Code: Select all
select return_value from answers
inner join questions on questions.question_id = answers.question_id and questions.category='B'
where cast(return_value as int)=10

This code selects the records from the correct category which it should be able to cast but it fails as described.


Code: Select all
select * from (
select return_value from answers
inner join questions on questions.question_id = answers.question_id and questions.category='B') t
where cast(return_value as int)=10

Subquery does not solve this either, error still exists no matter where you try to do the casting.


Code: Select all
select * from answers
inner join questions on questions.question_id = answers.question_id and questions.category='B'
where return_value ~ '^\d+$' and cast(return_value as int)=10

The workaround Jos uses works because an extra validation is done in the where clause so it doesn't matter which part of the query is executed first.

Moral of the story is that database design is important. If you put different datatypes into one column there are ways to get it to work but it may bite you when you least expect it.
Intrasoft, Founder
Omar van Galen
omar@intrasoft.nl
+31-(0)6-21234586
Servoy Developer
omar
 
Posts: 377
Joined: Sat Feb 12, 2011 4:51 pm
Location: Intrasoft, The Netherlands

Re: strange JOIN issue

Postby jdbruijn » Wed Jul 01, 2015 8:13 am

Omar,
You are right that the subselect option from Robert is not working, but the CTE solution is working as I expect it to.
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm


Return to SQL Databases

Who is online

Users browsing this forum: No registered users and 4 guests