Prepared statement with datetime value

PREPARE myplan (timestamp, timestamp) AS
SELECT glb_employee_working_time_id
FROM glb_employee_working_time
WHERE start_date <= $1 AND end_date >= $2;
EXECUTE myplan ('2015-11-03', '2015-11-03');
EXECUTE myplan ('2015-11-03', '2015-11-03');
EXECUTE myplan ('2015-11-03', '2015-11-03');
EXPLAIN ANALYZE EXECUTE myplan ('2015-11-03', '2015-11-03');

[attachment=0]prepared.png[/attachment]

SELECT glb_employee_working_time_id FROM glb_employee_working_time WHERE start_date <= '2015-11-03' AND end_date >= '2015-11-03';
SELECT glb_employee_working_time_id FROM glb_employee_working_time WHERE start_date <= '2015-11-03' AND end_date >= '2015-11-03';
SELECT glb_employee_working_time_id FROM glb_employee_working_time WHERE start_date <= '2015-11-03' AND end_date >= '2015-11-03';
EXPLAIN ANALYZE SELECT glb_employee_working_time_id FROM glb_employee_working_time WHERE start_date <= '2015-11-03' AND end_date >= '2015-11-03';

[attachment=1]normal.png[/attachment]
Differen index is chosen…

Yep, looks like a case of a bad query-plan for these input values.
The prepared statement is fetching 24874 rows from the index and then filters those to get to the result of 2 rows. The plain sql one fetches only 3 rows from the index to filter that to the resulting 2 rows. Much less work.

By the way shouldn’t you be passing the same values as in your concatenated query? I.e. including the time component.
Not sure if it will make any difference but who knows.

Edit: Nevermind, I looked at the explain output and not the query. You did pass the same values.

Hi Michel,

You always query on both date columns?
Perhaps a multi-column index will straighten things out.
Try this:

-- create the index
CREATE INDEX gb_employee_working_time_start_end_idx ON gb_employee_working_time (start_date, end_date);

-- update the statistics
ANALYZE gb_employee_working_time;

-- run all queries again

Thanks, didn’t think of that! I will give that a try.

Queries running slower because of prepared statements is a common issue. The problem is there are no sample values provided for what the arguments will be, so the query planner makes a very generic plan. However, if the real values were provided, it would know the cardinality of the particular values and build a proper query plan. I believe there is an option in the JDBC interface to do this, but Servoy doesn’t. So, in large databases, you’ll notice difference. In my experience, I’ve seen several minute difference in large sets of data. So, here is what we do.

This is a core function we use. It protects you from SQL injection, yet still runs it as a merged/flattened query:

function flattenSQL(sql, args) {
	for(var i=0; args && i<args.length; i++){
		//find ? to replace
		var q = sql.indexOf("?", 0)
		
		//determine type
		var v = args[i]
		if(typeof v == "number"){
			sql = sql.substring(0, q) + v + sql.substring(q + 1)
		}
		else if(typeof v == "string"){
			sql = sql.substring(0, q) + "'" + v + "'" + sql.substring(q + 1)
		}
		else{//then a date
			var d = utils.dateFormat(v,"yyyy-MM-dd HH:mm:ss")
			sql = sql.substring(0, q) + "'" + d + "'" + sql.substring(q + 1)
		}
	}
	
	return sql
}

So, code that used to look like this:

var sql = 'SELECT id FROM jobs WHERE name = ?'
var args = ['test']
var ds = databaseManager.getDataSetByQuery('server_name', sql, args, -1);

Now looks like this:

var sql = 'SELECT id FROM jobs WHERE name = ?'
var args = ['test']
//flatten the sql
sql = flattenSQL(sql, args)
args = null
//then run it
var ds = databaseManager.getDataSetByQuery('server_name', sql, args, -1);

With that, you’ll get much faster performance out of Postgres.

What is Servoy’s comment on this? I would say it’s shocking that users fill their foundsets not with the regular functions (find/search) but via a backdoor (getDataSetByQuery) because of performance reasons.

If it is common issue that prepared statements run slower in some cases I would like to have some kind of switch to not let Servoy use prepared statements in these cases. I’ll admit it will be hard to address these cases but the Performance Data in the application server admin pages can be a nice starting point.

goldcougar:
With that, you’ll get much faster performance out of Postgres.

Not just with Postgres.
We have seen the same behaviour with MSSQL Server.
Differences can be substantial…

m.vanklink:
What is Servoy’s comment on this? I would say it’s shocking that users fill their foundsets not with the regular functions (find/search) but via a backdoor (getDataSetByQuery) because of performance reasons.

If it is common issue that prepared statements run slower in some cases I would like to have some kind of switch to not let Servoy use prepared statements in these cases. I’ll admit it will be hard to address these cases but the Performance Data in the application server admin pages can be a nice starting point.

Problem by wrapping every query the way Scott proposes is that you will get a lot of unique queries which will ask a lot more resources from the database.
So solving your initial problem by doing a full rewrite, could create another.
Besides there’s no guarantee all queries will run faster, maybe some will, others don’t.

Over the years I’ve been working on numerous Servoy solutions with a mix of different databases (pg, mysql, ms sql server, oracle, sybase), mainly using prepared statements and never ran into issues.

I’d say, solve this particular query and keep a close eye on the performance data in the application server
If you want this to be automated there’s plenty of tools and/or examples out there that can help you do this.

mboegem:
Problem by wrapping every query the way Scott proposes is that you will get a lot of unique queries which will ask a lot more resources from the database.
So solving your initial problem by doing a full rewrite, could create another.
Besides there’s no guarantee all queries will run faster, maybe some will, others don’t.

Over the years I’ve been working on numerous Servoy solutions with a mix of different databases (pg, mysql, ms sql server, oracle, sybase), mainly using prepared statements and never ran into issues.

I’d say, solve this particular query and keep a close eye on the performance data in the application server
If you want this to be automated there’s plenty of tools and/or examples out there that can help you do this.

99% of the time your query will run slower as a Prepared Statement in Postgres. It so common, its on the FAQ page: FAQ - PostgreSQL wiki People generally use Prepared Statements to prevent SQL Injection, not for speed.

It also won’t use up more resources from the database. Using Prepared Statements lets Postgres save the query execution plan so it doesn’t have to figure it out the next time you send the same query with different parameters. So, by not using Prepared Statements, Postgres has to figure out the query execution plan for each query. However, on the vast majority of queries that is a very simple and quick step. Only on some very complex query plans will Prepared Statements have any real impact, and even then, it still might be slower. In addition, on a high transaction system, the query plan that you generate using a Prepared Statement might be fine at the time you ran the query, but what if the server was under heavy load when you sent the first Prepared Statement, and it resulted in a poor query execution plan? Now your stuck with that for the rest of the client session. There is some advantage to letting Postgres choose the best query execution plan at the point of time you want that query executed. Postgres, particularly more than other databases I’ve worked with, has more variance in the query execution plans.

The reason most people don’t run into issues is they have simple queries and/or small databases. I’m dealing with databases where there are billions of rows, and using prepared statements takes the system down to a very slow speed adding multiple minutes to each query. I can also confirm this technique as we consulted with Andrew Dunstan from PG Experts: https://pgexperts.com/team/andrew_dunstan/ one of the major contributors to the PostgreSQL project. One of the first things they did was to create some Postgres functions that we had written which take in parameters, and then flatten them in the Postgres function and send the entire query, not as a Prepared Statement, and it vastly improved speed.

PS. This shouldn’t be a problem any more in Postgres 9.2, however I haven’t tested that myself to confirm, but the docs say that speed issue with prepared statements have been improved

Hi Scott,

goldcougar:
This is a core function we use. It protects you from SQL injection, yet still runs it as a merged/flattened query:…

Actually that doesn’t protect you from SQL injection at all. All it does is replace the question mark placeholders with the parameter values. Actually it would also replace any question marks in string literal values in your SQL like the following example:

SELECT * FROM myTable WHERE textColumn='does this work?' AND otherColumn=?

I tend to use the following code for these kind of merges which also does some sanity checking:

function mergeSQL(_sQuery, _aParam) {
    var _aResult = [],
        _nParam = 0,
        _bInString = false,
        /** @type {Date} */
        _d;

    for (var i = 0; i < _sQuery.length; i++) {
        if (_sQuery[i] == "'") {
            _bInString = !_bInString;
        } else if (!_bInString && _sQuery[i] == "?") {
            if (_aParam[_nParam] === undefined) {
                throw new Error("Not enough values passed");
            }
            if (typeof _aParam[_nParam] === 'string') {
                _aResult.push("'" + _aParam[_nParam] + "'");
            } else if (typeof _aParam[_nParam] == 'date') {
                _d = _aParam[_nParam]; // just to fix build markers
                _aResult.push("'" + utils.dateFormat(_d, "yyyy-MM-dd HH:mm:ss") + "'");
            } else if (typeof _aParam[_nParam] === 'number') {
                _aResult.push(_aParam[_nParam]);
            } else {
                throw new Error("Unknown/unsupported parameter type in mergeSQL");
            }
            _nParam++;
            continue;
        }
        _aResult.push(_sQuery[i]);
    }
    if (_aParam[_nParam] !== undefined) {
        throw new Error("Not enough placeholders");
    }
    return _aResult.join('');
}

Quotes and double quotes can be escaped by repeating them: ‘’ or “” will turn into ’ and " respectively when you use them in a query. But I am not sure if this is standard across all vendors.
And then there is the option of ' and ". What do do with that? You don’t want to turn them into '’ or '“”. So sanitizing your inputs is a bit of work.
Perhaps anyone on the forum already has something in place for this?

Nice one Robert! We don’t use question marks as values, so never ran into that problem. I think the fact that we both have versions of these types of functions that flatten/merge SQL suggest this is something useful, and I would suggest the same practice to others.

Both examples are not safe when supplied values are untrusted.
What if a stringvalue = “’ || arbitrary_funtion() || '” ( || arbitrary_funtion() || ), then arbitrary_function() will be executed.
If we want to make it safe we would also somehow have to parse the supplied value.

goldcougar:
I think the fact that we both have versions of these types of functions that flatten/merge SQL suggest this is something useful…

Well, I tend to use it for debugging purposes so I have SQL ready to paste into my query editor :)

m.vanklink:
Both examples are not safe when supplied values are untrusted.

Correct, that’s what I stated in my last post. At least I thought I made that clear :)

ROCLASI:
… At least I thought I made that clear :)

I was kind of lost in quotes… :wink:

Okay, here is a quickie sanitizer function provided as-is.

It leaves ' and " alone and quotes the regular single and double quotes.
NOTE: use this on the parameters, not on the SQL itself.

/**
 * Best effort way of sanitizing input values
 *
 * @param {String} _sValue
 * @return {String}
 */
function sanitizeString(_sValue) {
    var _aResult = [];
    for (var i = 0; i < _sValue.length; i++) {
        if (_sValue[i] === '\\' && _sValue[i + 1] === "'") {
            _aResult.push(_sValue[i]);
            i = i + 1;
        } else if (_sValue[i] === '\\' && _sValue[i + 1] === '"') {
            _aResult.push(_sValue[i]);
            i = i + 1;
        } else if (_sValue[i] === "'") {
            _aResult.push("''");
            continue;
        } else if (_sValue[i] === '"') {
            _aResult.push('""');
            continue;
        }
        _aResult.push(_sValue[i]);
    }

    return _aResult.join('');
}

Usage:

var myCleanVal = sanitizeString("Bobby Tables';DROP TABLE students; --"); // https://xkcd.com/327/

Just be aware it might not prevent all cases of SQL injection. Use at your own risk.

Hope this helps.

m.vanklink:
What is Servoy’s comment on this? I would say it’s shocking that users fill their foundsets not with the regular functions (find/search) but via a backdoor (getDataSetByQuery) because of performance reasons.

If it is common issue that prepared statements run slower in some cases I would like to have some kind of switch to not let Servoy use prepared statements in these cases. I’ll admit it will be hard to address these cases but the Performance Data in the application server admin pages can be a nice starting point.

that we can’t do much about this, and no having an option to say we don’t use PreparedStatements is not an option. That is way to dangerous, all input can’t be trusted

Also about JDBC and be able to set cardinality , i can’t find that , and if that would be possible some how (so we really can give hints for a query) then the servoy developer needs to give us that, because we have no idea about data there is

After digging deeper on how prepared statements work and reading up on the many many sources about SQL injection and possible prevention I have to come to the following conclusion:

Use prepared statements.

This is why.
Prepared statements are parsed and planned by the query planner. After all this is done the parameters are added in the query-tree for execution. This means the parameters are no longer parsed and used as-is. Which means any SQL injected into these values will not be executed but used as values (as intended). This is what makes it safe.
The downside of this, as discussed at length in this thread, is that the query planner has to devise a query plan based on incomplete data. It doesn’t know what the passed values are and therefor doesn’t know the cardinality of these values. So it has to create a generic query-plan which can be efficient for some values and inefficient for others.
Now if you have a small database you will hardly notice the difference but when you have millions of rows this could be an performance issue. Depending on your use-case your mileage may vary.

So what about using plain sql where you concatenate all the values in your SQL statement?
Essentially it all comes down to your string values. Anything between 2 single quotes is considered a literal string. Anything outside these quotes are not strings and could be non-string values, SQL commands or logic.
And since we are using Java/JavaScript we have the benefit of checking what datatype a value is so we can safely assume numbers are numbers and dates are dates.
Consider the following:

SELECT * FROM users WHERE id='<id>' AND tenantid=1

And the following passed value:

' OR '1' = '1' --

Merging these together will give you the following SQL:

SELECT * FROM users WHERE id='' OR '1' = '1' --' AND tenantid=1

Mind you the – is the SQL comment so anything after – will be seen as a comment and not parsed/executed
So this type of SQL injection will now show ALL your user data.

What can be done against this? Escaping the quotes in your parameters.
So if we take the above value and escape it you get the following SQL:

SELECT * FROM users WHERE id='\' OR \'1\' = \'1\' --' AND tenantid=1

So now the SQL sees the passed value as is and the query will function as designed.

So that is simple to fix right?
Actually, no.
Each DB vendor has their own implementation of how it parses SQL. For example PostgreSQL has a feature called Dollar Quoted Strings which is a way of escaping a string by wrapping it inside $$$$. The thing is that when you have a $$ inside your string that that acts the same as doing a single quote. So everything behind the $$ will be seen as outside the string like so:

SELECT $$<value>$$

And the value:

test$$,now(),$$

Will result in:

SELECT $$test$$,now(),$$$$

This will result in ‘test’ and the current date.
So for PostgreSQL you need to escape these special characters as well. And MySQL has it’s own share of special characters. So does MSSQL and Oracle.
There seem to be other use cases as well that are exploitable like using ASCII() or CHAR() functions and the like.

I did find some proposed solutions online that encodes the values into HEX and then let it be de-encoded in the query. This might solve the issue of escaping but now you have another problem. Performance, which was the main issue of this thread. Encoding the HEX and de-encoding the HEX is not slow, it’s just that you usually don’t have indexes defined that support these. So your queries will not be using any indexes unless you created them as such (using functional indexes).

So long story short.
Be safe, use prepared statements. If it’s slow consider rewriting the query.
And if you really REALLY need to use plain sql queries make really sure your input values are valid. Perhaps check if they fall in a certain domain that is suitable for your query. Anything outside this domain should be rejected.
Even values that are already in your database can be a risk. Lets say you inserted them using a prepared statement and then later use those values in a merged SQL statement…I think you get the picture.
I guess the only safe use would be if your input values are generated by your own code like a date object or UUID.

Just my 2 cents.

Hope this helps.

Reading material:
https://www.owasp.org/index.php/SQL_Inj … heat_Sheet
https://en.wikipedia.org/wiki/SQL_injection
http://ferruh.mavituna.com/sql-injectio … sheet-oku/

Thanks Robert for this info !

What is the SQL injection risk profile on a typical server with both Servoy Application Server and PostgreSQL installed ?

Regards,

Hi Lambert,

A typical solution will probably use all Servoy objects (foundset and use controller.find() and the like). Servoy uses prepared statements out of the box so your risk for SQL injection is practically non-existent.
The moment you use your own SQL that you merge together with external values into a single string then you have to be very careful that you validate these values. Number and date values are not a threat because of the way Java/JavaScript works. A number value (of the type Number) can’t hold anything other than a number value, ditto with dates and UUID’s. String values however can hold characters and commands that can cause your SQL to misbehave.
So it’s up to you to asses the risk of where these values come from and the amount of scrutiny you need to apply to them.
But using prepared statements by way of foundset.loadRecords(SQL, Params) and getDataSetByQuery(server, SQL, Params,-1) is always the safest way. If you end up with slow queries take a good look at your query plan and perhaps rewrite your query and/or create another index that does make it faster. And if you really really need to use merged sql then scrutinize your input values.

Hope this helps.

Many thanks again Robert !

Regards,