Extracting Data Efficiently

Yeroc · October 22, 2010, 9:12pm

All,

We’ve had an ongoing issue for awhile now where we’ve been struggling to find a way to efficiently export a foundset as a csv file. The use case is that the user does a number of queries and then wants to export the resultant foundset. The foundset may contain anywhere from a handful of records to 100,000+ records and the export needs to refer to information across several relations.

Things we’ve tried:

Iterating over the foundset. This performed very badly since it ends up triggering a SQL query against the database every 200 records so a 100,000 record foundsest ends up generating 500 queries against the database each of which were running fairly slowly. This took forever.
The next thing we tried was to use the databaseManager.getSQL() / databaseManager.getSQLParameter() calls to extract the required query so we could run it directly ourselves. This performed well but unfortunately we had to abandon it due to case #243603 (under certain circumstances the SQL returned by this call refers to temporary tables that don’t exist).
Finally we switched to writing our own custom SQL for each export and use databaseManager.getFoundSetDataProviderAsArray() to get the list of primary keys (sometimes we have to call this a couple of times to extract composite keys) and then run the query ourselves based on the list of pks. This however starts performing very poorly as the number of keys gets large (on the order of 15 minutes or more for 10,000 or so records).

What we’re looking at now is to modify the last option to use temporary tables to hold the list of pks and join against the temp table when querying to improve performance but before we go to these lengths we wanted to check to see if we’re missing something obvious since this seems like a common requirement and there may be a better solution that we’ve overlooked.

Thanks,
Corey

jcarlos · October 23, 2010, 8:30am

Have you tried the rawSQL plugin?

ROCLASI · October 23, 2010, 10:43am

Hi Corey,

When doing these kinds of exports or even transformations then it’s wise to keep it all in the database layer and not in the application layer up until the point you are fetching the result.
So yes, use temp table(s) as a staging area. Also use INSERT INTO table (SELECT statement) so that you don’t even fetch it from the database into the application just to put it back again into the temp table(s). This makes things MUCH faster.
http://www.postgresql.org/docs/9.0/stat … nsert.html

When you start working with temp tables you also should work with database transactions or else Servoy will use multiple connection from the connection pool and only one of those connections (sessions) hold your temp table(s).
So using a database transaction forces Servoy to use only 1 dedicated connection for your session. Up until the point you commit/rollback.
So if you do this a lot with lots of clients you might want to make sure your connection pool is also large enough.

Hope this helps.

Yeroc · October 23, 2010, 8:27pm

jcarlos,

I’m not sure how the raw sql plugin would help our situation? Or are you suggesting to use it to create the temp table? In that case, yes, that’s how we were planning on implementing it.

ROCLASI,

The problem with the INSERT INTO table (SELECT …) idea is how to get the SQL from a foundset reliably. Theoretically we can use the databaseManager.getSQL() call but as I noted in my original post this doesn’t always work.

We had been planning on using transactions so we could use real temp tables but thanks for bringing that up. Interestingly, Servoy foundsets don’t seem to use SQL temp tables but use real table instead (at least on Oracle). It’s really too bad foundsets don’t keep the JDBC ResultSet open on the server side for a few seconds so that when iterating over a foundset quickly it could just keep pulling records out of the ResultSet. This would be really fast and would mean we wouldn’t have to jump through all these hoops… Maybe some day…

Corey

sbutler · October 25, 2010, 3:01am

Why not something like this:

var ds = databaseManager.convertToDataSet(foundset); 
var csv = ds.getAsText(',','\n','"',true)
//then write to file with File plugin.

Note that in the convertToDataSet you can optionally pass in an array of column names you want from the foundset.

mboegem · October 25, 2010, 7:27am

Hi Scott,

I agree with you that this can work very well, we use this solution for 95%.
So what happened to the last 5%?

These are cases this solution just doesn’t work, due to - mainly - 3 reasons (order of importance):

the number of related columns that have been used
size of the foundset
available memory.

Unfortunately we had to ‘hard code’ these exports into plain SQL to retrieve the dataset. Meaning it has become some x times faster as well!

For the actual ‘write-to-textfile’ routine I’d recommend something which was posted by Jan Blok some time ago:
http://forum.servoy.com/viewtopic.php?f=22&t=13866&p=72676

Yeroc · November 1, 2010, 5:19pm

Scott,

Just to see how it would perform (as noted by mboegem this solution wouldn’t work for us due to the potential for running out of memory) I tested the code snippet you suggested and the big problem is that the databaseManager.convertToDataSet() doesn’t really seem to be much faster than simply iterating over the foundset directly. At least from a database standpoint it seems to pull out the data in chunks rather than with a single SQL query so we have the same issue there as we do when iterating over a foundset – too many database queries.

Thanks for your input though – for small numbers of records that’s a nice succinct way to do it.

Regards,
Corey

jcompagner · November 2, 2010, 8:48am

i guess the best and fastest way to do this is to do it in a server side plugin…
That takes the connection and gets the sql you want to feed (from option number 3)
and then iterates to the resultset, streaming the contents to a csv file (not keeping everything in memory)

Yeroc · November 3, 2010, 3:10am

Johan,

Ok, I’d reached the same conclusion. As I noted the one sub-optimal part of the solution is that for composite keys we have to make multiple calls to databaseManager.getFoundSetDataProviderAsArray() which becomes fairly expensive. On my test foundset containing 88,000 records it takes about 15 seconds per call to retrieve each key and some of our tables have 3 fields making up the composite key so making those extra calls adds significant over-head. Do you think it would be possible for a future version of Servoy to optimize the databaseManager.convertToDataSet(foundset, [providers]) method so that composite PKs could be extracted in a single SQL select in the same manner that getFoundSetDataProviderAsArray() performs today?

At any rate, implementing this all as a plugin makes extraction of large foundsets feasible. I was able to extract the 88,000 record foundset mentioned above to a CSV file in a little over a minute which is sufficient for our purposes. Previously, it was taking over a 30 minutes (!) to extract 10,000 records. My test foundset was not based on a composite key though so I’m not suffering from the extra over-head mentioned above in this particular example.

Corey

jcompagner · November 4, 2010, 9:02am

i guess you see in the admin performance pages 3 times a query if you have a composite key of 3?

you could make a feature request so that the second argument of databaseManager.getFoundSetDataProviderAsArray() can be an array of strings so that we return a multi dim array

Yeroc · November 4, 2010, 3:29pm

Thanks Johan.

In the case of databaseManager.getFoundSetDataProviderAsArray() yes, we see one SELECT per call. In the case of databaseManager.convertToDataSet(foundset, [providers]) though we see many SELECTs per call (1 every 200 records).

I’ve created case #332712 based on your suggestion. I had previously created a separate enhancement request (case #332303) to improve the performance of databaseManager.convertToDataSet(foundset, [providers]) so that it (wherever possible) issues a single database query to extract the dataset rather than naively iterating over the foundset. I suppose your suggestion is more memory-efficient (simple multi-dimensional array vs. the more complex JSDataSet) but on the other hand optimizing databaseManager.convertToDataSet(foundset, [providers]) would require no API changes and immediately benefit anyone already using it in their code. I leave it to you to decide which is the better course of action.

Thanks again.
Corey

rgansevles · November 5, 2010, 4:50pm

Corey,

We have optimized databasemanager.convertToDataSet() to not iterate over the foundset if possible.
This is included in Servoy In 5.2.3

Rob

Yeroc · November 5, 2010, 8:58pm

Thanks a lot Rob! Corey

david · November 8, 2010, 5:53pm

rgansevles:
We have optimized databasemanager.convertToDataSet() to not iterate over the foundset if possible.
This is included in Servoy In 5.2.3

Nice.

Topic		Replies	Views
Transfer data Classic Servoy	8	4642	December 2, 2013
csv export speed Classic Servoy	6	2493	June 9, 2010
Export Excel Classic Servoy	1	1656	May 8, 2006
restrictions of getAsText() function of JSDataSet? Classic Servoy	32	25536	June 9, 2010
Problem with databaseManager.getDataSetByQuery Classic Servoy	7	3294	November 29, 2012

Extracting Data Efficiently

Related topics