Huge table - slow loading

I have a huge table (over 30 million records) that I need to connect to. I would like to filter it by date so that one days worth of records are visible at at time. The problem is, no matter how I build the foundset there is a huge delay (many minutes) in displaying the data. I have tried a number of things including using a simple SQL query and getDataSetByQuery - with a returned record limit of 100 records.

The databaseManager.getDataSetByQuery runs very quickly, but when I do controller.loadRecords(dataset), it takes many minutes to complete - even though, in theory, there were only 100 records in the dataset. The form datasource is the table in question and it seems that whenever I do anything with the form, it grinds to a halt. For instance, I originally had the following:

databaseManager.getDataSetByQuery( controller.getServerName(), query, args, maxReturnedRows);

which took many minutes to run. When I replace ‘controller.getServerName()’ with the server name - as in:

databaseManager.getDataSetByQuery( “my_server”, query, args, maxReturnedRows);

the query ran in less than a second. It seems any time I reference the form - whether or not I am actually loading records - the system grinds to a halt. So I can do the query and get the dataset, but I cannot load it into the form.

This is definately a size problem as all other forms work well even with 100s of thousands of records. . I have a slightly smaller table with about 8 million records that is also delayed but not by nearly as much.

There must be a way to deal with large tables such as this - any help would be appreciated. :?

Rob.

you need to turn on the Javascript Profiler, and also use the Performance Data Page so you can see what is going on in your app. Could be many different issues.

rtiterle:
databaseManager.getDataSetByQuery( controller.getServerName(), query, args, maxReturnedRows);

which took many minutes to run. When I replace ‘controller.getServerName()’ with the server name - as in:

databaseManager.getDataSetByQuery( “my_server”, query, args, maxReturnedRows);

the query ran in less than a second.

I’d say: the problem is in your code somewhere.
So take goldcougars advise and turn on the profiler to see where in your code this huge amount of time is taken…
It could be just 1 line like databaseManager.convertToDataset(xxxxxxxxx) that’s bugging you.

Thanks Scott,

I have turned on the Profiler - it is not telling me anything new though. I have been around Servoy for a while and know SQL well, but I recently (days ago) jumped from 3.5 to 5.2 so I still have much to learn. I have walked through the app and as I mentioned it stalls when it hits any mentioend of the form controller. Running the query itself is quick - it will return the dataset very quickly (seconds), but when I try to load the dataset into the form foundset it stalls. As I mentioend, it even stalled on the getServerName call (for the form based on the big table).

Also, If i go to SQL explorer, there is no issue pulling up records from this table.

How can I use the Profiler to reveal more info and where is the Performance Data page? A quick look in help was not very revealing.

Thanks!

Rob.

rtiterle:
I have turned on the Profiler - it is not telling me anything new though

Can’t you see where the biggest amount of time is taken? (so which method)
In this case you can track down where about the problem resides.

rtiterle:
it hits any mentioend of the form controller.

as said: I’d place my bet on code that’s being executed.

rtiterle:
where is the Performance Data page?

http://:8080/servoy-admin

Is the stall only the first time you access the controller? If so, I think that the first time you access the controller is also the first time you access the form and thus Servoy starts loading the the default foundset.

You should be able to see what queries are fired at the database by opening up the admin page, going to the Performance Data page and clearing it, then run the code that is slow and refresh the Performance page to see which queries have run.

Now, it can be that the initial foundset for the form has an initial sort on a non-indexed column, making the retrival very slow. If this is the case, you might be able to remove such sorting, or add a proper index.

You can also stop Servoy from initializing the foundset, either globally using databaseManager.setCreateEmptyFormFoundsets() or by calling foundset.clear() in the onLoad event of the form

Paul

Hi Marc,

I am sure I am doing somethign wrong, but it is so simple I can’t see where. I created a test app with one form with a few fields based on the table in question. Loading the form took many many minutes. I then added a method with the code below. You can see that any time I hit the controller, regardless of whether I am loading records (ie even with the getServerName call), it bogs down.

var maxReturnedRows = 100; 
var query = "select * from transactdata where proc_dt = '11/17/10'";   //<---- runs in about 1 second in SQL editor
var args = null;
var sname = controller.getServerName();  // <---- very slow (many minutes)
var dataset = databaseManager.getDataSetByQuery(sname, query, args, maxReturnedRows);  // <---- very fast (1 second)
controller.loadRecords(dataset);  // <--- very slow (many minutes)

I have not yet been able to use servoy to handle large tables. It works great for (relatively) smaller tables. I am only interested in dealing with a days worth of data and will always filter on date, but none of this seems to matter at this point.

Again, I really appreciate the help!

Rob.

Hi Rob,

please take a close look at what Paul pointed out.
He certainly has a point saying that you should prevent loading of all records on a foundset at first call: as you already mentioned yourself, in this case you’re always interested in just records of today.
Use one of the solutions he posted to do so.

also try using controller.loadRecords(query, args).

So reviewing the performance data is also not very revealing… there were a number of queries all of which were completed in milliseconds with the slowest showing a 5 second total time.

It does not seem to be anything happening on SQL or the time to return the result set - yet it is dependant on the size of the table. The initial query on form load (I assume) shows a simple query loading the top 200 records. This comleted in less than 3 ms.

Now I think I am even more confused. Is this a Servoy limitation/bug? Why would it be delayed when calling getServerName?

dazed and confused.

Rob.

Any chance you can post your testsolution?
Then others can take a look into it.

Hi Rob,

I did some testing myself but I can’t reproduce what you are seeing.

I will explain what I did so others could use this to create test data in PostgreSQL.
First I created a table named ‘hugetable’ with 3 columns, one ID, a number and a timestamp.

# CREATE TABLE hugetable AS SELECT i AS id, i*3 AS num, '2010-01-01'::timestamp + (i*INTERVAL '1 hour') AS dt FROM generate_series (1,30000000) i;
SELECT 30000000

So now I have a table with 30 million rows.
Of course Servoy requires a primary key so lets add one.

# ALTER TABLE hugetable ADD PRIMARY KEY (id);
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "hugetable_pkey" for table "hugetable"
ALTER TABLE

Now I want to filter on the dt column which holds a timestamp, but I want to search on dates (no time) only. So to let the database use an index for such a query we add an index with the function we are gonna use in the SQL.

# CREATE INDEX hugetable_dt_idx on hugetable (date_trunc('day', dt));
CREATE INDEX

The date_trunc(‘day’,dt) function will store only dates with the time set to 00:00:00 in the index.

So now we have created the following table:

# \d hugetable
             Table "public.hugetable"
 Column |            Type             | Modifiers 
--------+-----------------------------+-----------
 id     | integer                     | not null
 num    | integer                     | 
 dt     | timestamp without time zone | 
Indexes:
    "hugetable_pkey" PRIMARY KEY, btree (id)
    "hugetable_dt_idx" btree (date_trunc('day'::text, dt))

Next is to create a test solution with a form connected to this ‘hugetable’.
In this form I created a button to load the data with the following code:

var _sQuery= "select id from hugetable where date_trunc('day', dt) = '2010-11-17 00:00:00'",
	_dStart = new Date(),
	_sServer = controller.getServerName();
application.output("getServerName : " + (new Date() - _dStart));

_dStart = new Date()
var _ds = databaseManager.getDataSetByQuery(_sServer, _sQuery, null, 100); // result is in fact 24 records
application.output("getDataSetByQuery : " + (new Date() - _dStart));

_dStart = new Date()
controller.loadRecords(_ds);
application.output("loadrecords : " + (new Date() - _dStart));

When pressing the button I get the following output:

getServerName : 1
getDataSetByQuery : 2
loadrecords : 5

That’s in milliseconds. So hardly the minutes you are seeing so I wonder what is really going on.
I am using Servoy 5.2.2 with PostgreSQL 9.0.1.

Hi Robert,

Your test app is basically identical to one of the variations of mine. Did you use databaseManager.setCreateEmptyFormFoundsets()?

The differences are I am using mssql, and my records are larger - 12 fields - although all small. As I mentioned before, however, the queries to the sql server are executing in the sorts of time frames you were getting - a few ms. the problem seems to be when i access the controller for the table, whether or not I am going out to the database.It feels as though it is doing a table scan every time I access a controller function - however, I don’t see any sql queries in the performance page related to that, nor any that take more than a handful of ms.

We have worked around this problem in the past by chopping up data on the sql server, but moving forward I really need this to work. I have tired so many things I think I’ll go back and start again with a new test app modeled directly from your code so we can do a comparison - in fact I’ll set up a test table that mirrors yours as well.

Thanks for your help - I’ll let you know how I fare…

Rob.

Hi Rob,

I created a 12 column table this time to match your setup:

CREATE TABLE hugetable AS SELECT i AS id, '2010-01-01'::timestamp + (i*INTERVAL '1 hour') AS dt, i*3 AS num1, i*4 AS num2, i/3 AS num5, i^2 AS num6,i*3 AS num7, i*4 AS num8, i/3 AS num9, i^2 AS num10 FROM generate_series (1,30000000) i;

So the table now looks like this:

# \d hugetable
             Table "public.hugetable" 
 Column |            Type             | Modifiers 
--------+-----------------------------+-----------
 id     | integer                     | not null
 dt     | timestamp without time zone | 
 num1   | integer                     | 
 num2   | integer                     | 
 num5   | integer                     | 
 num6   | double precision            | 
 num7   | integer                     | 
 num8   | integer                     | 
 num9   | integer                     | 
 num10  | double precision            | 
Indexes:
    "hugetable_pkey" PRIMARY KEY, btree (id)
    "hugetable_dt_idx" btree (date_trunc('day'::text, dt))

With 30 million rows this is about 3.9GB worth of data.

Then in the test solution I tested it with foundset.clear() in the onLoad, databaseManager.setCreateEmptyFormFoundsets() in onSolutionOpen and without any of these.
In the onShow I called the loadData method when I had the form loaded with an empty foundset. And a button on it that calls the same loadData method.
The slowest times I recorded where these:

*onShow Event*
getServerName       : 0
getDataSetByQuery   : 27
loadrecords         : 18
---------------------------------
*Button Event*
getServerName       : 1
getDataSetByQuery   : 2
loadrecords         : 1
---------------------------------

These tests were all done in the debug client and the slowest times were right after a restart of Servoy Developer (which makes sense).

Rob,

I see you do a ‘select * …’ and put the result in controller.loadRecords().
This function expects a dataset with just the pk columns, if you pass in more columns the remaing are ignored.
Try ‘select pk …’ instead so you have the pks and not maybe another column.

Rob

OK, Rob, using your code, I have finally got to set up a clean test. Here are the results:

getServerName : 0
getDataSetByQuery : 15
loadrecords : 217863

There is obviously something going on with loadRecords. After the records are loaded, the app becomes very sluggish. it takes foreever to display, select a menu etc. I changed the maxrecordsreturned from 100 to 30 and thegetDataSetByQuery time reduced from 15ms to 1ms, but the loadrecords time was almost identical.

Any idea what could possibly be going on? I am not even sure where to start looking.

Rob.

Hi Rob,

Now what happens when you load the SQL straight in the controller?

controller.loadRecords(yourSQL)

This should give the same resultset but will only load the first 200 like Servoy does.

Hi Rob,

Did you ever got this solved? Did the SQL in the controller.loadRecords() help out ?
Loading a dataset into a controller will load ALL records in the dataset. Using the SQL in the loadRecords() will only load the first 200 and load the rest as needed.