Any thoughts on optimizing this Sybase SQL query?

The below query works fine, until I run it against 300,000 records in the time_expense table. Then it takes about 5.5 seconds to return the results, which is about 5.5 seconds too long. I need to optimize this query, but am unsure how to do that.

SELECT
	ROUND (SUM(TE.hours)/ TOTAL, 3) * 100                                             
	AS EFFICIENCY,
	DATEFORMAT(TE.slip_date, 'Mmm') + '-' + DATEFORMAT(TE.slip_date,'yy') AS MyMonthName,
	MONTH(TE.slip_date)AS MyMonth,
	YEAR(TE.slip_date)AS MyYear, 
	(	SELECT
			SUM(TE2.hours) 
		FROM
			time_expense TE2 
		WHERE
			TE2.ol_client_id = 1 AND
			MONTH(TE2.slip_date)= MyMonth AND
			YEAR(TE2.slip_date) = MyYear 
	)
	AS TOTAL 
FROM
	time_expense TE
		LEFT JOIN project P 
		ON TE.project_id = P.project_id 

WHERE
	TE.ol_client_id = 1 AND
	TE.charge_status_id = 1 AND
	P.flag_internal IS NULL 
    AND MyMonth IS NOT NULL
GROUP BY
	MyYear,
	MyMonth,
	MyMonthName 
ORDER BY
	MyYear ASC,
	MyMonth ASC

Basically, the query is calculating an efficiency, the hours entered for internal projects / the total hours entered, for the same month and year. The problem line is

(	SELECT
			SUM(TE2.hours) 
		FROM
			time_expense TE2 
		WHERE
			TE2.ol_client_id = 1 AND
			MONTH(TE2.slip_date)= MyMonth AND
			YEAR(TE2.slip_date) = MyYear 
	)
	AS TOTAL

as this nested query its running multiple times. I know there must be a better way to write this query, but I can not seem to come up with one. Here is what the output looks like:

 EFFICIENCY         MyMonthName     MyMonth     MyYear     TOTAL              
 -----------------  --------------  ----------  ---------  ------------------ 
 85.1               Jan-06          1           2006       53178.08000000002  
 85.2               Feb-06          2           2006       41087.31999999999  
 93.60000000000001  Mar-06          3           2006       65744.43           
 93.2               Apr-06          4           2006       48181.11000000001  
 91.7               May-06          5           2006       44332.75           
 89.9               Jun-06          6           2006       83862.48           
 92.9               Jul-06          7           2006       67954.55           
 96.5               Aug-06          8           2006       110806.87          
 94.39999999999999  Sep-06          9           2006       71438.16999999998  
 93.60000000000001  Oct-06          10          2006       71290.48           
 96.1               Nov-06          11          2006       103765.2           
 93.2               Dec-06          12          2006       62118.97

Any suggestions would be appreciated.

Gary,

You could try replacing the innner-select with an outer join:

    SELECT
       ROUND (SUM(TE.hours)/ TOTAL, 3) * 100 AS EFFICIENCY,
       DATEFORMAT(TE.slip_date, 'Mmm') + '-' + DATEFORMAT(TE.slip_date,'yy') AS MyMonthName,
       MONTH(TE.slip_date)AS MyMonth,
       YEAR(TE.slip_date)AS MyYear,
       , SUM(TE2.hours) AS TOTAL
    FROM
       time_expense TE
          LEFT JOIN project P
          ON JOIN TE.project_id = P.project_id
	  LEFT JOIN time_expense TE2
          ON TE2.ol_client_id = 1 AND
             MONTH(TE2.slip_date)= MyMonth AND
             YEAR(TE2.slip_date) = MyYear
    WHERE
       TE.ol_client_id = 1 AND
       TE.charge_status_id = 1 AND
       P.flag_internal IS NULL
        AND MyMonth IS NOT NULL
    GROUP BY
       MyYear,
       MyMonth,
       MyMonthName
    ORDER BY
       MyYear ASC,
       MyMonth ASC

Hope this helps,

Rob

Nope, that takes even longer (I cancelled the query after about 30 seconds).

gdotzlaw:
The below query works fine, until I run it against 300,000 records in the time_expense table. Then it takes about 5.5 seconds to return the results, which is about 5.5 seconds too long. I need to optimize this query, but am unsure how to do that.

The fact that is only taking 5.5 seconds is pretty amazing considering you’re running a separate query for every row returned. There is no “optimizing” this. Well, at least not significantly.

This is where the age-old transactional vs reporting data discussion needs a line drawn. This kind of an output is not transactional and expectations and approach should be adjusted accordingly.

The idea with reporting data is that you can live with a certain degree of real time lag in order to gain immediate response. To achieve this, fill a reporting table that munges together complex data at regular intervals and then run a much simpler/faster query against the reporting table to get your fast response. The data being returned is “old” by however long your refresh rate is on your reporting table.

If the user absolutely needs a real time snapshot now, you can additionally give them the option to pull in the real time data – but now they are prepped that it will take a bit of time to fulfill their request.

Thanks David. I was beginning to think this would be the direction I would need to go. The historical data won’t change (sums for 2006, 2007, 2008 etc), and only new records for the current year (and recent months) are going to be added. So, if I did the bulk of the sum calculations and stored the results in a separate table, then I could just use a procedure to pull recent data from the time_expense data and update the sums in the results table. Might be a good use for a stored procedure that is triggered periodically (like once a day) just to keep the results for the last 6 months fresh.

Sound like the right direction?

gdotzlaw:
Thanks David. I was beginning to think this would be the direction I would need to go. The historical data won’t change (sums for 2006, 2007, 2008 etc), and only new records for the current year (and recent months) are going to be added. So, if I did the bulk of the sum calculations and stored the results in a separate table, then I could just use a procedure to pull recent data from the time_expense data and update the sums in the results table. Might be a good use for a stored procedure that is triggered periodically (like once a day) just to keep the results for the last 6 months fresh.

Sound like the right direction?

Yup.

Note 1: Another option to a stored procedure is a Servoy batch processor. I tend to lean towards keeping all business logic in the Servoy layer and keeping the database layer as dumb as possible. But sometimes a procedure just makes the most sense.

Note 2: Even if you go with stored procedure, triggering procedures with a batch processor can give you some additional flexibility. Like a runtime user interface to manage all timed events.

There is a way to achive realtime results fast.
It requires you to update an additional table with cumulative results and a timestamp at every transaction. This wil ultimately slow down your updates but in many cases that delay is not significant.
This table containing the cumulative figures can now be queried for 2 records, the record corresponding with the beginning of the timeframe and the record corresponding with the end of the timeframe. Simply substracting these two records will yield your result, it doesn’t require all intermediate records to calculate the sum so your query runs a lot faster. You would need to keep cummulative versions of all terms in the expressions you use in your calculations.