Large Table cleanup

Questions and answers regarding general SQL and backend databases

Large Table cleanup

Postby jdbruijn » Thu Mar 05, 2020 12:00 pm

I want to cleanup a large postgresql table (9M records). At one time it was created as a 1-1 relation to another table (parent), but now we are using this child table a bit differently which results in me wanting to delete about 2/3 of its data. And also in the past years the data has become corrupted as well, there are records in the child table that no longer have corresponding records in its parent table so I want to cleanup this as well.
In the new setup I want only those record in the child table remaining where a specific flag in the parent table is set
Simplified Schema:
Child table: child_id: INTEGER, parent_id: INTEGER
Parent table: id: INTEGER, flag: INTEGER

I want to do this in a way that has as minimum impact as possible to the users.
I cannot find a single query that does what I want in a reasonable amount of time.
What I have come up with so far is this:
- created temp flag field in child table
Code: Select all
alter table child add column flag integer default null;

- copy value from parent table
Code: Select all
update child as c
set flag = p.flag
from parent p
where c.parent_id = p.id and p.flag = 1;

- delete all the records in child where the flag = null
Code: Select all
delete from child where flag is null;

- remove temp flag field
Code: Select all
alter table child drop column flag;


What I have found trying to run this locally as a test is that added the temp flag column takes a lot of time (still running after 1 hr) and I assume that when I run this query it places a lock on the table, preventing users from accessing the other data in this table (the records I want to delete are not used or accessed anymore by users)
Any suggestions on how to optimize this?
Does deleting the data in one go (after the tmp flag is set) result in the table being locked for as long as the query is running?
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm

Re: Large Table cleanup

Postby ROCLASI » Thu Mar 05, 2020 2:44 pm

Hi Jos,

There is no need to add a field to mark records for deletion. I fact that whole process only adds more locking contention.
With a simple LEFT JOIN you can find your orphan records, like so:

Code: Select all
-- get count of orphans
SELECT COUNT(1)
FROM child
    LEFT JOIN parent ON (child_id=id)
WHERE id IS NULL;


And to delete them:
Code: Select all
-- start transaction
BEGIN;

-- remove the orphans
DELETE FROM child
WHERE child_id IN (
    SELECT child_id
    FROM child c
        LEFT JOIN parent p ON (child_id=id)
    WHERE id IS NULL
);

-- uncomment the line you want to use...
-- commit changes
-- COMMIT;

-- or rollback changes
-- ROLLBACK;


Hope this helps.
Robert Ivens
SAN Developer / Servoy Valued Professional / Servoy Certified Developer

ROCLASI Software Solutions / JBS Group, Partner
Mastodon: @roclasi
--
ServoyForge - Building Open Source Software.
PostgreSQL - The world's most advanced open source database.
User avatar
ROCLASI
Servoy Expert
 
Posts: 5438
Joined: Thu Oct 02, 2003 9:49 am
Location: Netherlands/Belgium

Re: Large Table cleanup

Postby ROCLASI » Thu Mar 05, 2020 3:04 pm

Hi Jos,

jdbruijn wrote:And also in the past years the data has become corrupted as well, there are records in the child table that no longer have corresponding records in its parent table so I want to cleanup this as well.


Sounds to me you are not using foreign-key constraints in the database or else this would have never happend. These constraints will ensure the sanity of your data.
I suggest you apply these database constraints on all your relations where you can.

Hope this helps.
Robert Ivens
SAN Developer / Servoy Valued Professional / Servoy Certified Developer

ROCLASI Software Solutions / JBS Group, Partner
Mastodon: @roclasi
--
ServoyForge - Building Open Source Software.
PostgreSQL - The world's most advanced open source database.
User avatar
ROCLASI
Servoy Expert
 
Posts: 5438
Joined: Thu Oct 02, 2003 9:49 am
Location: Netherlands/Belgium

Re: Large Table cleanup

Postby ROCLASI » Thu Mar 05, 2020 3:46 pm

Hi Jos,

To speed things up you could create a new child table with the non-orphans and swap (rename) that old table out with the new one. Of course you do need to apply your indexes (before renaming).

Hope this helps.
Robert Ivens
SAN Developer / Servoy Valued Professional / Servoy Certified Developer

ROCLASI Software Solutions / JBS Group, Partner
Mastodon: @roclasi
--
ServoyForge - Building Open Source Software.
PostgreSQL - The world's most advanced open source database.
User avatar
ROCLASI
Servoy Expert
 
Posts: 5438
Joined: Thu Oct 02, 2003 9:49 am
Location: Netherlands/Belgium

Re: Large Table cleanup

Postby jdbruijn » Thu Mar 05, 2020 4:43 pm

Hi Robert,
I tried your suggestion for using the join to delete the orphaned records and after 1h40 min it still is not done (on my local database).
Maybe creating a new child table and only copy the records I want will be faster, I will try that later.
Jos de Bruijn
Focus Feedback BV
Servoy Certified Developer
Image
jdbruijn
 
Posts: 492
Joined: Sun Apr 11, 2010 6:34 pm


Return to SQL Databases

Who is online

Users browsing this forum: No registered users and 8 guests