Finding duplicates

I have a quantity of records with four fields, a date field, three integer fields. Is there a way of finding duplicates without looping through all records testing each one for other records that match the current contents of the four fields?

Appreciated.

Hi Morley,

Why not use a SQL statement to select the pks where the field values equal the values that you currently have and then count the pks returned !

Doing it this way also means that you could load the duplicate record set using the pks - if you wanted to.

Cheers
Harry

Harry Catharell:
Why not use a SQL statement to select the pks where the field values equal the values that you currently have and then count the pks returned !

I’m either misunderstanding you or vice versa. I’m not looking to match a single set of standards against all records, but to find all records which match another within the foundset.

Record Date Field1 Field2 Field3
1 Today 100 200 300
2 Today 101 200 300
3 Today 100 200 300
4 Yesterday 123 234 300
5 Yesterday 124 234 300
6 Yesterday 123 234 301
7 Yesterday 123 234 300

Records 1, 3, 4 and 7 have duplicates. All others are close but no cigar.

Why don’t you try (out of my head, in case of an error you have to “fine-tune”):

SELECT primary_key FROM table t WHERE COUNT (SELECT primary_key FROM table t2 WHERE t.date = t2.date AND t.field1 = t2.field1 AND t.field2 = t2.field2 AND t.field3 = t2.field3) > 1

patrick:
Why don’t you try (out of my head, in case of an error you have to “fine-tune”):

SELECT primary_key FROM table t WHERE COUNT (SELECT primary_key FROM table t2 WHERE t.date = t2.date AND t.field1 = t2.field1 AND t.field2 = t2.field2 AND t.field3 = t2.field3) > 1

Patrick, how are you setting those variables “t” and “t2”? (I assume that’s what they are). Clarification, if it’s needed. There’s just one table, we’re looking for internal duplicates in a single table, in this case named “calls”.

Appreciated.

Morley, what I wrote is SQL only. No variables. The “t” and “t2” are so called table aliases. A table alias is handy:

SELECT t.column_1, t.column_2 FROM a_pretty_long_table_name t

is nicer than

SELECT a_pretty_long_table_name.column_1, a_pretty_long_table_name.column_2 FROM a_pretty_long_table_name

If you want to answer your question using SQL, you need a self join. You are asking for records of a table that have a match in the same table.

My statement wasn’t quite right, though. It should work like this:

SELECT primary_key FROM table t WHERE (SELECT COUNT(*) FROM table t2 WHERE t.date = t2.date AND t.field1 = t2.field1 AND t.field2 = t2.field2 AND t.field3 = t2.field3) > 1

This means:

give me all primary_keys from table where there are more than 1 records with the same content in date, field1, field2 and field3

The second “SELECT” is a so-called subquery. I suggest that you google for subqueries, table aliases and self joins.

Fabulous, that works!! Neat solution!

Thanks for putting me onto tables and subqueries.