I have a quantity of records with four fields, a date field, three integer fields. Is there a way of finding duplicates without looping through all records testing each one for other records that match the current contents of the four fields?
Harry Catharell:
Why not use a SQL statement to select the pks where the field values equal the values that you currently have and then count the pks returned !
I’m either misunderstanding you or vice versa. I’m not looking to match a single set of standards against all records, but to find all records which match another within the foundset.
Why don’t you try (out of my head, in case of an error you have to “fine-tune”):
SELECT primary_key FROM table t WHERE COUNT (SELECT primary_key FROM table t2 WHERE t.date = t2.date AND t.field1 = t2.field1 AND t.field2 = t2.field2 AND t.field3 = t2.field3) > 1
patrick:
Why don’t you try (out of my head, in case of an error you have to “fine-tune”):
SELECT primary_key FROM table t WHERE COUNT (SELECT primary_key FROM table t2 WHERE t.date = t2.date AND t.field1 = t2.field1 AND t.field2 = t2.field2 AND t.field3 = t2.field3) > 1
Patrick, how are you setting those variables “t” and “t2”? (I assume that’s what they are). Clarification, if it’s needed. There’s just one table, we’re looking for internal duplicates in a single table, in this case named “calls”.
Morley, what I wrote is SQL only. No variables. The “t” and “t2” are so called table aliases. A table alias is handy:
SELECT t.column_1, t.column_2 FROM a_pretty_long_table_name t
is nicer than
SELECT a_pretty_long_table_name.column_1, a_pretty_long_table_name.column_2 FROM a_pretty_long_table_name
If you want to answer your question using SQL, you need a self join. You are asking for records of a table that have a match in the same table.
My statement wasn’t quite right, though. It should work like this:
SELECT primary_key FROM table t WHERE (SELECT COUNT(*) FROM table t2 WHERE t.date = t2.date AND t.field1 = t2.field1 AND t.field2 = t2.field2 AND t.field3 = t2.field3) > 1
This means:
give me all primary_keys from table where there are more than 1 records with the same content in date, field1, field2 and field3
The second “SELECT” is a so-called subquery. I suggest that you google for subqueries, table aliases and self joins.