Marking and finding duplicates
Example file: Phone.fp3
Oops. Another article (last month’s), another error. Fortunately it probably didn’t cause too much confusion, unless you tried to save IDs for more than about 16,000 records. I said that the global text field for saving them could hold 64,000K. That should be 64K or 65,536 characters (give or take). If you try to do more it will tell you that you’ve exceeded the text field limit. That can also happen if you don’t put a step in your script to either select the whole field or set it to “” (nothing) before pasting; after a while those numbers add up.
This month I’ll tackle the dreaded DUPLICATES. There’s little point in storing records if you have two for the same person with different addresses in them, one from a old address they moved out of a year ago. FileMaker has a good built-in duplicate search facility; you just type “!” in a field in Find mode. But it has some limits.
For one thing, you have to decide just what is a duplicate. It’s quite possible that several people could have the same name; it’s also possible that people with different names could be living at the same address, related or unrelated. Once you’ve decided, then you should create a concatenated (put together) field with the criteria. You can also use some of the text functions to keep the size down.
Duplicate Field Calculation
Here’s an example Dupe field (calculation, text) for a contact-type file:
Dupe = Left (First, 3) & Left (Last, 3) & Right (Main Phone, 4) Search with a “!” in that field and you should only get one person. Remember, no more than 20 characters without a space in an ID field, and less than 60 total (indexing limits).
One problem with the built-in Find is that it finds all the records that are the same, including the original. Then you have to look at them to decide what to do with them. You can’t write a good script that would safely delete them. If you write a looping script that finds duplicates then deletes every other record, it will screw up if it ever hits a triplicate (and if there are two, there could be three). Even if there aren’t, looping scripts can be very slow.
Another limit of Finds is that you have to run the search. It would be better if you could be notified automatically while trying to create a new record. To do that you’ll have to use a self-relationship. First, you must have an auto-entered serial ID for each record (always); I use a text field, with a letter followed by zeros in front (e.g., A0001).
Self-Relationship
Create the relationship (I’ll call it Self\Dupe) in the same file with the Dupe field on both sides; Dupe= ::Dupe (same Dupe field from earlier).
Create a new calculated field, type number, Dupe Mark=
If (SerialID = (Self\Dupe::SerialID), “”, 1)
The serial ID will only equal itself for the first record of the relationship. If you enter a new record and the same data is in the Dupe field, it will get marked with the 1.
Duplicate Warning
Now create your warning. It would be a calculated text field.
If (Dupe Mark = 1, “DUPLICATE”, “”)
Put it on the data entry layout where there’s some space. Set it to not allow entry and take it out of the tab order. Make the text red, but with transparent background and no line. It normally won’t show, but enter a duplicate and it will jump out. Since it’s based on a relationship it’s unstored, so it doesn’t even require exiting a field or record to trigger it. (Which is important. It doesn’t do much good to have a warning that only shows up after the user leaves the layout. You can’t count on them clicking outside of any fields before exiting.)
When You Want Duplicate Records
Using the same Duplicate relationship you can do a number of things. Often times you want duplicate records for a person or item in a database, and you need to track them. You may have several invoices for a person, or whatever; the principle of self-relation is still the same. Use a field that is always the same in the group of records and create a self-relationship on that field. Usually it’s a text ID field.
If you want to find out how many records, just count it with a calculated number field:
Count Dupe = Count (Self\Dupe::Serial ID)
It will instantly update. You can use similar self-relationships with an identifying key to total any number fields in its set of records. For example, let’s say you had a number field, Charges. Create a calculated field, Person’s Charges=
Sum (Self\Dupe::Charges)
You could get the same total with summary functions, but this one is more reliable, as it doesn’t depend on sort order. You can even use it in other calculations, such as, Balance Due=
Sum (Self\Dupe::Charges) – Sum (Self\Dupe::Payments)
They are unstored calculations, therefore slow, but so is sorting and going into Preview mode (another way to get this total).
Marking First or Last Entries
If you want to mark which is the first entry in the group, use this:
If (SerialID = Self\Dupe::SerialID, 1, “”)
It’s just the opposite of the duplicate calculation.
To find the latest entry you can use the Maximum function, but you have to do a little compensation for the fact that the Serial ID is a text field.
If (TextToNum(SerialID) = Max(Self\Dupe::SerialID), 1, “”)
This will also find all non-duplicates.
To find all duplicates that are not the latest entry, exchange the 1 and “”;
If(TextToNum(SerialID) = Max(Self\Dupe::SerialID),””, 1)
or use “‚” (does not equal);
If(TextToNum(SerialID) ‚ Max(Self\Dupe::SerialID), 1, “”)
If you want to use other criteria for the first entry other than when it was entered, then you can do that also, by using the minimum or maximum aggregate functions.
For example, the record with the biggest payment for a person:
If (Payment = Max(Self\Dupe::Payment, 1, “”)
Deleting Duplicates
You can now reliably find a set of only the duplicates of records, any number of them, excluding either the first or last entries. You can then add the single script step:
Delete All Records
That should do it.
Once again, I must disclaim. Make these marking fields visible on a list view layout. Experiment on test records until you are sure that they are doing what you want before committing to an automated Delete All script. Don’t worry, once you’ve got them down they will work well, with probably less error than if you did it by hand (well, than if I did it, anyway).
I’ve included a sample file, Phone.fp3, to illustrate how it works. You can modify it and add your own records. Have fun. If you have problem or specific question, please write me and I’ll try to answer.
Fenton Jones
Fenton Jones is a FileMaker database designer and consultant, based in San Diego, CA. FileMaker is a cross-platform rapid-development tool for affordable relational databases. If you have need of a FileMaker Pro expert, please be sure to visit his home page at http://www.fentonjones.com
Leave a Reply
You must be logged in to post a comment.