Using grep on text before importing
A good way to introduce FileMaker Pro database design is to use a simple real-world example, something that most people would find useful. The basic idea is to take a list of information which you’ve found on the Web and turn it into an easily searchable database (DB). I’ll use my newest project, which is a small DB of rental housing (for my imminent move to San Diego).
First you find the info on the Web site, drag down to select it, then copy/paste it into a text editor, such as BBEdit Lite (free, http://www.barebones.com
). The reason you copy/paste it is to eliminate all the [image] tags and stuff that would clutter it if you saved as text.
Once you’ve got the text into BBEdit, it’s time for a little text massaging (not very relaxing, but fun nonetheless). Look carefully at the structure of the text. Generally you’ll have lines of info separated by one or two carriage returns. Each line will have its own structure, with each piece of data followed by others in a repetitive pattern. See if there is any way to identify the different kinds of data.
For example, in the rental text, there was the name of the Area, the Price, then a long Description, ending (usually) with a phone number. There were just spaces between them, so I had to see what else was unique to each. The price had a dollar sign in front of it, then numbers. The telephone numbers all had the same area code.
OK, that’ll do. Time to break out the Grep (Global Regular Expression Pattern) tool. This is a system of text search and replacement developed on Unix. BBEdit Lite supports it, as does Word (with slightly different syntax). It’s very useful, even at the beginners’ level (my level). There is a short primer in the BBEdit documentation. Feel free to experiment. There’s also an Undo command. You’ll need it.
In BBEdit, you just click Use Grep in the Find dialog box, and any other choices, such as Start at Beginning, and fill in your “pattern.” In this case my Find was for “\$#+” (the “\” backslash allows a literal find for “$”, which is otherwise a special character, “#+” means a number followed by other numbers-but not text). In the Replace box I put “\t&\t” (“\t” is a tab, “&” is whatever the previous match was-the price in this case, “\t” is another tab).
With another Grep operation I got the phone number out. I just found the area code, then added a tab in front of it. If there were different area codes you could just identify the structure of the phone number; such as ###-###-#### or \(###\) ###-#### (using backslash once again to literally find the special character “(“). (BTW, you don’t use the quotes in the dialog box; it’s just hard to write without them.)
The basic structure of most Mac spreadsheets or databases is:
Row (SS) or Record (DB): data tab data tab data
Carriage Return
Next Row or Record: data tab data tab data
(each data is a SS column or DB Field)
I now had:
Area tab Price tab Description tab Phone
Return
Area tab Price tab Description tab Phone
To create a database out of this is easy. You can either create the above Fields (Area, Price, Desc., Phone) first in the DB (including ClarisWorks), and use the Import dialog (File Menu), or just Open the text file with FileMaker, choosing Tab-separated text files in the dialog.
In the latter case, FM will create generic fields (f1, f2, f3, etc.) to hold the data. If you had the names of the data (columns) in the first line (row) of your text file, they will appear in the fields of the first record. Just go to the Define Fields dialog (File Menu), select f1, change to “Area,” etc.. Click Save for each field, then Done at the end.
You can also open or import an existing spreadsheet, saved in one of the common formats, such as SYLK.
Back to the text for a moment:
Your original text data may be in short blocks, with one return after each piece of data, and double-returns after each block, like this:
Name
Address
City, State Zip
Then you need to use slightly different replacements, actually easier. You just change all double-returns to some weird characters, not found in the text.
Find: “\r\r” Replace: “XXXX”
Then change all remaining single returns to tabs. Find: “\r” Replace: “\t”
Change comma-space to tabs.
Find State and change to State tab.
Finally, change XXXX back to returns.
If there are many states you could use a fancier Grep on the Zip.
Find: “#####$” (means 5 numbers at the end of a line). Replace: “\t&”
Now you have the proper structure:
Name tab Address tab City tab State tab Zip
RETURN
One more way-cool Grep I just learned; [^ ]+ means literally “not space,” for runs of characters (like words). Check it out. It’s like a funny foreign language for computer text.
Oh, I almost forgot the easiest Grep of all. If you’ve downloaded a table, with figures or words separated by two or more spaces, then just replace double-spaces with a tab. Replace extra tabs, “\t+”, with a single one. If needed, strip tabs off the ends of lines by replacing “\t\r” with “\r”.
All right, enough Grep. The big question is, “Why go to so much damn trouble?”
It’s true that it’s more difficult to create a database than to just read the text. But then it’s so much easier to put to some use. One or more fields can be used to create a found set. Examples would be; find all rentals in one area, or all rentals under a set price.
Once the database is created, you can easily add new information by importing it as new records. You can use the Import dialog to line up the new data with your existing fields, even if it’s in a different order.
You can also add whatever additional fields you want, not to mention extra layouts, scripts, etc., to quickly do what you want. My little rental database has another detailed layout, with only one rental on the screen. I’ve added fields for the agent’s name, date called, when I should call back, when I have an appointment, storage space, priority, comments, etc..
Another common use of these simple databases is to create address lists for merge mailing. I’ve made another database that extends this idea further, tracking correspondence and responses. FileMaker is quite capable of creating a decent business letter all on its own, eliminating the need for merging to a word-processor. It has a tab ruler, and the ability to add graphics.
In fact, in the information management department, there’s little that FileMaker can’t do. If you need any of that, it’s the way to go. Other database applications are even harder to learn and use. So keep on file’n and I’ll see you next month, with tips on Layout.
If you know more about Grep (or any of this), please send me info or Web addresses. Questions are always welcome, though, as I’ve said, advanced beginner is all I am.
Fenton Jones
Fenton Jones is a FileMaker database designer and consultant, based in San Diego, CA. FileMaker is a cross-platform rapid-development tool for affordable relational databases. If you have need of a FileMaker Pro expert, please be sure to visit his home page at http://www.fentonjones.com
Leave a Reply
You must be logged in to post a comment.