Zip

A function to create a Zip archive from one or more containers and/or names of container fields; for the latter it collects the data from the found set or the related table

Originally I planned to write a simpler example that merely zips the passed containers, but then realized that it would be rather impractical. It is good to be able to accept an arbitrary number of parameters, but it cannot be the only way to select files to zip. In FileMaker in most situations the files you want to zip will be stored in a related table or in a found set. Certainly, it would be very inconvenient to zip them if the only thing you can do is to pass them all one by one to a function.

Zip archives can append files, so one could write a function that takes an existing archive (or none) and appends a file. This way the developer could run the function in a cycle:

  • Start with an empty archive and add the first file; get back the archive with a single file.

  • Use the archive as a starting point and append the second file; again, get back the updated archive with two files.

  • Continue in this manner until all files are appended.

Certainly, this is both tedious for the FileMaker developer and inefficient.

Fortunately, FileMaker plug-in API has the evaluate() function that lets us to call back to FileMaker to evaluate any expression. This way I can accept a field name and use it to fetch the data myself.


This example assumes you understand Python basics and focuses mostly on general logic and specific FileMaker features. The example also provides links to more detailed documentation, where applicable.

Sample code#Back to top

The full code of the function has been stored in the PyFM Tools file at the onegasoft/zip.py subpath. Note that you'll need to install onegasoft/__init__.py as well, otherwise Python won't be able to recognize the onegasoft module.

Function#Back to top

The function is supposed to be called so:

PyRun( "onegasoft.zip"; "zip"; name; data {; data; ... } )

Parameters#Back to top

"onegasoft.zip"Text

Module name (as implemented in PyFM Tools).

"zip"Text

Function name

nameText

Name of the archive. Should be a valid file name, although the function does not check this.

dataContainer, Text (field name)

Data to archive. Each data argument must be either a container or a name of a container field. For the latter the function will collect the data on its own; for example, if this is a name of a field from a related table, the function will get data from all related records.

At this moment all containers must store files; the function does not support containers that store only a reference to a file.

Result and side effects#Back to top

The function returns a container with a Zip archive that stores data from all the passed containers. The function tries to use the stored file name, but if they're not there, it generates provisional names like ‘File 2’ (second parameter) or ‘File 3.1’ (third parameter, data from the first record). For containers that have more than one data stream (see more about this in Code below) the function stores all the streams.

The function may raise the following errors:

  • TypeError if the name is not text, if there's no data, or if any of data parameters is not a container or does not resolve to a container.

  • ValueError if at least one of containers stores only a reference to a file.

  • filemaker.MissingFieldError if the passed field name is not available.

The function has no side effects.

Code#Back to top

I split the code into two parts: the first function parses the parameters, prepares a Zip file and then loops over the containers and asks the second function to add them to the Zip file.

Parsing parameters#Back to top

def zip(name, *items):
    """Zip the passed items into a single file with the given name."""
    if not isinstance(name, filemaker.Text):
        raise TypeError("The 'name' parameter must be Text, not %s." %
                type(name).__name__)
    if not items:
        raise TypeError("Expected at least one 'data' parameter.")
    vfile = cStringIO.StringIO()
    zfile = zipfile.ZipFile(vfile, 'w', zipfile.ZIP_DEFLATED)
    for i, item in enumerate(items, start=1):
        if isinstance(item, filemaker.Container):
            store(zfile, item, 'File %d' % i))
        elif isinstance(item, filemaker.Text):
            field = unicode(item)
            j = 1
            while True:
                try:
                    container = filemaker.evaluate('GetNthRecord( '
                            'GetField( "%s" ); %d)' % (field, j))
                    if not isinstance(container, filemaker.Container):
                        raise TypeError("The passed parameters must "
                                "eventually resolve to Containers, not "
                                "%s.", type(container).__name__)
                    store(zfile, container, 'File %d.%d' % (i, j))
                    j += 1
                except (filemaker.MissingFieldError):
                    raise filemaker.MissingFieldError("The field '%s' is "
                              "missing or is not available in the current "
                              "context." % field)
                except (filemaker.OutOfRangeError):
                    break
        else:
            raise TypeError("The 'data' parameters must be Containers or "
                    "field names, not %s.", type(item).__name__)
    zfile.close()
    return filemaker.Container(FNAM=name, FILE=vfile.getvalue())

As you see the Python function signature is fairly different from the FileMaker signature described above. The function name matches, of course, but the parameters may have any names.

In this case the Python signature doesn't even imply that at least one data parameter is required; I just check it later myself. That is, I could've written the function to match:

def zip(name, item, *items):
    ...

and then add the first item back to items:

...
items.insert(0, item)
for item in items:
   ...

But I decided it's simpler to me to have them in one place from the beginning and it's would be more educational to amplify such differences :)

Let's go through it step by step:

  1. I start by checking the type of the name parameter: if it is not Text, I raise an error.

  2. I then check if I got at least one data item; if not, I also raise an error.

  3. Once I know I got a name and at least one item I can start zipping them. I won't write the file to disk, so I use the cStringIO to create a virtual file-like StringIO object and then zipfile to create a ZipFile instance with this virtual file.

  4. Now I can loop through passed parameters. I need to track parameter numbers to make generic names for files that don't have file name streams, so I use the enumerate() function.

  5. For each parameter I check whether this is a Container or Text. If neither (look at the else at the end), I raise an error.

  6. If the parameter is a Container, I just store() it right away. (I describe the store() function in the next section.) The third parameter of store() is the file name stem to use in case the container has no embedded file name.

  7. If the parameter is Text, I assume it is a field name and try to get its contents from the whole found set or the related table. To do this I use the evaluate() method that can evaluate any FileMaker expression. I'll use the GetField() functions to get a field by name and GetNthRecord() to get this field's data from the specified record (which can come from this or related table).

    I do this in a never-ending loop (while True) for record numbers starting with 1 until I get the filemaker.OutOfRange error. It turns out FileMaker actually raises this error when you use GetNthRecord() with an invalid N.

    By the way, I discovered this by experimenting with evaluate() in console in the PyFM tools file.

    What's good here is that this code will work unchanged for fields in the same table or related table; it only depends on the field name.

    If evaluate() didn't raise an error, I see what I've got. If it's indeed a Container, I also store() it; this time I use a modified file name stem with a two-number index.

  8. If I get the OutOfRange error, its OK; it just means that there's no more records to fetch data from and I simply break out of the loop. But I may get other errors; for example, the field may not be there, in which case FileMaker will raise the MissingFieldError.

    Technically I don't need to handle it. If I don't, the code will simply abort and raise the error up to the top. But a generic FileMaker error is not especially helpful; all it tells is that a field is missing, but doesn't tell which one. My code, however, can tell this exactly, so I intercept this error and re-raise it with my own message that has the exact name.

  9. When I'm out of parameters, I close the zip file (this is necessary, because in the end the zip file writes a table of contents) and return a new Container with the specified name and the zip file data.

Zipping containers#Back to top

Now let's see what it takes to properly zip a FileMaker container. It's not as trivial as it seems.

A FileMaker container consists of multiple data streams. (See more about this in Container documentation.) For example, one stream stores file data and another stores file name. Images often have up to four streams: file name, image dimensions, original image data, plus the same data in JPEG or GIF formats if the original image format is not already JPEG or GIF (e.g. if you insert a BMP image, FileMaker will add a stream with a JPEG version).

Streams are identified by type, which is a four-character string. A container may only have one stream of each type. I don't know the full list of types, but some of typical types are that:

  • FILE: file data.

  • JPEG: image data, JPEG.

  • PNGf: image data, PNG.

  • PDF : PDF (note the trailing space).

  • FNAM: file name.

  • SIZE: image width and height.

All streams store raw bytes, except two: FNAM stores a string and SIZE stores two integers. A container is not required to have all the streams; for example, pasted images won't have the file name stream.

The stream type determines how FileMaker deals with the container. For example, if you put data of a JPEG file data into the JPEG stream and provide a valid SIZE, it will show up as an image; but if you put the same data into FILE, it will show up as a file, and try to show the file name from the FNAM stream instead.

To zip a container we need to decide two things:

  • How to deal with file name, if there is none?

  • How to deal with multiple streams?

Here's what I come up with for this sample:

  • If a container has a file name stream and a single data stream, then I store it under this name.

  • If a container has no file name stream or has multiple data streams, I construct names myself. If I have a file name stream, I use it as a base, but change the extensions, otherwise I expect to receive a hint on what name to use. (In our case the zip() function sends ‘File N’ or ‘File N.N’.)

    Then for each stream I construct the complete file name based on stream type. If the stream type is FILE (i.e. a generic file that can be anything), I'll use the .dat extension.

    For example, if I got a container with ‘My Bitmap.bmp’ image (remember that BMP containers have a companion JPEG stream), then the resulting archive will get ‘My Bitmap.bmp’ and ‘My Bitmap.jpeg’. If I got the same container without file name and with ‘File 5’ as a hint, it will end up as ‘File 5.bmp’ and ‘File 5.jpg.’ If I got no hint, I'll produce ‘Untitled.bmp’ and ‘Untitled.jpg.’

Now let's proceed to writing this companion function:

def store(file, container, stem='Untitled'):
    """Store the passed container in a zip file."""
    if not container:
        if filemaker.Text(container):
            raise ValueError("The function does not yet support "
                    "containers that store only a reference.")
    else:
        if 'FNAM' in container:
            name = unicode(filemaker.Text(container))
        else:
            name = None
        data_streams = [(stream, data) for stream, data in
                container.items() if stream not in ('FNAM', 'SIZE')]
        if name and len(streams) == 1:
            stream, data = data_streams[0]
            file.writestr(name, data)
        else:
            if name:
                 stem = os.path.splitext(name)[0]
            for stream, data in streams:
               try:
                   ext = zip_ext[stream]
               except (KeyError):
                   ext = '.' + unsafe.sub('', stream.lower())
               file.writestr(stem+ext, data)
zip_ext = {
    'BMPf' : '.bmp' , 'EPS ' : '.eps', 'FILE' : '.dat', 'GIFf' : '.gif',
    'JPEG' : '.jpeg', 'META' : '.emf', 'PDF ' : '.pdf', 'PICT' : '.pct',
    'PNGf' : '.png' , 'snd ' : '.snd',
}
unsafe = re.compile('[^a-z0-9_]')

Let's see what it does.

  1. The function starts by checking the container parameter. If it is empty, it is either completely empty or stores a reference. I can tell these two cases apart by reading the container as filemaker.Text: if the result is not empty, this is indeed a reference, else it's just an empty container.

    The function does not support references for now, but it wouldn't be right to silently ignore them, so if I spot a reference, I stop right there and raise a ValueError.

  2. If the container is not empty, I try to get the stored file name. To do this I check if the container has a FNAM stream and if yes, read it as Text to get the file name.

    This is a rather convenient method to avoid parsing the actual FNAM stream, which comes out as FileMaker reference (i.e. not just as abc.txt, but as file:abc.txt or even as a full path). This method doesn't raise an error though; if there's no file name, it will simply return a question mark (?)

    If FNAM is not there, I set the name to None.

  3. Then I read all data streams. Here I use Python list comprehension that gets data of all streams except FNAM and SIZE.

  4. Then I check if I have the name and a single stream. If so, I zip the stream data under the name.

  5. If I don't have a name or if I have multiple streams, I zip them all under generic names. First I calculate the base name: if I do have a name, then the base name is that name without an extension; I use os.path.splitext() to strip it off. If I don't have a name, I use the supplied index to come up with a generic File <index> name.

  6. Finally I loop over data streams. For each stream I try to get the file extension based on stream type. I already have a dictionary of a few common types and corresponding extensions: zip_ext.

    If I meet an unknown stream I calculate the extension using the stream itself: I convert it into lowercase and remove all characters except letters, numbers, and underscore using a precompiled regular expression from the re module (I precompile it as unsafe to create a RegexObject instance and then call the sub() method.) I then add the extension to the dictionary, so I don't have to recalculate it next time.

Required modules#Back to top

The code requires the following modules:

import cStringIO, filemaker, os.path, re, zipfile

I already mentioned some of them, but here's a round up: cStringIO gives us a virtual file-like object for the resulting file, filemaker provides FileMaker API, os.path helps to extract the file extension, re replaces unsafe characters if we are to construct a file extension from the raw stream type, and zipfile actually creates the Zip file.

Conclusion#Back to top

This example is relatively large, but it gives us a rather useful and robust function:

  • It is flexible: it can collect data from a related table, or from a found set, or just receive the containers as they are, or combine all of the above.

    It does not support reference-only containers, but only because parsing a FileMaker reference is not trivial and would double the code size. But the good news are is that once you wrote code to handle FileMaker file references, you can reuse it in all your modules.

  • It more or less intelligently handles missing file names or multiple streams, including previously unknown stream types.

  • It has adequate error handling.

As a tutorial example it shows a few important techniques:

  • Use of multiple parameters. FileMaker functions support very large number of parameters, so why not to use it?

  • Checking parameter type to choose different actions. This is common in Python, but not in FileMaker (so far); some developers even believe FileMaker has no types at all.

  • Use of FileMaker evaluate() function to get additional data from FileMaker.

  • Use of FileMaker exceptions: note how we use the OutOfRange error to know when to stop calling the GetNthRecord() function.

  • Use of subroutines: the store() function simply adds a container to a Zip file, so you can easily reuse it in other modules.