Geek post: NSKeyedArchiver files – what are they, and how can I use them?

If you have spent any time investigating iOS or OSX devices you will probably have come across files (usually property lists) making reference to NSKeyedArchiver. Certainly, in my experience working with iOS, these files can often contain really interesting data (chat data for example) but the data can appear at first glance unstructured and difficult to work with.

In this blog post I aim to explain where these files come from, how they are structured and how you can get the most out of them.

Remember, remember…

NSKeyedArchiver is a class in the Mac programming API which is designed to serialise data. Data serialisation is the process through which data belonging to a program currently held in memory is written to a file on disk, so that it may be reloaded into memory at some point in the future.

This process can be achieved in a number of ways depending on the requirements of the programmer; however, NSKeyedArchiver provides a convenient way for entire “objects” in memory to be serialised, so it is a widely-used method.

Let’s take a moment to consider what is meant by an “object” in terms of programming (don’t worry; I’m not going to get too programmery). Many modern programming languages allow for (or are entirely based upon) the Object Oriented Programming paradigm. Put very generally this means that they are based around data structures (objects) which contain both data fields and built-in functionality (usually known as “methods”).

So let’s imagine that we were writing the code to define a “Person” object: we might define data fields such as: “Name”; “Age”; “Height”; and “Weight” – but we might also want to give it functionality. For example: “Speak” and “Wave”.

Obviously, a “Car” object would be different from a “Person” – it would have data fields like: “Make”; “Model”; and “Fuel-Type”. It might also have a data field for “Driver” which would hold a reference to a “Person” object.

A “Road” object might have a data fields for: “Name”; “Length”; and “Speed-Limit” along with a data field containing a collection of “Car” objects (each having a reference to a “Person” object in their “Driver” data field).

Similar (and often far more complicated) data structures might be represented in a chat application: a “Message-List” object containing a collection of “Message” objects containing fields for “Sent-Time”, “Message-Text” and “Sender”, which itself contains a reference to a “Contact” object which contains fields for “Nickname”, “Email-Address” and so on.

It’s these kinds of data structures that NSKeyedArchiver converts from objects in memory and stores as a file which can subsequently be loaded back into the memory to rebuild the structure.

So what must NSKeyedArchiver store in order to achieve this? Well, there are two requirements: it has to store details of the type of object it’s serialising; and it has to store the data held in the objects (and the correct structure of the data).

NSKeyedArchiver property lists

NSKeyedArchiver serialises the object data into a binary property list file, the basic layout of which is always the same:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>$archiver</key>
        <string>NSKeyedArchiver</string>
        <key>$objects</key>
        <array>
            <null/>
            <string>Alex</string>
            <dict>
                <key>Name</key>
                <dict>
                    <key>CF$UID</key>
                    <integer>1</integer>
                </dict>
            </dict>
        </array>
        <key>$top</key>
        <dict>
        <key>root</key>
        <dict>
            <key>CF$UID</key>
            <integer>2</integer>
        </dict>
        </dict>
        <key>$version</key>
        <integer>100000</integer>
    </dict>
</plist>

Listing 1: Overview of an NSKeyedArchiver XML property list file.

In Listing 1 we have an example of an NSKeyedArchiver property list (converted to XML for ease of reading).  At the top level, every NSKeyedArchiver file is made up of a dictionary with four keys.

Two of the keys simply provide metadata about the file: the “$archiver” key should always be followed by a string giving the name of the archiver used to create this file, which should obviously always be “NSKeyedArchiver” and the “$version” key should be followed by an integer giving the version of the archiver (100000 appears to be the only valid value).

The other two keys (“$objects” and “$top”) contain the data that has been serialised and its structure.

The “$objects” key is followed by an array containing all the pieces of data involved in the serialisation, but stored flat with little or no structure. Each of these pieces of data can be understood as being enumerated starting from zero. Within this array of objects may be data which contains the structure shown in Listing 2:
<dict>
        <key>CF$UID</key>
        <integer>0</integer>
</dict>

Listing 2: an example of the CF$UID data type.

The CF$UID data type in Listing 2 is a dictionary with a single key (“CF$UID”) which is followed by an integer number (this layout is what you will see when the property list is represented in XML; in the raw binary format the “UID” data type is a separate entity which doesn’t require the dictionary structure).

These data types represent a reference to another entity in the “$objects” array. The number of the CF$UID gives the position of the array. Consider the snippet shown in Listing 3:
    <key>$objects</key>
    <array>
        <null/>
        <string>Alex</string>
        <dict>
            <key>Name</key>
            <dict>
                <key>CF$UID</key>
                <integer>1</integer>
            </dict>
        </dict>
    </array>

Listing 3: an example of a “$objects” array.

Listing 3 shows an “$objects” array containing three pieces of data. Indexing them starting from 0 we have:

  1. A null
  2. A string containing the value “Alex”
  3. A dictionary

The dictionary at index 2 contains a single key: “Name”. The following value is a “CF$UID” data type referencing index 1 in the “$objects” array so we could consider the data to be equivalent to Listing 4:
    <key>$objects</key>
    <array>
        <null/>
        <string>Alex</string>
        <dict>
            <key>Name</key>
            <string>Alex</string>
        </dict>
    </array>

Listing 4: the “$objects” array from Listing 3 “unpacked”.

This example is very simplistic; in reality the structure revealed by unpacking the object array can be extremely deeply nested with objects containing references to objects containing references to objects…and so on.

The observant among you may be thinking “this seems like a very inefficient way to represent the data”, and for this example you’d certainly be right! However, in most real-life cases the complex data held in these files contains many repeating values which, when arranged this way, only have to be stored once but can be referenced in the “$objects” array multiple times.

The “$top” key is our entry point to the data, so it is the data held at this key that represents the total structure of the object that has been serialised. This key will be followed by a single dictionary which again contains a single key “root”. The “root” key will be followed by a single CF$UID data type which will be a reference the top level object in the “$objects” array.

Returning to the example in Listing 1 the “root” is referencing the object at index 2 in the objects array. So expanding this, our complete data structure is shown in Listing 5:
    <key>$top</key>
    <dict>
        <key>root</key>
        <dict>
            <key>Name</key>
            <string>Alex</string>
        </dict>
    </dict>

Listing 5: Expanded “$top” object, showing complete data structure.

A sense of identity

So far we have only seen examples of basic data stored in this structure where the type of data is implicit but in most files you are likely to encounter you will see additional data relating to the type of the objects being stored.

Listing 6 shows an unpacked “$top” object from a “CHATS2.plist” file produced by the iOS application “PingChat”:
    <key>$top</key>
    <dict>
    <key>root</key>
    <dict>
        <key>$class</key>
        <dict>
            <key>$classes</key>
            <array>
                <string>NSMutableDictionary</string>
                <string>NSDictionary</string>
                <string>NSObject</string>
            </array>
            <key>$classname</key>
            <string>NSMutableDictionary</string>
        </dict>
        <key>NS.keys</key>
        <array>
            <string>pingchat</string>
        </array>
        <key>NS.objects</key>
        <array>
            <dict>
                <key>$class</key>
                <dict>
                    <key>$classes</key>
                    <array>
                        <string>NSMutableArray</string>
                        <string>NSArray</string>
                        <string>NSObject</string>
                    </array>
                    <key>$classname</key>
                    <string>NSMutableArray</string>
                </dict>
                <key>NS.objects</key>
                <array>
                    <dict>
                        <key>$class</key>
                        <dict>
                            <key>$classes</key>
                            <array>
                                <string>BubbleItem</string>
                                <string>NSObject</string>
                            </array>
                            <key>$classname</key>
                            <string>BubbleItem</string>
                        </dict>
                        <key>state</key>
                        <integer>1</integer>
                        <key>image</key>
                        <string>$null</string>
                        <key>msg</key>
                        <string>Yo</string>
                        <key>author</key>
                        <string>testingtesting</string>
                        <key>time</key>
                        <dict>
                            <key>$class</key>
                            <dict>
                                <key>$classes</key>
                                <array>
                                    <string>NSDate</string>
                                    <string>NSObject</string>
                                </array>
                                <key>$classname</key>
                                <string>NSDate</string>
                            </dict>
                            <key>NS.time</key>
                            <real>307828812.649871</real>
                        </dict>
                    </dict>
                </array>
            </dict>
        </array>
    </dict>
    </dict>

Listing 6: Expanded “$top” object taken from a PingChat “CHATS2.plist” file.

In Listing 6 we can begin to see how complex the serialised data can become (and this is a simpler example). However, if you keep your cool and realise that the data is still well-structured it is possible to parse the data into something more meaningful.

One new data structure we encounter for the first time here is the “$class” structure. “$class” isn’t part of the data itself, but rather information about which type of object has been serialised. This information is obviously important when the program that serialised the data comes to deserialise it, but we can also use it to give us clues about the meaning of the data; consider the snippet in Listing 7:
<dict>
    <key>$class</key>
    <dict>
        <key>$classes</key>
        <array>
            <string>NSMutableArray</string>
            <string>NSArray</string>
            <string>NSObject</string>
        </array>
        <key>$classname</key>
        <string>NSMutableArray</string>
    </dict>
    <key>NS.objects</key>
    <array>
        <dict>
            <key>$class</key>
            <dict>
                <key>$classes</key>
                <array>
                    <string>BubbleItem</string>
                    <string>NSObject</string>
                </array>
                <key>$classname</key>
                <string>BubbleItem</string>
            </dict>
            <key>state</key>
            <integer>1</integer>
            <key>image</key>
            <string>$null</string>
            <key>msg</key>
            <string>Yo</string>
            <key>author</key>
            <string>testingtesting</string>
            <key>time</key>
            <dict>
                <key>$class</key>
                <dict>
                    <key>$classes</key>
                    <array>
                        <string>NSDate</string>
                        <string>NSObject</string>
                    </array>
                    <key>$classname</key>
                    <string>NSDate</string>
                </dict>
                <key>NS.time</key>
                <real>307828812.649871</real>
            </dict>
        </dict>
    </array>
</dict>

Listing 7: Snippet of a single object in a PingChat “CHATS2.plist” file.

Let’s take a look at the objects involved here and what the “$class” sections can tell us about the data held. The “$class” structure takes the form of a dictionary containing two keys. The “$classname” section is fairly straightforward; it simply gives us the name of the type of object we’re dealing with.

So in the case of the first “$class” structure encountered, we find that the object is of type “NSMutableArray”, which a quick Google search tells us is a “Modifiable array of objects” – so the data held in this object is going to take the form of an array or list.

The other key in the “$class” structure is “$classes”; this is a little more subtle and requires a little more explanation of one of the key concepts in most object-oriented programming languages: inheritance.

Think back to the explanation of the “Person” object. A “Person” object had the fields: “Name”; “Age”; “Height”; and “Weight” and the functionality to “Speak” and “Wave”. Now imagine that we wanted to create a new type of object: “DigitalForensicAnalyst”. We would want this new type of object to have some specialised functionality: “Image”; “Analyse”, and so on.

However, a “DigitalForensicAnalyst” is a “Person” too – they have a name, they can speak and wave. Now, programmers are stereotyped as being lazy (because they are) so it is unlikely that after spending all that time writing and debugging the code to represent a “Person” that they are going to duplicate all that hard work when it comes to creating a “DigitalForensicAnalyst”.

Instead they would have the “DigitalForensicAnalyst” object inherit the functionality from “Person”; this means that only the new functionality of “Image” and “Analyse” need be created from scratch, all of the other functionality comes free thanks to this inheritance.

Coming back to the “$classes” key, this will be followed by an array containing the names of all of the types that this object inherits from. So in the case of our first “$class” structure we can see that the “NSMutableArray” inherits functionality from both “NSArray” and “NSObject” which may give us further  hints about what data might be held.

So, this “NSMutableArray” is going to contain a collection of other objects, and looking at the rest of this object’s structure we find a key “NS.Objects” which is followed by an array containing just that collection. This array has only one item, another object containing a “$class” definition, so let’s take a look. This time our “$classname” is particularly useful: “BubbleItem” – making reference to the “speech bubbles” displayed on screen; and indeed we find message details (author, message text, timestamp) in the object’s data fields.

There’s got to be an easier way…

NSKeyedArchiver files can contain really key evidence but even a data-sadist like me is going to lose their mind converting all of these files into a usable format by hand; so how can we speed things up?

Well our PIP tool has a “Unpack NSKeyedArchiver” feature which reveals the object structure so that you can use XPaths to parse the file (either by using one of many already in the included XPath library or by writing your own).

Also, if you are a Python fan (if not, why not?) I have updated our ccl_bplist python module, which you can get here, with a “deserialise_NsKeyedArchiver” function to unpack the “$top” object.

I hope you found this blog post useful. As always, if you have any questions or suggestions, leave a comment below or contact the R&D team at research@ccl-forensics.com.

Alex Caithness, Lazy Data-Sadist, CCL-Forensics

Special thanks to Arun Prasannan for assisting the BlogKeeper by rendering the code readable.