The ServerDocument Object Model

The ServerDocument object model enables you to read and write all the deployment information and cached data stored inside a customized document. This section goes through all of the "data" properties and methods in this object model, describing what they do, their purpose, and why they look the way they do. Chapter 20 describes the "deployment" portions of the object model.

Before we begin, note that the ServerDocument object model is what we like to call an "enough rope" object model. Because this object model enables you to modify all the information about the customization, it is quite possible to create documents with inconsistent cached data or nonsensical deployment information. The VSTO runtime engine does attempt to detect malformed customization information and throw the appropriate exceptions; but still, exercise caution when using this object model.

ServerDocument Class Constructors

The ServerDocument class has seven constructors, but five of them are mere "syntactic sugars" for these two:

ServerDocument(byte[] bytes, string fileType)
ServerDocument(string documentPath, bool onClient, FileAccess access)

These correspond to the two primary ServerDocument scenarios: Either you want to read/edit a document in memory, or on disk. Note that these two scenarios cannot be mixed; if you start off by opening a file on disk, you cannot treat it as an array of bytes in memory and vice versa.

The in-memory version of the constructor takes a string that indicates the type of the file. Because all you are giving it is the bytes of the file, as opposed to the name of the file, the constructor does not know whether this is an .XLS, .XLT, .DOC, .DOT, or .XML. Pass in one of those strings to indicate what kind of document this is. If you pass in .XML, the document you pass must be in the WordprocessingML (WordML) format supported by Word. ServerDocument cannot read documents saved in the Excel XML format.

The byte array passed in must be an image of a customized document. The ServerDocument object model does not support in-memory manipulation of not-yet-customized documents.

The on-disk version takes the document path, from which it can deduce the file type. The onClient flag indicates whether your code is presently running in a client scenario (such as the document viewer sample above) or a server scenario (such as the customized data island generation example at the beginning of this chapter).

Why does the ServerDocument care whether it is running on a client or a server? Most of the time it does not care. However, there is one important scenario: What if you pass in a document that does not yet have a customization?

In that case, the ServerDocument object attempts to add customization information to the uncustomized document. Adding the customization information requires the ServerDocument class to start up Word or Excel, load the document into the application, and manipulate it using the Office object model. Because doing that is a very bad idea in server scenarios, the ServerDocument throws an exception if given an uncustomized document on the server.

The file access parameter can be FileAccess.Read or FileAccess.ReadWrite. If it is "read-only," attempts to change the document will fail. (Opening an uncustomized document on the client in read-only mode is not a very good idea; the attempt to customize the document will fail.)

The other "in-memory" constructor is provided for convenience; it simply reads the entire stream into a byte array for you:

ServerDocument(Stream stream, string fileType)

Finally, the three remaining "on-disk" constructors act just like the three-argument constructor above with the onClient flag defaulting to false if omitted, and the file access defaulting to ReadWrite if omitted:

ServerDocument(string documentPath, bool onClient)
ServerDocument(string documentPath, FileAccess access)
ServerDocument(string documentPath)

Saving and Closing Documents

The ServerDocument object has two important methods and one property used to shut down a document:

void Save()
byte[] Document { get; }
void Close()

If you opened the ServerDocument object with an on-disk document, the Save method writes the changes you have made to the application manifest, cached data manifest, or data island to disk. If you opened the document using a byte array or stream, the changes are saved into a memory buffer that you can access with the Document property. Note that it is an error to read the Document property if the file was opened on disk.

It is a good programming practice to explicitly close the ServerDocument object when you have finished with it. Large byte arrays and file locks are both potentially expensive resources that will not be reclaimed by the operating system until the object is closed (or, equivalently, disposed by either the garbage collector or an explicit call to IDisposable.Dispose).

Server-side users of ServerDocument are cautioned to be particularly careful when opening on-disk documents for read-write access. It is a bad idea to have multiple writers (or a single writer and one or more readers) trying to access the same file at the same time. The ServerDocument class will do its best in this situation; it will make "shadow copy" backups of the file so that readers can continue to read the file without interference while writers write. However, making shadow copies of large files can prove time-consuming.

If you do find yourself in this situation, consider doing what we did in the first example in this chapter; read the file into memory, and edit it in memory rather than on disk. As long as the on-disk version is only read, it will never need to be shadow-copied and runs no risk of multiple writers overwriting each other's changes.

Static Helper Methods

Developers typically want to perform a few common scenarios with the ServerDocument object model; the class exposes some handy static helper methods so that you do not have to write the boring "boilerplate" code. All of these scenarios work only with "on-disk" files, not with "in-memory" files. The following static methods are associated with ServerDocument:

static string AddCustomization(
  string documentPath,
  string assemblyPath,
  string deploymentManifestPath,
  string applicationVersion,
  bool makePathRelative)
  out string[] nonpublicCachedDataMembers)
static void RemoveCustomization(string documentPath)
static bool IsCustomized(string documentPath)
static bool IsCacheEnabled(string documentPath)

AddCustomization

AddCustomization takes an uncustomized document and adds customization information to it. It creates a new application manifest and cached data manifest. If given an already customized document, the customization information is destroyed and replaced with the new information. This allows you to create new customized documents on a machine without Visual Studio; you could create the customization assemblies on a development box, and then apply the customizations to documents on a different machine.

AddCustomization should only be called on client machines, never on servers, because it always starts up Word or Excel to embed the customization information in the uncustomized document.

The document and assembly paths are required; the deployment manifest path may be null or empty if you do not want to use a deployment manifest to manage updating your customization.

The application version string must be a standard version string of the form "1.2.3.4". Note that this is the version number of the customization itself, not the version number of the assembly. (However, it might be wise to use the version number of the assembly as the version number of your customized document application.)

If the makePathRelative flag is set to true, the assembly location written into the customization information will be relative to the document location. For instance, if the document location is a UNC path such as \\accounting\documents\budget.doc, and the assembly location is \\accounting\documents\dlls\budget.dll, the assembly location written into the document will be dlls\budget.dll, not the full path. Otherwise, if makePathRelative is false, the assembly location is written exactly as it is passed in.

The AddCustomization method loads the assembly and scans it for document/worksheet classes that contain members marked with the Cached attribute so that it can emit information into the cached data manifest indicating that these members need to be filled when the customization starts up for the first time. Because the VSTO runtime will be unable to fill in nonpublic members of these classes, the AddCustomization method returns the names of such members to help you catch this mistake early.

RemoveCustomization

RemoveCustomization removes all customization information from a document, including all the cached data in the data island. It also starts up Word/Excel, so do not call it on a server. Calling RemoveCustomization on an uncustomized document results in an invalid operation exception.

IsCustomized and IsCacheEnabled

IsCustomized and IsCacheEnabled are similar but subtly different because of a somewhat obscure scenario. Suppose you have a customized document that contains cached data in the data island, and you use the ServerDocument object model to remove all information about what document/worksheet classes need to be started up. In this odd scenario, the document will not run any customization code when it starts up, and therefore there is no way for the document to access the data island at runtime. Essentially, the document has become an uncustomized document with no code behind it, but all the data is still sitting in the data island. The VSTO designers anticipated that someone might want to remove information about the code while keeping the data island intact for later extraction via the ServerDocument object model.

IsCustomized returns TRue if the document is customized and will attempt to run code when it starts up. IsCacheEnabled returns TRue if the document is customized at all, and therefore has a data island, regardless of whether the customization information says what classes to start up when the document is loaded. (Note that IsCacheEnabled says nothing about whether the data island actually contains any data, just whether the document supports caching.)

Cached Data Objects, Methods, and Properties

As you saw in our handy utility above, a customized document's data island contains a small XML document called the cached data manifest, which describes the classes and properties in the cache (or, if the document is being run for the first time, the properties that need to be filled). The cached data is organized hierarchically; the manifest consists of a collection of view class elements, each of which contains a collection of items corresponding to cached members of the class. For example, here is a cached data manifest that has one cached member of one view class. The cached data member contains a typed DataSet:

<cdm:cachedDataManifest cdm:revision="1">
  <cdm:view cdm:viewId="ExcelCached.Sheet1">
    <cdm:dataInstance cdm:dataId="NorthwindDataSet"
       cdm:dataType="ExcelCached.NorthwindDataSet,
       ExcelCached, Version=1.0.1854.30463, Culture=neutral,
       PublicKeyToken=null" />
  </cdm:view>
</cdm:cachedDataManifest>

Having a collection of collections is somewhat more complex than just having a collection of cached items. The cached data manifest was designed this way to avoid the ambiguity of having two host item classes (such as Sheet1 and Sheet2) each with a cached property named the same thing. Because each item is fully qualified by its class, there is no possibility of name collisions.

The actual serialized data is stored in the data island, not in the cached data manifest. However, in the object model it is more convenient to associate each data instance in the cached data manifest with its serialized state.

The Cached Data Object Model

To get at the cached data manifest and any serialized data in the data island, the place to start is the CachedData property of the ServerDocument class. The CachedData object returns the CachedDataHostItemCollection, which contains a CachedDataHostItem for each host item in your customized document. A CachedDataHostItem is a collection of CachedDataItem objects that correspond to each class member variable that has been marked with the Cached attribute. Figure 18-3 shows an object model diagram for the objects returned for the example in Figure 18-1.

Figure 18-3. The cached data object model for the example in Figure 18-1.

[View full size image]

To get to the CachedData object, use the ServerDocument object's CachedData property:

public CachedData CachedData { get; }

There are no constructors for any of the types we will be discussing. The CachedData class has four handy helper methods (Clear, FromXml, ToXml, and ClearData) and a collection of CachedDataHostItem:

void Clear()
void FromXml(string cachedDataManifest)
string ToXml()
void ClearData()
CachedDataHostItemCollection HostItems { get; }

Like the application manifest, the Clear method throws away all information in the cached data manifest, the FromXml method clears the manifest and repopulates it from the XML state, and the ToXml method serializes the manifest as an XML string.

The ClearData method throws away all information in the data island, but leaves all the entries in the cached data manifest. When the document is started up in the client, all the corresponding members will be marked as needing to be filled.

The CachedDataHostItem Collection

The HostItems collection is a straightforward extension of CollectionBase that provides a simple strongly typed collection of CachedDataHostItem objects. (It is called "host items" because these always correspond to items provided by the hosting application, such as Sheet1, Sheet2, or ThisDocument.)

CachedDataHostItem Add(string id)
bool Contains(string id)
int IndexOf(CachedDataHostItem item)
void Remove(CachedDataHostItem item)
void Remove(string id)
CachedDataHostItem this[string id] {get;}
CachedDataHostItem this[int index] {get;}
void CopyTo(CachedDataHostItem[] items, int index)
void Insert(int index, CachedDataHostItem value)

The id argument corresponds to the namespace-qualified name of the host item class. Be careful when creating new items to ensure that the class identifier is fully qualified.

The CachedDataHostItem Object

Each CachedDataHostItem object corresponds to a host item in your document and is a collection of CachedDataItem objects that correspond to cached members of the customized host item class:

CachedDataItem Add(string dataId, string dataType)
bool Contains(string dataId)
void Remove(CachedDataItem data)
int IndexOf(CachedDataItem data)
void Remove(string dataId)
CachedDataItem this[int index] {get;}
CachedDataItem this[string dataId] {get;}
void CopyTo(CachedDataItem[] items, int index)
void Insert(int index, CachedDataItem item)

You might wonder why it is that you must specify the type of the property when adding a new element via the Add method. If you have a host item class like this, surely the name of the class and property is sufficient to deduce the type, right?

class Sheet1 {
  [Cached] public NorthwindDataSet myData;

In this case, it would be sufficient to deduce the compile-time type, but it would not be if the compile-time type were object. When the document is run in the client and the cached members are deserialized and populated, the deserialization code in the VSTO runtime needs to know whether the runtime type of the member is a dataset, datatable, or other serializable type.

The CachedDataItem Object

The identifier of a CachedDataItem is the name of the property or field on the host item class that was marked with the Cached attribute. The CachedDataItem itself exposes the type and identifier properties:

string DataType { get; set; }
string Id { get; set; }

As well as two other interesting properties and a helper method:

string Xml { get; set; }
string Schema { get; set; }
void SerializeDataInstance(object value)

Setting the Xml and Schema properties correctly can be slightly tricky; the SerializeDataInstance method takes an object and sets the Xml and Schema properties for you. However, if you do not have an instance of the object on the server and want to manipulate just the serialized XML strings, you must understand the rules for how to set these properties correctly.

The first thing to note is that the Schema property is ignored if the DataType is not a DataTable or DataSet (or subclass thereof). If you are serializing out another type via XML serialization, there is no schema, so just leave it blank. On the other hand, if you are writing out a DataSet or DataTable, you must specify the schema.

Second, the data island may contain DataSets and DataTables in either in regular "raw" XML form or in "diffgram" form. The regular format that you are probably used to seeing XML-serialized DataSets in looks something like this:

<DataSet1 xmlns="http://www.foocorp.org/schemas/customers.xsd">
  <dbo_Customers>
    <Name>Maria Anders</Name>
    <Address>Obere Str. 57</Address>
  </dbo_Customers>
  <dbo_Customers>
    <Name>Ana Trujillo</Name>
    <Address>Avda. de la ConstituciF3n 2222</Address>
  </dbo_Customers>

And so on. A similar DataSet in diffgram form looks different:

<diffgr:diffgram>
  <NorthwindDataSet
    xmlns="http://www.foocorp.org/schemas/NorthwindDataSet.xsd">
    <Customers diffgr:id="Customers1" msdata:rowOrder="0">
    <CustomerID>ALFKI</CustomerID>
    <CompanyName>Alfreds Futterkiste</CompanyName>
    <ContactName>Maria Anders</ContactName>

You can store cached DataSets and DataTables by setting the Xml property to either format. By default the VSTO runtime saves them in diffgram format. Why? Because the diffgram format not only captures the current state of the DataSet or DataTable, but also records how the object has changed because it was filled in by the data adapter. That means that when the object's data is poured back into the database, the adapter can update only the rows that have changed instead of having to update all of them.

Be Careful

One final caution about using the ServerDocument object model to manipulate the cache: The cache should be "all or nothing." Either the cached data manifest should have no data items with serialized XML, or they should all have XML. The VSTO runtime does not currently support scenarios where some cached data items need to be filled and others do not. If when the client runtime starts up it detects that the cache is filled inconsistently, it will assume that the data island is corrupted and start fresh, refilling everything. If you need to remove some cached data from a document, remove the entire data item from the host item collection; do not just set the XML property to an empty string.