Archive eXchange Format

Archive eXchange Format (AXF) is an all encompassing container format for generic files which allows them to be stored, transported and preserved on any type of operating system, file system, storage media or technology. Recognizing the limitations in legacy container formats such as Tape ARchive (TAR) and the ever growing need for a long term open preservation format, AXF was designed specifically to address these needs today and into the future. Further, as storage technology also quickly evolves it is the goal of AXF to abstract the underlying technology, file and operating systems to ensure long term portability and accessibility to the files contained in the container.

Recent News
The Archive eXchange Format (AXF) was first officially launched by Front Porch Digital at the National Association of Broadcasters (NAB) tradeshow in Las Vegas on April 10, 2011 as part of their DIVArchive V7.0 Content Storage Management (CSM) solution. During a press conference Front Porch Digital publically committed to deliver their AXF specification and design work back to the community and to the Society of Motion Picture and Television Engineers (SMPTE) AXF committee for standardization.

During the same press conference, Front Porch Digital also announced the launch of openAXF.org a community website for the Archive eXchange Format providing information, details, specifications and a location for the public to participate in AXF development and standardization efforts.

Overview
The Archive eXchange Format (AXF) is based on a file and storage media agnostic encapsulation approach which abstracts the underlying file system, operating system and storage technology making the format open and non-proprietary. The AXF object contains any type, any number and any size of files as part of its payload accompanied by any amount and type of structured or unstructured metadata, checksum and provenance information, full indexing structures and more all in a single, fully self describing, encapsulated package. As the AXF object itself contains includes a complete file system, all of the complexities and limitations of the underlying storage is avoided and the same AXF object can exist on data tape, spinning disk, flash, optical media, or other storage technology now and into the future.

AXF can scale without limits. It can encapsulate any number of component files of any size providing a key advantage over the limitations of legacy container formats as well as some modern operating and file systems. As an entirely self-contained and self-describing format, AXF supports large-scale archive systems as well as simple standalone applications, facilitating encapsulation or wrapping, long term protection and content transport between systems from different vendors who also conform to the AXF specification. Fundamental features such as per-file checksums, per-structure checksums and easy replication across any storage technology help to ensure compatibility with evolving technology and a secure future.

AXF is a IT-centric implementation, supporting any type of file encapsulation including database files, binary executables, documents and image files - not just media assets. AXF offers full support for the well-known Open Archival Information System (OAIS) model as well as key preservationist features such as provenance for both media and objects, inherent GUID/UMID support, geo-location tagging, error detection down to the file/structure level and data-validity spot checking. In addition, because AXF does not rely on current tape storage functionality such as partitioning, it works with legacy as well as leading edge storage technologies such as IBM and HP LTO-5 as well as the Oracle T10000C (formally StorageTek and Sun).

AXF is currently being proposed to industry standards bodies such as the Society of Motion Picture and Television Engineers (SMPTE) with a goal of standardization.

History
In July 2006, Front Porch Digital, lead by Brian Campanotti their CTO, joined a small group of manufacturers (which also included SGL and Masstech Group) to form the Society of Motion Picture and Television Engineers (SMPTE) V16-ARC AHG initiative focused on the development of an open standard for the storage and preservation of assets on any type of storage device or technology. The SMPTE committee chair leading this initiative was Merrill Weiss.

In October 2008, the V16-ARC AHG received status as an official working committee under SMPTE and was renamed the 10E30WG-AXF or the Archive eXchange Format (AXF) working committee. In April 2008, Front Porch Digital remained committed to the SMPTE AXF initiative as the only remaining founding member in active participation. For the next six months, Front Porch Digital continued to drive the specification forward with the assistance of the SMPTE committee chair, unfortunately with very limited involvement from members from the end user or vendor community.

In October 2009, the SMPTE AXF committee ceased all meetings due to limited involvement and resources. At this point, Brian Campanotti formed a small committee within Front Porch Digital consisting of Grégory Pierre and Benoît Goupil to reevaluate the SMPTE work product and determine whether its work could be completed. Because of some fundamental flaws, the committee decided it was necessary to go back and examine the original goals of the SMPTE AXF committee and reinvent the design while maintaining some of its general concepts. This work continued for more than a year.

At the International Broadcasting Convention (IBC) in Amsterdam in September 2010, Merrill Weiss visited several companies and indicated the intent to reform the SMPTE AXF committee to pick up where the work had stopped a year previous. He was successful in gaining sufficient interest to consider reinvigorating the effort including Front Porch Digital. In December 2010, the SMPTE AXF committee restarted its weekly conference calls with little progress on the core AXF specification draft from October 2009. In parallel, Front Porch Digital continued to work on their AXF specification independently of SMPTE since significant progress had been made since the committee hiatus.

In March 2011, SMPTE reorganized their committee structure. The SMPTE 10E30WG-AXF committee previously under the TC-10E Essence parent technical committee was was placed under the TC-31FS File Formats and Systems parent committee. As a result of this reorganization, the official name of the SMPTE AXF committee was changed to TC-31FS30 WG Archive eXchange Format (AXF).

At the National Association of Broadcasters (NAB) conference in Las Vegas in April 2011, Front Porch Digital made a surprise announced regarding their release of AXF support in their DIVArchive V7.0 Content Storage Management (CSM) solution.. During a press conference on April 10th Brian Campanotti announced their AXF specification and design would be contributed back to SMPTE in hopes it will quickly be considered and ratified as a standard. At the same time, Brian Campanotti announced the launch of openAXF.org as a community portal for people interested in tracking and participating in the further development of the AXF specification.

Key Features
Scalability

AXF Objects can scale to any size, encapsulate any number of component files of any size or any type and can span across any number of storage medium devices or technologies

Resiliency

In-built features such as media cataloging and fast index recovery, fully self-describing objects and per file and structure checksums ensure long term accessibility and resiliency

Openness

Support for all storage media types (hard drive, optical, flash, etc) with no reliance on modern data tape features such as partitioning so there is no need to upgrade existing storage infrastructures

Accessibility

Community tools will soon become available to allow for generic and direct access to AXF formatted media and AXF Objects without any reliance on the applications which created them.

IT-Centric

Based on an IT-centric implementation which supports any type of file encapsulation (database files, documents, image files, etc) and is not limited to media assets alone

Universal

Designed to support all media and file types regardless of operating system, file system or platform

Open Archival Information System (OAIS)

AXF includes support for the fundamental concepts outlined in the OAIS reference model which address the needs of archivists and preservationists

Technology
AXF Embedded File System Concept
The AXF specification is based on an embedded file system. AXF offers a translation between any type of generic set of files and logical block positions on any type of storage medium being used with or without its own file system. By encapsulating a related set of files with any type and amount of ancillary metadata (structured or unstructured) into a single container, storage, tracking and management becomes a fairly trivial task. Creating and storing an AXF object on top of any storage medium provides an open and protected way to ensure long term accessibility to the encapsulated content (metadata, files, etc) irrespective of the type or generation of storage medium.

Specifically tuned for performance, AXF overcomes the limitations of other encapsulation formats such as ZIP or TAR which do not support complex file structures with millions (or more) of files nor do they handle large files particularly well. Because of its unique embedded file system approach, AXF does not have any reliance on the storage technology (file marks in the case of data tape for example) which can add significant storage and performance inefficiencies. Although optimized for the storage of large media assets, the AXF format can be applied to any environment where an open and accessible storage encapsulation format is desired for any type of file collection.
AXF Object Overview
Each AXF Object is a fully self-contained, encapsulated collection of files, metadata and any other ancillary information which adds relevancy or value to its contents. AXF is designed to handle a single file encapsulation as easily as it can for hundreds of millions of files. AXF Objects are equivalent regardless of whether they are created on data tape, spinning disk or flash and optical media with or without files systems. This fact makes the creation and handling of AXF Objects on differing media trivial once the specification has been implemented.

Each AXF Object commences with an AXF “Object Header” which is a structure containing descriptive XML metadata describing the actual contents of the AXF Object such as its unique identifier (UMID/GUID), creation date, descriptive information, file tree information, permissions, etc.

Following the AXF Object Header is any number of optional AXF “Generic Metadata” packages. These are self-contained, open metadata containers for applications to include AXF Object specific metadata. This metadata can be structured or unstructured, open or vendor specific, binary or XML and provides a flexible and dynamic space to enrich the depth of the AXF Object and permanently link it to the encapsulated file payload of the AXF Object. There are no constraints or specifications governing the type of metadata, the number of packages or their contents so it is truly an open metadata area. In the case where there is no metadata to store along with the AXF Object, this structure is simply omitted. Clearly, metadata (XML, binary or other) can also be stored in the file payload of the AXF Object as well but their context makes processing difficult for third-party applications during subsequent restore or replication operations so the Generic Metadata approach is recommended.

The next part of the AXF Object is any number of “File Data + File Padding + File Footer” triplets. This represents the actual “File Payload” of the AXF Object and is the actual byte data of the files to be stored within the AXF Object. “File Padding” is used to ensure the alignment of all AXF Object structure containers on the storage medium block boundaries. This is a fundamental feature of the AXF specification. The “File Footer” structure contains the exact size of the file along with an optional file level checksum which is designed to be processed by the application on-the-fly during restore operations to ensure data integrity.

The final portion of an AXF Object is the AXF “Object Footer” which is basically a repeat of the information contained in the AXF Object Header with some additional information captured during the creation of the AXF Object itself such as file checksums, block positions, file permissions, etc. This AXF Object Footer is fundamental to the resiliency of the AXF specification and allows efficient re-indexing of AXF media by foreign systems when the content of the media is not previously known.

Each AXF Object component described above is itself encapsulated in a universal AXF “Structure Container” which provides signature information, structure checksums, classification information, etc. and ensures efficiency during processing with little or no overhead.
Storage Medium Block Alignment
To ensure reliability, resiliency and performance across any storage device, technology or medium and ensure smooth transitions between them, each data structure and element contained within the AXF Object must be aligned on pre-defined block boundaries. Interestingly enough, these block boundaries can be independent of the storage technology or media itself as well as be different for each AXF Object contained on that media.

During each AXF Object creation, copy or movement operation, the controlling application is responsible for ensuring the data is aligned on the block boundary basis defined for the destination storage technology or media. For performance reasons, AXF Object unwrap and rewrap operations should always be performed on-the-fly by the application and as such should introduce little, if any, overhead during transfers. By enforcing this block alignment, AXF Objects can be specifically tuned to the underlying storage optimizing performance and storage efficiency. Even down to the level of tuning this on an AXF Object by AXF Object basis depending on the average, minimum and maximum file sizes contained within the File Payload.

Implementation Details
Linear Data Tape Media without a File System
This represents the most complex of the AXF implementations, but certainly was the fundamental motivation for the development of this specification. Resiliency and recoverability aspects developed as part of the AXF specification guarantee the most robust operation in the linear data tape medium environment but also drive advancements for other storage medium types.

The Linear Data Tape with No File System implementation includes three additional structures which allow AXF Objects to be transparently stored and also add to the recoverability of the overall solution. As described previously, each AXF Object is fully self-contained and self-describing, and similarly these additional structures guarantee the same self-contained and self-describing characteristics for the data tape medium containing any number of AXF Objects.

Three structures introduced to facilitate the AXF specification on linear data tape with no file system. These are the “VOL1 Label”, the “Medium Identifier” and the “Object Index” structures.

The first structure which appears on the medium is an ISO/ANSI standard “VOL1” volume label. This is included for compatibility purposes with other applications, including legacy ones. For well behaved applications, encountering this VOL1 Label with an AXF identifier will prevent any further actions if the application is not designed to specifically read or write AXF formatted media. It will also act as an immediate identifier for applications with AXF compatibility that the medium has been properly prepared for read and write activities.

The “Medium Identifier” is a structure which contains the AXF volume signature and other specific information regarding the actual storage medium itself. The implementation of the Medium Identifier differs depending on the storage medium type depending on whether it is linear or non-linear and whether it includes a file system or not.

The “Object Index” is an optional structure which assists in the recoverability of AXF formatted media. It is simply a collection of AXF Object Footer structures containing the information for all of the AXF Objects contained on the storage medium which appear previously on the linear medium and are still valid and have not been deleted or deprecated by the application. The information contained in this structure is sufficient to recover and reconstruct the entire catalog of all the AXF Objects on the storage medium which the structure references. Subsequent Object Index structures invalidate previous ones in the case of linear storage medium.
Non-Linear Media
Non-linear storage media embodies a large collection of differing technologies including optical media, flash memory or SSD media and fixed and removable hard disk drive based systems. It is generally assumed these technologies include file systems but special fringe case where this is not the case can also be easily handled by the AXF specification.

In general, these media types allow the creation of files, the optional creation of a directory hierarchy and random access to the AXF Objects (as well as random access to the AXF Object package contents) contained on the storage medium.

Limitations may exist in terms of the number of files which can be created in a single path on some file systems as well as limitations with respect to the maximum file size for an individual file. Each of the specific limitations and constraints of the storage medium and the file system should be considered prior to implementing an AXF workflow on any particular storage medium. As AXF Objects can grow in size with little or no limitation, an appropriate storage medium and file system should be chosen. Other characteristics of the storage medium under consideration should also be examined prior to any implementation including read/write lifespan as limitations do exist in current flash technologies.

In this particular implementation, AXF Objects are simply stored as single encapsulated files with or without a folder hierarchy dependant on the application. AXF Objects simply provide a file system per object on top of a file system per storage medium. As a normal practice in large scale environment, each AXF Object is assigned a filename equal to its GUID as specified in the AXF Object Header and suffix of “axf” (case insensitive) to visually indicate the file is a valid AXF Object and allow operating system level application associates where applicable. Of course, applications should not rely on this naming convention as human readable names may be more appropriate. A quick read of the first few bytes of a file will immediately indicate to the application whether it is a valid AXF Object or not.
Linear Tape File System (LTFS)
AXF and LTFS are not mutually exclusive and for all intents and purposes, they can coexist. AXF can simply layer on top of LTFS formatted medium where it is treated in a fashion similar to that of a non-linear medium with a file system. The LTFS implementation takes care of maintaining references to the AXF Objects stored on the data tape medium and abstract the application vendor from the specifics of block based storage. AXF brings the significant advantage of file encapsulation to the LTFS approach and helps remove inherent limitations in its design.

See Also
* Content Storage Management
* IBC
* NAB
* OAIS
* SMPTE

 
< Prev   Next >