5S Framework for Digital Libraries
Streams, Structures, Spaces, Scenarios and Societies (5S), is a unified formal theory for Digital Libraries (DLs). With 5S, digital library abstractions such as digital objects, metadata, collections, services, etc., can be rigorously and usefully described through compositions of basic and higher level mathematical objects. 5S enables high-level specification of DLs using five complementary dimensions, including: the kinds of multimedia information the DL supports (Stream Model); how that information is structured and organized (Structural Model); different multidimensional space representations as well as presentational properties and operations of DL components (Spatial Model); the services and behavior of the DL (Scenario Model); and the different communities of actors and users/managers of services that act together to carry out the DL behavior (Societal Model). 5SL specifications can be fed into Digital library generators, which can make use of component pools, to generate prototypes and implementations of the digital libraries. The table below summarizes the 5S models in terms of their primitives, underlying formalisms, and objectives.
Models |
Primitives |
Formalisms |
Objectives |
|---|---|---|---|
Stream Model |
Text; video; audio; software program |
Sequences; types |
Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data |
Structural Model |
Collection, catalog; hypertext; document; metadata; organizational tools |
Graphs; nodes; links; labels; hierarchies |
Specifies organizational aspects of the DL content |
Spatial Model |
User interface; index; retrieval model |
Sets; operations; vector space; measure space; probability space |
Defines logical and presentational views of several DL components |
Scenarios Model |
Service; event; condition; action |
Sequence diagrams; collaboration diagrams |
Details the behavior of DL services |
Societies Model |
Community; managers; actors; classes; relationships; attributes; operations |
Object-oriented modeling constructs; design patterns |
Defines managers; responsible for running DL services; actors, that use those services; and relationships among them |
Stream
Streams are sequences of elements of arbitrary types (e.g., bits, characters, pixels, frames, images, text, etc.). In this sense, they can model both static and dynamic content. The first includes, for example, textual material, while the later might be, for example, a presentation of a digital video, or a sequence of time and positional data (e.g., from a GPS) for a moving object. [1]
A dynamic stream can represent an information flow, a sequence of messages encoded by the sender and communicated using a transmission channel possibly distorted with noise, to a receiver whose goal is to reconstruct the sender’s messages and interpret message semantics. Dynamic streams are thus important for representing whatever communications take place in the digital library. Examples of dynamic streams include video-on demand delivered to a viewer, a timed sequence of news sent to a client, a timed sequence of frames that allows the assembly of a virtual reality scenario, etc. Typically, a dynamic stream is understood through its temporal nature. A dynamic stream then can be interpreted as a finite sequence of clock times and associated values that can be used to define stream algebra, allowing operations on diverse kinds of multimedia stream. In the static interpretation, the temporal nature is generally ignored or is irrelevant, and a stream corresponds to some information content that is interpreted as a sequence of basic elements, often of the same type. A popular type of static stream according to this view is text (sequence of characters). The type of the stream defines its semantics and area of application. For example, any text representation can be seen as a stream of characters, so that text documents, such as scientific articles and books, can be considered as structured streams. [3]
Activities related to the Networked Digital Library of Theses and Dissertations (NDLTD) involve a variety of streams. At the simplest level are streams of characters for text, and streams of pixels for images. Some students have included audio files, or digital video, with their Electronic Theses and Dissertations (ETDs). These present challenges regarding quality of service if played back in real time, or alternatively storage problems if downloaded and then played back from a local system. This suggests that students probably should store both types of representation. The other class of streams related to NDLTD is that of network protocols. [3]
Those involve transmissions of serialized streams over the network. Federated search, harvesting, and hybrid services, using a number of protocols, like Dienst, Z39.50, the Harvest system, and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), have been developed in the context of NDLTD
Examples on 5S stream part from the Digital Library (DL) Curriculum Project.
• Module: 5-a: DL architectures: Streams: there are specifications about what type of digital objects can be stored or provided by the DL architecture/models.
• Module 7-a: Indexing and Searching: Stream: The web crawler finds URL‟s in a webpage, which can be considered a stream of data, which it follows to find more streams recursively.
• Module 3-b: Digitization: Stream: Digitization creates a stream of data entering the digital library.
Structure
A structure can be defined as the way in which parts of a whole are arranged or organized. In digital libraries, structures can represent hypertexts, taxonomies, system connections, user relationships, and containment – to cite a few. Books, for example can be structured logically into chapters, sections, subsections, and paragraphs; or physically into cover, pages, line groups (paragraphs), and lines . Structuring orients readers within a document’s information.
Markup languages (e.g., SGML, XML, HTML) have been the primary form of exposing the internal structure of digital documents for retrieval and/or presentation purposes . Relational and object-oriented databases impose strict structures on data, typically using tables or graphs as units of structuring .
With the increase in heterogeneity of material continually being added to digital libraries, we find that much of this material is called “semi-structured” or “unstructured”. These terms refer to data that may have some structure, where the structure is not as rigid, regular, explicit, or complete as the structure used by structured documents or traditional database management systems . Query languages and algorithms can extract structure from these data . Although most of those efforts have a “data-centric” view of semi-structured data, works with a more “document-centric view” have emerged . In general, humans and natural language processing systems can expend considerable effort to unlock the interwoven structures found in texts at syntactic, semantic, pragmatic, and discourse levels.
For example, in the context of a Web crawler, the structure may refer to the structure of the Web, which is in the form of a directed graph called the Web Graph in which the nodes are Web Pages and the edges are the hyperlinks connecting them.
Space
A space defines logical and presentation views of several components. It is a set of objects together with operations on those objects that obey certain constraints. These operations on objects in the set when combined distinguish spaces from structures and streams. It is often applicable when you are unable to define a section of a DL with any of the other S’s. Despite the generality of this definition, spaces are extremely important mathematical constructs. The operations and constraints associated with a space define its properties. For example, in mathematics, affine, linear, metric, and topological spaces define the basis for algebra and analysis . In the information retrieval discipline, Salton and Lesk formulated an algebraic theory based on vector spaces and implemented it in the SMART system . “Feature spaces” are sometimes used with image and document collections and are suitable for clustering or probabilistic retrieval . Spaces also can be defined by a regular language applied to a collection of documents. Document spaces are a key concept in many digital libraries. In short, spaces are sets with operations that obey certain constraints.
Human understanding can be described using conceptual spaces. Multimedia systems must represent real as well as synthetic spaces in one or several dimensions, limited by some metric or presentational space (windows, views, projections) and transformed to other spaces to facilitate processing. Many of the synthetic spaces represented in virtual reality systems try to emulate physical spaces. Digital libraries may model traditional libraries by using virtual reality spaces or environments . Also, spaces for computer-supported cooperative work provide a context for virtual meetings and collaborations .
Again, spaces are distinguished by the operations on their elements. Digital libraries can use many types of spaces for indexing, visualizing, and other services they perform. The most prominent of these for digital libraries are measurable spaces, measure spaces, probability spaces, vector spaces, and topological spaces.
Below are some of the examples for the space component of the 5S framework in the Digital Libraries Curriculum Project.
- Module 7-b: Reference Services: Spaces: Reference is provided in an information space, physical or online.
- Module 7-g: Personalization: Spaces: Such as mappings between different spaces (e.g., from vector space models to probabilistic ones) for interoperability or reduction of dimensionality for providing better search services (e.g., with Latent Semantic Indexing (LSI))
Scenario
One important type of scenario is a story that describes possible ways to use a system to accomplish some function that a user desires. Scenarios are useful as part of the process of designing information systems. Scenarios can be used to describe external system behavior from the user’s point of view ; provide guidelines to build a cost-effective prototype . Developers can quickly grasp the potentials and complexities of digital libraries through scenarios. Scenarios tell what happens to the streams, in the spaces, and through the structures. Taken together the scenarios describe services, activities, tasks, and those ultimately specify the functionality of a digital library. For example, user scenarios describe one or more users engaged in some meaningful activity with an existing or envisioned system. This approach has been used as a design model for hypermedia applications . Human information needs, and the processes of satisfying them in the context of digital libraries, are well suited to description with scenarios, including these key types: fact-finding, learning, gathering, and exploring . Additionally, scenarios can aid understanding of how digital libraries affect organizations and societies, and how challenges to support social needs relate to underlying assumptions of digital libraries . Scenarios consist of sequences of events or actions that modify states of a computation in order to accomplish a functional requirement.
The concepts of state and event are fundamental to understanding scenarios. Broadly speaking, a state is determined by what contents are in specified locations, as, for example, in a computer memory, disk storage, visualization, or the real world. The nature of the values and state locations related to contents in a system are granularity-dependent and their formal definitions and interpretations are out of the scope of this article; the reader is referred to for a lengthy discussion. An event denotes a transition or change between states, for example, executing a command in a program. Scenarios specify sequences of events, which involve actions that modify states of a computation and influence the occurrence and outcome of future events. Dataflow and workflow in digital libraries can be modeled using scenarios.
Below are some of the examples for the scenario component of the 5S framework in the Digital Libraries Curriculum Project.
- Module 7-b: Reference Services: Scenarios: Reference is an info seeking technique employed by users in specific situations / contexts / anomalous states of knowledge.
- Module 7-g: Personalization: Scenarios: Like scenario re-design, by introducing new functions and interaction techniques, e.g., navigation by context, or by specializing existing ones, e.g., changes in syntax and parameters for searching
Society
A society is a set of entities and the relationships between them. The entities include humans as well as hardware and software components, which either use or support digital library services. Societal relationships make connections between and among the entities and activities.
Examples of specific human societies in digital libraries include patrons, authors, publishers, editors, maintainers, developers, and the library staff. There also are societies of learners and teachers. In a human society, people have roles, purposes, and relationships. Societies follow certain rules and their members play different roles — participants, managers, leaders, contributors, or users. Members of societies have activities and relationships. During their activities, society members often create information artifacts — art, history, images, data — that can be managed by the library. Societies are holistic — substantially more than the sums of their constituents and the relationships between them. Electronic members of digital library societies, i.e., hardware and software components, are normally engaged in supporting and managing services used by humans.
A society is the highest-level component of a digital library, which exists to serve the information needs of its societies and to describe the context of its use. Digital libraries are used for collecting, preserving, and sharing information artifacts between society members. Cognitive models for information retrieval , for example, focus on a user’s information-seeking behavior (i.e., formation, nature, and properties of a user’s information need) and on the ways in which information retrieval systems are used in operational environments.
Several societal issues arise when we consider them in the digital library context. These include policies for information use, reuse, privacy, ownership, intellectual property rights, access management, security, etc.. Therefore, societal governance (law and its enforcement) is a fundamental concern in digital libraries. Language barriers are also an essential concern in information systems and internationalization of online materials is an important issue in digital libraries, given their globally distributed nature .
Below are some of the examples for the society component of the 5S framework in the Digital Libraries Curriculum Project.
- Module 7-b: Societies: Reference is provided by a community of answerers (usually librarians and/or subject experts) and used by communities of users.
- Module 7-g: Personalization: Society: Where all other personalization dimensions would be organized or targeted for particular societies of users, e.g., incorporation and adaptation of specialized services for librarians, professors, and students in a digital library of theses and dissertations