|
Big structure is any form of data structure, including data relationships and context, that can be combined to enable dataset interoperability and understanding. Big structure has been cited as helping to integrate data in such areas as cognitive science, eHealth, feature recognition, grid computing, and semantic computing. Jiawei Han has argued that Big data needs Big structure. Michael Bergman has written much on Big structure in relation to data interoperability and matching and mapping techniques. Data differences are reconciled through the transformation of data into common forms. These reconciliation tasks are part of data wrangling, which also includes data cleaning and vetting. Semantics is therefore a central consideration in the assembly of Big structure. Role in data interoperability The ability of data structures to inform interoperability is, in part, a function of the structural complexity of the source structure. Even simple lists can contribute structural understandings. One way to leverage this structure is to map simpler structures into more complex ones. In semantics, there is a problem of symbol grounding. In the conceptual realm, symbol grounding means that when we use a term or phrase we are referring to the same thing; that is, the referent is the same. In the data value realm, symbol grounding means that when we refer to an object or a number — say, the number 4.1 — we are also referring to the same metric. Object names for set members have the same challenges of ambiguous semantics as do all other things referred to by language. The in Big data or the dimensions of semantic heterogeneity are explicit recognitions of the symbol grounding challenge. Context and groundings are ways to reduce ambiguity in what is measured and recorded. Thus, Big structure has an implied hierarchy that places reference structures as the foundations to the groundings. All other structures, with various degrees of structural complexity, are stacked in order of structural complexity upon this foundation. Existing information structures of various types can play a role in establishing reference structures. As reference structures grow, they can further extend the scope of interoperability and the ability to reconcile more datasets. Mapping and technologies Use of Big structure and a reduction of effort required in data wrangling can benefit from an integrative software engineering approach, akin to computer-aided software engineering. Particular classes of tools that support Big structure integration include build automation, parsers, performance analzers, revision control systems, unit testers, data modeling tools, mappers (ontologies and data), data transformers, and a variety of semantic technologies, especially in NLP. Since semantic reconciliations are some of the most difficult of computing challenges, it is not surprising that Big structure tooling depends on many approaches from statistical models to artificial intelligence, particularly in pattern recognition and machine learning. Applications Big structure is most broadly applicable to the area of data interoperability, with specific applications in the semantic Web, information retrieval, knowledge management, master data management, or any area that requires two or more datasets to be related to one another.
|
|
|