DSIR model

DSIR stands for "Distributional Semantics based Information Retrieval", and it's a retrieval model based on the Vector space model. In this model, the contents of retrievable objects, such as words, phrases, sentences, documents, are represented in a single form by vectors of several dimensions. These vectors are created from a co-occurrence matrix computed on a collection of text documents being indexed. Semantic proximity among objects is then simply interpreted in regards to the geometric proximity between corresponding vectors in the multi-dimensional space, called the "meaning space".

Definitions

Assuming that there exists a correlation between meaning of a word and its observable distributional characteristics within particular contexts in a given language, these distributional characteristics can either be "occurrences" of that word itself, or its "co-occurrences" with the other words appearing within the documents.

The characterization of word contexts is made on the basis of "co-occurrence statistic" which is a source of distributional information easily extracted from a document collection. The co-occurrence statistic of a word is the number of times that word co-occurs with one of its neighbours within a pre-defined boundary, the "distributional environment", such as sentences, paragraphs, sections, whole documents, or windows of k words.

We can then build a co-occurrence matrix M for each word i and each neighbour j where M_i,j is its co-occurrence statistic. For a word i its vector representation v will have the coordinates v_i = (M_i, 1,M_i, 2,...,M_i, j).

A document vector V_d will have a vector representation where the coordinates are all its words vectors. A query has a similar representation to the Vector space model and a similarity function such as the cosine is used to compare the document with the query.

References

{{ cite | title=Parallel DSIR Text Indexing System: Using Multiple Master/Slave Concept | url=http://www.springerlink.com/content/ebja71gh68u4q5b0/ | author=A. Rungsawang | coauthors=P. Laohawee , A. Tangpong | year=2000 }}

🪦 Wikipedia History

7 monthsage

1editors

1edits

Archive Provenance

Created: February 1, 2010

Deleted: September 17, 2010

Article size: 2.6 KB

Technical Metadata

Wikipedia page ID: 25949612

Metadata captured: May 8, 2026 10:18 PM

Metadata updated: May 8, 2026 10:18 PM

Subject Tags

Information retrieval

Why Deleted

PROD

No evidence of notability, all sources from the same author, no third party sources, tagged as technical since February with no improvements

by Courcelles

Expired PROD, concern was: No evidence of notability, all sources from the same author, no third party sources, tagged as technical since February with no improvements

Sources

citeseerx.ist.psu.edu/...

www.springerlink.com/...

Archive Inventory

View stored source record counts

Revision rows stored: 0

Outgoing links stored: 2

External links stored: 2

Templates stored: 3

Talk exports stored: 0

AfD exports stored: 0

Raw API payloads stored: 0

Image records stored: 0

View full source metadata

Outgoing Wikipedia links (2)

CosineVector space model

External links (2)

citeseerx.ist.psu.edu/...

www.springerlink.com/...

Templates (3)

CiteReflistTechnical

DSIR model

Definitions

References

See Also

30 Digits

Clairlib

Information Retrieval Specialist Group

Term Count Model

GLIMPSE