HMORN Virtual Data Warehouse
The HMO Research Network Virtual Data Warehouse (VDW) is a series of dataset standards and automated processes in place at implementing sites that allow SAS programs written at one HMORN Site to be run against other VDW sites quickly and with a minimum of site-specific customization. It is 'virtual' in the sense that there is no centrally located store of data against which data from all sites can be touched in a single run. It might also be called a federated or distributed data warehouse.
Implementations
Programmers at each site transform treatment and claims data elements from local data systems to a VDW standardized set of variable definitions, names, and codes. The common structure allows for programming code developed at one site to be used at other sites to extract and analyze data for research. Most VDW implementations include data from all of:
- Modern, integrated Electronic Medical Record systems like Epic.
- Professional and Facility Claims Data (for services covered, but not actually provided by the implementing HMO).
- Pre-modern EMR legacy service capture systems.
Subject Areas Covered
The dataset standards cover subject areas commonly held by HMOs and of interest to epidemiological and health services researchers, including:
- Demographics contains date of birth, gender, race and ethnicity.
- Enrollment is based on health plan membership enrollment with indicators of insurance types, benefits, and effective dates of coverage.
- Encounters characterizes outpatient visits and inpatient stays, including the associated diagnosis and procedure codes, type of encounter, provider seen, facility and discharge disposition.
- Procedures consists of all performed procedures including evaluation and management, surgery, laboratory, radiology, and immunization. Currently only performed procedures are captured and include various procedure coding systems (CPT-4, HCPCS, ICD-9-CM, insurance claims Revenue Codes).
- Diagnoses includes dates, diagnosis codes and codes types, primary diagnosis flag and diagnosing provider.
- Providers includes information on the providers such as specialty, age, gender, race and year graduated.
- Cancer/Tumor Registry is based on the Surveillance, Epidemiology and End Results (SEER) program standards as many HMORN sites are SEER sites. The domain consists of detailed stage and grade, date of diagnosis, dates of treatment initiation, and is by far the most complex domain of the VDW.
- Pharmacy consists of pharmacy dispensing and claims and includes date of dispensing, National [...] Code or GPI code (to standardize across sites), therapeutic class, days supply, and amount dispensed. These data are widely used to assess pharmacy-based disease and co-morbidity classification systems.
- Vital Signs are collected at most in-person encounters and include height, weight and blood pressure readings. Tobacco use and type is also included.
- Laboratory Results contains selected types of laboratory results (for example, hemoglobin A1C, creatinine, fasting blood glucose, etc.).
Not an Automated Query System
The process of running VDW programs and collating the results is not automated. Programs are typically distributed via e-mail or by posting them to the CRN Portal. They must be manually downloaded and run by personnel at the sites, and results returned manually. Thus, site personnel retain complete control over their local data.
References
Hornbrook MC, Hart G, Ellis JL, Bachman DJ, Ansell G, Greene SM, Wagner EH, Pardee R, Schmidt MM, Geiger A, Butani AL, Field T, Fouayzi H, Miroshnik I, Liu L, Diseker R, Wells K, Krajenta R, Lamerato L, Neslund Dudas C. Building a virtual cancer research organization. J Natl Cancer Inst Monogr 2005(35):12-25.
Selby JV. Linking automated databases for research in managed care settings. Ann Intern Med 1997 ; 127 (8 Pt 2): 719 – 24.