
Data Quality and Record Linkage Techniques
by Herzog, Thomas N.; Scheuren, Fritz J.; Winkler, William E.Rent Textbook
Rent Digital
New Textbook
We're Sorry
Sold Out
Used Textbook
We're Sorry
Sold Out
How Marketplace Works:
- This item is offered by an independent seller and not shipped from our warehouse
- Item details like edition and cover design may differ from our description; see seller's comments before ordering.
- Sellers much confirm and ship within two business days; otherwise, the order will be cancelled and refunded.
- Marketplace purchases cannot be returned to eCampus.com. Contact the seller directly for inquiries; if no response within two days, contact customer service.
- Additional shipping costs apply to Marketplace purchases. Review shipping costs at checkout.
Summary
Author Biography
Table of Contents
Preface | p. v |
About the Authors | p. xiii |
Introduction | p. 1 |
Audience and Objective | p. 1 |
Scope | p. 1 |
Structure | p. 2 |
Data Quality: What It is, Why It is Important, and How to Achieve It | |
What Is Data Quality and Why Should We Care? | p. 7 |
When Are Data of High Quality? | p. 7 |
Why Care About Data Quality? | p. 10 |
How Do You Obtain High-Quality Data? | p. 11 |
Practical Tips | p. 13 |
Where Are We Now? | p. 13 |
Examples of Entities Using Data to their Advantage/Disadvantage | p. 17 |
Data Quality as a Competitive Advantage | p. 17 |
Data Quality Problems and their Consequences | p. 20 |
How Many People Really Live to 100 and Beyond? Views from the United States, Canada, and the United Kingdom | p. 25 |
Disabled Airplane Pilots - A Successful Application of Record Linkage | p. 26 |
Completeness and Accuracy of a Billing Database: Why It Is Important to the Bottom Line | p. 26 |
Where Are We Now? | p. 27 |
Properties of Data Quality and Metrics for Measuring It | p. 29 |
Desirable Properties of Databases/Lists | p. 29 |
Examples of Merging Two or More Lists and the Issues that May Arise | p. 31 |
Metrics Used when Merging Lists | p. 33 |
Where Are We Now? | p. 35 |
Basic Data Quality Tools | p. 37 |
Data Elements | p. 37 |
Requirements Document | p. 38 |
A Dictionary of Tests | p. 39 |
Deterministic Tests | p. 40 |
Probabilistic Tests | p. 44 |
Exploratory Data Analysis Techniques | p. 44 |
Minimizing Processing Errors | p. 46 |
Practical Tips | p. 46 |
Where Are We Now? | p. 48 |
Specialized Tools for Database Improvement | |
Mathematical Preliminaries for Specialized Data Quality Techniques | p. 51 |
Conditional Independence | p. 51 |
Statistical Paradigms | p. 53 |
Capture-Recapture Procedures and Applications | p. 54 |
Automatic Editing and Imputation of Sample Survey Data | p. 61 |
Introduction | p. 61 |
Early Editing Efforts | p. 63 |
Fellegi-Holt Model for Editing | p. 64 |
Practical Tips | p. 65 |
Imputation | p. 66 |
Constructing a Unified Edit/Imputation Model | p. 71 |
Implicit Edits - A Key Construct of Editing Software | p. 73 |
Editing Software | p. 75 |
Is Automatic Editing Taking Up Too Much Time and Money? | p. 78 |
Selective Editing | p. 79 |
Tips on Automatic Editing and Imputation | p. 79 |
Where Are We Now? | p. 80 |
Record Linkage - Methodology | p. 81 |
Introduction | p. 81 |
Why Did Analysts Begin Linking Records? | p. 82 |
Deterministic Record Linkage | p. 82 |
Probabilistic Record Linkage - A Frequentist Perspective | p. 83 |
Probabilistic Record Linkage - A Bayesian Perspective | p. 91 |
Where Are We Now? | p. 92 |
Estimating the Parameters of the Fellegi-Sunter Record Linkage Model | p. 93 |
Basic Estimation of Parameters Under Simple Agreement/Disagreement Patterns | p. 93 |
Parameter Estimates Obtained via Frequency-Based Matching | p. 94 |
Parameter Estimates Obtained Using Data from Current Files | p. 96 |
Parameter Estimates Obtained via the EM Algorithm | p. 97 |
Advantages and Disadvantages of Using the EM Algorithm to Estimate m- and u-probabilities | p. 101 |
General Parameter Estimation Using the EM Algorithm | p. 103 |
Where Are We Now? | p. 106 |
Standardization and Parsing | p. 107 |
Obtaining and Understanding Computer Files | p. 109 |
Standardization of Terms | p. 110 |
Parsing of Fields | p. 111 |
Where Are We Now? | p. 114 |
Phonetic Coding Systems for Names | p. 115 |
Soundex System of Names | p. 115 |
NYSIIS Phonetic Decoder | p. 119 |
Where Are We Now? | p. 121 |
Blocking | p. 123 |
Independence of Blocking Strategies | p. 124 |
Blocking Variables | p. 125 |
Using Blocking Strategies to Identify Duplicate List Entries | p. 126 |
Using Blocking Strategies to Match Records Between Two Sample Surveys | p. 128 |
Estimating the Number of Matches Missed | p. 130 |
Where Are We Now? | p. 130 |
String Comparator Metrics for Typographical Error | p. 131 |
Jaro String Comparator Metric for Typographical Error | p. 131 |
Adjusting the Matching Weight for the Jaro String Comparator | p. 133 |
Winkler String Comparator Metric for Typographical Error | p. 133 |
Adjusting the Weights for the Winkler Comparator Metric | p. 134 |
Where are We Now? | p. 135 |
Record Linkage Case Studies | |
Duplicate FHA Single-Family Mortgage Records: A Case Study of Data Problems, Consequences, and Corrective Steps | p. 139 |
Introduction | p. 139 |
FHA Case Numbers on Single-Family Mortgages | p. 141 |
Duplicate Mortgage Records | p. 141 |
Mortgage Records with an Incorrect Termination Status | p. 145 |
Estimating the Number of Duplicate Mortgage Records | p. 148 |
Record Linkage Case Studies in the Medical, Biomedical, and Highway Safety Areas | p. 151 |
Biomedical and Genetic Research Studies | p. 151 |
Who goes to a Chiropractor? | p. 153 |
National Master Patient Index | p. 154 |
Provider Access to Immunization Register Securely (PAiRS) System | p. 155 |
Studies Required by the Intermodal Surface Transportation Efficiency Act of 1991 | p. 156 |
Crash Outcome Data Evaluation System | p. 157 |
Constructing List Frames and Administrative Lists | p. 159 |
National Address Register of Residences in Canada | p. 160 |
USDA List Frame of Farms in the United States | p. 162 |
List Frame Development for the US Census of Agriculture | p. 165 |
Post-enumeration Studies of US Decennial Census | p. 166 |
Social Security and Related Topics | p. 169 |
Hidden Multiple Issuance of Social Security Numbers | p. 169 |
How Social Security Stops Benefit Payments after Death | p. 173 |
CPS-IRS-SSA Exact Match File | p. 175 |
Record Linkage and Terrorism | p. 177 |
Other Topics | |
Confidentiality: Maximizing Access to Micro-data while Protecting Privacy | p. 181 |
Importance of High Quality of Data in the Original File | p. 182 |
Documenting Public-use Files | p. 183 |
Checking Re-identifiability | p. 183 |
Elementary Masking Methods and Statistical Agencies | p. 186 |
Protecting Confidentiality of Medical Data | p. 193 |
More-advanced Masking Methods - Synthetic Datasets | p. 195 |
Where Are We Now? | p. 198 |
Review of Record Linkage Software | p. 201 |
Government | p. 201 |
Commercial | p. 202 |
Checklist for Evaluating Record Linkage Software | p. 203 |
Summary Chapter | p. 209 |
Bibliography | p. 211 |
Index | p. 221 |
Table of Contents provided by Ingram. All Rights Reserved. |
An electronic version of this book is available through VitalSource.
This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.
By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.
Digital License
You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.
More details can be found here.
A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.
Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.
Please view the compatibility matrix prior to purchase.