Big data for tackling non-communicable diseases and health inequities

22 November 2020

This year the global Covid-19 pandemic has highlighted just how essential it is for policy makers, researchers, politicians and others to have robust health data and research evidence for making vital decisions.

Many research studies, particularly in health, rely on big data sets. Statistics NZ hosts a research database called the Integrated Data Infrastructure (or IDI), which links health data at an individual level to other data including benefits, census, justice, migration, and tax data. This data is anonymised for use in research.

New Zealand has an impressive wealth of linked health data. The IDI is a world class resource, increasingly being used by New Zealand researchers to inform health improvements but so far little attention has been given to errors that may occur and the potential impact of those errors on study findings.

Improving our world class IDI data for health research

Dr Andrea Teng
Dr Andrea Teng

A new research programme funded by Healthier Lives will investigate ways to improve the IDI for health research by improving data linkage and incorporating primary care lab test data.

The research programme, led by Dr Andrea Teng from the University of Otago, Wellington, has been awarded $500,000 over three years starting in January 2021.

The project will be co-led with two other principal investigators, also based at the University of Otago, Wellington: Dr Sheree Gibb and Dr Melissa McLeod (Ngāi Tahu).

The first part of the study will examine the extent of data linkage error and linkage bias in the IDI on measures of ethnic inequalities in cardiovascular disease, cancer and diabetes.

The second part of the study will examine the feasibility and value of linking community laboratory information to wider health data, using Helicobacter pylori infection as a case study.

Healthier Lives Director Professor Jim Mann notes that the new project is foundational for further integrated data research in New Zealand and will inform improved measurement and prevention of non-communicable diseases and health inequities.

How do linking errors affect ethnic differences in NCD rates?

Dr Sheree Gibb
Dr Sheree Gibb

The first part of the study, led by Dr Sheree Gibb, will investigate data linkage bias for measures of non-communicable disease (NCD) inequities and specifically whether correcting for linkage errors would change our estimates of inequities in NCD rates.

Linkage of health to other data within the IDI is achieved through probabilistic linking based on names, date of birth and sex to identify links between records that belong to the same person.

Sometimes, the matching can be incomplete for a variety of reasons: it may change over time or contain errors, or perhaps does not uniquely identify every person. This means records may be linked incorrectly or records belonging to same person may fail to be linked. If these errors are more common in some groups of people than others, bias can be introduced.

“Linkage errors have been observed to be correlated with ethnicity which means studies of health inequality can be particularly problematic,” says Dr Teng.

Helicobacter pylori as a model for integrated data research

Primary care data is the major gap in NZ’s health and integrated data system and is particularly valuable to health research. It is important, for example, in understanding the full clinical pathway, including cardiovascular risk assessment and mental health interactions, and also in examining potential service failure or inequitable service provision.

A second part of the study will examine the feasibility and value of linking community laboratory information to the wider health data system, by using the testing and treatment of Helicobacter pylori (H. pylori) infection as a model for future primary care integrated data research. Major differences (up to six-fold) exist in stomach cancer death rates in NZ by ethnicity and the major contributor is H. pylori infection in the stomach.

The recently produced New Zealand Cancer Action Plan specifies that a strategy should be developed to address H. pylori infection in priority populations, possibly including a coordinated programme to detect and manage the infection.

Trials have shown that one-third of stomach cancer cases may be preventable by testing and treating asymptomatic people. The value of such an approach in NZ would be informed by an analysis of whether the current level of H. pylori testing and treatment for people with stomach symptoms is meeting the needs of all ethnic groups.

The study will examine who is tested for H. pylori, who is treated, and who is retested and treated; using ten years of laboratory and pharmacy dispensing data from the four Northern DHBs in NZ (1.9 million people).

Outputs and reducing bias

This project builds on earlier Healthier Lives’ research through the Capitalising on New Zealand’s health data project which used big and linked data to investigate non-communicable diseases.  It found that linkage error and a lack of primary care data are two limitations on the efficient use of IDI data for health research. The project team produced a report to inform health researchers about data linkage bias within the IDI, which is available as a condensed accessible guide: Linkage error and linkage bias: a guide for IDI users (

The new project aims to recommend data quality improvements to reduce bias, as well as produce statistical code for investigating linkage bias in future health research.

“We look forward to working closely with many groups, including Statistics NZ and the Virtual Health Information Network (VHIN), who are key partners in disseminating the research findings,” says Dr Teng.

Related links

vertical themes

View our 2019-24 Research Strategy

He Pikinga Waiora Research Findings Brief

Sign up to our Newsletter

"*" indicates required fields

Scroll to Top