5. What are the uncertainties regarding this study?

5.1 Uncertainties of the WHO answers, guidelines, and risk assessments
5.2 Consideration of publication bias in the review
5.3 Consistency of epidemiological and toxicological evidence in defining thresholds
- 5.3.1 General statement
- 5.3.2 Ozone
- 5.3.3 Particulate matter
5.4 Contribution of different sources to PM-related health effects
5.5 Impact of methods of analysis used in epidemiological studies
5.6 Possible regional characteristics modifying the effects of air pollution

5.1 Uncertainties of the WHO answers, guidelines, and risk assessments

The source document for this Digest states:

How could these influence the conclusions for policy-makers?

Currently, no answer can be given to Question 2 ["What are the uncertainties of the WHO answers...?"] in absolute terms that would cover all different aspects of the problem. Uncertainties linked to gaps in knowledge exist and will exist in the future. We are aware of the uncertainties and we tried to take them into account to our best knowledge when deriving our conclusions on the questions we received from CAFE. To address the uncertainties in a systematic way, the project followed the advice provided by the WHO guideline document “Evaluation and use of epidemiological evidence for environmental health risk assessment” (WHO, 2000b), and in particular its recommendations in section 4.2. In particular, the process of the present project:

developed and followed the protocol for the review;

identified and assessed validity of the relevant studies;

conducted systematic overview of evidence from multiple studies, including formal meta- analysis;

based its conclusions on the critical scientific judgment of a wide range of top scientists working in various disciplines related to the assessment of impacts of air pollution on health.

The working group felt that an attempt to quantify the uncertainties linked to all answers to the first round of CAFE questions was – if at all – not feasible within this project. However, the European Commission provided some additional information relating to issues where an in depth assessment of underlying uncertainties was felt necessary.

Proper treatment of uncertainties is an important part of all risk management. Current uncertainties related to the scientific evidence should however not be taken as a cause for not acting if the potential risks are high and measures to reduce the risks are at reasonable cost (precautionary principle). As part of the WHO review the main uncertainties should be identified and assessed either in quantitative or qualitative terms. A number of issues are related to the uncertainties such as the following.

– It appears possible that studies that have found no associations between particulate matter concentrations and mortality or morbidity have not been published. How has the expert group tackled the issue of a potential publication bias?

– In some areas there appears to be evidence pointing in different directions thus an indication of the certainty of the conclusions would be desirable. An example would be the issue of threshold for effect due to exposure to ozone where some epi-studies have not been able to identify a threshold whereas thresholds have been found in toxicological studies. The uncertainty on the existence or non-existence of a threshold for ozone may influence the guidelines and also the setting of EC air quality standards. Hence, it is important to have an understanding how the strength of evidence and the uncertainty influences the guidelines.

– The WHO first report put a clear emphasis on the health effects of small PM originating from combustion sources. Can these relationships be quantified giving the source contribution to health effects? How may uncertainties in the source apportionment and the particle characterization (size and composition) influence the quantitative assessment of pinpointing a source as being the contributor to health effects? Also, is there information and associated uncertainty on the health effects of specific secondary particle mass, such as the particle mass fraction due to agriculture activities leading to ammonia containing particles.

– In the review of the guidelines a systematic assessment of the uncertainties (such as confidence intervals) of the relative risks would give a better understanding of the degree of uncertainty. This item should also include the uncertainty in the application of different models (including GAM).

– The assessment of the risks builds on a concentration response relationship based on a number of studies from the United States and Europe. However, different parts of Europe have different mixes of air pollution due to differences in sources, climate and so forth. To what extent may uncertainties of the applicability of these relations influence the risk assessment due to particles and other priority pollutants?

The working group focused its discussions on uncertainties on the subjects highlighted in this statement. These individual aspects of uncertainties are discussed in the following parts of the answer to this question on uncertainty.

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.1

5.2 Consideration of publication bias in the review

The source document for this Digest states:

Explanation provided by the European Commission to this question:
It appears possible that studies that have found no associations between particulate matter concentrations and mortality or morbidity have not been published. How has the expert group tackled the issue of a potential publication bias?"

Answer:

Publication bias occurs when the publication process is influenced by the size of the effect or direction of results. The bias is usually towards statistical significant and larger effects. It can be detected and adjusted for using statistical techniques. Bias may also occur when literature is selectively ascertained and cited.

This review used a systematic approach to identify all short-term exposure studies, but it did not formally investigate publication bias. The reviewers were aware that evidence of publication bias has been identified in meta-analyses of single city time series studies, but when estimates were corrected for this bias, significant positive associations remained. Furthermore, the multi-city time series studies, which have published results from all participating cities and are free from publication bias, have reported significant positive associations.

Because of the size and experience of the review group and referees, it is unlikely that any important published long-term study has been missed. Formal assessment of a possible publication bias has not been undertaken. Every effort was made to systematically ascertain long-term exposure studies.

Rationale:

At a meta-analytic level, i.e. when the collectivity of studies is considered, various sources of bias are possible. For example, the studies reviewed may be unrepresentative of the totality of those that have been carried out. This might occur because: 1) not all published studies have been ascertained for review; 2) because published studies are not representative of all work done because some studies remain unpublished; or 3) because the reviewer draws biased conclusions from the published work.

In the WHO review, several methods were used to reduce bias. The time-series studies reviewed were obtained by systematic methods of literature searching and we ensured that all relevant published studies in any language were obtained, following the WHO guideline document on the evaluation and use of epidemiological evidence for environmental health risk assessment (WHO Working Group, 2000). Studies with grossly inadequate methods were excluded. “Grey” literature (unpublished reports) was not obtained and authors of published work were not asked for additional results.

One is left however with the question posed by CAFE, which is whether there is an indication that published evidence is different from the totality of research findings. This is a common if not universal problem in our research culture (Sterling 1959; Mahoney, 1977; Simes & Berlin, 1988; Begg & Berlin, 1988; Begg & Berlin, 1989; Dickersin, 1997). It arises because there are more rewards for publishing positive or at least statistically significant findings, and journals are likewise biased towards “interesting” rather than “negative” findings. In parenthesis, this is interesting in view of the prevalent Popperian view of the scientific process as one of falsification rather than confirmation.

In the case of population studies there are particular reasons why publication bias might occur. One is that the data are relatively cheap to obtain and analyse, so that there may be less determination to publish “uninteresting” findings. The other is that each study can generate a large number of results for various outcomes, pollutants and lags and there is quite possibly bias in the process of choosing among them for inclusion in a paper.

In the field of air pollution epidemiology, the question of publication bias has now begun to be formally addressed (Anderson et al., 2002; Peacock et al., 2002). An additional paper addressing this issue is currently under consideration by an epidemiological journal. This cannot be made available at the present time.

Recognizing the potential for bias was one aspect of the rationale for the Air Pollution and Health: a European Approach 2 (APHEA 2) study, which had a prior commitment to publish and an a priori approach to choice of lag and analytic strategy. Analyses for all cities were done by one centre for one group of outcome, blind to the identity of the cities being analysed. Thus, this prospective multicity study attempted to avoid analytic bias, lag selection bias and publication bias.

There are methods of detecting publication bias but it should be noted that these are not without problems. One method is the “funnel plot” in which estimates are plotted against their standard error or precision of the estimate. If there is no publication bias, the resulting scatter should be symmetrically shaped like a funnel (Light & Pillemer, 1984). Evidence of asymmetry in the funnel plot can be tested by several statistical techniques (Egger et al., 1997; Begg & Mazumdar, 1994). There may be other reasons apart from publication bias for an asymmetrical funnel plot, so while the presence of symmetry probably excludes any important degree of publication bias, the presence of asymmetry, while suggestive of publication bias does not prove it.

An example of a funnel plot showing asymmetry is shown in Figure 1 for black smoke and daily all-cause mortality. There is clear asymmetry in the funnel plot. The formal test of bias is highly statistically significant. In contrast when a similar plot (Figure 2) is done for the 17 PM₁₀ and daily mortality studies reviewed in the WHO Air Quality Guidelines for Europe (WHO, 2000), there was no asymmetry and the test for asymmetry was non significant (p<0.51).

As a final example, a funnel plot and test for publication bias for PM_2.5 and daily mortality in North American studies (Figure 3) is presented. This shows some evidence of some publication bias in the studies with lower power. It is not very strong and the formal test was not significant. The summary estimate was not affected, being heavily weighted by the large studies towards the left of the axis.

It is important to distinguish two different implications of publication bias. The first is for science and hazard detection. Publication bias could lead to a false conclusion being drawn as to the association between air pollution and a health outcome i.e. that there is an association when in fact there is none. In the case of black smoke and daily mortality, correction for bias using the trim and fill method reduced the estimate of effect from 0.6% to 0.5% increase in mortality for a 10 unit increase in pollution. The trim and fill technique replaces the “missing” studies and re- calculates the estimate (Duval & Tweedie, 2000). However the adjusted estimate remained significant (0.5%; 95% CI: 0.3–0.6), suggesting that an association remains after allowing for publication bias in this way. More important are the results of the multicity studies (National Morbidity, Mortality, and Air Pollution Digest, NMMAPS from the United States of America and APHEA) which are free from evidence of bias and provide significant, though somewhat lower results than those from single city studies.

The other implication of publication bias is inflation of the real effect. This would clearly have implications for health impact assessment, but does not in itself affect the conclusions in the WHO review. For health impact assessment it would be necessary to recognize the possibility of publication bias and adjust for it where possible.

Bias due to preferential selection of models with positive result
Another related issue that may result in distortion of exposure-response coefficients and be revealed in funnel plots is preferential lag selection for positive effects. Often, air pollution time series studies investigate several “lags”, i.e. delays between exposure and effect. If investigators report the most significant and/or largest effect estimate in a positive direction, effect estimates as published in the literature may be inflated. If the most significant effects in either direction are reported, bias does not occur. In principle, this is not an issue in planned multicentre studies which use predefined lags. The NMMAPS and APHEA studies found significant exposure- response relationships using such a planned approach (Samet et al., 2000; Katsouyanni et al., 2001). The problem with the latter approach is that the best fitting models may not be chosen. This issue can be resolved by using all of the lags to estimate the effects associated with a distributed lag.

Bias due to use of single day rather than cumulative lags
On the other hand it is important to recognize that the use of single day lags may result in underestimation of exposure-response relationship because air pollution may exert an effect over longer periods of time. An analysis addressing the effect of using different lag structures suggested that indeed, multi-day exposures were associated with larger effect estimates than single-day exposures (Zanobetti et al., 2000; Schwartz, 2000). As shown by these authors, using single day lags can easily result in underestimation of effect estimates by a factor of two. However, it was also shown that the lag-structure might be different for different health endpoints and might vary between being immediate or cumulative over several lags. Therefore, pre-selected mis-specified lags might result in valid tests, but may underestimate the effects. Also, the recent work on “harvesting” (as discussed more fully in our answers to the previous set of CAFE questions) has suggested that estimates of air pollution effects in time series studies may increase even further when taking into account longer averaging times for exposure.

Bias due to measurement error
Measurement error in exposure often leads to underestimation of effects of exposure on health. A recent analysis has suggested that sizeable underestimation of exposure-response coefficients may occur in time series studies of air pollution and mortality for this reason (Zeger et al., 2000). However bias away from the null is possible when statistical models contain multiple possibly correlated pollutants (Zeger et al., 2000).

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.2

Fig. 1: Funnel plot of black smoke and "daily all cause mortality" in 47 studies. This shows an asymmetrical distribution suggestive of publication bias

Fig. 2: Funnel plot of studies of PM₁₀ and daily mortality used in the WHO (2000). There is no evidence of bias in the test or the formal plot

Fig. 3: Funnel plot of PM_2.5 and daily mortality in North American studies. There is moderate evidence of some bias

5.3 Consistency of epidemiological and toxicological evidence in defining thresholds

- 5.3.1 General statement
- 5.3.2 Ozone
- 5.3.3 Particulate matter

The source document for this Digest states:

Explanation provided by the European Commission to this question:
In some areas there appears to be evidence pointing in different directions thus an indication of the certainty of the conclusions would be desirable. An example would be the issue of threshold for effect due to exposure to ozone where some epi-studies have not been able to identify a threshold whereas thresholds have been found in toxicological studies. The issue of thresholds could be reassessed for different health endpoints.

5.3.1 General statement

The source document for this Digest states:

Multiple factors determine whether a threshold is seen and the level at which it can occur. Exposure-response curves depend on the age and gender of the subjects, their health status, their level of exercise (ventilation) and, especially the health effect selected. For highly uniform population groups, with a specific exposure pattern, a full range of concentrations, and a specific health outcome, one could identify a specific threshold. However, when there are different exposure-response curves for different groups, thresholds are harder to discern in population studies, and may ultimately disappear. Therefore, the evidence coming from the epidemiological and toxicological studies is not contradictory.

Rationale:

This section contains a description of the determinants of thresholds and tends to demonstrate why they are sometimes evident and at other times are difficult or impossible to detect. As summarized in the second sentence of the general statement, it is true that thresholds have sometimes been delineated in clinical studies of healthy human subjects to ozone when changes in pulmonary function or bronchoalveolar lavage constituents have been selected as endpoints. It is also true that, in contrast, in epidemiologic studies where death or hospital admissions have been used in non-uniform populations, no threshold levels were identified. However, this is likely to be a consequence of different experimental designs and does not reflect inherent contradictions between the studies.

In brief, when the multiple factors controlling the health outcome measure are controlled and uniformity is achieved, thresholds will be evident. For example, human or animal dose-response curves are most likely to exhibit a threshold when:

the subjects are genetically similar

the exposure is controlled

the exposure is to a single pollutant

the subjects are healthy, have no infection or pre-existing disease

the subjects have similar ventilation rates

the subjects have the same diet

there are minimal lifestyle differences (smoking, obesity)

gender and age are controlled.

Mathematical analyses have shown that if one sums many individual exposure response curves, all of which may have thresholds, then the threshold will gradually disappear and a more linear response will take its place. In other words, (1) as the characteristics of the animal or human subjects being studied become more varied and (2) as the exposure patterns become more varied and complex and (3) as we add various susceptible groups (children or the elderly or individuals with pre-existing cardiopulmonary disease), thresholds may be harder to discern and may ultimately disappear.

It should also be pointed out that the location of a threshold described in terms of levels depends on the health outcome selected. Higher exposure levels might be needed to produce mortality. Lower levels will be needed to achieve significant changes in respiratory symptoms, respiratory function, or hospital admissions and still lower concentrations will need to find the threshold for modest inflammatory changes (increased protein content or increased numbers of neutrophils in bronchoalveolar lavage (BAL)).

We would not assert that there is no possible threshold for any pollutant related health effect. What epidemiologic studies show, however, is that when complex populations of humans are studied, it is often not possible to identify a threshold at the range of concentrations currently being studied. Thus, in many published studies, authors feel confident in asserting that no threshold was apparent at current levels. This is not to say that no threshold exists at any level.

Toxicology and epidemiology rely on the concept that there is a dose/response relationship. This relationship can be described in uniform or more varied groups of animals and humans and with well-characterized or poorly characterized exposures. However, it is important to realize that there are susceptible individuals who show different dose-response slopes (see also answer to the question on specific population groups in this report). For example, if a condition exists in the population in the absence of environmental exposure, such as cardiovascular disease, and the pollutant of interest exacerbates or contributes to the mechanisms of disease that yield an outcome, such as sudden cardiac failure, then the concept of threshold is meaningless at the population level. In contrast, thresholds are often observed in animal studies because of the tight regulation of the exposure and of the exposed population (inbred mouse strains). However, when toxicology studies include “susceptible” animals we observe different dose/response slopes.

Studies of mechanisms of toxicity also reveal plausible underlying processes that could alter the dose/response relationship. These include adaptation of parts of the pulmonary response to ongoing oxidative stress-producing pollutants that could transiently alter the dose response curve.

Epidemiologists have grappled with the issue of thresholds. For example, the relationship between daily deaths and airborne particles in 10 United States cities has been analysed (Schwartz, 2000). Schwartz points out that there is variability in particle composition (e.g. summer/winter differences) as well as variability in the causes of death. The most frequent causes include myocardial infarctions, arrhythmias, and pneumonia. Each outcome may have a unique dose–response. Moreover, exposed humans vary widely in their susceptibility. Thus, the totality of this heterogeneity makes certain that a threshold will be difficult or impossible to define at the particle concentrations experienced.

The existence of a “threshold” implies a concentration-risk relationship with no effect until a “threshold” concentration is crossed; then risk rises. In analyzing epidemiological data to determine the existence of a threshold, comparison should be made between a statistical model incorporating a threshold and one not incorporating a threshold. Model fit can then be assessed, both descriptively and more formally. Alternatively, methods can be used to search for “break points” or inflections in the concentration-risk relationship. The data can also be restricted to lower and lower levels with repeated analyses to determine if an effect persists. Few epidemiological studies have been analysed using these approaches.

Most epidemiological data sets are analysed with linear models, which inherently assume no threshold. In interpreting the findings of such models, an estimate that is not statistically significant is not evidence for a threshold, even though this interpretation may be offered. Rather, the risk estimate from a linear model is indicative of a threshold if the estimate is close to the null and precisely estimated, i.e. the confidence intervals are narrow. While epidemiological data have been the primary basis for empirical determinations of dose-response relationships in humans, studies on mechanisms is the foundation for interpreting the epidemiological evidence.

The reasoning throughout this discussion is consistent with what most toxicology books say about thresholds. Rodricks et al. (2001) point out that for most non-cancer endpoints, there probably is a small dose of chemical that can be tolerated without any adverse health effects. In other words, there should be a threshold. However they go on to say, “Threshold doses generally cannot be estimated with precision even in animal studies with homogenous animals. Estimation of threshold doses for a heterogeneous human population is even more problematic”.

In conclusion, we recognize that thresholds are an appealing concept. It would be reassuring if we could define a concentration level below which there are no adverse health effects. However, contemporary data in the area of air pollution suggests that this concept is elusive. Realistic heterogeneity causes thresholds to vanish. Thus, we believe that regulators need to accept the reality that laboratory scientists and epidemiologists can help provide dose-response and concentration-response curves which will reveal the extent to which reductions in pollution levels will result in reductions in a specific health outcome. It is also the case that this concentration-response function will be best determined at concentrations that occur. Extrapolating to lower concentrations or doses where data do not exist will however increase the uncertainty.

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.3

5.3.2 Ozone

The source document for this Digest states:

Chamber studies may show thresholds for mean effects of ozone on lung function and airway $ but a few individuals show these responses below these levels. As mentioned previously, a particular threshold in a particular experimental situation does not necessarily contradict a finding of effects below these levels in other situations.

The time-series results often have insufficient data to distinguish between a linear and non-linear model with confidence. In addition, the statistical analyses applied to investigate thresholds in datasets on particles have not been applied to the same extent to datasets on ozone. There remain uncertainties in interpreting the shape of exposure-response relationships in epidemiological studies due to different patterns of confounding by other pollutants and correlations with personal exposure across the range of ozone concentrations. Although there is evidence that associations exist below the current guideline value, our confidence in the existence of associations with health outcomes decreases as concentrations decrease.

The answer and rationale refer to acute effects of ozone, as this is most important for health impact assessment of the effects of ozone.

Rationale:

Clinical experimental studies
Experimental clinical studies have the advantage that it is possible to set experimental conditions to test a specific hypothesis. Particular subject groups can be selected (provided ethical considerations are met), ozone can be studied without the presence of other pollutants and ozone concentrations can be experimentally controlled. These studies may give clearer information about a threshold for a specific measure of effect in particular circumstances. Such results can be used to test conclusions from other studies such as ecologic air pollution studies. However, the applicability of this information to the whole of the general population is limited as the studies usually have a small sample size and only study healthy or mildly ill subjects and milder health outcomes.

Human clinical studies do not provide convincing evidence for an absolute level below which no effects are observed. There is evidence that prolonged (6.6 hours) single exposures to ozone at concentrations of 160 µg/m³ (80 ppb), with prolonged “moderate” exercise, cause decrements in pulmonary function, airway injury, and increased non-specific airways responsiveness. Although the magnitude of the mean effects at this exposure level is generally small, some individuals show clinically important responses.

McDonnell et al. (1995) developed a predictive model for changes in the forced expiratory volume in 1 second (FEV1) based on the 6.6-hour ozone exposure studies performed by the US Environmental Protection Agency. This model found that the lowest level of exposure, expressed as concentration x time for which the 90% confidence interval excluded 0, was 0.4 mg/m³-hour (0.2 ppm-hour). This model suggests that significant declines in FEV1 would be seen with exercising exposures to ozone concentrations of 400 µg/m³ (200 ppb) for one hour, or 50 µg/m³ (25 ppb) for eight hours. This result is consistent with the epidemiological and panel studiesfinding effects on lung function with ozone concentrations below the WHO Air Quality Guidelines of 120 µg/m³ (60 ppb) for eight hours.

It must be kept in mind that human subjects show highly variable responsiveness to ozone effects (see also answer to Question 3). This may be the result of genetic differences as described in the epidemiological studies section below. Clinical studies have generally used relatively small numbers of unselected subjects. Relying on mean changes for the whole subject group may underestimate the clinical significance of larger changes in a small number of subjects. If a clinical study were to be performed with pre-selected “responders” to ozone, in terms of pulmonary function, it is likely that the observed response thresholds for such groups are lower than that for a healthy, unselected group. Thus the human clinical data on lung function changes are not sufficient to indicate a threshold below which no effects are expected to occur for all people.

With regard to indices of airway inflammation and injury, fewer data are available than for the studies of lung function effects. However, the report by Devlin et al. (1991) shows that 6.6 hours of exposure, with exercise, to 160 µg/m³ (80 ppb) ozone caused statistically significant increases in inflammatory cells in bronchoalveolar lavage fluid, and increases in indicators of epithelial injury. The degree of change was less than that generally seen with higher concentrations, and some significant changes at higher concentrations were not seen with exposure to 160 µg/m³. However, two study subjects exposed to 160 µg/m³ ozone experienced greater than 10-fold increases in polymorphonuclear leukocytes in bronchoalveolar lavage (BAL) fluid, suggesting an increased sensitivity to ozone inflammatory effects in these subjects. It is possible that the effect threshold for inflammatory changes in such sensitive subjects may be well below 160 µg/m³.

Epidemiological studies
Observational epidemiological studies examine whole populations including susceptible groups (even if these are unidentified). However, as the population is being observed in real life, it is not possible to choose perfect experimental conditions. The ideal case where only the ozone concentration is changed is not possible because, in actuality, changes in ozone concentrations occur at the same time as changes in the weather and concentrations of other pollutants. In addition, the study has to work with whatever range of ozone concentrations happen to occur in a particular place.

Time-series results often have insufficient data to distinguish between a linear and non-linear model with confidence. This can result from factors including too few data points overall, too few data points near a possible threshold and a restricted range of data. It is possible to perform a statistical test for any significant deviation from linearity but this has only been performed in a minority of studies on ozone (e.g. Schwartz et al., 1994; Hoek et al., 1997). In addition, the sophisticated statistical analyses applied to specifically address the question of thresholds in datasets on particles (e.g., Daniels et al., 2000) have not been applied to datasets on ozone to the same extent. A recent paper (Kim et al., 2004) applied a linear model, a natural spline model and a threshold model to a dataset in Seoul and found that the threshold model, with a threshold at 56 µg/m³ (28 ppb) 1 hour average, gave the best fit. However, the slope above the threshold was steeper than in the linear model so the threshold model did not necessarily predict a lower health impact. Further studies of this type are needed. Currently, many studies on ozone do not explicitly describe the shape of the exposure-response function at all.

The atmospheric chemistry of ozone has some unique features which make the interpretation of the shape of exposure-response relationships particularly complex. Formation of ozone is temperature-dependent so that the high end of the exposure-response relationship will be based on hot sunny summer days and the lower end on winter days. Unfortunately, this may mean that factors other than the ozone concentration are varying across the range of the exposure-response relationship. For example, it is known that ozone is often positively correlated with particles in the summer and negatively correlated with particles in the winter (Sarnat et al., 2001). Ozone can be particularly low in cold inversion conditions when other pollutants accumulate. As these other pollutants can have the same health effects as ozone, this can give the perverse impression that health effects increase (or fail to drop) as ozone concentrations decrease. This may appear to suggest a change in slope in a single pollutant model exposure-response relationship that does not truly reflect the effect of ozone itself. Although the use of multi-pollutant models may help to disentangle this somewhat, there may be other factors involved as well. For example, variations in the total oxidant burden in the different polluted environments in which ozone occurs may influence the health response to ozone.

Ozone levels are very low indoors. This means that people’s exposure to ozone varies according to how much they are outdoors. It is likely that people spend less time outdoors on the winter days contributing to the lower end of the exposure-response relationship – another factor complicating interpretation. The low level of ozone indoors means that personal exposure to ozone and ambient concentrations of ozone are not well correlated (Sarnat et al., 2001; Avol et al., 1998). Brauer et al. (2002) demonstrated, using simulations, that surrogate metrics that are not highly correlated with personal exposures obscure the presence of thresholds in epidemiological studies of larger populations. This would apply when ambient ozone concentrations are used as a surrogate for personal exposure to ozone.

Bearing in mind the above difficulties in interpretation, individual studies that examined the shape of exposure-response relationships are described below. Emphasis is given to studies on all cause mortality, respiratory hospital admissions and respiratory symptoms, the endpoints most likely to be used in health impact assessment. Panel studies that examine effects on lung function at a similar range of ozone concentrations are also considered as these may lend plausibility to the occurrence of the other health outcomes in the same range.

Several studies of ozone and all-cause mortality in single pollutant models suggest thresholds at 40 to 100 µg/m³ (20–50 ppb) 8 hour average (Anderson et al., 1996; Hong et al., 1999; Wong et al 2001); 50 µg/m³ to less than 120 µg/m³ 1 hour average (Kim et al., 2004; Simpson et al., 1997; Morgan et al., 1998/2002) or 36 to 50 µg/m³ 24 hour average (Diaz et al., 1999; Goldberg et al., 2001). Morgan et al. (2002) found a linear association when using the GAM rather than GEE model. Galan Labacca et al. (1999) found a U-shaped relationship and Toulomi et al. (1997) found a flatter slope at high concentrations. However, for the reasons given in the paragraphs above, it may not be possible to take these shapes at face value.

Fairley et al. (2003) found a suggestion of a stronger relationship of all cause mortality with daily ozone ppb-hours above 120 µg/m³ after adjustment for PM_2.5. Kim et al. (2004) found that there was a steeper slope above 52 to 56 µg/m³ 1 hour average in several different multi- pollutant models. Although Moolgavkar et al. (1995) only found a significant association, adjusted for SO₂ and TSP, in the highest quintile (above 96 µg/m³ 24 hour average), there was a linear increase across quintiles. Borja-Aburto et al. (1997) found no relationship after adjustment for TSP. Only Hoek et al. (1997) used a multi-pollutant model (with TSP/24 hour average ozone and a formal test for non-linearity – the test for non-linearity was not significant. The relative risk remained similar even after all days above 40 µg/m³ ppb 24 hour average were removed.

For single pollutant models of respiratory hospital admissions, Ponce de Leon et al. (1996) found a suggestion of a threshold around 100 µg/m³ 8 hour average; Thurston et al. (1994) found increased relative risks in the two upper quartiles above 90 µg/m³ 1 hour average and Schwartz et al. (1994) found an increase in risk above 50 µg/m³ 24 hour average. Other studies found a flat association (Atkinson et al., 1999, 8 hour) or a linear association (Burnett et al., 1994; Burnett et al., 2001, 1 hour). Burnett et al. (1997) found an upturn at 50 µg/m³ 12-hour average but a chi- squared test for non-linearity was not significant. None of the studies examined the shape of the exposure-response in a multi-pollutant model.

Mortimer et al. (2002) found that a significant association with lower respiratory symptoms remained below 160 µg/m³ 8 hour average. This was also found for an asthma symptom score although only in asthmatics not on medication (Delfino et al., 1998). Schwartz et al. (1994) found a flattening of the relationship with lower respiratory symptoms above 80 µg/m³ 24 hour average but considered this implausible shape was due to confounding. The relationship for cough, after control for PM₁₀, was linear (p=0.31 in test for non-linearity). Thurston et al. (1997) (1 hour) and Ostro et al. (1993) (1 hour) also found linear relationships.

Several panel studies of lung function have used censoring of days above a certain concentration to investigate thresholds. Higgins et al. (1990) found no significant effect on lung function in children after removal of days above about 240 µg/m³ 1 hour average, although there are many studies which have shown effects on lung function below this level. Spektor et al. (1988) found a significant association with lung function in active children remained after removal of all days above 120 µg/m³ 1 hour average. Brunekreef et al. (1994) found that, in vigorously exercising cyclists, significant associations with lung function remained after removal of all days above 100 µg/m³ 1 hour average but became non-significant after removal of all days above 80 µg/m³. Similarly, Brauer et al. (1996) found a significant association with lung function was maintained in active farm workers with removal of all days above 80 µg/m³ 1 hour average but not with removal of all days above 60 µg/m³. It should be noted that censoring days above a certain concentration also involves reducing the total days in the analysis and thus a loss of statistical power. This may itself result in a loss of statistical significance.

Bergamaschi et al. (2001) found a linear relationship (R2=0.484) between 2 hour average ozone in the range 60 to 220 µg/m³ and changes in serum CC16 (a marker of increased epithelial permeability) in subjects with wild type NADPH quinone reductase and null glutathione-S- transferase μ1. This was not found in subjects bearing other genotypes. (The former genotype is a proposed susceptible group in terms of oxidative stress). Correlations with decreased FEV1 and the forced expiratory vital capacity (FVC) were also found mainly in this susceptible group. This genotype is present in 30% of the population. Other candidates for genetic susceptibility to ozone, from evidence in mice, include the tumour necrosis factor Tnf and toll-like receptor 4 Tlr4 gene (Kleeberger et al., 2001). In communities with the lowest ozone concentrations, variant TNF genotypes were associated with a higher risk of wheezing outcomes (Gilliland et al., 2003). Thus, there are indications that subjects with particular genotypes are responding to ozone at lower concentrations than the general population. This needs to be taken into account when considering thresholds.

Several studies have compared associations with ozone by season and often find greater associations in the summer when ozone levels are higher (e.g. Anderson et al., 1996; Simpson et al., 1997; Sunyer et al., 1996). This might appear to provide support for a threshold. However, the studies often divide the year into two six month periods for which the ozone concentrations overlap for the majority of the exposure range (e.g. Simpson et al 1997). In other studies (e.g. Sunyer et al., 1996 in Barcelona), the ozone range in winter/spring is no lower than the full year range in other places where significant associations have been found such as London (Anderson et al., 1996). Hoek et al. (1997) adjusted for TSP and did not find a greater association of all- cause mortality with 24 hour average ozone in the summer. In contrast, Moolgavkar et al. (1995) with a good contrast in 24 hour average ozone concentrations and adjustment for sulphur dioxide and TSP did find a greater association in the summer. The non-significant associations found in many studies in the cool season may be due to the different patterns of confounding by other pollutants, of personal exposure and of the chemistry of the polluted environment in different seasons, rather than to the small differences in ozone concentrations. Seasonal differences may therefore be less informative about thresholds than might be expected.

Another approach is to examine the results from places where ozone concentrations are low (<160 µg/m³ 8 hour average or <180 µg/m³ 1 hour average). Although not all studies show significant associations (Bremner et al., 1999; Zmirou et al., 1996; Hong et al., 1999), positive and significant associations with all-cause mortality have been found in Brisbane with a maximum ozone concentrations of 126 µg/m³ 8 hour average (Simpson et al., 1997), in Vancouver with maximum ozone concentration of 150 µg/m³ 1 hour average (Vedal et al., 2003) and in London with a maximum ozone concentration of 148 µg/m³ 8 hour average (Anderson et al., 1996). These associations were stable to adjustment for other pollutants.

Some studies have found positive and statistically significant associations with respiratory hospital admissions, for example, in Brisbane with a maximum 8 hour average concentration of 130 µg/m³ (Petroeschevsky et al., 2001) and in London with a 95th percentile 8 hour average concentration of 74 µg/m³ (Ponce de Leon et al., 1996). Another study in London with a maximum 8 hour average ozone concentration of 160 µg/m³ was positive but not significant (Atkinson et al., 1999). A positive and significant association was found in a meta-analysis of results from 16 Canadian cities with a 99th percentile of 174 µg/m³ 1 hour average (Burnett et al., 1997).

Given the above results, it would be difficult to rule out the possibility of an association at ozone concentrations below 120 to 160 µg/m³ 8 hour average. In fact, if there was a threshold it could well be below this, as it is unlikely that a single day or a few days close to the maximum concentration would be sufficient to drive a significant association alone. The 90th percentiles in these studies (where given) are around 60 to 80 µg/m³.

Studies of non-asthmatics in areas with maximum ozone concentrations up to 228 µg/m³ 1 hour average, 186 µg/m³ 8 hour average or 82 µg/m³ 24 hour average did not find statistically significant associations with lower respiratory symptoms (Hoek et al., 1999; Declercq et al., 2000; Hoek et al., 1995; Ward et al., 2002). The only exception was a study in vigorously exercising cyclists with a maximum 1 hour average ozone concentration of 196 µg/m³ (Brunekreef et al., 1994). On the other hand, increases in asthma attacks have been found in severe asthmatics in Paris with a maximum ozone concentration of 86 µg/m³ 8 hour average (Desqueyroux et al., 2002).

Some studies found significant small negative effects on lung function in places where ozone levels did not rise above 140 or 160 µg/m³ 8 hour average (Korrick et al., 1998; Cuijpers et al., 1995). Rises in serum CC16, a marker of lung permeability, have been shown in cyclists at 2 hour average ozone concentrations of 120 or 160 µg/m³ (Broeckhart et al., 2000).

Conclusions
Overall, it was not possible for all health outcomes to confidently define an unequivocal no- effect threshold for the whole population. For the reasons described above, interpretation of the shape of exposure-response relationships is very difficult to ascertain for ozone, particularly at the low end of the ambient range. However, in some studies associations with outcomes ranging from mortality to respiratory symptoms have been reported from locations where ozone never exceeds 120 to 160 µg/m³ as 8 hour average values. Some panel studies suggest small effects on lung function above around 60 to 80 µg/m³ 1 hour average. Our confidence in the existence of associations with health outcomes decreases at concentrations well below these levels, as problems with negative correlations with other pollutants and lack of correlation with personal exposure increase, but we do not have the evidence to rule them out.

Further research
Clear conclusions concerning the shape of exposure-response relationships in epidemiological studies will always be difficult but, given the importance of this issue, we recommend further research to explore the shape of the exposure-response relationship for ozone. Greater understanding of the different factors which may influence the shape such as correlation with other pollutants, correlations with personal exposure and variations in the total oxidant burden of different polluted environments, may help. Recent work has increased understanding of possible genetic reasons for increased susceptibility to ozone, suggesting new types of susceptible groups, but the implications of this for the range of responses at different ozone concentrations have yet to be fully explored.

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.3

5.3.3 Particulate matter

The source document for this Digest states:

Most epidemiological studies on large populations have been unable to identify a threshold concentration below which ambient PM has no effect on mortality and morbidity. It is likely that within any large human population, there is a wide range in susceptibility so that some subjects are at risk even at the low end of current concentrations.

Rationale:

After a thorough review of recent scientific evidence, a previous WHO Working Group concluded "If there is a threshold [for PM], it is within the lower band of currently observed PM concentrations in Europe" (WHO, 2003).

This was based on analyses of the large NMMAPS database (Daniels et al., 2000) on PM₁₀, on a simulation study (Schwartz & Zanobetti, 2000), and on a large Spanish study that investigated black smoke as PM indicator (Schwartz et al., 2001). Some methodological papers highlighting difficulties to pinpoint thresholds exactly on the basis of time series studies were also quoted (Zeger et al., 2000, 2001; Cakmak, 1999). Some smaller studies or studies using less appropriate PM metrics were not discussed in the previous document. These include a study that suggested that threshold concentrations of PM do exist. Smith (2000) re-analysed data from Birmingham, Alabama, and suggested that effects on (short-term) mortality were only evident at levels above ~80 µg/m³ PM₁₀, i.e. above the 90th percentile of the distribution. The analysis was based on <4 years of observation, and <20 000 deaths. The authors also mentioned, however, that the analysis could not exclude a threshold at a much lower level, below 20 µg/m³ PM₁₀. Another analysis by the same authors (Smith et al., 2000) looked at data from Phoenix, on some 40 000 deaths occurring over a three-year period. This analysis suggested a threshold for PM_2.5 of about 20- 25 µg/m³ as a daily average. Both analyses had limited statistical power compared to some other studies such as Daniels et al. (2000) with a database of almost 3 000 000 deaths. Nicolich et al.(1999) re-analysed TSP data from Philadelphia and suggested that in these data, there was evidence for a threshold of about 125 µg/m³ for the relation between daily average TSP and mortality. As no PM₁₀ or PM_2.5 data were available for comparison, this particular analysis has no applicability for identification of a threshold for PM₁₀ or PM_2.5 as observed in more concurrent studies.

A new simulation study by Brauer et al. (2002) has suggested that a threshold that exists on the individual level becomes obscured (i.e. invisible) on the (usually analysed) population level when the relationship between ambient and personal exposure is poor, but not when this correlation is reasonably high. Data were compared for PM_2.5, which had a relatively poor mean correlation coefficient between ambient and personal in the underlying dataset of 0.48, mean regression coefficient of 0.27 (s.d. 1.78) (Ebelt et al., 2000) and sulphate with a relatively high correlation between ambient and personal, with a mean correlation coefficient of 0.96, mean regression coefficient of 0.78 (s.d. 0.23). The implication of this simulation exercise is that a threshold that truly exists at the individual level is not likely to be missed in a study using ambient monitoring of a pollutant with reasonably high correlations between ambient and personal exposure. The database in the simulation exercise of Brauer et al. (2002) is interesting in the sense that it has a correlation between ambient and personal PM_2.5 that is rather lower than in most other studies of the issue.

The "threshold" issue has now also been directly looked at for PM_2.5 (Schwartz et al., 2002) in a large dataset from six cities, studied over 8-18 years, and including >400 000 deaths. A variety of approaches showed that the relationship between PM_2.5 and total mortality persisted down to very low levels (2 µg/m³), with little evidence of a threshold. When, using source apportionment techniques, the PM was partitioned to various sources, "traffic" particles were found to be related to mortality, with no evidence of a threshold, after controlling for particles from other sources.

Another way of addressing the "threshold" issue is to investigate what the lowest range of exposure is over which significant associations between air pollution and health have been observed. This can be done by looking at the concentration range per se, but also by "censoring" the data to below a predefined cut point, effectively removing all data from the analysis above that cut point. A 1995 review (Brunekreef et al., 1995) has systematically reviewed the literature from this perspective up to that time. Effects of PM₁₀ on mortality were observed at less than 100 µg/m³ (Pope et al., 1992, Dockery et al., 1992), and of PM_2.5 on symptom exacerbations at less than 75 µg/m³ (Ostro et al., 1991). When effects are found over such ranges, a threshold, if it exists, must be considerably lower than the upper bound of the range, as it is highly unlikely that a significant relationship would be driven completely by just a few observations at the highest end of the exposure range. It is, of course, well known that "outliers" in any dataset can influence the shape and the statistical significance of an exposure response relationship; however the analyses quoted before which have used non parametric smoothing techniques, have not suggested steeper slopes at higher concentrations. In censored datasets, which by definition have removed all values above a certain concentration, it is virtually impossible that "outliers" would still exist that determine the shape and significance of the exposure response relationship.

With the advent of the more sophisticated analyses of threshold phenomena discussed earlier, it has become less important to use relatively crude approaches such as censoring. The argument remains valid, however, that studies conducted in low concentration areas provide some insight into the upper bounds of thresholds, if they exist. One recent example is a study from Vancouver by Vedal et al. (2003), which studied mortality in a period when the 90th percentile of 24 hour average PM₁₀ was 23 µg/m³, the maximum only 37 µg/m³. Still, significant effects on mortality were found.

It seems that recent, statistically powerful studies that have looked for thresholds for PM₁₀, PM_2.5 and black smoke were unable to find one. As stated in the report on the January 2003 workshop in Bonn, epidemiological studies are unable to exactly define a threshold if there is one. The combined arguments provided in the previous paragraphs make it highly unlikely, however, that a threshold would exist at a level anywhere near the level of 35 µg/m³ which has been put forward for consideration in the draft position paper on PM of the CAFE working group.

All of the above arguments refer to time series studies. There are only few studies on effects of long-term exposure of PM on mortality, and even fewer of these have examined the shape of the exposure response relationship. The most powerful study (Pope et al., 2002) used non parametric smoothing to address this issue, and found no indication of a threshold for PM_2.5 for either cardiopulmonary or lung cancer mortality, within the range of observed PM_2.5 concentrations of about 8-30 µg/m³. Further modelling of these data suggested that the exposure response relationship for PM_2.5 was actually steeper in the low exposure range up to about 16 µg/m³. In contrast, analyses for sulfates suggested that a threshold might exist at about 12 µg/m³ (Abrahamowicz et al., 2003).

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.3

5.4 Contribution of different sources to PM-related health effects

The source document for this Digest states:

Explanation provided by the European Commission to this question: The WHO first report put a clear emphasis on the health effects of small PM originating from combustion sources. Can these relationships be quantified giving the source contribution to health effects? How may uncertainties in the source apportionment and the particle characterization (size and composition) influence the quantitative assessment of pinpointing a source as being the contributor to health effects? Also, is there information and associated uncertainty on the health effects of specific secondary particle mass, such as the particle mass fraction due to agriculture activities leading to ammonia containing particles?

Answer:

Only a few epidemiological studies have addressed source contributions specifically. These studies have suggested that combustion sources are particularly important.

Toxicology, because of its simpler models and potential to tightly control exposures, provides an opportunity to determine the relative toxic potency of components of the PM mix, in contrast to epidemiology. Such toxicology studies have highlighted the primary, combustion-derived particles having a high toxic potency. These are often rich in transition metals and organics, in addition to their relatively high surface area. By contrast, several other components of the PM mix are lower in toxic potency, e.g. ammonium salts, chlorides, sulphates, nitrates and wind- blown crustal dust such as silicate clays.

Despite these differences among constituents under laboratory conditions, it is currently not possible to precisely quantify the contributions from different sources and different PM components to health effects from exposure to ambient PM.

Rationale:

To date only a limited number of investigations have related health endpoints to specific particle components and/or source markers. Results from the Harvard Six Cities Study suggest that daily mortality was mostly associated with combustion sources such as traffic, coal and residual oil (Laden et al., 2000). This study looked at pollution data obtained in the 1980s when it was still possible to use lead as a reliable tracer for traffic exhaust. So, the results are relevant for a mixture of traffic-related air pollution for which lead is a tracer, and it cannot be stated with certainty that these results are still representative for current day mixtures. In addition, the Harvard group examined the heterogeneity of PM₁₀ related health risks reported in the NMMAPS study which used data obtained largely during the 1990s (Janssen et al., 2002). Their findings showed that the PM₁₀ related risk for hospital admissions due to cardiovascular disease increased with the fraction of PM₁₀ originating from highway emissions. Similarly, the European APHEA study found that the slope of the PM and health relationship was higher in areas exhibiting relatively high NO₂ concentrations (Katsouyanni et al., 2001). These areas were mostly impacted by mobile sources providing evidence for an enhanced toxicity of PM emitted from these sources. Work conducted in the United States of America by Mar et al. (2000), using factor analysis, identified vehicle emissions, vegetative burning and regional sulphate as important predictors of cardiovascular mortality in Phoenix, Arizona. A similar analysis by Tsai et al. (2000) analysing three New Jersey cities, using data from the early 1980s, found motor vehicles, metal industry, sulphate and oil burning to contribute to mortality. Moreover, a study conducted in Amsterdam showed that the slope of mortality on black smoke was twice as high in subjects living in homes on the main road network (Roemer and van Wijnen, 2001; Roemer and van Wijnen, 2001). Black smoke, which is an important component of PM, was also shown to be twice as high on these roads, suggesting that traffic emissions contributed strongly to the PM associated mortality observed in Amsterdam. The recent Delfino et al. (2003) study from California found that effects of PM₁₀ on asthma worsening were completely explained by elemental and organic carbon, which the authors attributed in large measure to diesel exhaust in the study region.

Although these studies on source-specific particle toxicity underscore the importance of combustion sources, especially traffic emissions, the data do not allow precise attribution of health effects to different sources. Source apportionment techniques need to be further developed, in step with emission databases, in order to make these types of estimates more precise. Nevertheless, the case for attributing significant health effects of air pollution to vehicle emissions is strong, also given the results of recent other studies documenting impaired health in subjects living close to busy roads (see question on hot spots).

PM is a complex mixture and if composition data is available it is, of necessity, unsophisticated. Epidemiology is therefore often poor at determining the role of composition in driving adverse health effects. By contrast, a primary aim of toxicology is the determination of the relative toxic potency of substances. This is generally accomplished by the use of animal models, cell systems and human chamber studies using short-term, well-controlled exposures. Components of the PM mix have been examined for toxic potency in a range of toxicology studies. These studies suggest that some of the components of PM that contribute substantially to the mass are low in toxic potency; these include salts such as nitrates, sulphates and chlorides (Schlesinger & Cassee, 2003) and wind-blown crustal dust including silicate clays; it should be noted, however, that some silicates are toxic (Hetland et al., 2001). Primary, carbon-centred, combustion-derived particles have been found to have considerable inflammogenic potency (Cassee et al., 2002) as a consequence of their high surface area or number (Donaldson et al, 2002), their organic (Marano et al., 2002) and metal (Costa and Dreher, 1997) content. In support of this contention, human subjects exposed by inhalation to high levels of diesel exhaust showed inflammation in lung biopsies (Salvi et al., 2000).

There is insufficient information about the relative toxicity of the particle mass fraction due to agriculture activities leading to ammonia containing particles, compared to particles originating from other sources.

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.4

See also: Particulate Matter

Question 3.1 Which are the critical sources of PM or its components responsible for health effects?

5.5 Impact of methods of analysis used in epidemiological studies

The source document for this Digest states:

Explanation provided by the European Commission to this question:
In the review of the guidelines a systematic assessment of the uncertainties (such as confidence intervals) of the relative risks would give a better understanding of the degree of uncertainty. This item should also include the uncertainty in the application of different models (including GAM).

Answer:

This answer addresses matters relating to uncertainties in methods of analysis used. Epidemiological studies use statistical models of various types, including Poisson and logistic regression. The estimates of effect provided by air pollution studies are generally accompanied by confidence intervals. These convey the precision of the estimate or statistical uncertainty that arises because the analyses are subject to a degree of random error. To a varying degree, the results of these analyses are sensitive to the details of the model and the specification of confounding and interacting factors. Extensive sensitivity analyses have shown that associations between air pollution and health remain irrespective of the methods of analyses used.

Rationale:

Uncertainty has implications both for identification of the possibility that air pollution is a hazard and for estimating the actual size of any effect for the purposes of risk estimation. CAFE has requested to consider a systematic assessment of uncertainty. This is in two parts: the first is statistical uncertainty and the other is model uncertainty.

1. Statistical uncertainty

The reviewers were aware of the need to consider statistical uncertainty. In the material supplied for the review of time series studies, for example, all estimates were accompanied by 95% confidence intervals which indicate the precision of the estimate as well as the likelihood that it is due to chance. In some cases, where there are multiple comparisons, it has been prudent to adopt a more stringent level of statistical significance. Meta-analytic estimates such as from the APHEA study are also accompanied by confidence intervals. Thus, the possibility of an association being due to chance was taken into account. Similarly, all the cohort evidence reviewed was described in terms of a central estimate and 95% confidence intervals.

2. Model uncertainty - time-series studies

Time-series analysis is complex, especially in the need to control for time-varying confounding factors. It is inevitable that different statistical approaches will lead to different results. Various studies in the past have looked at the sensitivity of results to different modelling strategies (Health Effects Institute, 1997; Samoli et al., 2001) and found that while the precise estimates vary between the statistical approaches used, the overall effects are still in favour of an adverse health effect.

This question was thrown into relief by the discovery of Dominici and colleagues that the results of the NMMAPS analyses were very sensitive to the criteria for convergence in the program (S Plus) that was used for the generalized additive modelling (GAM) approach, which was in vogue in the latter part of the 1990s (Dominici et al., 2002). In addition, other workers identified a problem with the underestimation of standard errors (Ramsey et al., 2003). Using the St. George's database, a comparison was made of GAM and non-GAM results in the published literature. The results are shown in Table 1. There was a tendency for GAM results to be higher than non-GAM results, though either method showed significant adverse effects.

Table 1. Summary estimates for studies of PM₁₀ and daily mortality by GAM or non-GAM statistical model and by single-city or multicity study design.

Following this discovery, many investigators re-analysed their data using GAM models with stricter convergence criteria. The APHEA group found little change in the original estimates for mortality (Katsouyanni et al., 2002) and hospital admissions (Atkinson, letter in preparation to AJRCCM).

This question has now been thoroughly investigated and reported recently by the Health Effects Institute (2003). The approach was to compare the original GAM based estimates with those from GAM models using stricter convergence criteria, and with those using Generalized Linear Modelling (GLM) with natural cubic splines. CAFE is referred to this report for a fuller description of the findings of this re-analysis; in summary:

or NMMAPS mortality, stricter convergence criteria and GLM methods resulted in lower estimates of effect (40-50% reduction), though these were still statistically significant;

for hospital admissions, there were smaller reductions (8-19%) in the NMMAPS results when the revised methods were used;

a variety of additional studies were re-analysed, including APHEA 2. These also tended to find smaller but still significant estimates, but less so than for NMMAPS. For some series, such as hospital admissions due to respiratory disease for the APHEA studies, the results were generally insensitive to stricter convergence criteria or to the use of GLM;

important uncertainties remain as to what is the best model to use for this type of analysis.

3. Model uncertainty - cohort studies

Model uncertainty has also been examined in cohort studies. A good example is the reanalysis of the ACS cohort study (Health Effects Institute, 2000). This involved a complete reanalysis using new statistical approaches, such as the incorporation of spatial correlation in the models. A wide range of sensitivity analyses was performed. The conclusion of this was that the original findings were robust in the sense that the estimates observed in the earlier analysis were substantiated in size and direction. However, the estimates did show sensitivity to the models used and interactions with various factors such as educational level. There was also uncertainty as to the relative importance of the main pollutants studied.

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.5

5.6 Possible regional characteristics modifying the effects of air pollution

The source document for this Digest states:

Explanation provided by the European Commission to this question:
The assessment of the risks builds on a concentration response relationship based on a number of studies from the United States and Europe. However, different parts of Europe have different mixes of air pollution due to differences in sources, climate and so forth. To what extent may uncertainties of the applicability of these relations influence the risk assessment due to particles and other priority pollutants?

Answer:

Potentially this could be a very influential issue since the characteristics of populations, environments and pollution (including particle concentration, size distribution and composition) vary throughout Europe. However, at this stage there is not sufficient evidence to advocate different guidelines for particles or other priority pollutants in different parts of Europe.

Several studies on short and long-term effects of particulate matter have consistently reported an association between pollution levels and mortality; however, there are differences in the size of the estimated effects of PM according to geographical region or according to the levels of other variables (potential effect modifiers). For example, it has been reported that the short-term effects of PM₁₀ are greater where long term average NO₂ concentration is higher, when the proportion of the elderly is larger and in warmer climates. Modification by socioeconomic factors, such as the level of education, has also been reported. Plausible explanations for some of these observations have been proposed.

Effect modification, for example by the age distribution in a population and by climate should, if possible, be taken into account in sensitivity analysis of health impact assessments or risk assessments.

Possible effect modifiers of other criteria pollutants have not been investigated to any extent so far.

Rationale:

In the context of several studies of the health effects of air pollution, the heterogeneity of effect estimates between cities or areas has been identified and investigated (Katsouyanni et al., 1997; 2001; Samet et al., 2000; Krewski et al., 2003; Levy et al., 2000). Thus, in the APHEA project it was first noted that the short-term effects of particles on mortality were lower in cities of central- eastern Europe (Katsouyanni et al., 1997). Similarly in the NMMAPS project the highest effects of particles were estimated for north-east United States (Samet et al., 2000). This issue was investigated further in the APHEA 2 project, where a number of variables (city characteristics) hypothesized to be potential effect modifiers, were recorded and tested in a hierarchical modelling approach (Katsouyanni et al., 2001). This led to the identification of several factors that can explain part of the observed heterogeneity. The following were the most important effect modifiers identified.

Mean temperature. In warmer cities larger estimates of the effects of particles on mortality are found (e.g. 0.8% versus 0.3% increase in mortality per 10 g/m³ change in PM₁₀). We do not know the mechanism through which this is occurring. One possibility is that in warmer climates populations are more exposed to outdoor air pollution by spending more time outdoors (especially in the summer) or by keeping the indoor/outdoor air exchange rate higher. This is supported by the higher effects estimated during the warm season in several studies (Katsouyanni et al., 1997; Samet et al., 2000). It is also supported by the finding that lower indoor penetration of outdoor air (e.g. due to the higher prevalence of air conditioning) is associated with lower health effects (Janssen et al., 2002). Another possible explanation could focus on considerations of the particle mix in warmer compared to colder climates and especially on the proportion of primary and secondary particles or the influence of the hours of sunlight on photochemical reactions that produce larger concentrations of organic fine particles and increased oxidant capacity of the ambient pollutant mixture. In any case, this issue should be further investigated.

NO₂ long-term average concentration. In cities with higher NO₂ levels the estimated effects were higher (e.g. 0.8% versus 0.2% increase in mortality per 10 g/m³ change in PM₁₀). This may reflect a real interaction between NO₂ and PM or it may indicate that high NO₂ levels imply larger proportions of particles originating from traffic. This latter explanation is supported by other findings, which suggest that traffic particles might be more toxic than those from other sources (Jannsen et al., 2002; Laden et al., 2000). Results from the NMMAPS project are also compatible (Samet et al., 2000).

It is generally accepted that air pollution causes larger effects to members of sensitive population subgroups. There is evidence that the effects are larger among the elderly (Viegi G & T Sandstrom, 2003; Gouveia & Fletcher, 2000). In the APHEA 2 analyses it was found that in cities with higher age-standardized mortality and those with smaller proportion of elderly (>65 years) the estimated effects were lower (Katsouyanni et al., 2001). This finding is supported by the analysis of Levy et al. (2000).

In the re-analyses of the six-city and the American Cancer Society (ACS) cohort studies on long- term effects of air pollution on mortality, several socioeconomic variables have been tested as potential effect modifiers (Krewski et al., 2003). It was found that lower education was associated with higher relative risks of mortality among those exposed to higher ambient particle concentrations. The results of the Dutch cohort study are compatible with the findings of the ACS study (Hoek et al., 2002). In a short-term effect study, there was limited evidence of effect modification by social factors (Zanobetti et al., 2000, O'Neill et al., 2003).

Recently, the problems identified with the application of GAM models for the analyses of short- term effects of air pollution and especially the underestimation of the standard errors of the effect estimates, lead to the conclusion that heterogeneity has been overestimated in reported studies (Ramsay et al., 2003). However, the re-analyses indicates that the patterns of effect modification remain the same, although the contrast in the size of the estimates at various levels of the effect modifier is smaller (Health Effects Institute, 2003).

Source & ©: WHO Regional Office for Europe Health Aspects of Air Pollution - answers to follow-up questions from CAFE (2004), Section 5.6

Air Pollution Ozone

5. What are the uncertainties regarding this study?

5.1 Uncertainties of the WHO answers, guidelines, and risk assessments

5.2 Consideration of publication bias in the review

5.3 Consistency of epidemiological and toxicological evidence in defining thresholds

5.3.1 General statement

5.3.2 Ozone

5.3.3 Particulate matter

5.4 Contribution of different sources to PM-related health effects

5.5 Impact of methods of analysis used in epidemiological studies

5.6 Possible regional characteristics modifying the effects of air pollution

Get involved!

Video