A Statistical Approach for Comparative Assessment of the Effect of Smoke Exposure in In Vivo Experiments: A Case Study of an OECD 90-Day Inhalation Study Including 3R4F and 1R6F Reference Cigarettes
Pubblicato online: 30 gen 2025
Pagine: 26 - 33
Ricevuto: 05 lug 2024
Accettato: 08 nov 2024
DOI: https://doi.org/10.2478/cttr-2025-0003
Parole chiave
© 2024 Athanasios Kondylis et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Reference cigarettes from the University of Kentucky are widely used for the toxicological assessment of the effects of cigarette smoke, with the 3R4F reference cigarette serving as the general standard for several decades. Recently, the 1R6F was developed and manufactured to replace the 3R4F. Here, we propose a novel method for comparative
The current study data enable direct comparison of effects between the two cigarettes including average differences and study-specific variability estimates; historical 3R4F data provide estimates for the historical or long-term variability of the reference cigarette. We leveraged the historical data to improve our interpretation of the observed differences between the two cigarettes in the current study. Including historical data is reasonable because even under a statistically sound design (e.g., block randomized designs), data variation cannot be fully attributed to pre-identified study design parameters. Variation inherent to smoke and aerosol generation emanates from multiple and often unknown sources during aerosol generation or the bio-analytical quantification process. Cigarette smoke is a highly complex system, with variation that results from many uncontrollable sources (2,3,4). In addition, despite all efforts to optimize and validate analytical methods and instrumentation, many approaches still provide measurements with non-negligible error. Although increasing the sample size could minimize variation while performing traditional statistical comparisons, this is not compatible with animal welfare concerns. The use of historical data provides a powerful and ethically acceptable alternative. Clear regulatory directives on setting equivalence limits for comparative tobacco product assessment do not exist. Indeed, there are no widely accepted statistical methodologies that could be used to demonstrate equivalence for tobacco products (5). Our proposal consists of following the equivalence principle for statistical comparisons while expanding the range beyond the standard bioequivalence limits to incorporate the long-term variability of the reference product. The use of variable equivalence limits based on historical data evidence has been discussed in the pharmaceutical industry (6,7,8). Variable equivalence limits have been also proposed for comparisons of smoke constituent yields and
The term “comparative assessment” refers to testing for differences, which in statistical terms translates to testing the null hypothesis that products do not differ
Confidence intervals are used to assess differences and/or equivalence (23, 24), providing more insight into the comparative assessment between two products. They provide information on the direction and the magnitude of the observed differences, as well as the associated uncertainty, which is reflected in their width. The width of the confidence interval depends on the sample size and confidence level 1−α, creating a direct link between confidence intervals and statistical tests. For difference testing, we look for a confidence interval on the (1−α)% confidence level; for testing equivalence with TOST we look for a (1−2α)% confidence interval. Figure 1 illustrates several hypothetical confidence intervals and their link to statistical difference and equivalence testing resulting from a comparison of mean ratios between two products. Row A indicates no difference between the two products with high certainty. The mean ratio between the two products is equal to one, and the confidence interval is narrow. The confidence interval in row H is equally narrow, but the effect of the test product is reduced by almost 50% relative to the reference product. In terms of a difference testing approach, the observed effect in case H is highly significant, while the one in case A is not. Rows C and G illustrate statistically significant changes; a reduction and an increase for rows C and G, respectively. Statistically significant increase is also depicted in row D and row E. For equivalence testing, case H shows no equivalence between the two products, whereas case A shows equivalence within the standard equivalence limits of [0.8, 1.25]. No equivalence is the conclusion for all rows other than A, given all confidence intervals don't entirely lie within the equivalence zone of [0.8, 1.25].These equivalence limits are commonly used in bioequivalence studies and are also recommended by the FDA. Note that they are symmetric in the ratio scale (1/0.8 = 1.25). We propose to extend the equivalence limits beyond [0.8, 1.25] according to:

Lower and upper equivalence range limits (
cv | ||||||
---|---|---|---|---|---|---|
10 | 0.05 | 0.05 | 0.2 | 0.198 | 0.786 | 1.271 |
10 | 0.05 | 0.05 | 0.3 | 0.294 | 0.700 | 1.428 |
10 | 0.05 | 0.05 | 0.4 | 0.385 | 0.627 | 1.596 |
10 | 0.05 | 0.05 | 0.5 | 0.472 | 0.564 | 1.773 |
10 | 0.05 | 0.1 | 0.2 | 0.198 | 0.805 | 1.243 |
10 | 0.05 | 0.1 | 0.3 | 0.294 | 0.725 | 1.380 |
10 | 0.05 | 0.1 | 0.4 | 0.385 | 0.655 | 1.526 |
10 | 0.05 | 0.1 | 0.5 | 0.472 | 0.596 | 1.679 |
Lower and upper alternative equivalence range limits (
cv | ||||
---|---|---|---|---|
1 | 0.2 | 0.198 | 0.820 | 1.219 |
1 | 0.3 | 0.294 | 0.746 | 1.341 |
1 | 0.4 | 0.385 | 0.680 | 1.470 |
1 | 0.5 | 0.472 | 0.624 | 1.604 |
0.76 | 0.2 | 0.198 | 0.860 | 1.162 |
0.76 | 0.3 | 0.294 | 0.800 | 1.250 |
0.76 | 0.4 | 0.385 | 0.746 | 1.340 |
0.76 | 0.5 | 0.472 | 0.698 | 1.432 |
Data from a 90-day OECD rat inhalation study served as a use case for our statistical approach on the comparative assessment of the exposure effect to smoke from the standard reference cigarette 3R4F

The analysis presented in this work focuses on the comparison of two products: new (1R6F)
The cigarette smoke was administered by nose-only exposure inhalation to outbred male and female Sprague Dawley rats [Crl:CD(SD)] bred under specific pathogen-free conditions (10 rats per sex). The exposure was conducted for approximately 13 weeks (6 hours per day, 5 days per week). Week 1 consisted of a time-adaptation phase during which the rats were exposed to increasing exposure durations over 7 days. All exposures were targeted to deliver 23 μg/L nicotine in the test atmosphere.
A set of relevant endpoints was analyzed as part of the comparative assessment between 3R4F and 1R6F. These endpoints were grouped into four main categories:
test atmosphere characteristics in the exposure chambers; cigarette smoke uptake parameters as respiratory physiology, carboxyhemoglobin levels in the blood (as a marker for CO uptake), and urinary nicotine metabolite levels; inflammatory biomarkers as inflammatory cells; clinical chemistry markers.
Data from selected endpoints from three historical studies that used 3R4F (27,28,29) were also collected and analyzed as part of the comparative assessment between 3R4F and 1R6F.
A tabulated method description for the analyzed parameters is given in the
The results of the statistical comparisons from the proposed comparative assessment are shown in Figures 3 and 4. The confidence intervals depict the estimates and the uncertainty related to the observed differences between the two reference cigarettes, while the blue boxes reflect equivalence zones for each endpoint derived from the historical data variability of the historical (3R4F) reference product. Figure 3 provides strong evidence for equivalence between the 3R4F and 1R6F cigarettes for all test atmosphere endpoints except formaldehyde. The observed differences are estimated with high precision for these endpoints, as reflected by the relatively narrow confidence intervals. The only exceptions in this case are the carbonyl compounds, especially formaldehyde, which was also significantly increased in 1R6F as compared to 3R4F (9, 30). The blue boxes are fixed at the standard limits of [0.8, 1.25], reflecting that no extra variability is present in the 3R4F reference cigarette results across the historical data. Figure 4 highlights the diversity of the obtained results for the various biomarker groups. Variability is observed in both the current study data (as reflected by the large confidence intervals) and the historical data (as reflected by the large equivalence ranges depicted as blue boxes). Figure 4 highlights that for all the biomarkers, the average observed differences between 1R6F and 3R4F are within the equivalence ranges and, therefore, within the variability range of the reference product. These results can be visually confirmed by noting that the geometric mean ratio estimates (black circles) fall within the equivalence limits (blue boxes). However, the lower and/or upper confidence limits are not always within the equivalence ranges. In these cases, equivalence, as mathematically defined, is not met. In our case study equivalence is confirmed for all respiratory physiology and exposure endpoints. Yet, it is not always met for nicotine metabolites, inflammatory cells, and clinical chemistry endpoints. For all studied endpoint categories, with the exception of the urinary nicotine metabolites, there is no evidence of an increasing or decreasing trend. Nicotine metabolites show a consistent inferiority trend.


This manuscript describes a novel statistical comparison approach for tobacco product comparative assessment using
The OECD study results show that differences observed between the two reference cigarettes were modest and on average within the range of the variability of 3R4F. The analysis of the
Equivalence analyses with variable equivalence limits using historical data may largely improve the comparative assessment of smoke/aerosol exposure in