Clinical Trial Designs for the Testing of OTC Products for Dentine Hypersensitivity-A Systematic Review

Aim: The aim of the present study was to review the papers in the published literature and to compare the clinical trial methodology used in these studies to evaluate the products for Dentine Hypersensitivity (DH) based on the previously recommended guidelines of Holland, et al. (1997). Material and Methods: A systematic search was conducted on PubMed and Embase for double blind randomized placebo-controlled clinical studies conducted over ≥6 weeks assessing the efficacy of OTC products for the treatment of DH in otherwise healthy adult subjects with reported and diagnosed DH. Results: A total of 35 studies were included in this review from an initial search of 882 titles. All the included studies complied with the guidelines in terms of study design, duration, subject selection, adequate control(s) and subject instructions. 91% of studies used a sample size of ≥ 25 per arm. Most studies (91%) complied with the minimum required number of teeth to be test ed except in two studies. All the studies used an objective assessment however only two studies (22.8%) included a subjective evaluation to an everyday stimulus when evaluating DH. Only two studies included assessment of the impact of DH treatment on the participants’ Quality of Life (QoL). Most studies did not include a recommended specific run in/wash period (10 studies [28.5%]) or a follow up period following the cessation of the study (2 studies [5.7%]). All studies reported the reduction of DH as a percentage reduction from baseline values. None of the studies reported a total relief of pain irrespective of the intervention(s) evaluated although there were mean reductions in both test and control groups. Conclusions: All of the included studies reported a significant statistical reduction of pain in both the test and control groups, although none of the studies reported the complete absence of the pain response following any of the interventions at the end of the studies. Overall, most studies complied with the recommendations from the Holland, et al. guidelines however, there is still a need to include both a run in/wash out and follow up periods in future studies. Furthermore, it may be recommended to include a subjective evaluation of the treatment outcome to overall sensitivity from day to day activities as well as the effect on the QoL (person-centered approach) in future studies. None of the included studies reported the complete absence of the pain response following any of the interventions at the end of the studies.


Introduction
Dentine Hypersensitivity [DH] has previously been defined as 'pain derived from exposed dentin in response to chemical, thermal tactile or osmotic stimuli which cannot be explained as arising from any other dental defect or disease [1][2]. It is evident from the published literature that DH is a common problem in the adult population although its true impact on those individuals suffering with the condition varies [3] particularly in relation to their quality of life [4][5]. The pain associated with DH is episodic (transient) in nature and will generally ease once the stimulus has been removed. The prevalence rates vary depending on how the data was collected or where the studies took place (e.g., questionnaire, surveys or clinical examination; general practice, university hospital or consumer based) and these may range form 1-74% [6]. DH tends to be underestimated by clinicians due to the difficulty in diagnosis [7]. The condition generally involves the facial surfaces of teeth near the cervical aspect and is very common in premolars and canines followed by the upper first molar with the incisors being the least sensitive teeth [7][8][9][10]. To treat the condition a number of Over the Counter (OTC) and Inoffice (Professionally applied) products have been developed for patients/consumers and are evaluated in an in vitro environment prior to clinical evaluation to determine their safety as well as their effectiveness in successfully treating the condition. It is important, however, to assess these products in an in vivo environment to determine both safety and efficacy of the products and as such clinical trial design is an important aspect in the evaluation of the efficacy of these products and describes the manner in which patients will be studied in terms of selection, treatment and assessment [11]. There are four types of clinical trial designs, 1) randomized (parallel arm), 2) pre-treatment period, 3) crossover, and 4) split-mouth. When evaluating products for the treatment of DH, a double blind randomized parallel group has been recommended in which all subjects are allocated to either a test or control groups, and the difference in outcome across the group will determine any significant efficacy of the test group [12]. There are several problems associated with studies for testing products of DH such as placebo and no-placebo effects, the objective methodology used to evaluate DH as well as the highly subjective of the pain response [13]. In order to reduce these effects Holland, et al. [12] made a number of recommendations or guidelines such as the experimental design, sample size, subject selection, teeth and sites to be tested, test stimuli, controls, wash-in/wash-out period, subjects, instruction duration, assessment, outcome, and follow up when conducting clinical trials for evaluating products in the treatment of DH. It was apparent, however from reading the published literature that some studies do not necessarily conform to these guidelines when evaluating these products.

Aim
The aim of the present study, therefore was to review the papers in the available literature and to compare the clinical trial methodology used to evaluate the products for DH based on the inclusion criteria of the guidelines previously recommended by Holland, et al. [12].
1) The objectives of the study are as follows: 2) Identify clinical trials that evaluate OTC products for DH 3) Describe the methodological quality of these studies and their main features and characteristics. Compare the methodology of the included studies with the recommendations of Holland et al. [12] in terms of experimental design, sample size, subject selection, teeth to be included, test stimuli, controls, wash-in/ wash-out period, subject instruction, duration, assessment, outcome, follow up, number of studies, products used in the over the counter (OTC) interventions. 4) Present any pertinent implications for future trial design for DH studies.

Method of the review
This systematic review was conducted in agreement with the recommendation of the principles of the PRISMA statement [14]. The focus question was: Do clinical trials evaluating the efficacy of products for treatment of DH follow the guidelines recommended by Holland, et al. [12]?.

Types of studies
The review will include all studies in English, full text, double blinded randomized controlled clinical trials conducted in human subjects to test the efficacy of products for DH. Duration of the selected studies will be at least 6-8 weeks in duration.

Types of participants
Healthy dentate adults (at least 18 years old) with a reported and established history of DH.

Types of interventions
Participants should be randomly allocated to one of the following: 1) Test: agent or product (the formula and concentration should be stated by authors).
2) Control: (should be the same as the test but without the active agent.

Types of outcome measures
Included studies should assess the change in response to the test procedures including tactile, thermal and air blast stimuli or a patient subjective assessment of pain during every-day activities.

Types of studies
Single case reports, in vitro, in situ or review articles were excluded.

Types of participants
Studies were excluded if the subjects were not described or if the subjects were taking any analgesic drugs due to medical problems or if the subjects received any periodontal therapy during the period of trial, and whether the sensitivity was due to caries, bleaching, or endodontic reasons.

Types of interventions
Studies were excluded if the test product contained fluoride and the test did not.

Types of outcome measures
Exclude any unknown methodology.

Other relevant criteria for inclusion
1) Investigator calibration on assessment of DH prior to the commencement of the study 2) Randomization of the participants into different groups was clearly described such as the concealment of participants and group allocation to both investigators and participants.

Search strategy
The search strategy included the use of the electronic data bases PUBMED and Embase up to 21.12.2016. The searching Keywords in PUBMED ((((((((((dentifrice) OR dentifrices)) OR (((toothpaste) OR toothpastes) OR tooth paste)) OR ((desensitizing) AND products))) AND ((dentine) OR dentin)) AND ((hypersensitivity) OR sensitivity)))) NOT (((((laser) OR endodontic) OR bleach) OR whitening) OR caries). The reference list of the included studies and the relevant reviews were manually searched. Only articles published in the English Language were selected.

Study Selection
Studies were selected in a two-stage screening process and performed by two of the three independent reviewers (AK/DC/ DG). Disagreement about the inclusion or exclusion of a study was resolved by consensus. The first stage screening of the titles and abstracts was performed to eliminate irrelevant articles and those that did not fit the inclusion criteria established by this review. At the second stage following the reading of the full text of each article, the study eligibility was verified independently by two of the three reviewers and data extraction and quality assessment was performed for the included studies based on randomization, allocation concealment, blinding, and description for dropout. Any disagreement was be resolved after discussion between AK and DG.

Risk of Bias of Included Studies
This was assessed according to the criteria of concealment of treatment allocation described in the Cochrane handbook for systematic reviews of interventions [15]. Allocation concealment for each study was rated as a) Adequately concealed, b) Concealment is unclear, c) Inadequately concealed. Blinding was also assessed as a) Double blinded, b) Single blinded, c) Blinding is mentioned but is suspicious or uncertain [16].

Statistical Analysis
No statistical or meta-analysis analysis was undertaken in the present study, descriptive summaries of the studies are presented in the results section.

Results Overall Description of the Included and Excluded Studies
The flow chart representing the study selection and inclusion is shown in Figure 1. The initial search resulted in 878 articles; an additional four articles were identified by manual searching by hand. Following the first stage screening of titles and abstracts, 121 articles qualified for full-text screening. Following the full text reading, 35 articles met the defined criteria, 86 articles were excluded as follows: 1) in office procedures (27) 2) Insufficient duration of study (20) 3) Inadequate controls, randomization, blinding (18) 4) Review papers (6), 5) In vitro studies (8), 6) In situ studies (2), 7) Editorial (1), 8) Publications not in the English language (1 in vitro and 2 in-office) [3], 9) Adjunctive use of OTC products (1). With regards to the type of product that was under investigation, the studies can be divided into four categories, OTC products, OTC products combined with another over the counter, in office products, and in office products combined with OTC products. For purpose of this review only studies which test OTC without any adjunctive will be considered.

Excluded OTC studies
The 86 excluded OTC studies were compared with the Holland, et al. [12] guidelines in terms of experimental design, study duration, sample size, subject selection, teeth to be included, test stimuli, controls, Run-in/wash-out period, subject instruction, outcomes, and follow up. The results are shown in Figure 2.

Included studies
The included studies  were compared with the Holland et al. guidelines [12] in terms of experimental design, study duration, sample size, subject selection, teeth to be included, test stimuli, controls, Run-in/wash-out period, subject instruction, outcomes, and follow up. The main features and characteristics of the included studies compared to the Holland, et al. [12] guidelines are summarized in Table 1

Study design
All the included studies followed the recommended study design guidelines Holland et al. [12] (Figure 3). According to Schulz [16], random allocation to intervention groups in a clinical study appears to be the only method of ensuring that the groups being compared have an equivalent foothold at study outset hence eliminating confounding factors or the introduction of bias into the study. The success of randomization depends on: a) Generating a system or sequence by which study subjects have equal possibilities of being allocated to the different intervention groups and.
b) Allocation concealment such as blinding or masking which reduces the introduction of bias, confounding factors by shielding the intervention received by each specific group.
All the studies randomized their participants to the various test and control groups although it was unclear in some of the studies as to how they completed the randomization and allocation process. It was clear from most of the included studies that they were either sponsored by an Oral Health Care Company or followed a protocol based on their recommendations and as such the randomization and allocation of interventions to the various treatment groups would be based on a randomized code to blind either the study staff or subject (single blind) or both the staff/subject (double-blind). For example randomization based on stratification according to the participants' age, gender, baseline mean thermal (air blast) and tactile (Yeaple Probe) sensitivity scores in to the treatment groups or a modification of these variables 17 [46] used a computer algorithm to limit the impact of age, gender, diet, and current level of oral hygiene on the study. The methodology used to maintain blinding during the study was not always clearly described in the studies, but it would be normal practice to overwrap the test and control dentifrices or have similar tubes/mouth rinses with labels with the assigned code (allocation concealment or masking) [18,33,37,38,43,50,51].

Duration
All the included studies meet the guidelines in relation to the study duration ranging from six to 12 weeks (assessment at baseline, mid-point, end-point assessment

Sample Size and Statistical Power
33466 participants were enrolled in the included studies and included both male and female participants with an age range from 18-70 years. Most of the included studies (91%; n=32) included at least 25 subject per arm, three studies [17,32,46], included less than 25 per arm. Only 12 studies provided information on any power calculation (formally or informally), for example to detect statistically significant differences between treatment groups using a two tailed alpha 0.05 and a power of 80% [30-33,36,39,44-45,47,49-51]. Studies also included a 5% or 10% 'drop out' calculation to allow for the relevant number of participants completing the respective studies. The rest of the studies in this review did not appear to report any details on sample size calculation.

Consideration of Withdrawals and Dropouts
Any withdrawals and dropouts that occur following the randomization process may affect the balance of the groups established via the randomization procedure. One way of avoiding this problem is by reporting on the number of withdrawals or dropouts as if they were still a part of the clinical trial; this is called the intention-to-treat analysis (52). The number of 'drop-outs' was reported in most of the included studies although only few studies included ITT analysis [18,31,40,44,47,49,51].

Subject Selection
AAll included studies broadly complied with the Holland, et al. [12] guidelines regarding the inclusion/exclusion criteria of the study populations by including participants with a known history of DH, assessment and at least two stimuli etc. Participants with existing medical conditions, allergies to the product ingredients, dental status that would preclude testing (periodontal condition, crowns, abutment teeth, carious teeth) pain medication that would conflict with the pain assessment and female participants who were pregnant or nursing mothers were excluded from participating in the studies. Participants who were unwilling or unable to provide written consent or unable to complete the proposed study were also excluded.

Run In /Wash-Out Period and Follow Up Assessments
Only eight (22.8%) of the studies included a pre-trial run-in/wash-out period ranging from seven days to six weeks [17,18, reported on a four-week follow-up period following the cessation of active treatment.

Subject Instruction
All the included studies complied with the guidelines in this aspect (Table 1). Generally speaking all the studies included information on whether the participants provided their consent following their reading of the relevant paperwork, these instructions were also verbally reinforced during the study in relation to the number of daily brushing, exclusion of other oral hygiene aids and dental products. Most studies indicated a twice daily brushing with the assigned dentifrice although some studies did not specify for how long, but the assumption would be one minute twice daily where the participants were asked to brush twice daily for two minutes. There were five mouthwash studies where the participants were asked to brush their teeth for one-minute with or without a toothpaste prior to rinsing with their allocated mouthwash for one-minute (twice daily) [18,24,38,40,48].

DH Stimulus and Assessment
All included studies included at least two test stimuli, with the exception of Ghassemi, et al.
[31] who only used an air-blast stimulus (thermal)(VAS and Schiff Scale). Most studies used the Yeaple probe as the main assessment of tactile stimulation [17-18,20-

Data Analysis
Data analysis was performed in all the included studies which included summary statistics from the baseline to the end-point of the study together with a variety of statistical tests (significance level p≤0.05) used for the various assessment outcomes to determine any differences between the test and control groups. For example, the following main statistical tests were used in the included studies; 1) Analysis of Variance (ANOVA) [17-23,25- Logit transformed analysis which was based on, the net number of sensitive teeth becoming non-sensitive together with the change in the proportion of sensitive teeth over the duration of the study [19].

Outcomes
All the authors reported on the outcomes from the included studies in terms of a reduction in the pain response using the various assessment methodology compared to the baseline values as well as a comparison of the between treatment difference(s) to the control groups . Although there were statistical differences from baseline to the completion of the study in both the test and control groups (p ≤ 0.05) not all studies shown a statistical difference between the test and control groups. For example, some studies reported statistical differences between the test and control groups [17,20-23, 25,26,29,31,33,35,36,38-47,50,51], whereas in other studies there appeared to be no significant differences despite the emergence of a positive trend [18,19,24,27,28,30,32,34,37]. There was also a statistical reduction in the pain response in the control groups as well as in the test. Only eight studies included the effects of the treatment to everyday life-stimuli [19,24,34,39,41,42,50,51], and only two studies reported the effects of the treatment on the quality of life (DHEQ)[50-51], although a study by Parkinson, et al. [44] included the DHEQ in the methodology but may not have report the findings. None of the included studies reported the complete absence of the pain response following any of the interventions at the end of the studies.

Discussion
One of the problems in evaluating DH in the clinical environment is the highly subjective nature of DH affecting the QoL which may also complicate the evaluation of the participant's response to the assessment methodology when diagnosing DH in a dental practice or during a clinical study to evaluate the efficacy of a desensitizing products [3,53]. Pain is the major outcome arising from DH and the degree of discomfort expressed by a participant may depend on the individual's pain perception and pain tolerance, as well as his/her emotional and physical factors. According to Curro and Gillam [13] there are several problems such as the effects of Hawthorne and placebo effects which makes it difficult to objectively assess the level of pain during a clinical study.
Both these effects can have an impact on the results from the study. For example, simply being in a study with a continually reinforcement of the recommendations will improve the participant's oral hygiene and subsequently introduces a degree of bias into the study (Hawthorne effect). The placebo effect where the participants report an improvement even though they may be in a placebo group with no active ingredient in the dentifrice may also give an improvement throughout the study (varies from 20 to 60% in DH clinical studies) [54]. The entry criteria for DH studies should however be reasonable and realistic otherwise the investigator will struggle to recruit adequate numbers within the allocated time frame for completion of the study. Care should also be taken when screening subjects not to recruit subjects who report either minimal or extreme discomfort as the statistical probability of measuring the pain response can only stay the same, worsen, or improve, respectively. This phenomenon is called regression towards the mean or mode and can magnify a product's treatment effect if used on a severely affected population or reduce the effect when used in an under-affected population [11,13]. The importance of conducting well-designed RCTs has been emphasized by several clinical investigators [12,13,54] and it was evident that prior to the publication of the Holland et al guidelines in 1997 [12], the standard of conducting and reporting DH studies was inconsistent and it should be recognized that since 1997 there has been an marked improvement in the conduct and reporting of these studies. The current review examined the available published literature to compare the clinical trial methodology used to evaluate the products for DH based on the inclusion criteria of the guidelines recommended by Holland, et al. [12]. The purpose of the review was not to evaluate the efficacy of these products per se but to determine whether these studies complied with the Holland, et al. [12] guidelines and whether there were any recommendations to be made in the light of the outcomes from the review. A total of 35 studies were included in this review, from an initial search of 882 titles. All included studies complied with the guidelines in terms of study design, duration, subject selection, adequate controls, and subject instruction(s). Those studies that were excluded from the review failed to satisfy the Holland, et al. [12] guidelines ( Figure 2) in comparison to those studies that were included in the present review ( Figure 3). There were a variety of different test interventions and control dentifrice and mouthwash products but generally speaking a placebo/negative control was either a fluoride (Sodium fluoride [NaF] [19,21,23,25,26,29,35,45,49], Monofluorophosphate 1000 ppm Fluoride [MFP] [17,27,34,36,39,41,42,44,46,47,49], Amine fluoride dentifrice [37] or a positive control dentifrice (a recognized desensitizing dentifrice) [21][22][23]30,33,37,49]. Other studies included an arm that was either minus active [30,[32][33][34]43,50,51] or evaluating changes in 1) abrasive system [20] or 2) including an anti-plaque ingredient [27][28]. It should be noted that most studies were Company sponsored as it is very difficult to run an independent DH RCT due to the cost of running these studies. It should be noted that some of the dentifrices reported in the present review are no longer commercially available in the formulations described in the studies (e.g., 19,[20][21][22][23]32). The methodology used for the assessment of DH in the included studies has been established in several review papers [12,[53][54][55] such as the Yeaple probe [17,18,[20][21][22][23][24][25][26]29,32,[34][35][36][38][39][40][41][42]44,45,47,[49][50][51], Jay probe [39,41,42], Scratchometer [19] or dental explorer probe or equivalent (e.g. periodontal probe) [27,28,30,31,33,37,43,48] for tactile stimulation. Only one study used an electrical stimulus an electrical stimulus (Sensitometer) [19]. The tactile stimulus would be generally used prior to the cold air blast from a dental air syringe or other stimuli such as cold water and hypertonic solutions. The rationale for this methodology is based on using the least damaging stimulus first as the evaporative/thermal stimulus may have a more lasting effect on the pain response than the tactile stimulus [53,55]. There does not appear to be an accepted time interval between each of these stimuli although five to ten minutes between each stimulus has been suggested [53]. Prior to any evaluation both the Yeaple and Jay probes should be calibrated daily by the study examiners, these probes register a constant force (gm) starting from a lower range ascending to 50-100 gm depending on the probe type. Following the application on the test tooth the value is recorded and the participant asked to complete a VAS score. The assessment of the air blast from a dental air syringe is also assessed with a VAS score but more recently this assessment is by the Schiff air scale (0-3) score. It could be argued that these stimuli (tactile and thermal) are not realistic of daily living and that more realistic assessment should be included in the evaluation. The use of the DHEQ and other QoL tools may therefore have a more important role in future studies. Details of calibration and training of the examiners were however not routinely reported in the included studies and this may be an area which could be improved on in future studies. Holland, et al. [12] Overall these studies followed the guidelines although there were minor variations with 1) the number of test stimuli used in these studies (97% compliant), 2) minimum teeth included for assessment (91% compliant) and 3) sample size (91% compliant). The two areas where the compliance with the guidelines was poor was in 1) run in/wash out periods (22.8% compliant) and 2) a follow-up period after the cessation of the active intervention to determine the duration of a product's efficacy (5.7% compliant) ( Table 1). All the included studies reported on the effect of the intervention as a percentage reduction from baseline for each of the clinical parameters (tactile, thermal, overall sensitivity). However, while it was evident that there was an improvement in the various outcomes from baseline in both the test and control groups, none of the included studies reported a total relief of pain from DH. According to Holland, et al. [12] the main objective of a study evaluating the efficacy of a desensitizing dentifrice or mouthwash should be to produce a clinically significant reduction in symptoms rather than a small but statistically significant reduction between the intervention and its control. One of the problems however in evaluating the efficacy of the various interventions in the present review was that both test and control groups in each of the included studies reported a significant statistical reduction of pain from baseline which would suggest that both placebo and non-placebo (Hawthorne) effects confounded the results reported in these studies.
One of major findings from the present review was that there was a lack of compliance with two of the recommendations from Holland et al. [12] namely 1) a run-in/wash out period and 2) an evaluation of the effects of DH on the participants' QoL. Only eight (22.8%) of the included studies mentioned a pre-trial run-in/wash-out period ranging from seven days to six weeks [17][18]27,31,[48][49][50][51]. A further two studies reported on a follow up period following the cessation of the intervention [31,46] with Ghasssemi et al.
[31] switching the participants from the Enamel Care group after eight weeks to the control dentifrice for a further eight weeks to determine the degree of persistence of pain reduction. Naoum, et al. [46] reported on a four-week follow-up period following the cessation of active treatment. The advantage of including a run in/wash out period before the commencement of a study is that it allows for any potential therapeutic benefits from the participants' previous desensitizing toothpaste to be minimized since the participants will be on a standardize brushing regime with a fluoride dentifrice and standardized toothbrushes. As observed from the present review there was a range of recommended wash in/wash out periods from seven days to six weeks however there does not appear to be any information as to an ideal duration for a run-in/wash period [12]. The inclusion of a period following the cessation of the intervention [31,46] may also be of importance as it determine the duration of a desensitizing dentifrice's efficacy. This observation if standardized from several studies may provide a basis for recommending the minimum duration of a run in/wash out period to nullify any carry-over effects from the previous dentifrice used by the participants. According to Gillam [11] the disadvantage of including a longer run in/wash out period would be to increase the duration and cost of running DH studies. Holland, et al. [12] also recommended in the guidelines that studies should impact of the various interventions on the participants' daily activities (Subjective response) as well as the impact on their QoL. Only eight studies appeared to include the effects of the interventions to everyday life-stimuli [19,24,34,39,[41][42][50][51] and only two studies reported the effects of the intervention on the QoL (DHEQ)[50-51] although a study by Parkinson, et al. [44] included the DHEQ in the methodology but may not have report the findings. According to Bekes and Hirsch [4] QoL research has gained increasing recognition in both Medicine and Dentistry and while this approach was previously regarded as a secondary outcome to complement biological and clinical markers of disease and as such should be included in any future recommendations for running DH studies.

Conclusions
All of the included studies reported a significant statistical reduction of pain in both the test and control groups, although none of the studies reported the complete absence of the pain response following any of the interventions at the end of the studies. Overall, most studies complied with the recommendations from the Holland, et al., guidelines [12] however, there is still a need to include both a run in/wash out and follow up periods in future studies. Furthermore, it may be recommended to include a subjective evaluation of the treatment outcome to overall sensitivity from day to day activities as well as the effect on the QoL (personcentered approach) in future studies. None of the included studies reported the complete absence of the pain response following any of the interventions at the end of the studies.