Open Access

Ultrasonography in the diagnosis of pediatric distal forearm fracture: a systematic review

 and   
Nov 08, 2024

Cite
Download Cover

Introduction

Distal forearm fractures are considered one of the commonest injuries in adults and children, due to falling on an outstretched hand(1). There are different types of fractures distinctively seen in children: torus (buckle), greenstick, complete, and epiphyseal plate fractures(2). X-ray studies are the most common diagnostic modality for suspected fractures(3). Ultrasound (US) has recently been used for the detection of fractures, with reports suggesting that it may be more sensitive than X-ray because bone acts as a natural obstacle against sound transmission at high frequencies. Furthermore, the US can analyze a region in multiple planes rather than the limited views offered by traditional radiography(4,5).

Several important sonographic findings may be associated with bone fractures, both in the emergency setting and at follow-up. These include focal disruption of the hyperechoic cortical bone, hematoma with or without discontinuity of the periosteum (subperiosteal space), edema of the soft tissues surrounding the fracture, mechanical disruption or dissociation of the growth plate, and assessment of fracture healing and different stages of bone callus formation(6). In addition, US can show abnormalities of the surrounding tissues(7)as well as bursitis and articular effusion in cases with intra-articular fractures(8).

Currently, experience in bedside ultrasound is growing amongst emergency physicians(9,10), with a relatively easy learning curve(11,12). The role of ultrasound as a gold standard screening tool is currently being investigated(13,14). An important feature in this debate is the actual diagnostic accuracy of ultrasound for detecting forearm fractures.

Methods

This systematic review was conducted in accordance with the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, while the reporting followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines(15).

Research question

In children with suspected distal forearm fractures presenting to the emergency department (ED), is bedside ultrasound as accurate as plain radiography in confirming the diagnosis?

Research aim and objectives

This systematic review aimed to determine the diagnostic accuracy of bedside US for identifying distal forearm fractures in pediatric patients; with the following objectives: a) to assess the diagnostic accuracy of bedside US for diagnosing distal forearm fractures, and b) to investigate potential sources of heterogeneity as differences across types of fractures.

Inclusion criteria for studies
Types of studies

This systematic review included observational (cohort or case-control) studies and clinical trials. Only studies published in English from the start of 1997 until the end of 2017 were included.

Participants

Children under 16 years old.

Index test

Bedside US.

Target conditions

Patients with suspected distal forearm fractures without apparent deformity, regardless of patients’ sex or fracture type.

Reference standards

Plain roentgenograms.

Exclusion criteria

Studies on the adult population, children with a non-traumatic cause of fracture, open fractures, evidence of neurovascular compromise, and angulated fractures. In addition, narrative reviews, editorials, comments, and conference abstracts were excluded.

Search strategy
Electronic searches

A literature search was performed to identify articles evaluating the accuracy of US in the detection of traumatic fractures of the distal forearm in children, without a language restriction and with no filters/limits. The following databases were searched in the main Health Service Executive (HSE) library with the assistance of a librarian: MEDLINE, CINAHL, and EMBASE, published from inception up to May 2017. Other searched sites included the Cochrane Library, Google Scholar, and Best Bets.

Other resources

The reference lists of relevant narrative reviews and retrieved studies from electronic search were screened to find other potentially relevant studies.

Selection of studies

Duplicate articles were removed from the search results. The abstracts of relevant articles were reviewed, and the studies matching the eligibility criteria were selected. Then, the full-text articles for the studies were retrieved and revised for their eligibility for inclusion in this systematic review. The process of search and study selection was performed by the first author and checked by the second author.

Data extraction

A standardized data sheet was used to extract relevant data from the selected studies. The extracted data included: (a) the studies’ country, design, duration, and the number of patients; (b) the characteristics of patients (age, sex, and type of fracture); (c) the index test used; (d) the reference standard; e) the counts of true positive (TP), true negative (TN), false positive (FP), and false-negative (FN) cases. The first author extracted the data, and the second author checked the data to ensure consistency and clarity. No blinding was used for the journal titles, authors, or institutions.

Assessment of methodological quality

Two tools are commonly used in the scientific literature for appraising the quality of studies validating diagnostic tests: the Standards for Reporting Studies of Diagnostic Accuracy (STARD)(16), covering 25 criteria, and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS)(17), involving 14 criteria. Both tools share many of the same criteria, directly or indirectly. Consequently, we used a combination of the QUADAS and STARD criteria (Tab. 1) to evaluate all selected papers. Only three were selected from the STARD criteria as being absent from the QUADAS.

Characteristics of the included studies (n = 7)

Study Study design & settings No. participants/No. fractures Inclusion criteria Exclusion criteria
Patel et al.(21) Prospective cohort Single center in the USA March 2006 through January 2007 33/34 Age: 2 through 17 Suspected radius, ulna, tibia, or fibula fractures Open fractures Neurovascular compromise Hemodynamic instability Fractures involving joints
Ackermann et al.(16) Prospective diagnostic test study Single center in Germany January 2007 to May 2008 93/77 Age: 0–12 years Suspected closed forearm fracture Open wounds or deformity >30 Neural/vascular lesions requiring immediate operation
Chaar-Alvarez et al.(18) Prospective diagnostic test study Single center in the USA October 2007 to March 2009 101/46 Age: 1–17 years Nonangulated distal forearm injuries Normal neurovascular examination distal to the injury site. Clinical forearm deformity, open forearm wound Multisystem trauma Altered mental status, developmental delay Hemodynamic instability Previous radiography Allergy to US gel Extremity pain/swelling proximal or distal to the injured forearm
Barata et al.(17) Prospective diagnostic test study Single center in the USA March 2008 to January 2009 53/43 Age <18 years Suspected long-bone fracture. History of fracture Extremity deformity or open fracture Orthopedic hardware in the traumatized area
Eckert et al.(19) Prospective diagnostic test study Single center in Germany September 2009 to August 2010 76/52 Suspected distal forearm fracture Open injuries Significant deformity Neural &/or vascular lesions.
Herren et al.(20) Prospective diagnostic test study 2 centers in Germany January to December 2012 201/104 Age: up to 11 years Pain in the forearm area following trauma Open wounds in the distal forearm Peripheral disorders of circulation Axis deviations requiring immediate reduction Pre-existing forearm deformities
Rowlands et al.(22) Prospective diagnostic study Single center in Australia November 2011 to May 2012 419/234 Age: 0–16 years History of forearm trauma Suspected fracture Open fracture Imaging performed before arrival

The QUADAS system score states that a score of 10–14 indicates high quality, while a score of 9 or below signifies low-quality studies(17). Furthermore, all papers were assessed regarding the level of evidence according to Oxford Centre for Evidence-Based Medicine (OCEBM)(18).

Data synthesis

Review Manager (RevMan Version 5.4. The Cochrane Collaboration, 2020) was used for calculating the sensitivity and specificity of bedside US (and their 95% confidence intervals [CI]) for each study. In addition, the hierarchical summary receiver operating characteristic (HSROC) curve was created using RevMan 5.4. The positive predictive value (PPV), negative predictive value (NPV), likelihood ratios, prevalence, area under the curve (AUC), and accuracy (with their 95% CI) of bedside US were calculated using MedCalc Statistical Software version 15.8 (MedCalc Software bvba, Ostend, Belgium; https://www.medcalc.org;2015).

Results

The results of the literature search, screening, and study selection are illustrated in the PRISMA 2020 flowchart (Fig. 1). The literature search found 105 results (93 from databases and 12 from other electronic resources). The results from other resources were all duplicates of those yielded by searching the Medline and Embase databases. After the removal of duplicates, the titles, and abstracts of 86 records were screened, and 65 records were excluded due to nonrelevance to the research question. The full text of the remaining 21 studies was retrieved and examined for eligibility; 14 studies did not fulfil the inclusion and exclusion criteria. Finally, seven studies were eligible to be included in this systematic review(1925).

Fig. 1.

The PRISMA flow chart diagram for the results of the literature search and study selection

Table 1 summarizes the basic characteristics of the included studies. One paper was a prospective cohort study(24), while the other six studies were prospective diagnostic test studies(1923,25). All studies were single-centered(1922,24,25), except for one study that was conducted in two centers(23). The studies were conducted in the United States (USA)(20,21,24), Germany(19,22,23), and Australia(25). The seven studies compared bedside US (index test) to standard plain X-ray (standard test).

The study by Patel et al.(24) prospectively enrolled 33 children aged two through 17 years with suspected radius, ulna, tibia, or fibula fractures, who presented from March 2006 through January 2007 to the Pediatric Emergency Department, New York. The exclusion criteria were open fractures, neurovascular compromise, and hemodynamic instability. Bedside US was performed before standard radiography by one of three pediatric emergency medicine physicians who had no previous formal training in ultrasonography. The three physicians completed – before initiating the study – a two-hour session of bedside US. The bedside US for upper extremity injuries showed a sensitivity of 1.00 (95% CI: 0.87–1.00) and a specificity of 0.91 (95% CI: 0.69–0.98). The study suggests that the US of the upper extremity may be equivalent to radiography for identifying fractures not involving the joint.

Ackermann et al.(19) conducted their study from January 2007 to May 2008 in Germany on 93 patients aged 0–12 years with a suspected forearm fracture. This was defined by a mechanism of injury consistent with forearm injury, bony tenderness, and the absence of open wounds in the forearm area. The children were first subjected to US examination, followed by radiography. The US assessors in this study were residents and consultants without special training in osteosonography. The sensitivity of US diagnosis was 94%, and the specificity was 99%, compared with X-ray diagnosis as the gold standard.

Chaar-Alvarez et al.(21) conducted their study in Palmer Children’s Hospital, Orlando, from October 2007 to March 2009. The study enrolled 101 children between the ages of one and 17 years who presented to the ED with non-angulated distal forearm injuries. Children were excluded if they had a clinical forearm deformity, open wounds, or neurovascular injuries. The US was performed by pediatric emergency doctors who had completed training in emergency ultrasonography. Standard radiography was performed afterwards. The overall diagnostic accuracy of the blinded reviewer’s US interpretation was 94% (95% CI: 88–99%). Sensitivity and specificity were 96% (95% CI: 85–99%) and 93% (95% CI: 82–98%), respectively. At the observed prevalence, the PPV and NPV were 92% and 96%, respectively. The overall diagnostic accuracy of the bedside interpretation of the US was 79% (95% CI: 70–86%), with a sensitivity of 85% (95% CI: 72–94%), a specificity of 73% (95% CI: 60–84%), a PPV of 73%, and an NPV of 85%. The inter-rater reliability (J) was 0.57 (95% CI: 0.41–0.73). The setting of the study was a large pediatric ED covered by Pediatric Emergency Medicine physicians. Therefore, the findings may not apply to other types of hospitals.

Barata et al.(20) enrolled 53 pediatric patients (under 18 years of age) who presented to the ED of a university-affiliated, level I trauma center, New York, between March 2008 and January 2009, with suspected long-bone fracture. Suspected fractures were characterized by swelling, erythema, and localized pain. Patients who had a history of fracture at the suspected site, extremity deformity, or an open fracture were excluded from this study. True blinding was applied. The sensitivity and specificity of US were 95.3% (95% CI: 82.9–99.2%) and 85.5% (95% CI: 72.8–93.1%), respectively. The PPV and NPV were 83.7% (95% CI: 68.8–92.2%) and 96% (95% CI: 84.9–99.3%), respectively.

The study by Eckert et al.(22) included 76 patients aged between one and 14 years, and presenting with suspected distal forearm fractures (defined by adequate trauma and appropriate clinical symptoms). After US examination, standard X-rays of the wrist from the anteroposterior and lateral view were taken. The training of the US assessors was not reported. All radiologically diagnosed radius fractures and all patients with no fractures were correctly diagnosed by ultrasound. Compared to X-ray, the sensitivity of the ultrasound method was 96.1%, and the specificity was 97%, with a PPV of 94.3% and an NPV of 97.9%.

Herren et al.(23) carried out a prospective study including 201 patients between four and 11 years of age who presented at two trauma surgery clinics in Germany with a presumptive diagnosis of the distal radius or forearm fracture between January and December 2012. First, US imaging of the distal forearm was carried out on six standardized planes by physicians who had undergone a short training in US-guided fracture diagnosis. Afterwards, radiographs of the wrist were taken and interpreted by attending experts in radiography. The specificity and sensitivity of ultrasound diagnosis were both 99.5%.

The study by Rowlands et al.(25) was conducted at Princess Margaret Hospital for Children, the sole tertiary pediatric hospital in Western Australia. The study included 409 children under 16 years who had a suspected fracture of the forearm. Patients with evidence of an open fracture were excluded, as well as patients who had imaging performed before arrival. The study was double-blinded to avoid bias. The results showed that physicians could diagnose forearm fractures in children with a sensitivity of 91.5% and a specificity of 87.6%.

Table 2 shows the assessment of the risk of bias using the QUADAS II and three of the STARD criteria. All studies had a QUADAS II score above 10, indicating high quality of the studies. In all the studies, the reference standard was plain X-ray, which is considered the gold standard for diagnosing fractures. The performance and interpretation of US and radiography were independent of each other. All the studies clearly described their criteria for selecting patients. The sampling process was detailed in three studies only(20,21,25). The confidence intervals for sensitivity and specificity were mentioned by three studies only(20,21,24).

Risk of bias assessment of the included studies using QUADAS II and STARD criteria (n = 7)

Patel et al.(21) Ackermann et al.(16) Chaar-Alvarez et al.(18) Barata et al.(17) Eckert et al.(19) Herren et al.(20) Rowlands et al.22
QUADAS II criteria
Was the spectrum of patients representative of the patients who will receive the test in practice? yes yes no no unclear yes no
Is the reference standard likely to correctly classify the target condition? yes yes yes yes yes yes yes
Were selection criteria clearly described? yes yes yes yes yes yes yes
Is the time period between reference standard and index test short enough to be reasonably sure that the target condition did not change between the two tests? yes unclear yes yes yes yes yes
Did the whole sample or a random selection of the sample, receive verification using a reference standard of diagnosis? yes yes yes yes unclear unclear yes
Did patients receive the same reference standard regardless of the index test result? yes yes yes yes yes yes yes
Was the execution of the index test described in sufficient detail to permit replication of the test? yes unclear yes yes yes yes unclear
Was the execution of the reference standard described in sufficient detail to permit its replication? yes unclear yes unclear yes yes unclear
Were the index test results interpreted without knowledge of the results of the reference standard? yes unclear yes unclear unclear unclear yes
Were the reference standard results interpreted without knowledge of the results of the index test? yes unclear yes unclear unclear yes yes
Was the reference standard independent of the index test? yes yes yes yes yes yes yes
Were the same clinical data available when test results were interpreted as would be available when the test is used in practice? no unclear no yes yes yes yes
Were uninterpretable/intermediate test results reported? unclear yes unclear yes unclear yes unclear
Were withdrawals from the study explained? unclear yes yes unclear unclear unclear yes
Score 12 11 11.5 11 11 12.5 11.5
STARD criteria
The sampling process is described no no yes yes no unclear yes
Sensitivity and specificity results are presented with their respective confidence intervals yes no yes yes no no no
The demographic characteristics of patients are described no no no yes no yes no

The study by Patel et al.(24) was well-powered and has many strong points. The sample size was calculated, as well as the kappa value, and the authors ensured blinding the radiologist to US results to avoid bias. Unfortunately, there was no STARD Flow Diagram to visualize the patient cohort and follow-up (QUADAS 12, level 2b).

Several limitations were also observed in the study by Ackermann et al.(19). The method of patient recruiting was not clear, and the sample size was not calculated. The study carried a high risk of bias due to a lack of sample blinding. The lack of calculation of the kappa value and confidence intervals adds more limitations to the study (QUADAS 11, level 2b).

As for the study by Chaar-Alvarez et al.(21), the limitations included the lack of a power calculation and the fact that the sample was convenient, which is a potential source of bias (QUADAS 11.5, level 2b).

In the study by Barata et al.(20), the inclusion of confidence intervals in the results and blinding increased the internal validity of the study. However, the small sample size with no power calculations and a single site may have affected its internal as well as external validity. Furthermore, the lack of a STARD diagram in the study makes it difficult to interpret the findings (QUADAS 11, level 2b).

Many limitations were noted in the study by Ecker et al.(22). The method of subject recruitment was not clear, and the sample size was not calculated. It was not clear from the study who performed the US and where it was done. Moreover, no tables or even STARD diagrams were prepared to add value to the study. Finally, the results were limited to sensitivity and specificity, without confidence intervals (QUADAS 11, level 2b).

In the study by Herren et al.(23), the residents who evaluated the US were not blinded to any information about the child. The attending experts in radiography who interpreted the X-rays were blinded to the US results. The physicians who performed US diagnoses were not experts in the US, but they had undergone a short training in US-guided fracture diagnosis. However, inter-observer variability was not determined. In addition, the author did not provide a true double-blind method for the analysis of both procedures, which put him at risk of diagnostic and test review bias (QUADAS 12.5, level 2b).

The study by Rowlands et al.(25) shows potential for bias in the selection of patients, given that the recruitment used prospective convenience sampling. Furthermore, the results of the primary outcome were represented as sensitivity and specificity, without confidence intervals or any other values such as likelihood ratios (QUADAS 11.5, level 2b).

The diagnostic performance of the US in the included studies is summarized in Tab. 3 and Fig. 2 and Fig. 3. The overall accuracy of the US in diagnosing fractures ranged from 99.5% to 78.6%. The sensitivity and specificity ranged from 85% to 100%, and from 73% to 100%, respectively. These findings suggest that the US can be used to diagnose forearm fractures in children with sufficient accuracy. The HSROC curve (Fig. 3) shows that the AUC for the US ranged from 0.79 to 1.00, with the study by Chaar-Alvarez et al.(21) showing a lower AUC than the other six studies.

Fig. 2.

Forest plot showing the sensitivity and specificity of the included studies. CI – confidence interval; FN – false negative; FP – false positive; TN – true negative; TP – true positive

Fig. 3.

Hierarchical summary receiver operating characteristics (HSROC) curve for the diagnostic performance of bedside ultrasonography in the included studies

Diagnostic performance of bedside ultrasound in the included studies (n = 7)

Study TP FP TN Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI) LR+ (95% CI) LR-(95% CI) AUC (95% CI) Prevalence % (95% CI) Accuracy % (95% CI)
Patel et al.(21) 34 2 20 1.00 (0.90–1.00) 0.91 (0.71–0.99) 0.94 (0.81–0.99) 1.00 (0.83–1.00) 11.00 (2.93–41.2) 0 0.95 (0.86–0.99) 60.7 (46.8–73.5) 96.4 (87.7–99.6)
Ackermann et al.(16) 72 0 16 0.94 (0.85–0.98) 1.00 (0.79–1.00) 1.00 (0.95–1.00) 0.76 (0.53–0.92) 0.06 (0.03–0.15) 0.97 (0.91–0.99) 82.8 (73.6–89.8) 94.6 (87.9–98.2)
Chaar-Alvarez et al.(18) 40 15 41 0.85 (0.72–0.94) 0.73 (0.60–0.84) 0.72 (0.59–0.84) 0.85 (0.72–0.94) 3.18 (2.03–4.98) 0.20 (0.10–0.41) 0.79 (0.70–0.87) 45.6 (35.8–55.7) 78.6 (69.5–86.1)
Barata et al.(17) 41 8 47 0.95 (0.84–0.99) 0.86 (0.73–0.94) 0.84 (0.70–0.93) 0.96 (0.86–1.00) 6.56 (3.44–12.48) 0.05 (0.01–0.21) 0.90 (0.83–0.95) 43.9 (33.9–54.3) 89.8 (82.0–95.0)
Eckert et al.(19) 50 1 24 0.96 (0.87–1.00) 0.96 (0.80–1.00) 0.98 (0.90–1.00) 0.92 (0.75–0.99) 24.04 (3.52–164.16) 0.04 (0.01–0.16) 0.96 (0.89–0.99) 67.5 (55.9–77.8) 96.1 (89.0–99.2)
Herren et al. 103 0 97 0.99 (0.95–1.00) 1.00 (0.96–1.00) 1.00 (0.97–1.00) 0.99 (0.95–1.00) 0.01 (0.00–0.07) 1.00 (0.97–1.00) 51.7 (44.6–58.8) 99.5 (97.3–100.0)
Rowlands et al.(22) 214 23 162 0.91 (0.87–0.95) 0.88 (0.82–0.92) 0.90 (0.86–0.94) 0.89 (0.84–0.93) 7.36 (5.01–10.80) 0.10 (0.06–0.15) 0.90 (0.86–0.92) 55.9 (51.0–60.7) 89.7 (86.4–92.5)

AUC – area under the curve; CI – confidence interval; FN – false negative; FP – false positive; LR- – negative likelihood ratio: LR+ – positive likelihood ratio; NPV – negative predictive value; PPV – positive predictive value; TN – true negative; TP – true positive Confidence intervals for sensitivity, specificity and accuracy are “exact” Clopper-Pearson confidence intervals;

Confidence intervals for the likelihood ratios are calculated using the Log method; Confidence intervals for the predictive values are the standard logit confidence intervals

Discussion

Ultrasound imaging has several advantages, including the absence of ionizing radiation besides the ability to carry out comparative scanning between the pathological and healthy sides in doubtful cases and to perform ultrasound palpation of the painful site with the transducer. In addition, it is cost-effective and can be repeated several times during the first hours after trauma to assess possible local complications, such as bleeding around the fracture site. All these advantages, together with its availability in primary care and in low-resource rural areas, make it an invaluable tool for the assessment and follow-up of bone fractures(6).

Therefore, this review was conducted to summarize the evidence regarding the diagnostic accuracy of bedside US for identifying distal forearm fractures in pediatric patients.

All studies are prospective cohorts comparing a new diagnostic test (the US scan for fracture detection) against a gold standard of X-ray, with only two of them(21,25) comparing pain associated with clinical examination, US, and X-ray. Appropriately, all the authors excluded open fractures and obvious deformity when ultrasound would not be appropriate. The quality of most studies was average to high (QUADAS 11–12.5). Despite the heterogeneity of the studies, they all show the sensitivity of the ultrasound to be high, varying from 91.5% to 100%, and specificity from 87.5% to 99.5%, and the majority presented the values with the confidence interval.

The seven studies showed some limitations which may introduce bias into their results. The populations were generally recruited as convenience samples, allowing for the introduction of selection bias. In two studies(24,25), the sample size was calculated to detect high sensitivity and specificity with a power of 80% and p-value <0.05. Although there is a wide variation in the ultrasound experience of the doctors carrying out the scans in each study, ranging from a consultant to surgeons with vast previous scanning experience, to ED doctors with 1–2 hours of US teaching(2325), inter-rater variability (kappa) was calculated only in three studies(21,24,25) and the value was between 0.57–0.79. None of the studies documented the number of patients who withdrew or declined study inclusion.

Previous systematic reviews assessed the US as a diagnostic test for pediatric fractures of the upper limb(2628). However, these systematic reviews showed some limitations. Other types of reference tests, such as computed tomography or magnetic resonance imaging, were used in the review by Joshi et al.(27). The review by Katzer et al.(28) included studies enrolling patients older than 18 years old. The eligibility criteria for selecting studies were not clearly stated by Douma-den Hamer et al.(26). The present review attempted to avoid the limitations of earlier systematic reviews and to summarize the updated evidence, as some new studies were published after the previous reviews.

Limitations

The current review did not perform pooling of the diagnostic performance analyses, as the studies varied in baseline characteristics, such as the examined bones, the training of US assessors, the patients’ age, and other inclusion criteria.

Conclusions and implications

Ultrasound is a reliable tool for the diagnosis of distal forearm fractures in children when performed by well-trained emergency doctors and through using an appropriate viewing method. Ultrasound has an advantage over X-ray in terms of being radiation-free and allowing a shorter length of stay in ED. The application of a new diagnostic imaging modality in current healthcare systems can meet with resistance at different levels. However, it should be noted that for the US, only a basic level of training and knowledge is necessary before it can be performed and used accurately in daily clinical practice. This will be achieved if fracture sonography becomes obligatory as part of the emergency medicine training program conducted through the Royal College of Emergency Medicine (RCEM), in a similar fashion to the FAST scan. Finally, to keep a high standard of diagnostic pathways in place, a larger prospective blinded study on long bone injuries is recommend. This would increase the applicability and generalizability of bedside US in pediatric distal forearm fractures.

Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Medicine, Basic Medical Science, Basic Medical Science, other