Ultrasonography in the diagnosis of pediatric distal forearm fracture: a systematic review
Article Category: Research paper
Published Online: Nov 08, 2024
Page range: 1 - 8
Received: Sep 07, 2023
Accepted: Jan 17, 2024
DOI: https://doi.org/10.15557/jou.2024.0019
Keywords
© 2024 Ayman S. Ahmed et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Distal forearm fractures are considered one of the commonest injuries in adults and children, due to falling on an outstretched hand(1). There are different types of fractures distinctively seen in children: torus (buckle), greenstick, complete, and epiphyseal plate fractures(2). X-ray studies are the most common diagnostic modality for suspected fractures(3). Ultrasound (US) has recently been used for the detection of fractures, with reports suggesting that it may be more sensitive than X-ray because bone acts as a natural obstacle against sound transmission at high frequencies. Furthermore, the US can analyze a region in multiple planes rather than the limited views offered by traditional radiography(4,5).
Several important sonographic findings may be associated with bone fractures, both in the emergency setting and at follow-up. These include focal disruption of the hyperechoic cortical bone, hematoma with or without discontinuity of the periosteum (subperiosteal space), edema of the soft tissues surrounding the fracture, mechanical disruption or dissociation of the growth plate, and assessment of fracture healing and different stages of bone callus formation(6). In addition, US can show abnormalities of the surrounding tissues(7)as well as bursitis and articular effusion in cases with intra-articular fractures(8).
Currently, experience in bedside ultrasound is growing amongst emergency physicians(9,10), with a relatively easy learning curve(11,12). The role of ultrasound as a gold standard screening tool is currently being investigated(13,14). An important feature in this debate is the actual diagnostic accuracy of ultrasound for detecting forearm fractures.
This systematic review was conducted in accordance with the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, while the reporting followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines(15).
In children with suspected distal forearm fractures presenting to the emergency department (ED), is bedside ultrasound as accurate as plain radiography in confirming the diagnosis?
This systematic review aimed to determine the diagnostic accuracy of bedside US for identifying distal forearm fractures in pediatric patients; with the following objectives: a) to assess the diagnostic accuracy of bedside US for diagnosing distal forearm fractures, and b) to investigate potential sources of heterogeneity as differences across types of fractures.
This systematic review included observational (cohort or case-control) studies and clinical trials. Only studies published in English from the start of 1997 until the end of 2017 were included.
Children under 16 years old.
Bedside US.
Patients with suspected distal forearm fractures without apparent deformity, regardless of patients’ sex or fracture type.
Plain roentgenograms.
Studies on the adult population, children with a non-traumatic cause of fracture, open fractures, evidence of neurovascular compromise, and angulated fractures. In addition, narrative reviews, editorials, comments, and conference abstracts were excluded.
A literature search was performed to identify articles evaluating the accuracy of US in the detection of traumatic fractures of the distal forearm in children, without a language restriction and with no filters/limits. The following databases were searched in the main Health Service Executive (HSE) library with the assistance of a librarian: MEDLINE, CINAHL, and EMBASE, published from inception up to May 2017. Other searched sites included the Cochrane Library, Google Scholar, and Best Bets.
The reference lists of relevant narrative reviews and retrieved studies from electronic search were screened to find other potentially relevant studies.
Duplicate articles were removed from the search results. The abstracts of relevant articles were reviewed, and the studies matching the eligibility criteria were selected. Then, the full-text articles for the studies were retrieved and revised for their eligibility for inclusion in this systematic review. The process of search and study selection was performed by the first author and checked by the second author.
A standardized data sheet was used to extract relevant data from the selected studies. The extracted data included: (a) the studies’ country, design, duration, and the number of patients; (b) the characteristics of patients (age, sex, and type of fracture); (c) the index test used; (d) the reference standard; e) the counts of true positive (TP), true negative (TN), false positive (FP), and false-negative (FN) cases. The first author extracted the data, and the second author checked the data to ensure consistency and clarity. No blinding was used for the journal titles, authors, or institutions.
Two tools are commonly used in the scientific literature for appraising the quality of studies validating diagnostic tests: the Standards for Reporting Studies of Diagnostic Accuracy (STARD)(16), covering 25 criteria, and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS)(17), involving 14 criteria. Both tools share many of the same criteria, directly or indirectly. Consequently, we used a combination of the QUADAS and STARD criteria (Tab. 1) to evaluate all selected papers. Only three were selected from the STARD criteria as being absent from the QUADAS.
Characteristics of the included studies (
Study | Study design & settings | No. participants/No. fractures | Inclusion criteria | Exclusion criteria |
---|---|---|---|---|
Patel |
Prospective cohort Single center in the USA March 2006 through January 2007 | 33/34 | Age: 2 through 17 Suspected radius, ulna, tibia, or fibula fractures | Open fractures Neurovascular compromise Hemodynamic instability Fractures involving joints |
Ackermann |
Prospective diagnostic test study Single center in Germany January 2007 to May 2008 | 93/77 | Age: 0–12 years Suspected closed forearm fracture | Open wounds or deformity >30 Neural/vascular lesions requiring immediate operation |
Chaar-Alvarez |
Prospective diagnostic test study Single center in the USA October 2007 to March 2009 | 101/46 | Age: 1–17 years Nonangulated distal forearm injuries Normal neurovascular examination distal to the injury site. | Clinical forearm deformity, open forearm wound Multisystem trauma Altered mental status, developmental delay Hemodynamic instability Previous radiography Allergy to US gel Extremity pain/swelling proximal or distal to the injured forearm |
Barata |
Prospective diagnostic test study Single center in the USA March 2008 to January 2009 | 53/43 | Age <18 years Suspected long-bone fracture. | History of fracture Extremity deformity or open fracture Orthopedic hardware in the traumatized area |
Eckert |
Prospective diagnostic test study Single center in Germany September 2009 to August 2010 | 76/52 | Suspected distal forearm fracture | Open injuries Significant deformity Neural &/or vascular lesions. |
Herren |
Prospective diagnostic test study 2 centers in Germany January to December 2012 | 201/104 | Age: up to 11 years Pain in the forearm area following trauma | Open wounds in the distal forearm Peripheral disorders of circulation Axis deviations requiring immediate reduction Pre-existing forearm deformities |
Rowlands |
Prospective diagnostic study Single center in Australia November 2011 to May 2012 | 419/234 | Age: 0–16 years History of forearm trauma Suspected fracture | Open fracture Imaging performed before arrival |
The QUADAS system score states that a score of 10–14 indicates high quality, while a score of 9 or below signifies low-quality studies(17). Furthermore, all papers were assessed regarding the level of evidence according to Oxford Centre for Evidence-Based Medicine (OCEBM)(18).
Review Manager (RevMan Version 5.4. The Cochrane Collaboration, 2020) was used for calculating the sensitivity and specificity of bedside US (and their 95% confidence intervals [CI]) for each study. In addition, the hierarchical summary receiver operating characteristic (HSROC) curve was created using RevMan 5.4. The positive predictive value (PPV), negative predictive value (NPV), likelihood ratios, prevalence, area under the curve (AUC), and accuracy (with their 95% CI) of bedside US were calculated using MedCalc Statistical Software version 15.8 (MedCalc Software bvba, Ostend, Belgium;
The results of the literature search, screening, and study selection are illustrated in the PRISMA 2020 flowchart (Fig. 1). The literature search found 105 results (93 from databases and 12 from other electronic resources). The results from other resources were all duplicates of those yielded by searching the Medline and Embase databases. After the removal of duplicates, the titles, and abstracts of 86 records were screened, and 65 records were excluded due to nonrelevance to the research question. The full text of the remaining 21 studies was retrieved and examined for eligibility; 14 studies did not fulfil the inclusion and exclusion criteria. Finally, seven studies were eligible to be included in this systematic review(19–25).

Table 1 summarizes the basic characteristics of the included studies. One paper was a prospective cohort study(24), while the other six studies were prospective diagnostic test studies(19–23,25). All studies were single-centered(19–22,24,25), except for one study that was conducted in two centers(23). The studies were conducted in the United States (USA)(20,21,24), Germany(19,22,23), and Australia(25). The seven studies compared bedside US (index test) to standard plain X-ray (standard test).
The study by Patel
Ackermann
Chaar-Alvarez
Barata
The study by Eckert
Herren
The study by Rowlands
Table 2 shows the assessment of the risk of bias using the QUADAS II and three of the STARD criteria. All studies had a QUADAS II score above 10, indicating high quality of the studies. In all the studies, the reference standard was plain X-ray, which is considered the gold standard for diagnosing fractures. The performance and interpretation of US and radiography were independent of each other. All the studies clearly described their criteria for selecting patients. The sampling process was detailed in three studies only(20,21,25). The confidence intervals for sensitivity and specificity were mentioned by three studies only(20,21,24).
Risk of bias assessment of the included studies using QUADAS II and STARD criteria (
Patel |
Ackermann |
Chaar-Alvarez |
Barata |
Eckert |
Herren |
Rowlands |
|
---|---|---|---|---|---|---|---|
Was the spectrum of patients representative of the patients who will receive the test in practice? | yes | yes | no | no | unclear | yes | no |
Is the reference standard likely to correctly classify the target condition? | yes | yes | yes | yes | yes | yes | yes |
Were selection criteria clearly described? | yes | yes | yes | yes | yes | yes | yes |
Is the time period between reference standard and index test short enough to be reasonably sure that the target condition did not change between the two tests? | yes | unclear | yes | yes | yes | yes | yes |
Did the whole sample or a random selection of the sample, receive verification using a reference standard of diagnosis? | yes | yes | yes | yes | unclear | unclear | yes |
Did patients receive the same reference standard regardless of the index test result? | yes | yes | yes | yes | yes | yes | yes |
Was the execution of the index test described in sufficient detail to permit replication of the test? | yes | unclear | yes | yes | yes | yes | unclear |
Was the execution of the reference standard described in sufficient detail to permit its replication? | yes | unclear | yes | unclear | yes | yes | unclear |
Were the index test results interpreted without knowledge of the results of the reference standard? | yes | unclear | yes | unclear | unclear | unclear | yes |
Were the reference standard results interpreted without knowledge of the results of the index test? | yes | unclear | yes | unclear | unclear | yes | yes |
Was the reference standard independent of the index test? | yes | yes | yes | yes | yes | yes | yes |
Were the same clinical data available when test results were interpreted as would be available when the test is used in practice? | no | unclear | no | yes | yes | yes | yes |
Were uninterpretable/intermediate test results reported? | unclear | yes | unclear | yes | unclear | yes | unclear |
Were withdrawals from the study explained? | unclear | yes | yes | unclear | unclear | unclear | yes |
Score | 12 | 11 | 11.5 | 11 | 11 | 12.5 | 11.5 |
The sampling process is described | no | no | yes | yes | no | unclear | yes |
Sensitivity and specificity results are presented with their respective confidence intervals | yes | no | yes | yes | no | no | no |
The demographic characteristics of patients are described | no | no | no | yes | no | yes | no |
The study by Patel
Several limitations were also observed in the study by Ackermann
As for the study by Chaar-Alvarez
In the study by Barata
Many limitations were noted in the study by Ecker
In the study by Herren
The study by Rowlands
The diagnostic performance of the US in the included studies is summarized in Tab. 3 and Fig. 2 and Fig. 3. The overall accuracy of the US in diagnosing fractures ranged from 99.5% to 78.6%. The sensitivity and specificity ranged from 85% to 100%, and from 73% to 100%, respectively. These findings suggest that the US can be used to diagnose forearm fractures in children with sufficient accuracy. The HSROC curve (Fig. 3) shows that the AUC for the US ranged from 0.79 to 1.00, with the study by Chaar-Alvarez


Diagnostic performance of bedside ultrasound in the included studies (
Study | TP | FP | TN | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | LR+ (95% CI) | LR-(95% CI) | AUC (95% CI) | Prevalence % (95% CI) | Accuracy % (95% CI) |
Patel |
34 | 2 | 20 | 1.00 (0.90–1.00) | 0.91 (0.71–0.99) | 0.94 (0.81–0.99) | 1.00 (0.83–1.00) | 11.00 (2.93–41.2) | 0 | 0.95 (0.86–0.99) | 60.7 (46.8–73.5) | 96.4 (87.7–99.6) |
Ackermann |
72 | 0 | 16 | 0.94 (0.85–0.98) | 1.00 (0.79–1.00) | 1.00 (0.95–1.00) | 0.76 (0.53–0.92) | 0.06 (0.03–0.15) | 0.97 (0.91–0.99) | 82.8 (73.6–89.8) | 94.6 (87.9–98.2) | |
Chaar-Alvarez |
40 | 15 | 41 | 0.85 (0.72–0.94) | 0.73 (0.60–0.84) | 0.72 (0.59–0.84) | 0.85 (0.72–0.94) | 3.18 (2.03–4.98) | 0.20 (0.10–0.41) | 0.79 (0.70–0.87) | 45.6 (35.8–55.7) | 78.6 (69.5–86.1) |
Barata |
41 | 8 | 47 | 0.95 (0.84–0.99) | 0.86 (0.73–0.94) | 0.84 (0.70–0.93) | 0.96 (0.86–1.00) | 6.56 (3.44–12.48) | 0.05 (0.01–0.21) | 0.90 (0.83–0.95) | 43.9 (33.9–54.3) | 89.8 (82.0–95.0) |
Eckert |
50 | 1 | 24 | 0.96 (0.87–1.00) | 0.96 (0.80–1.00) | 0.98 (0.90–1.00) | 0.92 (0.75–0.99) | 24.04 (3.52–164.16) | 0.04 (0.01–0.16) | 0.96 (0.89–0.99) | 67.5 (55.9–77.8) | 96.1 (89.0–99.2) |
Herren |
103 | 0 | 97 | 0.99 (0.95–1.00) | 1.00 (0.96–1.00) | 1.00 (0.97–1.00) | 0.99 (0.95–1.00) | 0.01 (0.00–0.07) | 1.00 (0.97–1.00) | 51.7 (44.6–58.8) | 99.5 (97.3–100.0) | |
Rowlands |
214 | 23 | 162 | 0.91 (0.87–0.95) | 0.88 (0.82–0.92) | 0.90 (0.86–0.94) | 0.89 (0.84–0.93) | 7.36 (5.01–10.80) | 0.10 (0.06–0.15) | 0.90 (0.86–0.92) | 55.9 (51.0–60.7) | 89.7 (86.4–92.5) |
AUC – area under the curve; CI – confidence interval; FN – false negative; FP – false positive; LR- – negative likelihood ratio: LR+ – positive likelihood ratio; NPV – negative predictive value; PPV – positive predictive value; TN – true negative; TP – true positive Confidence intervals for sensitivity, specificity and accuracy are “exact” Clopper-Pearson confidence intervals;
Confidence intervals for the likelihood ratios are calculated using the Log method; Confidence intervals for the predictive values are the standard logit confidence intervals
Ultrasound imaging has several advantages, including the absence of ionizing radiation besides the ability to carry out comparative scanning between the pathological and healthy sides in doubtful cases and to perform ultrasound palpation of the painful site with the transducer. In addition, it is cost-effective and can be repeated several times during the first hours after trauma to assess possible local complications, such as bleeding around the fracture site. All these advantages, together with its availability in primary care and in low-resource rural areas, make it an invaluable tool for the assessment and follow-up of bone fractures(6).
Therefore, this review was conducted to summarize the evidence regarding the diagnostic accuracy of bedside US for identifying distal forearm fractures in pediatric patients.
All studies are prospective cohorts comparing a new diagnostic test (the US scan for fracture detection) against a gold standard of X-ray, with only two of them(21,25) comparing pain associated with clinical examination, US, and X-ray. Appropriately, all the authors excluded open fractures and obvious deformity when ultrasound would not be appropriate. The quality of most studies was average to high (QUADAS 11–12.5). Despite the heterogeneity of the studies, they all show the sensitivity of the ultrasound to be high, varying from 91.5% to 100%, and specificity from 87.5% to 99.5%, and the majority presented the values with the confidence interval.
The seven studies showed some limitations which may introduce bias into their results. The populations were generally recruited as convenience samples, allowing for the introduction of selection bias. In two studies(24,25), the sample size was calculated to detect high sensitivity and specificity with a power of 80% and p-value <0.05. Although there is a wide variation in the ultrasound experience of the doctors carrying out the scans in each study, ranging from a consultant to surgeons with vast previous scanning experience, to ED doctors with 1–2 hours of US teaching(23–25), inter-rater variability (kappa) was calculated only in three studies(21,24,25) and the value was between 0.57–0.79. None of the studies documented the number of patients who withdrew or declined study inclusion.
Previous systematic reviews assessed the US as a diagnostic test for pediatric fractures of the upper limb(26–28). However, these systematic reviews showed some limitations. Other types of reference tests, such as computed tomography or magnetic resonance imaging, were used in the review by Joshi
The current review did not perform pooling of the diagnostic performance analyses, as the studies varied in baseline characteristics, such as the examined bones, the training of US assessors, the patients’ age, and other inclusion criteria.
Ultrasound is a reliable tool for the diagnosis of distal forearm fractures in children when performed by well-trained emergency doctors and through using an appropriate viewing method. Ultrasound has an advantage over X-ray in terms of being radiation-free and allowing a shorter length of stay in ED. The application of a new diagnostic imaging modality in current healthcare systems can meet with resistance at different levels. However, it should be noted that for the US, only a basic level of training and knowledge is necessary before it can be performed and used accurately in daily clinical practice. This will be achieved if fracture sonography becomes obligatory as part of the emergency medicine training program conducted through the Royal College of Emergency Medicine (RCEM), in a similar fashion to the FAST scan. Finally, to keep a high standard of diagnostic pathways in place, a larger prospective blinded study on long bone injuries is recommend. This would increase the applicability and generalizability of bedside US in pediatric distal forearm fractures.