Open Access

Describing Serbian Hospital Activity Using Australian Refined Diagnosis Related Groups: A Case Study in Vojvodina Province


Cite

INTRODUCTION

Despite decades of transition, Serbia has maintained an archaic organisation of the health system inherited from the former Yugoslavia, with national social health insurance, universal coverage, public hospitals and physicians as state employees (1). The National Health Insurance Fund collects insurance revenues and distributes them to providers according to a contract between the Fund and providers. An integral part of the contract is a business plan with the planned amount of all services based on previous-year inputs rather than service delivery. Therefore, hospitals are incited to gradually increase inputs through increasing the bed occupancy rate, performing unnecessary procedures, and employing new staff, rather than focusing on the results or the quality of care.

To increase the efficiency of hospital healthcare, the World Bank has recommended the introduction of a prospective payment system based on diagnostic-related groups (DRGs) as one of the priorities in the reform of public finances (2). The DRG is a case-mix system created at Yale University for classifying hospital episodes in groups that are relatively homogeneous with respect to resource use and clinical conditions (3). Similar to other European countries, the main arguments for the implementation of the DRG system were to improve transparency and efficiency without deterioration in the quality of care (4, 5). The system improves transparency by condensing the confusingly large number of individual patients and thousands of procedures into a manageable number of groups. This allows for group analyses for comparison between states, regions, hospitals and departments (4).

DRG systems are adopted as a reimbursement base for a prospective payment system (5, 6). The prospective payment system incites hospitals to improve efficiency, to limit the services per patient, to treat more patients and to produce sufficient services meeting patient needs. The Australian redefined version of DRG (AR-DRG 6.0) was chosen as a case-mix model for implementation in Serbia (7). Different versions of AR-DRG, in original or in modified form, have been previously implemented in Germany, Romania, Slovenia, Croatia, Ireland, New Zealand etc. (8).

The DRG-based payment system consists of the classification system and the payment formula that is a product of hospital base rate and relative weights adjusted for outliers (9). The method for determining outliers is an integral element of any payment system and can be as important as the patient classification system itself (10). Outliers are essential because they can lead healthcare providers to expensive losses. The methods for handling outliers and weights are specific for every country or even providers according to the distribution of costs, the length of stay and the type of hospital (i.e. level of care) in the data sample (11). In general, there are two types of methods for detecting outliers: parametric and non-parametric (11). Parametric methods are based on a normal distribution of episodes around arithmetic mean, whereas non-parametric are based on inter-quartile range. Standing procedure before the implementation of a certain DRG scheme involves testing several classifications during the pilot stage and subsequently adopting the most effective one according to relevant statistical parameters (12).

The study aims to describe hospital activity using the AR-DRG, to examine case-mix performances using relevant statistics and to estimate the data quality in one Serbian province. Findings could be generalised at a national level. Results will provide precious information for the creators of the new hospital payment system.

METHODS

The autonomous province Vojvodina is located in the northern part of Serbia, with a population of 1.9 million and a total area of 21,506 square kilometres (13). Demographic and clinical data has been obtained from five university hospitals and nine general hospitals in 2016. Patients with severe conditions were transported from general to university hospitals. All hospitals might be considered as part of a complete system.

The National Institute of Public Health collected hospital records that contain age, gender, admission and discharge date, discharge status, birthweight for new-borns, the principal diagnosis, secondary diagnoses and performed procedures. The International Classification of Diseases (ICD) 10th revision and the Australian Classification of Health Interventions (ACHI) 7th edition were used for coding diagnoses and procedures. The collected data was input into computer-based software named “grouper” created by Laeta (Laeta Pty Ltd, Randwick, New South Wales, Australia), which classifies patients into AR-DRG groups according to the algorithm (7).

The AR-DRG 6.0 classification system contains 698 groups with unique alphanumeric codes classified into 23 major diagnostic categories (MDCs). The AR-DRG version 6.x definitions manual contains full names for DRGs and MDCs (7). The first character in the DRGs code refers to the major diagnostic category. According to the next two characters, all groups can be separated into “surgical DRGs” (from 01 to 39), “medical DRSs” (from 60 to 99) and “other DRSs” (from 40 to 59). The last character represents the resource consumption. Cases with higher and variable costs are grouped into a pre-MDC category. Error cases are assigned to error DRGs (960Z, 962Z and 963Z). Groups 801A, 801B and 801C, contain operating-room procedures unrelated to the principal diagnosis.

Trimming is a method of identifying cases based on the length of stay (LOS). LOS is calculated as a difference between admission and discharge date. All cases between lower and upper threshold are determined as inliers, whereas cases out of range are determined as outliers. Three trimming methods were used: L3H3, IQR and 10th–95th percentile method. The L3H3 method is based on the average length of stay (ALOS) of each DRG. In such a method, a lower threshold (L3) is ALOS divided by three, whereas an upper threshold (H3) is ALOS multiplied by three. In the interquartile range (IQR) method, the lower threshold is calculated by equation Q1-1.5*IQR, whereas the upper threshold as Q3+1.5*IQR. The IQR is a difference between the first quartile (Q1) and the third quartile (Q3) of the distribution of LOS. In the final, 10th–95th method, the lower threshold being equal to the 10th percentile of LOS and the upper threshold to the 95th percentile of LOS.

The coefficient of variation (CV) measures variation in LOS for each DRG, as the ratio of the standard deviation to the mean. The coefficient of variation below 100% reflects acceptable within-group homogeneity.

The total sum of squares (SST) of the LOS is defined as the sum of the squared deviations of each observation from the mean of all observations (6). The error sum of squares (SSE) is defined as the sum of squared deviations of each observation from the mean of the group into which the observation has been classified. The difference between SST and SSE is the regression sum of squares (SSR). SSR is a variation between the mean of each group and the mean of the observed dataset. The ratio of SSR to SST provides a measure of the reduction in variation (RIV) measured with the coefficient of multiple determination (R2). R2 represents the fraction of variation in LOS explained by the DRG. In other words, R2 is a summary measure of the extent to which the DRG system can predict the value of an outcome variable based on the characteristics of individual patients. R2 ranges between zero and one. The coefficient of multiple determination takes maximum value only if the number of hospital episodes and number of DRG groups are equal. Since the number of groups and sample size affect R2, adjusted R2 was used (14). The statistical significance of adjusted R2 can be measured using F statistic (15).

RESULTS

Data was obtained from 14 hospitals in Vojvodina province containing 246,131 hospital discharges with 1,651,913 inpatient days. Women accounted for 56% of all episodes. The average age of hospital patient was 50 years. University hospitals recorded 100,334 (40.8%) discharges, while general hospitals recorded 145,797 (59.2%) discharges.

A classification of all discharges using AR-DRG version 6.0 resulted in 652 discrete DRG groups, of which 8.3% had five episodes or less. There were 333 “medical”, 280 “surgical” and 39 “other DRGs”, accounting for 71.7%, 25.6% and 2.7% of discharges, respectively. R63Z (chemotherapy) accounted for the majority of inpatient discharges (4.9 per cent) with ALOS of 1.3 days. E71B (respiratory neoplasms without catastrophic complication and comorbidity) accounted for the majority of bed days (2.4 per cent) with ALOS of 7.9 days. The 20 highest volume DRGs accounted for 34.1% of discharges, while 142 highest volume DRGs accounted for 79.8% of discharges (Table 1).

The highest volume DRGs with the average length of stay and the coefficient of variation for untrimmed data in Vojvodina province during 2016.

Diagnostic related groupALOS% of total episodes (%)CV
R63Z1.314.998%
O60Z4.433.669%
G60B3.323.0195%
E71B7.852.0126%
Z64A7.111.9139%
O66Z3.391.8102%
J62B2.841.7233%
J11Z2.261.7172%
R61C1.001.60%
N09Z1.771.5154%
O01B6.131.348%
G10B4.291.170%
961Z10.681.1101%
D11Z2.941.171%
G67B4.691.083%
G70B4.441.0104%
C16Z2.521.087%
Q61B3.420.9178%
J62A3.570.9186%
K60B6.480.987%

Approximately 1.1% of episodes were identified as errors into three DRGs. 961Z (unacceptable principal diagnosis), 963Z (neonatal diagnosis not consistent with age and/or weight), 960Z (ungroupable) accounted for 95.8%, 2.2% and 1.9% of errors, respectively.

Episodes with operating-room procedures unrelated to principal diagnosis were grouped into 801A, 801B and 801C DRGs. 801C (operating room procedures unrelated to the principal diagnosis without catastrophic complication and comorbidity) accounted for 61.6% of unrelated episodes, 801B (operating room procedures unrelated to the principal diagnosis with severe or moderate complication and comorbidity) for 23.9% and 801A (operating room procedures unrelated to the principal diagnosis with catastrophic complication and comorbidity) for 14.5% of unrelated episodes.

MDC 6 (diseases and disorders of the digestive system) was the highest volume MDC, accounting for 11.4% of the total number of episodes. The five highest volume MDCs accounted for 46.3% of the total number of episodes, whereas twelve MDCs accounted for 80.5% (Table 2). The average length of stay was 6.71 days (95CI 6.67–6.75) and the median value was 4 days (Table 3). “Day-cases”, which did not require an overnight stay in the hospital, accounted for 21.4% of total episodes. R63Z with 10,696 episodes accounted for the majority of “day-cases”. After excluded “day-cases”, ALOS increased to 8.24 days (95CI 8.19–8.29) whereas the median value remained 4 days. There were 7,161 (2.9%) episodes lasting more than 28 days defined as “prolonged hospitalisation”. Among “prolonged hospitalisation”, 1,368 (19.1%) were classified in MDC 19 (mental diseases and disorders). Without “prolonged hospitalisation” ALOS dropped down to 5.56 days (95CI 5.54–5.58) with the median value of 3. ALOS amounted to 5.98 days (95CI 5.96–6.00) with the median value of 4 days, excluding “day-cases” and “prolonged hospitalisation” altogether.

Variance explained (adjusted R2) for the length of stay and percentage of outliers for untrimmed and trimmed data in Vojvodina province during 2016.

Major diagnostic categoryUntrimmed dataL3H3IQR10th–95th
Adjusted R2% of total episodes (%)% of day cases within MDC (%)Adjusted R2% of outliers within MDC (%)Adjusted R2% of outliers within MDC (%)Adjusted R2% of outliers within MDC (%)
Pre-MDC0.13<110.31230.3150.3311
MDC 010.13560.38220.2760.2810
MDC 020.112140.46180.2990.235
MDC 030.15440.33130.2370.257
MDC 040.136130.31290.2350.269
MDC 050.14950.27200.2550.299
MDC 060.1811130.47300.4770.477
MDC 070.17370.41240.3250.3410
MDC 080.215130.44330.3450.378
MDC 090.158410.62220.53110.445
MDC 100.163190.45310.3570.326
MDC 110.375270.75340.6870.696
MDC 120.102270.46370.57100.466
MDC 130.195370.59210.4980.415
MDC 140.16890.30100.3770.366
MDC 150.372110.65210.5430.527
MDC 160.102430.40570.37110.255
MDC 170.2710770.71130.3770.352
MDC 180.12140.34180.2150.2010
MDC 190.09120.18280.1650.1813
MDC 200.11190.31440.33100.3210
MDC 210.161210.49380.3970.297
MDC 220.26<120.68330.3740.4311
MDC 230.194170.56250.5260.495
Unrelated DRGs0.13160.29300.2460.298
Error DRGs0.01180.06370.0210.015
Overall DRGs0.30100210.61240.4970.517

The average length of stay and percentage of outliers in university and general hospitals for untrimmed and trimmed data in Vojvodina province during 2016.

Trimming methodALOSALOS (university hosp.)ALOS (general hosp.)% of outliers (university hosp.) (%)% of outliers (general hosp.) (%)
Untrimmed data6.716.886.59--
L3H37.026.897.092424
IQR5.655.565.7186
10th–95th5.785.735.8286

For untrimmed data, the highest ALOS was 73.3 days for L02A (operative insertion of peritoneal catheter for dialysis with catastrophic or severe complication and comorbidity). Among DRGs with more than five episodes, the highest ALOS was 52.4 days for P62Z (neonate, admission weight 750–999 g) with seven episodes.

ALOS for “surgical”, “medical” and “other DRGs” was 5.94, 6.98 and 6.84 days, respectively.

The average length of stay seen in university hospitals (6.88 days) was higher than in general hospitals (6.59 days) (Table 3). Among the DRGs seen in both types of hospitals, 282 DRGs had higher ALOS in university examples, whereas 244 DRGs had higher ALOS in general examples.

Outliers accounted for 24% (H3L3 method), 7% (IQR) and 7% (10th–95th) of total episodes, covering 20.6% (H3L3), 21.7% (IQR) and 19.7% (10th–95th) of total bed days (Table 2). The H3L3 method increased ALOS to 7.02 (Table 3) with the maximum ALOS of 105.0 days for L02A group. Within DRGs with more than five episodes, the maximum ALOS was 44.25 days for P63Z (neonate, admission weight 1,000–1,249 g without significant operating room procedure) with 16 episodes.

There were 27 DRGs with CV below 20%, 70 DRGs with CV below 50%, and 441 DRGs with CV below 100% that accounted for 2.7%, 5.9% and 53.6% of total episodes, respectively. G65A (gastrointestinal obstruction with catastrophic or severe complication and comorbidity) had the highest CV of 341%. Nine more groups had CV greater than 200%. In the group of highest volume DRGs, nine groups had CV below 100% (Table 1). Trimming increased homogeneity to 91%, 95% and 100% of DRGs with CV below 100%, also reducing maxim CV in the dataset (Table 4). Maximum variation after the H3L3 trimming method was 96% for L02A. For the IQR method, the highest value for CV was 158% for I79A (pathological fracture with catastrophic complication and comorbidity), whereas for the 10th–95th method this was 153% for V60Z (alcohol intoxication and withdrawal).

The total number of DRGs and the number of DRGs with CV <100% in trimmed and untrimmed data in Vojvodina province during 2016.

Trimming methodTotal number DRGs *Number of DRGs with CV <100%% of total episodes CV <100% (%)
Untrimmed data63944153.6
L3H3634634100.0
IQR63960794.1
10th–95th61856283.1

Number of DRGs after excluding DRGs with one episode

Adjusted R2 for untrimmed data was 0.30 (Table 2). Exclusion of “day-cases” decreased adjusted R2 to 0.27, whereas exclusion of “prolonged hospitalisation” increased adjusted R2 to 0.36. Adjusted R2 for untrimmed data was 0.30 after exclusion of “day-cases” and “prolonged hospitalisation” together. Data trimming increased adjusted R2 to 0.61 (L3H3 method), 0.49 (IQR) and 0.51 (10th–95th percentile) (Table 2). The L3H3 trimming method resulted in the maximum adjusted R2, as well as the greatest number of outliers. MDC 16 (diseases and disorders of the blood and blood-forming organs) had the highest proportion of outliers for H3L3 and IQR, with 57% and 11% of episodes (Table 2). Even after trimming, adjusted R2 for some MDCs remained relatively low (less than 0.25). The lowest adjusted R2, except for errors, was for MDC 19, in both untrimmed and trimmed data (Table 2). MDC 11 (diseases and disorders of the kidney and urinary tract) had the maximum value for adjusted R2. The most significant improvement in adjusted R2 compare to untrimmed value was for MDC 12 (diseases and disorders of the male reproductive system).

Trimming improved adjusted R2 for medical rather than for “surgical DRGs”. However, the adjusted R2 for “surgical DRGs” remained above values for “medical” and for “other DRGs” after all trimming methods (Table 5).

Variance explained (adjusted R2) and the percentage of outliers within medical, surgical and others DRGs in Vojvodina province during 2016.

Trimming methodMedical DRGsSurgical DRGsOthers DRGs
Adjusted R2% of outliers (%)Adjusted R2% of outliers (%)Adjusted R2% of outliers (%)
Untrimmed data0.28-0.38-0.22-
L3H30.60290.62110.4716
IQR0.4770.5880.396
10th–95th0.5060.6280.398
DISCUSSION

The average length of stay is a standard measure of hospital activity (15). According to Eurostat, Serbian ALOS was 9.5, being among the highest in Europe (16). Some of the reasons for prolonged hospitalization are the inadequate planning of admissions and discharges, duplicate procedures to fulfil the annual plan, as well as the shortage of mental health and palliative care community centres, lack of within-hospital coordination and archaic definition of daily cases. Therefore, the second volume group and routine procedure O60Z-Vaginal delivery had ALOS of almost five days; or the C16Z-lens procedures that are usually performed during daily cases had ALOS of almost three days.

The proportion of outlier cases is a measure of classification effectiveness (3). A less effective case-mix will detect more outliers, whereas a more effective classification will allow outliers to be assigned to inliers.

The most common trimming method in Australia was H3L3 method (17). Understandable and easily computable, this method was accepted at the beginning of AR-DRG implementation in numerous countries. The H3L3 method is based on the assumption of the normal distribution of LOS. However, the distribution of LOS is right-skewed, and the arithmetic mean might become misleading. In additional to skew, with the median higher than mean, prolonged hospitalizations pull the ALOS more to the right. In this research, the percentage of prolonged episodes was 3% of episodes, in contrast with 1% in Ireland (18). The percentile method is also very sensitive to skewed data (19). Aforementioned statistical inconveniences support some non-parametric methods based on the IQR and median value as preferable methods in this stage of implementation (20, 21). The results of the IQR method were satisfactory, classifying 7% of cases as outliers with only 5% of groups with CV above 100% and ALOS of 5.65 days.

The evidence from literature suggests great variability in the proportion of outliers, depending on the algorithm, method (parametric or non-parametric, based on LOS or cost), prior experiences etc. (22). Outliers in Ireland, Germany, Austria and France accounted for 6%, 22%, about 14% and less than 1% of total episodes, respectively (22). According to the proposal from the US, the accepting proportion of outliers should be below 10% (3). Such a proportion was reached by two of three appllied trimming methods in this case study and closed to the acceptable ratio suggested by Professor Fedler. Fedler highlighted that optimal threshold depends on providers and their willingness to take risk (23). Such risk differs between US and European hospitals. If hospitals are risk averse, the optimal threshold should be higher and with no more than 5% of total cases beyond the upper threshold; whereas different rules should be applied on the lower threshold (in an email from Fedler S, in September 2019). As outliers are somewhat inevitable, a kind of surcharge is necessary. The common surcharge for a long-stay outlier depends on the number of hospital days beyond the upper threshold adjusted for some types of patients (e.g. new-borns) or additionally paid for new technologies, expensive medications etc. (22, 24). Some countries prefer to amplify short-stay weights without the lower threshold in order to create an incentive for short-stay visits or day-cases (11). A system without the lower threshold might be implemented in Serbia later on, in order to raise the currently small percentage of daily cases (21% in Vojvodina). On the other hand, the lower threshold is an attempt at avoiding inappropriately early discharges, colloquially called “bloody discharges” (22). In conclusion, the implementation of the DRG system is a continuous process of improving (25). Eventually, the method should be chosen by the authority as the balance between efficiency and quality, and between competition and sustainability.

Joint activities of health institutions, the National Health Insurance Fund and the Ministry of Health resulted in the modification in the hospital payment system. According to the Rulebook for 2019, 4% of the hospital reimbursement should be based on DRGs performance and 1% on quality indicators. Neither of the methods for determining outliers has been included yet (26). Presumably, the authority has planned to cross a point of no return and postpone implementation of the method until stakeholders become familiar with scheme and coding. Since there is no real competition between providers, the financial effects of recent reform are not easy to predict. For instance, it is questionable how hospitals would cover a potential decrease in revenues, even for a single percent in comparison to a previous year. Should hospitals cut down expenditure for medication or for salaries that are guaranteed by the law? Considering this, general hospitals could count on more or less a similar number of patients each year, so it could be presumed that more pressure would be on university hospitals. However, Keeler suggests that large hospitals have a lower risk from prospective payment and consequence of outliers, since they can make transfers between different DRGs (27).

The pre-DRG hospital payment system in Serbia instigated providers to prolong hospitalisation. Additional to poor coding practices and insufficient planning of hospital admissions resulted in great heterogeneous data. Therefore, the implementation of the DRG system should be strengthened with the implementation of solutions in different aspects of healthcare. From a clinical perspective, the utilisation of acute beds should be a privilege for acute patients, who should continue further treatment either in primary healthcare facilities or in nursing homes, afterward. Clinical and integrative pathways in support of knowledge and judgment should be directed at the highest volume conditions and diseases (measuring by the proportion of episodes) and for conditions with insufficient within-group homogeneity in order to reduce LOS and costs. Since there is no “best trimming method”, the choice of method must be made based not only on the characteristics of the data sample at hand but also on the goals that health policymakers intend to reach, particularly regarding the announced rationalisation of public health facilities (11, 28). The publishing and comparison of data will certainly improve transparency in clinical practice and spending.

LIMITATIONS

Data quality may affect the measuring of DRG performances (3). In the Vojvodina dataset, a bit more than 1% of cases were identified as errors, which is more than in countries with longer experience in DRG implementation (29).

Episodes with the operating room (OR) procedures unrelated to the principal diagnosis were grouped into separate DRGs. Such groups accounted for around 1% of total episodes in comparison to 0.05% in Australia (30). There are no mistakes in the real sense; despite some of them possibly being the result of miscoding. Therefore, their percentage should be under control, because they could be a result of oversight high-cost episodes. The training of clinicians on the correct usage of ICD 10 and ACHI, to avoid such errors, is necessary. Since proper coding is essential and clinicians are more focused on treatment, the authority should consider training for coders who will review, analyse and accurately assign ICD-10-AM/ACHI codes and DRGs to all inpatient episodes.

CONCLUSIONS

A long length of stay, a small percentage of daily cases and a substantial number of long-term episodes characterized hospital activity in Vojvodina with a great heterogeneity of coding practice. AR-DRG could explain 30% of variation for LOS in raw dataset, and between 49% and 61% in trimmed dataset. The percentage of outliers varied from 7% to 24%, depending on the trimming method.

Further studies should test different trimming algorithms and identify factors associated with high length of stay and low R2 for some MDCs using cost data rather than LOS.

eISSN:
1854-2476
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Medicine, Clinical Medicine, Hygiene and Environmental Medicine