1. bookAHEAD OF PRINT
Detalles de la revista
License
Formato
Revista
eISSN
2444-8656
Primera edición
01 Jan 2016
Calendario de la edición
2 veces al año
Idiomas
Inglés
Acceso abierto

Application of data mining in basketball statistics

Publicado en línea: 29 Apr 2022
Volumen & Edición: AHEAD OF PRINT
Páginas: -
Recibido: 05 Jan 2022
Aceptado: 23 Feb 2022
Detalles de la revista
License
Formato
Revista
eISSN
2444-8656
Primera edición
01 Jan 2016
Calendario de la edición
2 veces al año
Idiomas
Inglés
Introduction

The National Basketball Association (NBA) is a professional basketball league in the North American continent. Its enormous influence attracts many fans from all over the world. Because of the significant amount of money and the vast number of fans involved in the NBA league, there have been many studies attempting to predict the outcome of NBA games by simulating the winning team and analysing their players’ abilities so as to assist the coach in team organisation. NBA games have accumulated much historical game data and statistical analysis data. Even with this situation being what it is, analysing and predicting these games is still very complicated [1].

Statistics have always been a must for basketball player evaluation, from simple field-goal average to overall efficiency indicators such as the attack score introduced by Oliver (2004) [2]. Generally, a professional sport network is staffed with a professional analysis team, which is responsible for collecting and interpreting data from each game, and establishing statistical indicators based on the player performance in the actual play to measure the realisation level of players and teams. In predicting the performance of individual athletes, only a few statistical indicators can be used. In order to simplify the game analysis and accurately predict game results by using data, related technologies such as machine learning have been applied to predict outcomes of NBA games.

In this paper, machine learning was used to predict the performance of players before they played in the regulation game, by using FPTS collected from professional sports sites and their NBA basketball player statistical indicators, to predict outcomes of NBA games. Such prediction will only define the winning team, without considering its final scores in the game. Moreover, the effects of feature variables of basketball matches on game prediction were analysed, and feature selection was performed. The machine learning models adopted in this paper included the linear regression, extreme gradient boosting (XGBoost) and neural network algorithms [4, 5, 6, 7]. The prediction effects of the three models were compared to select the best machine learning model. This paper comprises five parts. The first part is the introduction. In the second part, the problems to be solved are analysed and the modelling objectives are defined. The third part discusses the various machine learning models and algorithms. In the fourth part, model training and model evaluation are introduced. The final part presents an analysis on the prediction results associated with each of the studied models, and explains the selected model's advantages; further, it also presents a summary of our observations.

Problem Analysis and Modelling Objectives

The paper uses a machine learning algorithm to predict the player performance based on past player data, and to predict team performance and the outcome of basketball games. Specifically, the data adopted, the feature variables constructed and the predicted evaluation objectives were analysed. First, the statistics of player games and player salary and position information during three regular seasons of NBA from 2015 to 2018 were used. For missing values in player data, the mean value or NaN was used. With a total of 30 teams, the NBA has 82 regular-season games, with a maximum of 15 players in each game. Accordingly, the combined dataset contains 89,406 pieces of data. The maximum, mean, standard deviation and median of the major variables are shown in Figure 1. The different variable size ranges made it necessary to standardise the data. Therefore, data were standardised in the range of [0,1] in this paper. There were no significant deviations between the median value and the mean value for variables other than salary. These variables are less affected by abnormal values which can be ignored.

Fig. 1

The influence weights of three weighting modes on matches, for the past 10 matches

Second, indicators were defined to quantify a player's ability, including total points (PT S), three-point shot (3P), total rebounds (T RB), assist (AST), steal (STL), block (BLK), turnover (TOV), double doubles (DD) and triple double (TD). In addition, another 19 variables from advanced statistics of Basketball-Reference.com were used in this paper. All variables’ values for models were the weighted mean of the data from the past 10 games, because using historical data that is located temporally closer to the predicted game is expected to result in a higher predictive accuracy.

There were other variables affecting player performance, such as home-court advantage, rest days, team sheets, player positions and salary, which were collected from the DraftKings’ algorithm or coaches’ decision-making system [8]. The variable (value) of a player was constructed as the ratio between salary divided by 1,000 and FPTS, which was treated as a heuristic algorithm. When the value is higher than 5, it indicates that the player is in a good state, with higher ability evaluation [9]. Moreover, the dimensionality resulting from more feature variables is solved by reducing the correlation between variables and selected important features. In this paper, considering the predictive effectiveness of variables for players, the following models were established to represent the quantitative relationship between variables and players’ ability values. The ability value yi of the athlete i is defined as follows: yi=PTSi+0.5*3Pi+1.25*TRBi+1.25*ASTi+2*STLi+2*BLKi0.5*TOV+1.5*DD+3*TD {y_i} = PT{S_i} + 0.5*3{P_i} + 1.25*TR{B_i} + 1.25*AS{T_i} + 2*ST{L_i} + 2*BL{K_i} - 0.5*TOV + 1.5*DD + 3*TD

Finally, this paper aims to predict players’ ability value yi by minimising the loss function between observed and predicted values of yi. Mean absolute error (MAE) and root mean square error (RMSE) were used to evaluate the predictive performance of analytical regression, gradient enhancement and deep learning algorithms. Besides, the algorithm is expected to be optimised into a condition under which the RMSE value between the predicted and observed values of similar players’ abilities are minimised. RMSE=1Ni=1N(yiy^i)2 RMSE = \sqrt {{1 \over N}\sum\limits_{i = 1}^N {{\left({{y_i} - {{\hat y}_i}} \right)}^2}} MAE=1Ni=1N(|yiy^i|) MAE = {1 \over N}\sum\limits_{i = 1}^N \left({|{y_i} - {{\hat y}_i}|} \right)

Introduction to Models and Algorithms

Considering the prediction goal of this paper, it is obvious that the prediction is closely related to the regression. Therefore, the regression algorithm was adopted. The model comparison in this paper applies relevant machine learning regression algorithms, including linear regression, XGBoost and neural network algorithms. This section discusses the algorithm principles of these models.

Linear regression algorithm

For the linear regression, if a linear relationship exists between independent variables and dependent variables, it will meet the following equation: (xi)=θ0+θ1xi1++θ1xij+θ1xin \left({{x_i}} \right) = {\theta _0} + {\theta _1}{x_{i1}} + \ldots + {\theta _1}{x_{ij}} + {\theta _1}{x_{in}}

There are m samples, and each sample has n dimensions, where xi is the ith input sample vector, hθ (xi) represents the output value, xij represents the jth component of the ith input sample vector and θj represents the jth component of the θ vector. The prediction objective is to minimise the loss function of the model: J(θ)=12i=1m(hθ(xi)yi)2 J\left(\theta \right) = {1 \over 2}\sum\limits_{i = 1}^m {({h_\theta}\left({{x_i}} \right) - {y_i})^2}

The least square method is used to estimate the parameters. This method can deduce the optimised parameters after the model training, so as to predict outcomes by regression.

XGBoost Algorithm:

XGBoost is a member of the boosting family of ensemble learning, which is an improvement of the boosting algorithm based on the Gradient-boosted Decision Tree (GBDT). GBDT fits to the residual by using the negative gradient of the model on the data as the approximate value of the residual. XGBoost attempts to fit to the residual of data. However, this algorithm uses the second-order Taylor expansion to fit loss residuals of model, and at the same time improves the loss function of the model and adds a regular term of model complexity [10].

XGBoost has improved performance over GBDT, and its performance can also be seen in various competitions. Under a reasonable parameter setting, GBDT can only achieve satisfactory accuracy after generating a certain number of trees. In some way, when the dataset is complex, the model may be iterated for thousands of times. However, XGBoost can solve this problem better by using parallel CPUs. This is the mathematical principle of XGBoost.

Boosting is a method to transform a weak classifier into a strong classifier, whose function model is superposed. To be specific: D={(xi,yi)}(|D|=n,xi) D = \left\{{\left({{x_i},{y_i}} \right)} \right\}\left({\left| D \right| = n,{x_i} \in {{\rm{\mathbb R}}}} \right) y^i=ϕ(xi)=k=1Kfk(xi),fkF {\hat y_i} = \phi \left({{x_i}} \right) = \sum\limits_{k = 1}^K {f_k}\left({{x_i}} \right),\quad {f_k} \in F where D represents a dataset, m dimensions and quantity is n. Specifically, each tree is a tree model, which can be expressed as follows: F={f(x)=wq(x)}(q:nT,ωT) F = \{f\left(x \right) = {w_{q\left(x \right)}}\} (q:{{{\rm{\mathbb R}}}^n} \to T,\;\omega \in {{{\rm{\mathbb R}}}^T}) where q(x) represents the mapping relationship between the sample x and the leaf nodes of the tree model, and ω is the predicted value used to fit to the samples belonging to the respective leaf nodes in the tree model. From the second round, for each training, the residual between the predicated and actual values of the last round is input. The result of the last leaf node is also the prediction of the residual. The sum of residuals in all rounds is what we want.

Determine the objective function

The objective function is a feature of XGBoost. In order to prevent overfitting, the objective function of XGBoost is made of a loss function and a complexity term. The complexity consists of the number of leaves and the L2 regularity: L(ϕ)=il(y^i,yi)+kΩ(fk) L\left(\phi \right) = \sum\limits_i l({\hat y_i},{y_i}) + \sum\limits_k \Omega ({f_k}) where Ω(f)=γT+12λω2 \Omega \left(f \right) = \gamma T + {1 \over 2}\lambda \parallel \omega {\parallel ^2} can generate a minimum value since the loss and complexity terms are both convex functions. It can be understood in a better way that ω is the residual to the actual value, and the addition of L2 regularity of ω into the target function can prevent overfitting.

Optimising the solution:

When the objective function is determined, it moves to the training process. For each iteration process, the training of the objective function of a tree can be written as follows: L(t)=il(yi,y^it1)+ft(xi)+Ω(ft) {L^{(t)}} = \sum\limits_i l\left({{y_i},{{\hat y}_i}^{t - 1}} \right) + {f_t}\left({{x_i}} \right) + \Omega ({f_t})

The input is the predicted value after the (t−1)th round, and the actual value is used to fit to the residual. In this formula, a second-order Taylor expansion approximation is performed to obtain the form of loss function: L(t)i=1l(yi,y^it1)+gift(xi)+12hift2(xi)+Ω(ft) {L^{(t)}} \approx \sum\limits_{i = 1} l\left({{y_i},{{\hat y}_i}^{t - 1}} \right) + {g_i}{f_t}\left({{x_i}} \right) + {1 \over 2}{h_i}f_t^2\left({{x_i}} \right) + \Omega \left({{f_t}} \right) where gi=y^(t1)l(yi,y^(t1)) {g_i} = \partial _{\hat y}^{\left({t - 1} \right)}l\left({{y_i},{{\hat y}^{(t - 1)}}} \right) , hi=y^(t1)2l(yi,y^(t1)) {h_i} = \partial _{{{\hat y}^{(t - 1)}}}^2l\left({{y_i},{{\hat y}^{(t - 1)}}} \right) .

After a series of calculations, the smallest objective function ω is as follows: wjt=iIjgiiIjhi+λ w_j^t = - {{\sum_{i \in {I_j}} {g_i}} \over {\sum_{i \in {I_j}} {h_i} + \lambda}}

After putting it into the original equation, the minimum value obtained is: L(t)(q)=12j=12(iIjgi)iIjhi+λ+λT {L^{\sim \left(t \right)}}\left(q \right) = - {1 \over 2}\sum_{j = 1}^2 {{(\sum_{i \in {I_j}} {g_i})} \over {\sum_{i \in {I_j}} {h_i} + \lambda}} + \lambda T

So, the finally obtained ω is the optimal solution of the objective function under the condition of a sample set.

Neural network

The neural network algorithms are widely used in all subdomains of artificial intelligence. They are briefly introduced in the literature, including the explained version. wij is considered as the strength of connection between neurons i and j. Basically, a neural network includes three groups, namely visible set V, hidden set Hand neuron output set O. The set V consists of neurons that receive signals and pass them to hidden neurons in the set H. In terms of the mathematical principle for deep neural network learning, ReLU function is used as the main activation function in this paper [11, 12, 13]: f(yi)={yi,yi>00,y0 f\left({{y_i}} \right) = \left({\matrix{{{y_i},} & {{y_i} > 0} \cr {0,} & {y \le 0} \cr}} \right.

Model establishment
Data processing

Data were pre-processed, including filling the missing value, making the variable name uniform and carrying out variable standardisation. The mean value was used to fill in the missing data. The data from different sources were uniformly processed and the player data were standardised for minmax to make them to fall into the range of [0,1]. As this model selected a fixed number of historical games, each statistic has a comparable computational cost across seasons. The weights for each game are shown in Figure 1. Since the first 10 games are ignored in each season, the number of historical games is now 75,239.

Feature selection

Except for the variables directly used to calculate FPTS, other variables were selected for models to quantify the player ability by using more advanced statistical methods. The details are discussed in part 2.

Using data from recent games allows a more objective and accurate prediction of the results of players’ abilities. Therefore, the weighted mean of the past 10 games was used to obtain each variable. The relevant theory shows that the mean weight of games increases linearly with the number of games. In this paper, the square root and linear and square modes were used for quantitative evaluation. It is necessary to normalise the weights such that the sum of the weights is 1. As shown in Figure 1, the weighted square mode is a better weighting method. Accordingly, it was considered as the best option in this paper.

For the consistency of feature variables, the standard deviation of the FPTS variable over the last 10 games was defined as FPTS_std, while salary information from DraftKings was also defined as the Salary variable. Before a game, the participation of a player in the game cannot be determined from the model. Thus, it is necessary to calculate the value of the model's feature variable according to the published player sheet [14] before the game.

Fig. 2

Feature importance ranking

Because of the different predictive abilities of features on game outcomes and correlations between features, the features should be filtered and ranked. Taking the field goal (FG) as an example, it is highly correlated with effective field-goal percentage (eFG%) because the latter considered far fewer free throw points. Furthermore, some variables have multiple collinearity, such as three pointers (3P), three pointers (3PA) and three-point percentage (3P%). In this paper, the Pearson correlation coefficient was used for screening features. With the setting correlation threshold of 0.90, the following six features were screened out: three-point shot (3PA), defensive rebound (DRB), field goals attempted (FAG), field-goal percentage (FG%) and offensive rating (ORtg). In addition, variables without predictive ability are directly ignored in models. The gradient descent method was used in models to evaluate and quantify the feature significance of 34 variables. Using feature ranking, such features as dummy variables of position (SG, F, C), three pointers (3P) and three doubles (TD) were excepted. Finally, the remaining 29 variables were used as selected features for regression, gradient enhancement and deep learning.

Fig. 3

Correlation matrix

Model prediction and evaluation

The results of linear regression for all variables, using three different datasets, are shown in Table 1. A 5-fold cross validation was implemented when the linear regression model was trained. According to the linear model regression prediction, the minimum value of RMSE is 9.5356.

Prediction effect of linear model

Square RootLinearQuadratic

RMSE7.25267.21247.1963
MAE9.58239.54379.5356

MAE, Mean absolute error;

RMSE, Root mean square error.

For the XGBoost model, the parameters should be adjusted and optimised during model training. First, the RMSE produced by XGBoostRegressor using the 29 variables selected above for any parameter adjustment is 9.0316. Then the Bayesian optimisation algorithm was used to adjust the parameters and improve the model. For the XGBoost's parameters to be optimised, the distribution of the variables was set in a uniform form, except for the learning rate, which distributes in a logarithmic uniform form. Parameters were randomly selected from the first five parameters and iteratively improved from the optimal performance parameters’ set. During each iteration, the 5-fold cross validation was applied to evaluate a given parameters’ set. Through a series of iterations, the resulting optimal performance parameters’ set was:

Max_depth: 5, n_estimators: 354, n_child_weight: 0, gamma: 0.8, learning_rate: 0.0152. The use of these parametric adjustment models will result in better performance (RMSE is 8.9581).

In terms of the neural network model, Figure 4 shows the learning process of the model. A total of 20% of the training data was retained as validation data. It can be seen that the model soon starts to overfit, and verification losses are different from training losses. To prevent the model from overfitting, a loss layer was added, which randomly ignored 40% of the data points before feeding them forward to the last layer. In this paper, the EarlyStopping method in the Keras.callbacks model was applied. If verification is lost for no more than five periods, learning is terminated. This solution improved the model performance, with the RMSE reduced from 9.0678 in the original model to 9.0387.

Fig. 4

Neural network model training effect

Finally, the performance of these three models was compared, as shown in Table 2, where the models in the first and third rows calculated the mean value by using the defined linear combination of coefficients. In the fourth and fifth lines, the XGBoost and neural network algorithms used the selected 29 variables and quadratic weights. Obviously, the XGBoost algorithm shows the best predictive performance, with the lowest RMSE (8.9581) and MAE values (6.8486). According to the training results of these three models, the XGBoost regression model with hyperparameter adjustment was selected as the final prediction model for basketball games.

Comparison of three machine learning models

ModelRMSEMAE

Simple Average9.94347.4285
Weight Average9.74757.2059
Linear Regression9.25587.0478
XGBoost8.95816.8486
Neural Network9.03876.8805

MAE, mean absolute error;

RMSE, root mean square error;

XGBoost, extreme gradient boosting.

After the prediction calculation by models, and 5-fold cross validation, the consistent RMSEs (8.9910, 9.0522, 9.0148, 8.9351, and 9.0831) were obtained. In order to further improve the robustness of the model, the effect of small changes in input data on the performance of the model was also studied. First, the Gaussian noise was created by using mean 0 and variance 1, which was then scaled to the range of [0, 0.2] and added to the continuous variable in the original input. When the 5-fold cross validation was performed on the independent variable evaluated by using the noise, the losses seem to have barely budged, with RMSE and MAE only increasing to 6.8964 and 9.0153, respectively. As a result, the model was proven to have good robustness. The final model showed that the RMSE increased by nearly 10%, from 9.9434 in the linear model to 8.9581.

For the model, t-test equation was used to verify that the RMSE of the XGBoost model was better than the RMSE (9.9434) of the baseline model, and the former has a simple mean value. As the null hypothesis is set, the XGBoost predictor's performance is equal to the average performance of baseline model of the sample. Assuming that the two groups follow a normal distribution, the RMSE can be calculated based on the standard deviation σ and sample size n after the 5-fold cross validation according to the definition, because the central limit theorem can be applied to the quantity involving the average value. In this case, the original result is: 8.9910,9.0522,9.0148,8.9351,9.0831 8.9910,{\kern 1pt} 9.0522,{\kern 1pt} 9.0148,{\kern 1pt} 8.9351,{\kern 1pt} 9.0831

Assuming that L˜ \tilde L is the RMSE of XGBoost's predicted value and L0 is the RMSE of the baseline model, the t statistic is: t=L˜L0σn=9.94348.95810.04055=121.51>15.54 t = {{\tilde L - {L_0}} \over {{\sigma \over {\sqrt n}}}} = {{9.9434 - 8.9581} \over {{{0.0405} \over {\sqrt 5}}}} = 121.51 > 15.54

The statistical significance was 0.1%, with four degrees of freedom and a critical value of 15.54. Therefore, the weighted mean, feature engineering and XGBoost regression can be used to improve the accuracy of predicted outcomes of NBA games.

Conclusion

In this paper, the steps and processes of solving application problems by using machine learning methods are discussed, i.e., analysing problems (predicting NBA games), searching research and processing data, feature selection, model training, evaluation and optimisation, etc. The indicators from DraftKings were used to predict how NBA players perform in regular games. The model prediction is run to minimise the RMSE between the predicted value and the actual FPTS statistics. It started from the basis linear model, where averages of the seasons’ past statistical data were used, together with weighting methods, to extract important features from selected features. The key feature variables for the model to predict a player's ability included salary information, team, player sheet provided by DraftKings, and other important statistical factors such as total rebounds and individual points scored. After feature selection and data normalisation, the XGBoost model with hyperparameter adjustment has the lowest RMSE of 8.9581, while the linear regression model has an RMSE of 9.2558 and the neural network has an RMSE of 9.0387. Obviously, the XGBoost model has a better effect.

Moreover, the comparison between the predictions by models and FPTS actual statistics was verified in relevant games selected from the training data. The algorithm was used in five games broadcast on 10 March 2018, which produced the following eight player lineups, with an expected FPTS of 242.2643 and a total salary of US$ 49,900. The blue, orange and green bars stand for the actual FPTS, predictions by final model and predictions by basis liner mode, respectively. Below the names of the players their positions are mentioned. The final model predicted that the actual FPTS were much better than that of the basis liner modes of certain athletes such as Dillon Brooks (SF) and Dwight Powell (C). However, it tends to overestimate the FPTS of players such as Tomas Satoransky (G) and Kobi Simmons (F). Overall, the predictions from the match were very accurate with losses of 6.2836 (MAE) and 7.6538 (RMSE).

Throughout the data mining exercise, the feature importance and feature correlation matrix were essential to understanding how statistical indicators affect the predicated outcome of competition. Most importantly, data processing and feature extraction in this project have a great impact on the predicted results, which must be focused on. In particular, unifying names, handling missing values in data processing, and ranking and combination of important features in the feature selection process will greatly affect the final model training.

Fig. 5

Comparison between predicted and actual values

While the models used in this paper generally followed the performance comparisons of the algorithms themselves, the final improvement in RMSE was less than 10%. If, the opponents’ defensive data, such as a team's defensive rating and the positions of opposing players, are considered, the accuracy of prediction of the model might be improved. Furthermore, there is another important factor – namely, a coach determines when a player enters the field. When simulated players are used, it becomes possible to observe how the number of minutes varies under different game managements and, especially, how the formation tactics change during games. These factors can be modelled for quantitative evaluation and included within models. DraftKings also provides the views of news outlets and professional reviewers, which can be combined through natural language processing to be useful for performance prediction and formation optimisation. In conclusion, it is highly necessary to use the machine learning algorithm in the basketball game prediction and player ability quantitative evaluation system, and thus this usage is worthy of further research.

Fig. 1

The influence weights of three weighting modes on matches, for the past 10 matches
The influence weights of three weighting modes on matches, for the past 10 matches

Fig. 2

Feature importance ranking
Feature importance ranking

Fig. 3

Correlation matrix
Correlation matrix

Fig. 4

Neural network model training effect
Neural network model training effect

Fig. 5

Comparison between predicted and actual values
Comparison between predicted and actual values

Prediction effect of linear model

Square Root Linear Quadratic

RMSE 7.2526 7.2124 7.1963
MAE 9.5823 9.5437 9.5356

Comparison of three machine learning models

Model RMSE MAE

Simple Average 9.9434 7.4285
Weight Average 9.7475 7.2059
Linear Regression 9.2558 7.0478
XGBoost 8.9581 6.8486
Neural Network 9.0387 6.8805

Leung C K, Joseph K W. Sports data mining: predicting results for the college football games [J]. Procedia Computer Science, 2014, 35:710–719 LeungC K JosephK W Sports data mining: predicting results for the college football games [J] Procedia Computer Science 2014 35 710 719 10.1016/j.procs.2014.08.153 Search in Google Scholar

Dean Oliver. Basketball on Paper. Brassey's, Inc., 2002. OliverDean Basketball on Paper Brassey's, Inc. 2002 Search in Google Scholar

Christopher Barry, Nicholas Canova, and Kevin Capiz. Beating Draftkings at Daily Fantasy Sports. 2017. [Online; Accessed 04-May-2018]. BarryChristopher CanovaNicholas CapizKevin Beating Draftkings at Daily Fantasy Sports 2017 [Online; Accessed 04-May-2018]. Search in Google Scholar

Loeffelholz Bernard, Bednar Earl, and Bauer Kenneth W. Predicting nba games using neural networks. Journal of Quantitative Analysis in Sports, 5(1):1–17, January 2009. BernardLoeffelholz EarlBednar BauerKenneth W Predicting nba games using neural networks Journal of Quantitative Analysis in Sports 5 1 1 17 January 2009 10.2202/1559-0410.1156 Search in Google Scholar

Kou-Yuan Huang and Wen-Lung Chang. A neural network method for prediction of 2006 world cup football game. In IJCNN, pages 1–8. IEEE, 2010. HuangKou-Yuan ChangWen-Lung A neural network method for prediction of 2006 world cup football game In IJCNN, 1 8 IEEE, 2010 10.1109/IJCNN.2010.5596458 Search in Google Scholar

S. P. Kvam and J. S. Sokol. A logistic regression/Markov chain model for ncaa basketball. Naval Research Logistics, 53:788–803, 2006. KvamS. P. SokolJ. S. A logistic regression/Markov chain model for ncaa basketball Naval Research Logistics 53 788 803 2006 10.1002/nav.20170 Search in Google Scholar

M.C. Purucker. Neural network quarterbacking. Potentials, IEEE, 15(3):9–15, 1996. PuruckerM.C. Neural network quarterbacking Potentials, IEEE 15 3 9 15 1996 10.1109/45.535226 Search in Google Scholar

DraftKings. NBA: Rules & Scoring, 2018. [Online; Accessed 11-May-2018]. DraftKings NBA: Rules & Scoring 2018 [Online; Accessed 11-May-2018]. Search in Google Scholar

Mark Brown, Paul Kvam, George Nemhauser, and Joel Sokol. Insights from the LRMC method for NCAA tournament predictions. In MIT Sloan Sports Conference, March 2012. BrownMark KvamPaul NemhauserGeorge SokolJoel Insights from the LRMC method for NCAA tournament predictions In MIT Sloan Sports Conference March 2012 Search in Google Scholar

Tianqi Chen. Introduction to Boosted Trees, n.d. [Online; Accessed 15-May-2018]. ChenTianqi Introduction to Boosted Trees n.d. [Online; Accessed 15-May-2018]. Search in Google Scholar

Schulte O, Khademi M, Gholami S, et al. A Markov game model for valuing actions, locations, and team performance in ice hockey[J]. Data Mining & Knowledge Discovery, 2017, 31(5814):1–23 SchulteO KhademiM GholamiS A Markov game model for valuing actions, locations, and team performance in ice hockey[J] Data Mining & Knowledge Discovery 2017 31 5814 1 23 10.1007/s10618-017-0496-z Search in Google Scholar

Saikia H, Bhattacharjee D, Lemmer H. Predicting the Performance of Bowlers in IPL: An application of artificial neural network[J]. International Journal of Performance Analysis in Sport, 2012, 12(12):75–89(15) SaikiaH BhattacharjeeD LemmerH Predicting the Performance of Bowlers in IPL: An application of artificial neural network[J] International Journal of Performance Analysis in Sport 2012 12 12 75 89(15) 10.1080/24748668.2012.11868584 Search in Google Scholar

Milich R, Murphy D A. Appliance of neural networks in basketball scouting[J]. Acta Polytechnica Hungarica, 2010, 7(4):201–213 MilichR MurphyD A Appliance of neural networks in basketball scouting[J] Acta Polytechnica Hungarica 2010 7 4 201 213 Search in Google Scholar

Carpita M, Sandri M, Simonetto A, et al. Discovering the drivers of football match outcomes with data mining[J]. Quality Technology & Quantitative Management, 2015, 12(4):561–577 CarpitaM SandriM SimonettoA Discovering the drivers of football match outcomes with data mining[J] Quality Technology & Quantitative Management 2015 12 4 561 577 10.1080/16843703.2015.11673436 Search in Google Scholar

Artículos recomendados de Trend MD