Interpretable decision-tree induction in a big data parallel framework

AlSabti, K., Ranka, S. and Singh, V. (1998). Clouds: Classification for large or out-of-core datasets, Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 2-8. Search in Google Scholar

Amado, N., Gama, J. and Silva, F. (2001). Parallel implementation of decision tree learning algorithms, in P.10.1007/3-540-45329-6_4Open DOI Search in Google Scholar

Brazdil and A. Jorge (Eds.), Progress in Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 6-13.Search in Google Scholar

Amado, N., Gama, J. and Silva, F. (2003). Exploiting parallelism in decision tree induction, ECML/PKDDWorkshop on Parallel and Distributed Computing for Machine Learning, Cavtat/Dubrovnik, Croatia, pp. 13-22.Search in Google Scholar

Andrzejak, A., Langner, F. and Zabala, S. (2013). Interpretable models from distributed data via merging of decision trees, IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Savannah, GA, USA, pp. 1-9.Search in Google Scholar

Bekkerman, R., Bilenko, M. and Langford, J. (2011). Scaling up Machine Learning: Parallel and Distributed Approaches, Cambridge University Press, Cambridge.10.1145/2107736.2107740Search in Google Scholar

Ben-Haim, Y. and Tom-Tov, E. (2010). A streaming parallel decision tree algorithm, The Journal of Machine Learning Research 11: 849-872.Search in Google Scholar

Breiman, L. (1999). Pasting small votes for classification in large databases and on-line, Machine Learning 36(1-2): 85-103.10.1023/A:1007563306331Open DOI Search in Google Scholar

Dai, W. and Ji, W. (2014). A MAPREDUCE implementation of c4.5 decision tree algorithm, International Journal of Database Theory and Application 7(1): 49-60.Search in Google Scholar

DeWitt, D.J., Naughton, J.F. and Schneider, D. (1991). Parallel sorting on a shared-nothing architecture using probabilistic splitting, Proceedings of the 1st International Conference on Parallel and Distributed Information Systems, Miami Beach, FL, USA, pp. 280-291.Search in Google Scholar

Domingos, P. and Hulten, G. (2000). Mining high-speed data streams, Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pp. 71-80.Search in Google Scholar

Fan, W. and Bifet, A. (2013). Mining big data: Current status, and forecast to the future, ACM sIGKDD Explorations Newsletter 14(2): 1-5.10.1145/2481244.2481246Search in Google Scholar

Gehrke, J., Ganti, V., Ramakrishnan, R. and Loh, W.-Y. (1999). Boat-optimistic decision tree construction, in S. Davidson and C. Faloutsos (Eds.), ACM SIGMOD Record, Vol. 28, ACM, New York, NY, pp. 169-180.10.1145/304181.304197Open DOI Search in Google Scholar

Goil, S. and Choudhary, A. (2001). Parsimony: An infrastructure for parallel multidimensional analysis and data mining, Journal of Parallel and Distributed Computing 61(3): 285-321.10.1006/jpdc.2000.1691Open DOI Search in Google Scholar

Hansen, L.K. and Salamon, P. (1990). Neural network ensembles, IEEE Transactions on Pattern Analysis & Machine Intelligence 12(10): 993-1001.10.1109/34.58871Search in Google Scholar

Jin, R. and Agrawal, G. (2003). Communication and memory efficient parallel decision tree construction, Proceedings of the 3rd SIAM International Conference on Data Mining, San Francisco, CA, USA, pp. 119-129.Search in Google Scholar

Joshi, M.V., Karypis, G. and Kumar, V. (1998). SCALPARC: A new scalable and efficient parallel classification algorithm for mining large datasets, Parallel Processing Symposium, Los Alamitos, CA, USA, pp. 573-579.Search in Google Scholar

Kargupta, H. and Park, B.-H. (2004). A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments, IEEE Transactions on Knowledge and Data Engineering 16(2): 216-229.10.1109/TKDE.2004.1269599Search in Google Scholar

Kourtellis, N., Morales, G.D.F., Bifet, A. and Murdopo, A. (2016). VHT: Vertical Hoeffding tree, arXiv preprint, 1607.08325.Search in Google Scholar

Louppe, G. and Geurts, P. (2012). Ensembles on random patches, in P.A. Flach et al. (Eds.), Machine Learning and Knowledge Discovery in Databases, Springer, Berlin/Heidelberg, pp. 346-361.10.1007/978-3-642-33460-3_28Search in Google Scholar

Mehta, M., Agrawal, R. and Rissanen, J. (1996). SLIQ: A fast scalable classifier for data mining, in P. Aspers et al. (Eds.), Advances in Database Technology, Springer, Berlin/Heidelberg, pp. 18-32.10.1007/BFb0014141Search in Google Scholar

Miglio, R. and Soffritti, G. (2004). The comparison between classification trees through proximity measures, Computational Statistics & Data Analysis 45(3): 577-593.10.1016/S0167-9473(03)00063-XOpen DOI Search in Google Scholar

Narlikar, G.J. (1998). A parallel, multithreaded decision tree builder, Technical report, DTIC Document, http://www.dtic.mil/docs/citations/ADA363531Search in Google Scholar

Ntoutsi, I., Kalousis, A. and Theodoridis, Y. (2008). A general framework for estimating similarity of datasets and decision trees: Exploring semantic similarity of decision trees, in C. Apte et al. (Eds.), SIAM Conference on Data Mining, SIAM, Philadelphia, PA, pp. 810-821.10.1137/1.9781611972788.73Search in Google Scholar

Panda, B., Herbach, J.S., Basu, S. and Bayardo, R.J. (2009). Planet: Massively parallel learning of tree ensembles with MapReduce, Proceedings of the VLDB Endowment 2(2): 1426-1437.10.14778/1687553.1687569Search in Google Scholar

Pawlik, M. and Augsten, N. (2011). RTED: A robust algorithm for the tree edit distance, Proceedings of the VLDB Endowment 5(4): 334-345.10.14778/2095686.2095692Search in Google Scholar

Shafer, J., Agrawal, R. and Mehta, M. (1996). Sprint: A scalable parallel classifier for data mining, International Conference on Very Large Data Bases, Mumbai (Bombay), India, pp. 544-555.Search in Google Scholar

Shannon, W.D. and Banks, D. (1999). Combining classification trees using MLE, Statistics in Medicine 18(6): 727-740.10.1002/(SICI)1097-0258(19990330)18:6<727::AID-SIM61>3.0.CO;2-2Open DOI Search in Google Scholar

Sollich, P. and Krogh, A. (1996). Learning with ensembles: How overfitting can be useful, in D.S. Touretzky et al. (Eds.)Advances in Neural Information Processing Systems 8, MIT Press, Cambridge, MA, pp. 190-196.Search in Google Scholar

Sreenivas, M.K., AlSabti, K. and Ranka, S. (2000). Parallel out-of-core decision tree classifiers, in H. Kargupta and P. Chan (Eds.), Advances in Distributed and Parallel Knowledge Discovery, Cambridge, MA, pp. 317-336.Search in Google Scholar

Srivastava, A., Han, E.-H., Kumar, V. and Singh, V. (1995). Parallel formulations of decision-tree classification algorithms, Data Mining and Knowledge Discovery 3(3): 237-261.10.1007/0-306-47011-X_2Search in Google Scholar

Triguero, I., Peralta, D., Bacardit, J., Garc´ıa, S. and Herrera, F. (2015). MRPR: A MAPREDUCE solution for prototype reduction in big data classification, Neurocomputing 150(A): 331-345.10.1016/j.neucom.2014.04.078Search in Google Scholar

Zhang, K. and Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems, SIAM Journal on Computing 18(6): 1245-1262.10.1137/0218082Open DOI Search in Google Scholar

Zhang, X. and Jiang, S. (2012). A splitting criteria based on similarity in decision tree learning, Journal of Software 7(8): 1775-1782.10.4304/jsw.7.8.1775-1782Search in Google Scholar

Zhang, Y., Gao, Q., Gao, L. and Wang, C. (2012). IMAPREDUCE: A distributed computing framework for iterative computation, Journal of Grid Computing 10(1): 47-68.10.1007/s10723-012-9204-9Open DOI Search in Google Scholar

eISSN:: 2083-8492
Langue:: Anglais

Périodicité:: 4 fois par an
Sujets de la revue:: Mathematics, Applied Mathematics

RSS Feed de la revue

Interpretable decision-tree induction in a big data parallel framework

Publié en ligne: 13 janv. 2018

Pages: 737 - 748

Reçu: 30 nov. 2016

Accepté: 08 août 2017

DOI: https://doi.org/10.1515/amcs-2017-0051

Mots clésbig data, parallel computing, MAPREDUCE, decision trees, editing distance, tree similarity

© by Mark Last

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Mots clés
big data, parallel computing, MAPREDUCE, decision trees, editing distance, tree similarity