[Bach, F. and Moulines, E. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning, in J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira and K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (NIPS), Curran Associates, Inc., Red Hook, NY, pp. 451-459.]Search in Google Scholar
[Balamurugan, P., Shevade, S., Sundararajan, S. and Keerthi, S.S. (2011). A sequential dual method for structural SVMs, SDM 2011-Proceedings of the 11th SIAM International Conference on Data Mining, Mesa, AZ, USA.10.1137/1.9781611972818.20]Search in Google Scholar
[Bottou, L. (2008). SGD implementation, http://leon.bottou.org/projects/sgd.]Search in Google Scholar
[Boyd, S. and Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press, New York, NY.10.1017/CBO9780511804441]Search in Google Scholar
[Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Vol. 10, Association for Computational Linguistics, Stroudsburg, PA, pp. 1-8.]Search in Google Scholar
[Collins, M., Globerson, A., Koo, T., Carreras, X. and Bartlett, P.L. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks, Journal of Machine Learning Research 9: 1775-1822.]Search in Google Scholar
[Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S. and Singer, Y. (2006). Online passive-aggressive algorithms, Journal of Machine Learning Research 7: 551-585.]Search in Google Scholar
[Crammer, K., McDonald, R. and Pereira, F. (2005). Scalable large-margin online learning for structured classification NIPSWorkshop on Learning with Structured Outputs, Vancouver/ Whistler, Canada.]Search in Google Scholar
[Daume, III, H.C. (2006). Practical Structured Learning Techniques for Natural Language Processing, Ph.D. thesis, University of Southern California, Los Angeles, CA.]Search in Google Scholar
[Do, C.B., Le, Q.V., Teo, C.H., Chapelle, O. and Smola, A.J. (2008). Tighter bounds for structured estimation, in D. Koller (Ed.), Advances in Neural Information Processing Systems, Curran Associates, Inc., Red Hook, NY, pp. 281-288.]Search in Google Scholar
[Gimpel, K. and Smith, N.A. (2010). Softmax-margin CRFs: Training log-linear models with cost functions, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, pp. 733-736.]Search in Google Scholar
[Jaggi, M., Lacoste-Julien, S., Schmidt, M. and Pletscher, P. (2012). Block-coordinate Frank-Wolfe for structural SVMS, NIPS Workshop on Optimization for Machine Learning, Lake Tahoe, NV, USA.]Search in Google Scholar
[Joachims, T., Finley, T. and Yu, C.-N.J. (2009). Cutting-plane training of structural SVMs, Machine Learning 77(1): 27-59.10.1007/s10994-009-5108-8]Search in Google Scholar
[Lafferty, J.D., McCallum, A. and Pereira, F.C.N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the 18th International Conference on Machine Learning, ICML’01, San Francisco, CA, USA, pp. 282-289.]Search in Google Scholar
[Lee, C., Ryu, P.-M. and Kim, H. (2011). Named entity recognition using a modified Pegasos algorithm, Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, UK, pp. 2337-2340.]Search in Google Scholar
[Li, M., Lin, L., Wang, X. and Liu, T. (2007). Protein-protein interaction site prediction based on conditional random fields, Bioinformatics 23(5): 597-604.10.1093/bioinformatics/btl66017234636]Search in Google Scholar
[Lim, S., Lee, C. and Ra, D. (2013). Dependency-based semantic role labeling using sequence labeling with a structural SVM, Pattern Recognition Letters 34(6): 696-702.10.1016/j.patrec.2013.01.022]Search in Google Scholar
[Martins, A.F.T., Smith, N.A., Xing, E.P., Aguiar, P.M.Q. and Figueiredo, M.A.T. (2011). Online learning of structured predictors with multiple kernels, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, Vol. 15, pp. 507-515.]Search in Google Scholar
[McDonald, R., Crammer, K. and Pereira, F. (2005). Online large-margin training of dependency parsers, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL’05, Ann Arbor, MI, USA, pp. 91-98.]Search in Google Scholar
[Nagata, M. (1994). A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm, Proceedings of the 15th Conference on Computational Linguistics, COLING ’94, Kyoto, Japan, Vol. 1, pp. 201-207.]Search in Google Scholar
[Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization 19(4): 1574-1609.10.1137/070704277]Search in Google Scholar
[Ni, Y., Saunders, C., Szedmak, S. and Niranjan, M. (2010). The application of structured learning in natural language processing, Machine Translation 24(2): 71-85.10.1007/s10590-010-9078-1]Search in Google Scholar
[Nowozin, S. and Lampert, C.H. (2011). Structured learning and prediction in computer vision, Foundations and Trends in Computer Graphics and Vision 6(3-4): 185-365.10.1561/0600000033]Search in Google Scholar
[Platt, J.C. (1999). Fast training of support vector machines using sequential minimal optimization, in B. Schölkopf, C.J.C.]Search in Google Scholar
[Burges and A.J. Smola (Eds.), Advances in Kernel Methods, MIT Press, Cambridge, MA, pp. 185-208.]Search in Google Scholar
[Rakhlin, A., Shamir, O. and Sridharan, K. (2012). Making gradient descent optimal for strongly convex stochastic optimization, in J. Langford and J. Pineau (Eds.), Proceedings of the 29th International Conference on Machine Learning (ICML-12), Edinburgh, UK, pp. 449-456.]Search in Google Scholar
[Ratliff, N.D., Bagnell, J.A. and Zinkevich, M.A. (2006). Subgradient methods for maximum margin structured learning, ICML Workshop on Learning in Structured Output Spaces, Pittsburgh, PA, USA.]Search in Google Scholar
[Sas, J. and Żołnierek, A. (2013). Pipelined language model construction for Polish speech recognition, International Journal of Applied Mathematics and Computer Science 23(3): 649-668, DOI: 10.2478/amcs-2013-0049.10.2478/amcs-2013-0049]Search in Google Scholar
[Shalev-Shwartz, S., Singer, Y. and Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for SVM, Proceedings of the 24th International Conference on Machine Learning, ICML ’07, Corvalis, OR, USA, pp. 807-814.]Search in Google Scholar
[Shalev-Shwartz, S., Singer, Y., Srebro, N. and Cotter, A. (2011).]Search in Google Scholar
[Pegasos: Primal estimated sub-gradient solver for SVM, Mathematical Programming 127(1): 3-30.10.1007/s10107-010-0420-4]Search in Google Scholar
[Shamir, O. (2012). Open problem: Is averaging needed for strongly convex stochastic gradient descent?, Journal of Machine Learning Research 23: 47-1.]Search in Google Scholar
[Shamir, O. and Zhang, T. (2012). Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, arXiv preprint, arXiv:1212.1824.]Search in Google Scholar
[Soong, F.K. and Huang, E.-F. (1991). A tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, Vol. 1, pp. 705-708.]Search in Google Scholar
[Taskar, B., Guestrin, C. and Koller, D. (2004). Max-margin Markov networks, in S. Thrun, L. Saul and B. Schölkopf (Eds.), Advances in Neural Information Processing Systems 16, MIT Press, Cambridge, MA, pp. 25-32.]Search in Google Scholar
[Tjong Kim Sang, E.F. and Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking, Proceedings of the 2nd Workshop on Learning Language in Logic/4th Conference on Computational Natural Language Learning, Lisbon, Portugal, Vol. 7, pp. 127-132.]Search in Google Scholar
[Tsochantaridis, I., Joachims, T., Hofmann, T. and Altun, Y. (2005). Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research 6: 1453-1484. Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory 13(2): 260-269.]Search in Google Scholar
[Weston, J. and Watkins, C. (1998). Multi-class support vector machines, Technical report, Department of Computer Science, Royal Holloway, University of London, London.]Search in Google Scholar
[Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent, arXiv preprint, arXiv:1107.2490. ]Search in Google Scholar