Duplicate Literature Detection for Cross-Library Search

2. Su, W., H. Wu, Y. Li et al. Understanding Query Interfaces by Statistical Parsing. - ACM Transactions on the Web (TWEB), Vol. 7, 2013, No 2, p. 8.10.1145/2460383.2460387Search in Google Scholar

3. Dragut, E. C., W. Meng, C. T. Yu. Deep Web Query Interface Understanding and Integration. - Synthesis Lectures on Data Management, Vol. 7, 2012, No 1, pp. 1-168.10.2200/S00419ED1V01Y201205DTM026Search in Google Scholar

4. Lu, Y, H. He, H. Zhao et al. Annotating Search Results from Web Databases. - Knowledge and Data Engineering, IEEE Transactions on, Vol. 25, 2013, No 3, pp. 514-527.10.1109/TKDE.2011.175Search in Google Scholar

5. Palekar, V. R., M. S. Ali, R. Meghe. Deep Web Data Extraction Using Web Programming-Language Independent Approach. - Journal of Data Mining and Knowledge Discovery, Vol. 3, 2012, No 2, p. 69.Search in Google Scholar

6. Wang, Z., G. Xu, H. Li et al. A Probabilistic Approach to String Transformation. - Knowledge and Data Engineering, IEEE Transactions on, Vol. 26, 2014, No 5, pp. 1063-1075.10.1109/TKDE.2013.11Search in Google Scholar

7. Sood, S., D. Loguinov. Probabilistic Near-Duplicate Detection Using Simhash. - In Proc of 20th ACM International Conference on Information and Knowledge Management, ACM, 2011, pp. 1117-1126.10.1145/2063576.2063737Search in Google Scholar

8. Zhao, W. L., C. W. Ngo, H. K. Tan et al. Near-Duplicate Keyframe Identification with Interest Point Matching And Pattern Learning. - Multimedia, IEEE Transactions on, Vol. 9, 2007, No 5, pp. 1037-1048.10.1109/TMM.2007.898928Search in Google Scholar

9. Hajishirzi, H., W. Yih, A. Kolcz. Adaptive Near-Duplicate Detection via Similarity Learning. - In: Proc. of 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010, pp. 419-426.10.1145/1835449.1835520Search in Google Scholar

10. Zhao, P., J. Xin, X. Xian et al. Active Learning for Duplicate Record Identification in Deep Web. Foundations of Intelligent Systems. Berlin, Heidelberg, Springer, 2014, pp. 125-134.10.1007/978-3-642-54924-3_12Search in Google Scholar

11. Xiao, C., W. Wang, X. Lin et al. Efficient Similarity Joins for Near-Duplicate Detection. - ACM Transactions on Database Systems (TODS), Vol. 36, 2011, No 3, p. 15.10.1145/2000824.2000825Search in Google Scholar

12. He, B., K. C.-C. Chang. Making Holistic Schema Matching Robust: An Ensemble Approach. - KDD, 2005, pp. 429-43810.1145/1081870.1081920Search in Google Scholar

13. Fellegi, I. P., A. B. Sunter. A Theory for Record Linkage. - Journal of the American Statistical Association, Vol. 64, December 1969, No 328, pp. 1183-1210.10.1080/01621459.1969.10501049Search in Google Scholar

14. Newcombe, H. B., J. M. Kennedy, S. J. Axford, A. P. James. Automatic Linkage of Vital Records. - Science, Vol. 130, October 1959, No 3381, pp. 954-959.10.1126/science.130.3381.954Search in Google Scholar

15. Jaro, M. A. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. - Journal of the American Statistical Association, Vol. 84, June 1989, No 406, pp. 414-420.10.1080/01621459.1989.10478785Search in Google Scholar

16. Dempster, A., N. Laird, D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. - Journal of the Royal Statistical Society, Vol. B, 1977, No 39, pp. 1-38.10.1111/j.2517-6161.1977.tb01600.xSearch in Google Scholar

17. Winkler, W. E. Improved Decision Rules in the Felligi-Sunter Model of Record Linkage. Technical Report Statistical Research Report Series RR93/12, U.S. Bureau of the Census, Washington, D.C., 1993.Search in Google Scholar

18. Cochinwala, M., V. Kurien et al. Improving Generalization with Active Learning. - Information Sciences, Vol. 137, September 2001, No 1-4, pp. 1-15.10.1016/S0020-0255(00)00070-0Search in Google Scholar

19. Breiman, L., J. Friedman et al. Classification and Regression Trees. CRC Press, July 1984. Search in Google Scholar

20. Hastie, T., R. Tibshirani, J. Friedman. The Elements of Statistical Learning. - Springer Verlag, August 2001.10.1007/978-0-387-21606-5Search in Google Scholar

21. Bilenko, M., R. Mooney et al. Adaptive Name Matching in Information Integration. - IEEE Intelligent Systems, Vol. 18, 2003, No 5, pp. 16-23.10.1109/MIS.2003.1234765Search in Google Scholar

22. Chang, K. C., B. He, C. Li, M. Patel, Z. Zhang. Structured Databases on the Web: Observations and Implications. - SIGMOD Record, Vol. 33, 2004, No 3, pp. 61-70.10.1145/1031570.1031584Search in Google Scholar

23. Cohen, W., J. Richman. Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. - In Proc. of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.10.1145/775047.775116Search in Google Scholar

24. Mc Callum, A., B. Wellner. Conditional Models of Identity Uncertainty with Application to Noun Coreference. - In: Proc. of Advances in Neural Information Processing Systems (NIPS’2004), 2004.Search in Google Scholar

25. Xiao, C., W. Wang, X. Lin et al. Efficient Similarity Joins for Near-Duplicate Detection. - ACM Transactions on Database Systems (TODS), Vol. 36, 2011, No 3, p. 15.10.1145/2000824.2000825Search in Google Scholar

26. Tejada, S., C. Knoblock, S. Minton. Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification. - In: Proc. of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.10.1145/775047.775099Search in Google Scholar

27. Rohit, A., S. Chaudhuri, V. Ganti. Eliminating Fuzzy Duplicates in Data Warehouses. - In: Proc. of 28th International Conference on Very Large Databases, 2002.Search in Google Scholar

28. Guha, S., N. Koudas et al. Merging the Results of Approximate Match Operations. - In: Proc. of 30th International Conference on Very Large Databases, 2004, pp. 636-647.10.1016/B978-012088469-8.50057-7Search in Google Scholar

29. Chaudhuri, S., V. Ganti, R. Motwani. Robust Identification of Fuzzy Duplicates. - In: Proc. of 21st IEEE International Conference on Data Engineering (ICDE’2005), 2005, pp. 865-876.Search in Google Scholar

30. Christen, P. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication. - IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 2012, No 9, pp. 1537-1555. 10.1109/TKDE.2011.127Search in Google Scholar

eISSN:: 1314-4081
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology

Journal RSS Feed

Duplicate Literature Detection for Cross-Library Search

Published Online: Jun 22, 2016

Page range: 160 - 178

DOI: https://doi.org/10.1515/cait-2016-0028

KeywordsInformation integration, digital library, duplicate detection, schema mapping, data cleaning

© by Wei Liu

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
Information integration, digital library, duplicate detection, schema mapping, data cleaning