Skip to main content
Log in

Application of improved distributed naive Bayesian algorithms in text classification

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The naive Bayes classifier is a widely used text classification method that applies statistical theory to text classification. Due to the particularity of the text, related feature items may generate new semantic information, which may be lost when the traditional vector space model represents text. This paper mainly studies the construction and improvement of distributed naive Bayes automatic classification system. The application of Hadoop cloud computing in web page classification is one of the focuses of this article. Firstly, the text classification system and Bayesian classification model are analyzed and discussed, including the representation and extraction of text information, text classification methods and Bayesian text classification methods. Then, in view of the shortcomings of the above-mentioned naive Bayesian text classification method, when training text, we use the mutual information method to check the correlation between the feature sets generated after feature selection, and then combine the features with higher correlation degree appropriately. Through a series of tests, the experimental data show that the improved text classification system can achieve better classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Wegener D, Mock M, Adranale D, Wrobel S (2009) Toolkit-based high-performance data mining of large data on MapReduce clusters. In: IEEE International Conference on Data Mining Workshops. IEEE. 11048117, Miami, FL, USA. https://doi.org/10.1109/ICDMW.2009.34

  2. Xu J, Ma B (2014) Study of network public opinion classification method based on naive bayesian algorithm in hadoop environment. Appl Mech Mater 519–520:4

    Google Scholar 

  3. Jiang L, Li C, Wang S et al (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39

    Article  Google Scholar 

  4. Cao Y, Sun L, Han C et al (2018) Improved side information generation algorithm based on naive Bayesian theory for distributed video coding. IET Image Process 12(3):354–360

    Article  Google Scholar 

  5. Nisa R, Qamar U (2015) A text mining based approach for web service classification. Inf Syst e-Bus Manag 13(4):751–768

    Article  Google Scholar 

  6. Diab DM, El Hindi KM (2017) Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification. Appl Soft Comput 54:183–199

    Article  Google Scholar 

  7. Wong Tzu-Tsung (2014) Generalized Dirichlet priors for Naive Bayesian classifiers with multinomial models in document classification. Data Min Knowl Discov 28(1):123–144

    Article  MathSciNet  Google Scholar 

  8. Guan G, Guo J, Wang H (2014) Varying Naïve Bayes models with applications to classification of chinese text documents. J Bus Econ Stat 32(3):445–456

    Article  Google Scholar 

  9. Jing-Hui LI, Xiao-Gang Z, Hua C et al (2013) Improved algorithm for learning hidden Naive Bayes. J Chin Comput Syst 21(10):1361–1371

    Google Scholar 

  10. Yang B, Lei Y, Yan B (2016) Distributed multi-human location algorithm using Naive Bayes classifier for a binary pyroelectric infrared sensor tracking system. IEEE Sens J 16(1):216–223

    Article  Google Scholar 

  11. Zhang X, Jiang J, Hong R et al (2015) Accelerated image classification algorithm based on naive Bayes K-nearest neighbor. Beijing Hangkong Hangtian Daxue Xuebao/J Beijing Univ Aeronaut Astronaut 41(2):302–310

    Google Scholar 

  12. Wang S, Jiang L, Li C (2015) Adapting naive Bayes tree for text classification. Knowl Inf Syst 44(1):77–89

    Article  Google Scholar 

  13. Chettri R, Pradhan S, Chettri L (2015) Internet of things: comparative study on classification algorithms (k-NN, Naive Bayes and case based reasoning). Int J Comput Appl 130(12):7–9

    Google Scholar 

  14. Jiang JC, Lin TY (2013) Mahalanobis-Taguchi system and selective Naive Bayesian algorithm for multivariate pattern recognition. J Comput Theor Nanosci 19(2):638–641

    Google Scholar 

Download references

Acknowledgements

The study was supported by “Chinese National Natural Science Fund Project, China (Grant No. 61802271).”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyi Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, H., Zeng, X. & Yao, C. Application of improved distributed naive Bayesian algorithms in text classification. J Supercomput 75, 5831–5847 (2019). https://doi.org/10.1007/s11227-019-02862-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02862-1

Keywords

Navigation