.

Wednesday, February 6, 2019

Essay --

Chapter quadruple Related defecate there are several work and try out on text category with Arabic text and e precise work take the take up from some points and leave some others depend on typesetters case of study. in 68 make classification for Arabic text and the result was that very robust and reliable without morphological analysis, in 71 make comparative study employ N-Gram and using two legal communitys, Manhattan measure and Dices measure and make comparison between them and the result was the N-Gram with Dices measure better than using Manhattan measure and make experimental on four category, in other 83 Text Classification from Labeled and Unlabeled Documents using EM, Been proposed Algorithm use expectation - maximization with the naive Bayes classifier to learn from the documents labeled and non-labeled, The source step classifier using trains and documents named, and labels potentially Unnamed documents. And then trained on the new classifier using the labels f or all the documents, and is repeated to convergence. many researches are proposed and presented for the difficulty of the Arabic text classification In this section we mention the primary(prenominal) algorithms of these studies such(prenominal) as Decision tree 36, KNN 37,38,39,40, NB 17,41,42, N-Gram frequency 5,45,Rocchio 4, SVM 19,21,43, and standoffishness based classifier 46,47,48.Syiam et. al. 40 presented an intelligent Arabic text categorization system that utilize the KNN and Rocchio profile-based 50 classifiers to classify a set of Arabic text documents collected from iii Egyptians news paper called Al Ahram, Al Gomhoria, and Al Akhbar during the period from haughty 1998 to September 2004. the corpus contains 1132 documents with 39468 words and cover six topics. Three approaches were adopt as pre... ... Agency website. The corpus contain 1562 documents of different lengths belongs to six categories.The documents were normalized and preprocessed by removing digits , foreign words, punctuation marks, and stop-words. The Chi square method was used for device characteristic selection with various numbers of words ranging from 10 to 1000. The corpus was spied such as 70% of the documents were used for training the classifier while the remaining 30% of documents were used for testing. Three evaluation measures precision, recall, and F-measure were used to evaluate the performance of the NB classifier. Results showed that the NB classifier work well when the number of words grows. The NB classifier discover its top out for precision and F-measure when the number of selected words equal 800 words, while the peak for the recall measure was when the number of selected words equal to 700 words.

No comments:

Post a Comment