IT63-RE71 :: TextRank Algorithm Performance on Thai Text Summarization Using Various Word and Sentence Segmentations

ประสิทธิภาพของอัลกอริทึม TextRank ในการสรุปข้อความภาษาไทยโดยใช้การแบ่งส่วนคำและประโยคต่างๆ

details
This study observes the performance of the TextRank algorithm for extractive summarization of Thai text. The impact of different word and sentence segmentation techniques on summarization quality is explored. The TextRank algorithm is implemented to rank sentences based on their significance within the document, using cosine similarity for sentence comparison. The top-ranked sentences are selected to generate the final extractive summary. The ROUGE score is utilized for evaluation, a widely used metric for assessing text summarization.
tools & techniques
Environmental Setup The experiment was conducted in a flexible environment, utilizing the following configurations on a Kaggle notebook. - Baseline Configuration: Intel(R) Xeon(R) CPU @ 2.20 GHz with 4 cores 30 Gigabytes of RAM. - TPU Accelerated Configuration (TPU VM v3-8): Intel(R) Xeon(R) CPU @ 2.00 GHz with 96 cores 330 Gigabytes of RAM. Experimental Dataset This experiment utilizes the ThaiSum dataset, a large-scale collection of Thai text summarization pairs. These articles are all sourced from reputable online news sources, namely Thairath, Thai PBS, Prachatai, and The Standard. This dataset encompasses 358,868 articles and their journalist-written summaries. The Use of Open-source Libraries - PyThaiNLP for Thai natural language processing. - Modin and Pandas for data analysis and manipulation. - NumPy for numerical computing. - scikit-learn for constructing a TF-IDF matrix and similarity matrix. - NetworkX for graph formation and TextRank algorithm. - rouge provides metrics that compare the generated summaries to human-written references. - seaborn and Matplotlib for data visualization
author
MR.Songglod Petchamras
รหัสนักศึกษา 63130500042
songglod.tan@mail.kmutt.ac.th
MR.Surawit Nakgaew
รหัสนักศึกษา 63130500125
surawit.mick@mail.kmutt.ac.th
advisor
Nantapong Keandoungchun