A Multilingual Hybrid News Recommendation Framework for Educational Web Portals

Przha Barzan Mohialdeen; Sarkar Hasan Ahmed

doi:10.24017/science.2025.2.14

Authors

Przha Barzan Mohialdeen Information Technology Department, Technical College of Informatics, Polytechnic University of Sulaimani, Sulaymaniyah, Iraq https://orcid.org/0009-0009-6317-4837
Sarkar Hasan Ahmed Computer Network Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaymaniyah, Iraq. https://orcid.org/0000-0001-5729-073X

Abstract

University web portals increasingly serve as vital platforms for academic information sharing, yet effective news recommendation in resource-constrained, multilingual environments remains challenging due to limited labeled data, sparse user profiles, and linguistic diversity. This study presents a modular hybrid news recommendation framework tailored for educational web portals in low-resource settings. The approach integrates lexical methods, specifically Term Frequency–Inverse Document Frequency (TF–IDF) and Best Match 25 (BM25), with semantic retrieval based on Sentence-BERT (SBERT), combined with unsupervised clustering for topical diversification and a fuzzy-logic fusion layer to integrate heterogeneous similarity signals. A publicly available multilingual dataset of 1,389 university news articles was collected via a custom crawler, and a Flask-based API was implemented for real-time recommendation. Evaluation relies on an automatic hybrid ground truth generated by fusing SBERT, TF–IDF, and BM25 signals. On the ground truth subset, the hybrid model attains Precision@5 = 0.96 and NDCG@5 = 0.945, outperforming SBERT (Precision@5 = 0.93; NDCG@5 = 0.859), with improvements shown to be statistically significant (paired t-test on NDCG@5, p < 1e-5). Clustering enhances thematic diversity (entropy 1.697 vs. 0.032), reducing concentration on repeated announcements. Multilingual experiments demonstrate consistently high precision across Arabic, Kurdish, and English but reveal substantially lower recall for underrepresented languages, highlighting dataset imbalance and representation challenges. Fusion weights were tuned on a validation split to balance precision and recall while mitigating the dominance of any single signal across languages and content types. The proposed framework provides an interpretable and extensible solution for multilingual academic news recommendation in scenarios where interaction data are scarce, offering a practical foundation for future work on language-aware preprocessing, human validation of labels, and supervised re-ranking.

Keywords:

Multilingual News Recommendation, Educational Web Portals, Hybrid Recommendation Algorithms , Sentence-BERT , Term Frequency – Inverse Document Frequency, Best Matching 25

References

A. Iana, M. Alam, and H. Paulheim, “A survey on knowledge aware news recommender systems,” Semantic Web, vol. 15, no. 1, pp. 21–82, 2024. doi: 10.3233/SW-222991. DOI: https://doi.org/10.3233/SW-222991

J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez, “Recommender systems survey,” Knowledge-Based Systems, vol. 46, pp. 109–132, 2013. doi: 10.1016/j.knosys.2013.03.012. DOI: https://doi.org/10.1016/j.knosys.2013.03.012

S. S. Kundu, D. Sarkar, P. Jana, and D. K. Kole, “Personalization in education using recommendation system: An overview,” in Computational Intelligence in Digital Pedagogy. Singapore: Springer, 2020, pp. 85–111, doi: 10.1007/978-981-15-5258-8_5. DOI: https://doi.org/10.1007/978-981-15-8744-3_5

A. Agbeyangi and H. Suleman, “Advances and challenges in low resource environment software systems: A survey,” Informatics, vol. 11, no. 4, p. 90, 2024. doi: 10.3390/informatics11040090. DOI: https://doi.org/10.3390/informatics11040090

J. Lin et al., “How can recommender systems benefit from large language models: A survey,” ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–47, 2025. doi: 10.1145/3678004. DOI: https://doi.org/10.1145/3678004

Z. Chen, W. Gan, J. Wu, K. Hu, and H. Lin, “Data scarcity in recommendation systems: A survey,” ACM Transactions on Recommender Systems, vol. 3, no. 3, pp. 1–31, 2025. doi: 10.1145/3639063. DOI: https://doi.org/10.1145/3639063

K. M. Awlla, H. Veisi, and A. A. Abdullah, “Sentiment analysis in low resource contexts: BERT’s impact on Central Kurdish,” Language Resources and Evaluation, vol. 59, no. 1, pp. 1–31, 2025. doi: 10.1007/s10579-024-09720-2. DOI: https://doi.org/10.1007/s10579-024-09805-0

S. Bansal, K. Gowda, and N. Kumar, “Multilingual personalized hashtag recommendation for low resource Indic languages using graph based deep neural networks,” Expert Systems with Applications, vol. 236, p. 121188, 2024. doi: 10.1016/j.eswa.2023.121188. DOI: https://doi.org/10.1016/j.eswa.2023.121188

Y. Ge et al., “A survey on trustworthy recommender systems,” ACM Transactions on Recommender Systems, vol. 3, no. 2, pp. 1–68, 2025. doi: 10.1145/3652891. DOI: https://doi.org/10.1145/3652891

E. Purificato, L. Boratto, and E. W. De Luca, “User modeling and user profiling: A comprehensive survey,” arXiv preprint, arXiv:2402.09660, 2024. doi: 10.48550/arXiv.2402.09660.

S. R. Javaji and K. Sarode, "Multi BERT for Embeddings for Recommendation System," arXiv preprint arXiv:2308.13050, 2023. doi: 10.48550/arXiv.2308.13050.

B. Juarto and A. Suganda Girsang, "Neural collaborative with sentence BERT for news recommender system," International Journal on Informatics Visualization, vol. 5, no. 4, p. 448, 2021. doi: 10.30630/joiv.5.4.678. DOI: https://doi.org/10.30630/joiv.5.4.678

G. Yunanda, D. Nurjanah, and S. Meliana, “Recommendation System from Microsoft News Data using TF IDF and Cosine Similarity Methods,” Building of Informatics, Technology and Science, vol. 4, no. 1, 2022. doi:10.47065/bits.v4i1.1670. DOI: https://doi.org/10.47065/bits.v4i1.1670

S. E. Robertson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” Foundations and Trends® in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009. doi: 10.1561/1500000019 DOI: https://doi.org/10.1561/1500000019

X. Wang and F. Yuan, “Course Recommendation by Improving BM25 to Identify Students' Different Levels of Interests in Courses,” International Conference on New Trends in Information and Service Science, 2009. doi:10.1109/NISS.2009.104. DOI: https://doi.org/10.1109/NISS.2009.104

M. Kazemifard, “Emotional Arabic News Recommender System,” M.Sc. Thesis, 2017.

S. Alotaibi, and M. B. Khan, “Development of the recommender system of Arabic books based on the content similarity,” International Journal of Computer Science and Network Security, vol. 22, no. 8, 2022. doi: https://doi.org/10.22937/IJCSNS.2022.22.8.23

C. Yin and Z. Zhang, "A Study of sentence similarity based on the All-minilm-l6-v2 model with 'Same Semantics, Different Structure' after fine tuning," Second International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI), 2024, pp. 677–684, Atlantis Press. doi: 10.2991/978-94-6463-540-9_69. DOI: https://doi.org/10.2991/978-94-6463-540-9_69

S. Badawi, A. M. Saeed, S. A. Ahmed, P. A. Abdalla, and D. A. Hassan, “Kurdish news dataset headlines (KNDH) through multiclass classification,” Data in Brief, vol. 48, p. 109120, 2023. doi: 10.1016/j.dib.2023.109120. DOI: https://doi.org/10.1016/j.dib.2023.109120

A. A. Abdullah et al., “NER RoBERTa: Fine tuning RoBERTa for named entity recognition (NER) within low resource languages,” arXiv preprint, arXiv:2412.15252, 2024. doi: 10.48550/arXiv.2412.15252.

A. A. Abdullah, A. H. Gandomi, T. A. Rashid, S. Mirjalili, L. Abualigah, M. Živković, and H. Veisi, “The role of orthographic consistency in multilingual embedding models for text classification in Arabic script languages,” arXiv preprint, arXiv:2507.18762, 2025. doi: 10.48550/arXiv.2507.18762.

H. Wu, F. Dai, R. Lv, H. Dong, T. Su, Z. Liu, Y. Yang, Y. Jiang, and Z. Wang, “MIND: A large-scale dataset for news recommendation,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020, pp. 3597–3606. Available: https://aclanthology.org/2020.acl-main.331/ DOI: https://doi.org/10.18653/v1/2020.acl-main.331

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008. doi:10.1017/CBO9780511809071 DOI: https://doi.org/10.1017/CBO9780511809071

R. Burke, “Hybrid recommender systems: Survey and experiments,” User Modeling and User Adapted Interaction, vol. 12, no. 4, pp. 331–370, 2002. doi: 10.1023/A:1021240730564 DOI: https://doi.org/10.1023/A:1021240730564

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. Available: https://jmlr.org/papers/v12/pedregosa11a.html

D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003. doi: 10.1162/jmlr.2003.3.4-5.993. DOI: https://doi.org/10.1162/jmlr.2003.3.4-5.993

P. J. Rousseeuw, "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, vol. 20, pp. 553-65 1987. doi:10.1016/0377-0427(87)90125-7. DOI: https://doi.org/10.1016/0377-0427(87)90125-7

D. L. Davies and D. W. Bouldin, "A cluster separation measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI‑1, no. 2, pp. 224--227, 1979. doi:10.1109/TPAMI.1979.4766909. DOI: https://doi.org/10.1109/TPAMI.1979.4766909

L. Hubert and P. Arabie, "Comparing partitions," Journal of Classification, vol. 2, no. 1, pp. 193–218, 1985. doi: 10.1007/BF01908075. DOI: https://doi.org/10.1007/BF01908075

M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics, Singapore, 2009, pp. 1003–1011. doi: 10.3115/1690219.1690287 DOI: https://doi.org/10.3115/1690219.1690287

S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning-based recommender system: A survey and new perspectives,” ACM Computing Surveys, vol. 52, no. 1, pp. 1–38, 2019. doi: 10.1145/3285029 DOI: https://doi.org/10.1145/3285029