Welcome to ICCSEA 2024

14th International Conference on
Computer Science, Engineering and
Applications (ICCSEA 2024)

November 16 ~ 17, 2024, Zurich, Switzerland



Accepted Papers
Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts

Naseela Pervez1 and Alexander J. Titus1,2,3, 1Information Sciences Institute, University of Southern California, 2Iovine and Young Academy, University of Southern California, 3In Vivo Group

ABSTRACT

Large language models (LLMs) are increasingly utilized to assist in scientific and academic writing, helping authors enhance the coherence of their articles. Previous studies have highlighted stereotypes and biases present in LLM outputs, emphasizing the need to evaluate these models for their alignment with human narrative styles and potential gender biases. In this study, we assess the alignment of three prominent LLMs—Claude 3 Opus, Mistral AI Large, and Gemini 1.5 Flash—by analyzing their performance on benchmark text-generation tasks for scientific abstracts. We employ the Linguistic Inquiry and Word Count (LIWC) framework to extract lexical, psychological, and social features from the generated texts. Our findings indicate that, while these models generally produce text closely resembling human-authored content, variations in stylistic features suggest significant gender biases. This research highlights the importance of developing LLMs that maintain a diversity of writing styles to promote inclusivity in academic discourse.

Keywords

Large Language Models (LLMs), Text Generation, Gender Bias, Linguistic Inquiry and Word Count (LIWC), Computational Linguistics.


Hyperparameter Optimization for Search Relevance in E-commerce

Manuel Dalcastagn´e and Giuseppe Di Fabbrizio, VUI, Inc., Boston, USA

ABSTRACT

The configuration of retrieval and ranking strategies in search engines is traditionally done manually by search experts in a time-consuming and often irreproducible process. A typical use case is field boosting in keyword-based search, where the weights of different fields are tuned in an endless trial-and-error process to obtain what seems to be the best possible results on a small set of manually picked user queries that do not always generalize as expected. Hyperparameter optimization (HPO) methods can be employed to automatically tune search engines and solve these problems. To the best of our knowledge, there has been little work in the research community regarding the application of HPO to search relevance in e-commerce. This study demonstrates the effectiveness of HPO techniques for search relevance in e-commerce and provides insights into the impact of field boosting, retrieval query structure, and query understanding on relevance. Differential evolution (DE) optimization achieves up to 13% improvement in terms of NDCG@10 over baseline search configurations on a publicly available dataset. Also, we provide guidelines on the application of HPO to search relevance in e-commerce, addressing the characteristics of search spaces, the multifidelity of objective functions, and the use of more than one metric for multi-objective optimization.

Keywords

Hyperparameter optimization, differential evolution, e-commerce search relevance optimization.


Scalable Query Understanding for E-commerce: an Ensemble Architecture With Graph-based Optimization

Manuel Dalcastagn´e and Giuseppe Di Fabbrizio, VUI, Inc., Boston, USA

ABSTRACT

Query understanding is a critical component of e-commerce platforms, enabling accurate interpretation of users’ intents and efficient retrieval of relevant products. This paper presents a study on scalable query understanding techniques applied to a real use case in the e-commerce grocery domain. We propose a novel architecture that combines deep learning models with traditional ML models to capture query nuances and provide robust performance. Our model ensemble approach aims to capture the nuances of user queries and provide robust performance across various query types and categories. We conduct experiments on real-life datasets and demonstrate the effectiveness of our proposed solution in terms of accuracy and scalability. An optimized graphbased architecture using Ray enables efficient processing of high-volume traffic. The experimental results highlight the benefits of combining diverse models.

Keywords

Query classification, query understanding, distributed and scalable machine learning.


Identifying Students at Risk From Online Clickstream Data Using Machine Learning

Hadeel Alhabdan1 and Ala Alluhaidan2, 1College of Computing and Information Sciences, Princess Nourah bint Abdulrahman University,Riyadh, Saudi Arabia 2Department of Information Systems, College of Computing and Information Sciences, Princess Nourah bint Abdulrahman University,Riyadh, Saudi Arabia

ABSTRACT

This study examines the use of four machine learning methods to identify students at risk from online clickstream data for 60 courses and the students grades in these courses. To identify students at risk of failing, the study classified students with grades of “F” or “D” as at-risk, while students with grades of “A,” “B,” or “C” were classified as safe. Logistic regression, decision tree, neural networks and random forest models were used, with each model subjected to eight folds cross-validation. The decision tree model had the lowest performance across all four metrics, followed by the logistic regression model, while the neural network model showed marginally superior accuracy, sensitivity, and F1 score compared to the random forest model. The four machine learning models were found to be reliable in identifying at-risk students based on the provided online clickstream data.

Keywords

Decision tree, Logistic regression, Neural networks, Online clickstream data, Random Forest. .


Comparative Analysis on Brain Tumor Classification using Transfer Learning

Hanan AlJuaid and Noorah Al-Sultan, Department of Computer Science, Princess Noura University, Princess Noura University, Riyadh, KSA

ABSTRACT

Brain tumor classification is paramount in accurate diagnosis and treatment planning, with significant implications for patient outcomes. This research project focuses on the classification of brain tumors using deep learning techniques, specifically transfer learning in Convolutional Neural Networks (CNNs). The dataset used in this study is obtained from National Guard Hospital. The motivation for this study arise from the challenges associated with accurate brain tumor classification and the potential advantages offered by modern deep learning models. Transfer learning is employed to leverage the knowledge and pre-trained weights of existing CNN models trained on large-scale datasets. This approach enables efficient and accurate classification of brain tumor images. The performance of different pre-trained CNN models, fine-tuned specifically for brain tumor classification, is compared through experimentation and evaluation. The effectiveness and reliability of these models are assessed using key performance metrics such as accuracy, precision, and recall. The objective of this research is to identify the most accurate and robust model for brain tumor classification. The selected models for evaluation are VGG16, ResNet50, InceptionV3, and Xception. The accuracy results of these models are reported as 91.47%, 86.80%, 82.67%, and 82.13%, respectively.

Keywords

Convolutional Neural Network (CNN) · Transfer Learning · Brain Tumor. .


Smishing Detection Application Using AI

Hanan Alossimi, Noura alotaibi, Alhnouf alsubaie, Hanan Aljuaid, Department of Computer Science, Princess Noura University, Princess Nourah bint Abdurahman University, Riyadh, KSA

ABSTRACT

Due to the rapid advancement and widespread integration of technology into various faces of our lives, including work, entertainment, communication, and finance, significant transformations have occurred. These changes have brought about a paradigm shift in the way we perceive and interact with the world. There is a high risk that attackers will reveal this information because day after day they try to use a new method to get what they want. Nowadays, the easiest way to gain access or obtain sensitive information about users is to send phishing messages via SMS, so a phishing detection system is essential to keep everything safe because responding to phishing messages or accessing a URL can cause great harm to a person. The general main of this project is to build a smishing detection application for Arabic SMS messages, by building a model capable of accurately text classifying. We achieved the goal with our application Etiqa. Therefore, we tried hard to choose an effective model and achieved this by selecting a hybrid CNN-LSTM deep learning model [1]. It has been proven effective in classifying SMS messages in Arabic, achieving an accuracy of 98%, so the dataset has been collected, and processed using tools for natural language analysis, especially for the Arabic language. The algorithm was developed using Python and based on designing a simple user interface that is easy to use by using Dart programming language on the Flutter framework for Android users. Finally, the interface was integrated with our model by using Fast API. In the future, we aim in this work to develop and expand the effectiveness of the system.

Keywords

Smishing – fraud – SMS – Detection - Artificial Intelligence. .


Systematic Overview of Machine Learning Applied for Propaganda Social Impact Research

Darius Plikynas, Institute of Data Science and Digital Technologies, Department of Mathematics and Informatics, Vilnius University, Vilnius, Lithuania

ABSTRACT

The proliferation of fake news, propaganda, and disinformation (FNPD) in the era of generative AI and information warfare poses significant challenges to societal cohesion and democratic processes. This systematic review examines recent advances in machine learning (ML) techniques for detecting and assessing the social impact of FNPD. Employing the PRISMA framework, we analyze promising ML/DL methodologies and hybrid approaches in combating the spread of conspiracy theories, echo chambers, and filter bubbles that contribute to social polarization and radicalization. Our findings highlight the potential of AI-driven solutions in identifying malicious social media accounts, organized troll networks, and bot activities that target specific demographics and manipulate public discourse. We also explore future research directions for developing more robust FNPD detection systems and mitigating the fragmentation of social networks of trust and cooperation. This review provides valuable insights for researchers and policymakers addressing the complex challenges of information integrity in the digital age.

Keywords

Machine Learning, Deep learning, Propaganda and Disinformation, Social Impact Analysis, PRISMA Systematic Review.


A Survey of Evaluating Question-answering Techniques in the Era of Large Language Model Llm

Khaled N. Al Muteb, Bader K. Alshemaimri and Jassir A. Altheyabi, College of Computer and Information Science, King saud university, Riyadh, Riyadh region, kingdom of Saudi Arabia

ABSTRACT

Large language models (LLMs) are gaining increasing popularity in both academia and industry due to their exceptional performance in various applications. As LLMs continue to play a crucial role in research and everyday use, their evaluation becomes increasingly crucial, not only at the task level but also at the societal level for a better understanding of their potential risks. In recent years, significant efforts have been dedicated to examining LLMs from different perspectives. This article presents a comprehensive review of the evaluation methods for LLMs, with a specific focus on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide a comprehensive overview of the evaluation tasks, including general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other domains. Secondly, we delve into the evaluation methods and benchmarks, which serve as critical components in assessing the performance of LLMs, addressing the questions of "where" and "how". We then summarize the instances of success and failure of LLMs in different tasks. Finally, we shed light on several important aspects that need to be considered in the evaluation process of LLMs.

Keywords

Larga language model, Question Answering, LLMs Evaluation, Knowledge base question answering, Open domain questions answering.


Error Analysis and Cognitive Biases in Named Entity Recognition (Ner): a Comparative Study of English and Turkish News Articles

Tolga Sahin, Department of Language Sciences, Ca’ Foscari University Venice, Venice, Italy

ABSTRACT

This study investigates the performance of Named Entity Recognition (NER) tools in identifying such entities through a comparative method within English and Turkish news articles. It aims to examine potential biases in both languages (recognition accuracy) and connect these results to cognitive biases to human language processing. Using, spaCy, the first 50 lines of Turkish and English newspapers are analyzed. Through the analysis, it is revealed that the NER tool achieved a high accuracy of 93.55% in English, resulting in 87 correctly identified entities out of 93; while achieving 29.11% accuracy in Turkish with 23 entities out of 70 correctly identified. Clearly, the tool exhibited a higher rate of misclassifications and missed entities in Turkish, suggesting a bias toward non-Western names and underlining the challenges of recognizing culturally specific entities. The results suggest questions about the implications of NER biases in AI applications and its parallels with cognitive biases in humans. Such similarities tend to show how human recognition of names across different cultures tend to be similar with artificial/machine mind. The results also tell about the need for improved training data and methodologies to enhance NER performance in underrepresented languages and contribute to the ongoing discourse on ethical AI and inclusive language.

Keywords

Named Entity Recognition (NER), Cognitive Bias, Error Analysis, Multilingual NLP, NLP.


Distributed Blockchain-based Firmware Update Architecture for Iot Environments

Jes´us Rugarc1, Santiago Figueroa-Lorenzo2,3, Saioa Arrizabalaga2,3, and Nasibeh Mohammadzadeh2, 1University of the Basque Country UPV/EHU, Donostia / San Sebasti´an-20018, Spain, 2CEIT-Basque Research and Technology Alliance (BRTA), Donostia / San Sebasti´an-20018, Spain, 3School of Engineering, University of Navarra, Tecnun, Donostia / San Sebasti´an-20018, Spain

ABSTRACT

The Internet of Things (IoT) is one of the most rapidly expanding fields of technology. IoT devices often have limited capabilities when it comes to security, and have been shown to have vulnerabilities that are often exploited by malicious agents. To fix those vulnerabilities, firmware updates are often needed. The process, however, can also be vulnerable. A secure update mechanism is needed to create a more secure IoT environment. This paper proposes a secure distributed IOT firmware update solution using Hyperledger Fabric Blockchain and IPFS based on the RFC 9019 and previously proposed frameworks, contributing with a strong manifest format and defining authentication and verification procedures. More importantly, we provide a public implementation on which performance tests were made, demonstrating the promising feasibility of using distributed ledger technologies for this problem.

Keywords

IoT, Hyperledger Fabric Blockchain, Security, Distributed solution, Firmware update.


Clustering Solidity Smart Contracts by Similarity

Ansumana F Jadama and Aditya Dilip Thakur, Faculty of Computer Science, University of New Brunswick Fredericton, NB, Canada

ABSTRACT

This paper addresses the challenging task of clustering source code files within Ethereum smart contracts. The intricate structure of these files, encompassing contracts, interfaces, and libraries, presents significant challenges in identifying syntactic similarities. Our methodology employs a detailed analysis of structural, behavioral, and contextual characteristics, integrating both syntactic and semantic features. The objective is to effectively cluster source code files, thereby facilitating a deeper understanding and systematic categorization of smart contracts. This comprehensive approach aims to enhance insights into the architectural patterns and functionalities of blockchain applications, supporting improved governance and management of these systems.

Keywords

Smart Contracts, Blockchain, Source Code Clustering, Syntactic Similarity, Semantic Features.


Three Variations of Heads or Tails Game for Bitcoin

Cyril Grunspan1, Ricardo Perez-Marco2, 1Leonard de Vinci P ´ ole Univ, Finance Lab ˆ Paris, France, 2CNRS, IMJ-PRG, Univ. Paris Cite´ Paris, France

ABSTRACT

We present three very simple variants of the classic Heads or Tails game using chips, each of which contributes to our understanding of the Bitcoin protocol. The first variant addresses the issue of temporary Bitcoin forks, which occur when two miners discover blocks simultaneously. We determine the threshold at which an honest but temporarily “Byzantine” miner persists in mining on their fork to save his orphaned blocks. The second variant of Heads or Tails game is biased in favor of the player and helps to explain why the difficulty adjustment formula is vulnerable to attacks of Nakamoto’s consensus. We derive directly and in a simple way, without relying on a Markov decision solver as was the case until now, the threshold beyond which a miner without connectivity finds it advantageous to adopt a deviant mining strategy on Bitcoin. The third variant of Heads or Tails game is unbiased and demonstrates that this issue in the Difficulty Adjustment formula can be fully rectified. Our results are in agreement with the existing literature that we clarify both qualitatively and quantitatively using very simple models and scripts that are easy to implement.


A Cross-chain Protocol Based on Main-subchain Architecture

Feng Zhang1, Le Yu1, Rong Wang2 and Wei-Tek Tsai3, 1China Mobile Information Security Management and Operation Center, Beijing, China, 2Guangzhou Institute of Software, Guangzhou 510006, China, 3College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China

ABSTRACT

We present three very simple variants of the classic Heads or Tails game using chips, each of which contributes to our understanding of the Bitcoin protocol. The first variant addresses the issue of temporary Bitcoin forks, which occur when two miners discover blocks simultaneously. We determine the threshold at which an honest but temporarily “Byzantine” miner persists in mining on their fork to save his orphaned blocks. The second variant of Heads or Tails game is biased in favor of the player and helps to explain why the difficulty adjustment formula is vulnerable to attacks of Nakamoto’s consensus. We derive directly and in a simple way, without relying on a Markov decision solver as was the case until now, the threshold beyond which a miner without connectivity finds it advantageous to adopt a deviant mining strategy on Bitcoin. The third variant of Heads or Tails game is unbiased and demonstrates that this issue in the Difficulty Adjustment formula can be fully rectified. Our results are in agreement with the existing literature that we clarify both qualitatively and quantitatively using very simple models and scripts that are easy to implement.

Keywords

Cross-chain protocols; main-subchain architecture; relay chain technology; sharded blockchain; cross-chain transactions.


Blockchain Adoption in Data Spaces With an Edc-hfb Interface

Yasiru Witharanage1,2, Santiago Figueroa-Lorenzo1,2, 3, and Saioa Arrizabalaga1,2, 3, 1CEIT-Basque Research and Technology Alliance (BRTA), Manuel Lardizabal 15, Donostia / San Sebastian, 20018, Basque Country, Spain, 2Universidad de Navarra, Tecnun, Manuel Lardizabal 13, Donostia / San Sebastian, 20018, Basque Country, Spain, 3Institute of Data Science and Artificial Intelligence (DATAI), Universidad de Navarra, Edificio Ismael S´anchez Bella, Campus Universitario, 31009-Pamplona, Spain

ABSTRACT

Data is a fundamental asset for organizations. Data spaces emerge as distributed structures that promote secure and reliable data sharing. The International Data Space (IDS) protocol is currently one of the main standards in the data space environment. The growing evolution of data spaces implies the emergence of challenges associated with aspects such as digital sovereignty, decentralization, veracity, security and privacy protection. Distributed Ledger Technologies (DLTs) are emerging as information structures that can provide solutions to these challenges. This paper proposes the migration of trust entities in the IDS architecture, such as the Clearing House, to Hyperledger Fabric Blockchain infrastructure as a solution mechanism to the above challenges. It also proposes the creation of an Eclipse Dataspace complement, a Hyperledger Fabric Blockchain interface (EDC-HFB), that guarantees the interaction between an EDC Connector and the blockchain.

Keywords

Blockchain, Data spaces, EDC, HFB


Development of a Co-design Architecture (Hardware/software) for Real-time Video Encryption Based Chaos

SID Hichem1 and AZZAZ Mohamed Salah1, SADOUDI Said2, 1Electronic and Digital Systems Laboratory, EMP, Algiers, Algeria, 2Telecommunications Laboratory, EMP, Algiers, Algeria

ABSTRACT

The article presents a novel Codesign Architecture (Hardware/Software) for Real-Time Video Encryption based on Chaos. It features an auto-switched Hybrid Chaotic Key Generator integrated into a flowsymmetric cryptosystem for encrypting video streams. Using the Genesys-2 FPGA platform and Pmod CAM-OV7670 camera, the system ensures synchronized key parameters for accurate decryption. The architecture addresses key availability challenges while balancing security, performance, hardware resources and a high level of security of the real-time video stream. Experimental results demonstrate its efficacy for efficient embedded ciphering communication systems specially for real-time video stream.

Keywords

Video, DSP, Chaos, Key Generator, RNG, Cryptography, NIST, Xilinx, Vivado, FPGA, Embeded system Genesys 2, VHDL, real-time, Vernam OTP, symmetric flow, synchronisation.


Influence of Background Color on 6d Pose Tracking Accuracy

Andreas Hubert1, Konrad Doll1, and Bernhard Sick2, 1University of Applied Sciences Aschaffenburg, Germany, 2University of Kassel, Germany

ABSTRACT

Fast 6D pose tracking is a critical component in numerous applications ranging from robotics to augmented reality. A notable method for addressing this challenge involves simulating the last known pose and comparing it with the current one, a process central to the SE(3)-TrackNet approach, which is known for its reliability. Traditionally, this method employs a uniform black background for the simulated input. This study challenges the standard practice by demonstrating that the choice of background color can significantly influences the accuracy of 6D pose estimation. Through a series of experiments, we provide results showing that background color is a critical factor for the effectiveness of the SE(3) TrackNet approach.

Keywords

machine learning, computer vision, deep learning, 6D pose estimation, data generation, simulated data.


Security Assurance and Repudiation Threats

Srinivas Rao Doddi1 and Akshay Krishna Kotamraju2, 1Department of Information Technology, University of Los Angeles, Los Angeles, California, USA, 2Founder Non-profit , Think Cosmos, Saratoga, California, USA

ABSTRACT

Social engineering attacks pose a serious threat to individuals and various entity’s including financial and non-financial. This paper presents a converged security framework towards a comprehensive prevention and detection controls mechanism . It also explores different types of social media attributes ,leverage data mining engineering tactics. The paper also discusses associated limitations and challenges and recommends security best practices, and proposes an integrated framework. The paper proposes a converged security framework that allows various parties from fraud, cyber, and physical security to collaborate. Additionally , the proposed framework through social media mining unearths scams related information in a protected method by preserving security and privacy.

Keywords

Security, Assurance, Authentication, Information, Policy.


Optimizing Social Welfare in Electricity Markets: a Comparative Study of Evolutionary Algorithms — Ga, Ngsa-ii, De and Milp Branch and Cut

Ali Abbasi1, 2 , Jean Gomes1, Filipe Alves1, Pedro Carvalho1, 2, Jo˜ao Luis Sobral2, and Ricardo Rodrigues1, 1DTx — Digital Transformation CoLAB, University of Minho, 4800-058 Guimar˜aes, Portugal, 2University of Minho, 4704-553 Braga, Portugal

ABSTRACT

This work addresses the problem of maximizing social welfare in electricity markets by utilizing advanced optimization techniques to enhance both operational and economic efficiency. It explores the application of evolutionary algorithms (EAs), specifically Genetic Algorithms (GA), Differential Evolution (DE), and Non-dominated Sorting Genetic Algorithm II (NSGA-II), bench- marking their performance against exact solutions from the Branch and Cut method. A com- prehensive hyperparameter optimization (HPO) was conducted using a Tree-structured Parzen Estimator (TPE), Covariance Matrix Adaptation Evolution Strategy (CMA-ES), and Random Search to fine-tune each algorithm’s performance parameters. The study compares the exploration and exploitation capabilities of TPE and CMA-ES with Random Search in the context of HPO for GA, NSGA-II, and DE. This systematic approach highlights the relative strengths and weak- nesses of different EAs in complex market scenarios, offering insights into optimal configurations for achieving the best social welfare outcomes in electricity markets.

Keywords

Electricity Market, Social Welfare, Evolutionary Algorithms, Hyperparameter Opti-mization.


The Gambit of De-dollarization: Unveiling New Currency Frontiers Through NLP

Vineeth Kumar Reddy Anumula and Niskhep A Kulli Sacred Heart University, CT 06825 , USA

ABSTRACT

In the light of heightened geopolitical and economic volatility, conversation around de-dollarization and the rise of alternative currencies has intensified, sparking widespread public debate. This article builds on analyzing 6000 tweets retrieved from platform X, utilizing advanced natural language processing (NLP) techniques—sentiment analysis, tweet classification using BERT (Bidirectional Encoder Representations from Transformers), named entity recognition (NER), and Latent Dirichlet Allocation (LDA) modeling—to delve into these critical discussions. This study uncovers key entities and other emerging financial technologies, revealing a complex and evolving narrative. The findings underscore the critical role of social media as a barometer for global economic trends, particularly in light of ongoing debates surrounding currency alternatives. With geopolitical tensions mounting, the discourse on financial sovereignty, cryptocurrencies, and national economic strategies is becoming increasingly polarized. Sentiment analysis reveals stark contrasts in public opinion, while LDA modeling uncovers dominant themes driving the conversation. This research is especially timely, as the growing intensity of discussions on currency dominance and financial security demands a more nuanced understanding. By offering a real-time analysis of these debates, this paper provides essential insights for policymakers, economists, and academics. As the global financial landscape shifts, our findings serve as a crucial layer in the academic discourse, revealing how technology, public opinion, and geopolitics intertwine to shape the future of global economies.

Keywords

Natural language processing, Sentiment analysis, Entity recognition, Latent Dirichlet Allocation(LDA), De-Dollarization.


Emulating a Computing Grid in a Local Environment for Feature Evaluation

Jananga Kalawana1, Malith Dilshan1, Kaveesha Dinamidu1, Kalana Wijethunga1, 2, Maksim Stortvedt2, Indika Perera1, 1Department of Computer Engineering, University of Moratuwa, Bandaranayake Mawatha, 10400, Moratuwa, Sri Lanka, 2CERN, Esplanade des Particules 1, 1217, Meyrin, Switzerland

ABSTRACT

The necessity for complex calculations in high-energy physics and large-scale data analysis has led to the development of computing grids, such as the ALICE computing grid at CERN. These grids outperform traditional supercomputers but present challenges in directly evaluating new features, as changes can disrupt production operations and require comprehensive assessments, entailing significant time investments across all components. This paper proposes a solution to this challenge by introducing a novel approach for emulating a computing grid within a local environment. This emulation, resembling a mini clone of the original computing grid, encompasses its essential components and functionalities. Local environments provide controlled settings for emulating grid components, enabling researchers to evaluate system features without impacting production environments. This investigation contributes to the evolving field of computing grids and distributed systems, offering insights into the emulation of a computing grid in a local environment for feature evaluation.

Keywords

Computing Grid, Feature Evaluation, Grid Replica, Distributed Computing.


Semantic Textual Similarity in Kazakh: Dataset Development and Comparative Model Analysis

Mamyr Altaibek, Sharipbay Altynbek, Razakhova Bibigul, Zulhazhav Altanbek, Kazakhstan Academy of Artificial Intelligence, Astana, Kazakhstan

ABSTRACT

Semantic textual similarity assesses the degree of shared meaning between two textual entities. This research advanced the field by translating the STS-b evaluation dataset into Kazakh using the Google API, thereby facilitating studies in a new linguistic context. We employed various pre-trained models including BERT, SBERT, RoBERTa, and Language-agnostic BERT Sentence Embedding (LaBSE) to generate sentence embeddings. The experimental framework also integrated a Kazakh-translated SNLI dataset. Model effectiveness was quantified through Pearson and Spearman correlation coefficients, comparing predicted similarity scores against the gold standard labels. The most effective results emerged from an initial fine-tuning of the BERT model on the Kazakh-translated SNLI dataset, followed by subsequent refinements utilizing the STSb-kk dataset with the mentioned contrastive learning techniques process.

Keywords

Semantic Textual Similarity, STSb Dataset, Natural Language Inference, Kazakh language.


Semantic Textual Similarity in Kazakh: Dataset Development and Comparative Model Analysis

Mamyr Altaibek, Sharipbay Altynbek, Razakhova Bibigul, Zulhazhav Altanbek, Kazakhstan Academy of Artificial Intelligence, Astana, Kazakhstan

ABSTRACT

Semantic textual similarity assesses the degree of shared meaning between two textual entities. This research advanced the field by translating the STS-b evaluation dataset into Kazakh using the Google API, thereby facilitating studies in a new linguistic context. We employed various pre-trained models including BERT, SBERT, RoBERTa, and Language-agnostic BERT Sentence Embedding (LaBSE) to generate sentence embeddings. The experimental framework also integrated a Kazakh-translated SNLI dataset. Model effectiveness was quantified through Pearson and Spearman correlation coefficients, comparing predicted similarity scores against the gold standard labels. The most effective results emerged from an initial fine-tuning of the BERT model on the Kazakh-translated SNLI dataset, followed by subsequent refinements utilizing the STSb-kk dataset with the mentioned contrastive learning techniques process.

Keywords

Semantic Textual Similarity, STSb Dataset, Natural Language Inference, Kazakh language.


Semantic Textual Similarity in Kazakh: Dataset Development and Comparative Model Analysis

Mamyr Altaibek, Sharipbay Altynbek, Razakhova Bibigul, Zulhazhav Altanbek, Kazakhstan Academy of Artificial Intelligence, Astana, Kazakhstan

ABSTRACT

Semantic textual similarity assesses the degree of shared meaning between two textual entities. This research advanced the field by translating the STS-b evaluation dataset into Kazakh using the Google API, thereby facilitating studies in a new linguistic context. We employed various pre-trained models including BERT, SBERT, RoBERTa, and Language-agnostic BERT Sentence Embedding (LaBSE) to generate sentence embeddings. The experimental framework also integrated a Kazakh-translated SNLI dataset. Model effectiveness was quantified through Pearson and Spearman correlation coefficients, comparing predicted similarity scores against the gold standard labels. The most effective results emerged from an initial fine-tuning of the BERT model on the Kazakh-translated SNLI dataset, followed by subsequent refinements utilizing the STSb-kk dataset with the mentioned contrastive learning techniques process.

Keywords

Semantic Textual Similarity, STSb Dataset, Natural Language Inference, Kazakh language.


Comparison Between Cnn and Gnn Pipelines for Analysing the Brain in Development

Antoine Bourlier1, 2, Elodie Chaillou1, and Jean-Yves Ramel2, 1LIFAT. 37000 Tours, France, 2INRAE, CNRS, Université de Tours, 37380 Nouzilly, France

ABSTRACT

In this study, we present a novel pipeline designed for the analysis and comparison of non-conventional animal models, such as pigs and sheep, without relying on neuroanatomical priors. This innovative approach combines histogram-based segmentation with graph neural networks (GNNs) to overcome the limitations of traditional methods. Conventional tools often depend on predefined anatomical atlases and are typically limited in their ability to adapt to the unique characteristics of developing brains or non-conventional animal models. By generating regions of interest directly from MR images and constructing a graph representation of the brain, our method eliminates biases associated with predefined templates and avoids the black-box issues inherent in convolutional neural networks (CNNs). Our results show that the GNN-based pipeline is significantly more efficient in terms of execution time compared to CNNs, while maintaining reasonable accuracy. However, the GNN approach yields slightly lower performance in brain age prediction. Despite this, GNNs offer notable advantages, including improved interpretability and the ability to model complex relational structures within brain data. This flexibility allows for a more nuanced analysis of brain morphology and function. Future research will focus on refining graph construction techniques, incorporating edge features, and exploring various GNN architectures to enhance the pipeline’s performance. Overall, our approach provides a promising solution for unbiased, adaptable, and interpretable analysis of brain MRIs, particularly for developing brains and non-conventional animal models.

Keywords

Graph, machine learning, MRI, segmentation.


Theoretical Approach on Assessing the Accuracy of the Shortest Path Non-optimal Algorithm for 2-dimensional Grids With Obstacles

Chenghao Mo1, Durham2, 1Oyster River High School, 2NH, United States

ABSTRACT

In many applications such as urban navigation and robotics, finding the shortest path in a 2D grid is crucial but computationally expensive using traditional optimal algorithms like Floyd-Warshall or Dijkstra. These traditional algorithms guarantee to find the shortest path at the cost of time complexity, leading to a time-consuming computation, particularly for largescale grids. Non-optimal algorithms that trade accuracy for speed have emerged to address the issue. However, the impact of grid obstacle density on the accuracy of the algorithms has not been well understood. This paper presents a theoretical framework for evaluating the accuracy of two non-optimal algorithms. By integrating theoretical analysis with extensive experimental data, this paper demonstrates how obstacle density influences algorithm performance, and proposes a methodology to select the best non-optimal algorithms based on the grid obstacle density. The theoretical framework has practical implications for applications requiring rapid path finding in complex environments.

Keywords

Shortest Path Algorithms, Non-Optimal Algorithms, 2D Grids with Obstacles, Greedy Algorithm, Heuristic DFS, Algorithm Accuracy.


Optimization of Solar Energy Integration in Smart Grid Solutions

Shad Hasib Talukder, Rafi Abrar Kabir, Nazmus Sakib Rayhan, Shamima, Sultana, Jachi Sangma, and Md. Motaharul Islam, Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh

ABSTRACT

In recent years, many homes have installed solar panels to harness renewable energy, but without proper storage systems, this can lead to financial losses and inefficiencies. Our research offers a solution using smart grids, smart meters, and fuzzy rule-based algorithms to enhance solar energy efficiency and minimize these losses. By collecting data from IoT sensors, the system predicts financial impacts and recommends energy-saving practices. A real-time monitoring framework, connected to a central database, helps with decision-making and prevents future issues. With a focus on reducing financial losses through predictive analytics, our system provides a more comprehensive approach than existing solutions, leading to better energy management, cost savings and sustainability.

Keywords

Smart Grid, Smart Meter, Fuzzy rule-based expert system, Centralized Database, Solar Panel.


Jchaosindex: Measuring and Benchmarking Dispersion in Randomized Data

Jui Keskar, Metropolitan School, Frankfurt, Germany

ABSTRACT

Randomization of data is an ongoing need for various business reasons like design of clinical trials [2], or training an AI model [3], to name a few. To control the level of randomization, it is important to measure the level of randomness, i.e. unpredictability and dispersion, in the “randomized” data vis-à-vis the original data. Permutation entropy is an established techniques for measuring unpredictability and complexity of time series [4]. To measure dispersion in randomized data, a “Neighbour-displacement-delta” (NDD) based technique is proposed. JChaosIndex, a measure of dispersion, considers displacement of each data element as well as relative displacements of the neighbours of each data element. JChaosIndex measurement technique can be easily included in a programming language library or database methods or any algorithm. Importantly, this technique is domain-agnostic as it works purely on the indexes of the data record and not the actual data.

Keywords

Measure of Randomness, Data Dispersion, JChaosIndex, Permutation Entropy, Neighbour Displacement Delta.


Productivity in Construction (at the Example of the German Construction Sector by Comparison With Other National Construction Sectors)

Leif Laszig and Matthias Bahr, Department of Civil Engineering, Hochschule Biberach, Karlstr. 9-11, 88400 Biberach, Germany

ABSTRACT

A downward trend on the productivity growth rate of western industrial countries construction sectors has been observed since the 1970s of the twentieth century. Noted general causes for the observable productivity slowdown can be mainly seen in low investment (in capital equipment), deficient business organization and qualification, current technologies with only limited potential for growth and the ability of innovation capacities of companies. Further possible explanations range from methodological measurement errors, demographic and structural changes up to regulation. Productivity growth is mainly driven by a continuous improvement in the quality of input factors, for example relating to training and in the field of technological expertise, and further by the acceleration of product and process innovations. The construction industry is perceived to be underdeveloped in a technical sense, willing to forego high technology, and to innovate little. It is also considered to have to fight high barriers to innovation. Thus, innovations in the construction industry are distinguished from those in other industry sectors and the service sectors in that they are strongly process-oriented, incremental and often designed to solve a specific problem that has occurred in the short-term. To enable taking targeted measures to increase productivity growth, the question arises as to the concrete causes, effects and causal mechanisms as well as the intervening influences for the observable decline of productivity. This article deals with the relationship between the indicator of operational productivity and several external factors for which an explanatory power for the development of productivity is assumed. In addition to these internal factors, external factors such as structural and demographic change, the regulatory framework (laws, directives, guidelines, etc.) and the integration of the customer (or client) into the service provision process also have an impact on productivity. Such intervening variables influence the causal mechanism and mediate the dependent variable productivity through it. They must therefore be taken into account in the consideration, even if the actual goal of knowledge does not apply to them.

Keywords

productivity, value added chain, innovation, influence on productivity.


Unsupervised Learning of Shape Segment Point Distribution Models with the Em Algorithm

Abdullah A. AlShaher, Department of Computer Science and Information System, College of Business Studies Public Authority for Applied Education and Training, Kuwait

ABSTRACT

This paper demonstrates how 2D handwritten shapes can be classified by analyzing shape structure. The underlying framework is a one-layer architecture where the shapes are segmented to a series of connected segments. Each segment is represented by a set of uniformly distributed landmarks along the skeleton of the character. We follow by representing each segment using the Point Distribution Model (SPDM). We then capture shape variations by learning Gaussian mixture of segment point distribution models in a two-step Expectation Maximization algorithm. The approach is tested on a set of handwritten Arabic characters.

Keywords

Handwritten Arabic characters, Shape analysis, Point distribution models, Machine learning, Expectation Maximization Algorithm.


Road Infrastructure Defect Detection using Computer Vision

Norah A. AlSubaie, Ghada N. AlMutairi, Ghayda A. AlMalki and Sarah A. AlRumaih, Department of Computer Sciences, Princess Nourah University, Riyadh, Saudi Arabia

ABSTRACT

This research introduces "Jaddah," an innovative AI-based system for the automated detection of road infrastructure defects using advanced computer vision and machine learning techniques. The project overcomes significant limitations of traditional road inspection methods, which are often slow, labour-intensive, and prone to human error. Jaddah develops a mobile application that efficiently detects and classifies road defects, such as cracks and potholes, in real time. By utilizing a comprehensive dataset of high-resolution images, we enhance model training. The implementation of the YOLOv8-seg model enables precise defect localization and segmentation, achieving impressive accuracy in identifying and categorizing road anomalies. Performance metrics indicate robust results, ensuring reliable defect detection and contributing to improved infrastructure maintenance.

Keywords

Road Defect Detection, Road Infrastructure, Computer Vision, Machine Learning and Image Processing.


Enhancing Sound Processing in Children with Autism using Technology

Sana Alsubaie, Fatimah Alasmari, Daad Alsikhan and Reema Alsheddi, Department of Computer Sciences, Princess Nourah University, Riyadh, Saudi

ABSTRACT

This project aims to develop an interactive application that helps autistic children recognize and process environmental sounds. Children with Autism Spectrum Disorder (ASD) often struggle with sound identification, leading to communication challenges. The application offers a platform where children can match sounds with images, improving their sound recognition skills. Additionally, it includes a specialist consultation feature for parents to track their child s progress and receive guidance. A key aspect of the project is a wearable bracelet designed for children with autism. The bracelet captures and identifies environmental sounds in real-time. These sounds are then sent to the application, where they are stored in the "Recordings" interface, allowing the child to revisit and reinforce their learning. Together, the application and bracelet provide a comprehensive solution to support the auditory development of children with ASD.

Keywords

Autism, Sound Processing, Specialist Consultation, Learning App, ASD, Sound Recognition, Environmental Sounds.