Outlier Detection in Data Streams Using Fuzzy C-Mean Clustering, Outlier Detection and Genetic Algorithm
Authors: Pragya Giri, Abhishek Raghuvanshi
Certificate: View Certificate
Abstract
Outliers with the rest of the data points which are different from or inconsistent. New novel, unusual, abnormal or may contain noise. Outliers are sometimes more interesting than the majority of the data. Increasing complexity, size and variety of datasets with major challenges outlier detection, a group, and how to evaluate similar as outliers outliers are to catch. This paper is an approach to detect outlier as a pre-processing step that uses semi surveillance describes outlier detection and then the fuzzy c-means clustering and genetic algorithm to cluster analysis dataset applies to analyze the effects of outliers. As data is digitized, connected and integrated systems, getting the scope of data and analyzes has been growing rapidly. Today, the system\'s most massive, the size, volume, speed of the phenomenon is changing rapidly, and the non-stationary data generated by these types of data are called data streams. Stream data and your issues in detail in this paper we explore the different techniques for nonreviewed and presented the same results. Keywords: Outlier Detection, Data Streams, Data Preprocessing, Fuzzy C-mean clustering and genetic algorithm.
Introduction
Now an example of real-time monitoring of data streams in a day for many applications, medical systems, Internet traffic, communication networks, financial market transactions online, remote sensors, and industry are causing the production process. Ordered temporal data streams, rapidly changing, massive, and potentially infinite sequence data objects [1]. Unlike traditional data set, to store a complete data stream or the tremendous amount of time to scan through the impossible. Time data streams can keep evolving new concepts. An evolutionary concept to continually update your model requires data stream processing algorithms to adapt to changes. Data mining is the outlier detection. It is also known as nonmining. Quite a thing for a non-isolated or other data objects is inconsistent. Many applications are more interesting than the usual cases of outliers. Network intrusion detection, credit card fraud detection, weather forecasting, remote detection of cases of medical data, is an example of marketing and customer segmentation.
Conclusion
In this paper, we use a fuzzy C- mean algorithm, a new algorithm for outlier detection proposed. The proposed algorithm counts the number of outliers in a particular period of time is good. Future work with a variety of algorithm changes required to implement the proposed work for more dataset for is to make it more efficient. It also proposed to implement a system for distributed environments, processing speed and improve the performance of the algorithm has been planned.
Copyright
Copyright © 2025 Pragya Giri, Abhishek Raghuvanshi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.