Exploratory data analysis and visualization on the example of an e-commerce enterprise
DOI:
https://doi.org/10.18372/2073-4751.79.19385Keywords:
exploratory data analysis, data visualization, time series, correlation analysis, cluster analysis, machine learningAbstract
This article presents an approach to exploratory data analysis and visualization using the example of an e-commerce company. The study examines key stages of exploratory data analysis, including data preprocessing, visualization, anomaly detection, correlation analysis, and cluster analysis, aimed at preparing data for solving machine learning tasks in future research. These tasks include estimating a time series model, identifying trends, seasonal and cyclical components of time series, customer clustering, new customer classification, and predicting the quantity of items sold within customer clusters. The proposed approach can be applied to the analysis of other e-commerce datasets.
References
Пінцак І. Використання машинного навчання та аналізу даних для прогнозування тенденцій у електронній комерції. Information Technology: Computer Science, Software Engineering and Cyber Security. 2024. № 1. С. 80–88. DOI: 10.32782/it/2024-1-10.
Про електронну комерцію: Закон України від 01.01.2024 № 675-VIII. URL: https://zakon.rada.gov.ua/laws/show/675-19#Text (дата звернення: 01.09.2024).
Що чекає на український e-commerce у 2024 році: розбираємо ключові тренди? URL: https://rau.ua/novyni/ukr-e-commerce-2024-trendi/ (дата звернення: 10.09.2024).
12 Best Machine Learning Strategies for E-commerce Businesses. URL: https://www.prefixbox.com/blog/machine-learning-for-ecommerce/ (дата звернення: 24.09.2024).
Apache Superset. The Apache Software Foundation. URL: https://superset.apache.org/ (дата зве-рнення: 01.10.2024).
Chen D., Sain S. L., Guo K. Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing & Customer Strategy Management. 2012. Vol. 19, no. 3. P. 197–208. URL: https://doi.org/10.1057/dbm.2012.17.
García-Aroca C. et al. An algorithm for automatic selection and combination of forecast models. Expert Systems with Applications. 2024. 121636. DOI: 10.1016/j.eswa.2023.121636.
How Ukrainian eCommerce Survived 2023. Annual Indicators & Forecast 2024. URL: https://www.promodo.com/research/ukrainian-ecommerce-2023#obsyag-ukrayinskogo-rinku-2023 (дата звернення: 11.09.2024).
Syakur M.A. et al. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster. IOP Conference Series: Materials Science and Engineering. 2018. Т. 336. 012017. DOI: 10.1088/1757-899x/336/1/012017.
Looker Studio Overview. URL: https://lookerstudio.google.com/ (дата звернення: 01.10.2024).
Matplotlib – Visualization with Python. URL: https://matplotlib.org/ (дата звернення: 02.10.2024).
NumPy. The fundamental package for scientific computing with Python. URL: https://numpy.org/ (дата звернення: 02.10.2024).
Pandas. Python Data Analysis Library. URL: https://pandas.pydata.org/ (дата звернення: 02.10.2024).
Power BI. Uncover powerful insights and turn them into impact. URL: https://www.microsoft.com/en-us/power-platform/products/power-bi (дата звернення: 01.10.2024).
Scikit-learn: machine learning in Python. URL: https://scikit-learn.org/stable/ (дата звернення: 03.10.2024).
Seaborn: statistical data visualization. URL: https://seaborn.pydata.org/ (дата звернення: 03.10.2024).
Sinaga K. P., Yang M.-S. Unsupervised K-Means Clustering Algorithm. IEEE Access. 2020. Т. 8. С. 80716–80727. DOI: 10.1109/access.2020.2988796.
Tableau: Business Intelligence and Analytics Software. URL: https://www.tableau.com/ (дата звер-нення: 01.10.2024).
Taylor S. J., Letham B. Forecasting at Scale. The American Statistician. 2018. Т. 72, № 1. С. 37–45. DOI: 10.1080/00031305.2017.1380080.
Downloads
Published
How to Cite
Issue
Section
License
The scientific journal adheres to the principles of Open Access and provides free, immediate, and permanent access to all published materials without financial, technical, or legal barriers for readers.
All articles are published in Open Access under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Copyright
Authors who publish their works in the journal:
-
retain the copyright to their publications;
-
grant the journal the right of first publication of the article;
-
agree to the distribution of their materials under the CC BY 4.0 license;
-
have the right to reuse, archive, and distribute their works (including in institutional and subject repositories), provided that proper reference is made to the original publication in the journal.