In this article we will analyze the popularity of various software programs designed for data analysis using recently published reviews. These articles and blogs were written in the last two years, covering a wide range of software tools written in C++, Java and Python. Such programs are designed for data analysis, data mining, statistics and data visualization. Here is the list of articles used in our analysis of popularity of such software tools:
https://codecondo.com/open-source-data-mining-tools/
By Rashmi Ingle. Published in CodeCondo. April 29, 2018
By Matt Kapko, Freelance Writer, CIO|May 4, 2018. Published In CIO from IDG
Published in Software Testing Help Journal. August 31, 2018
https://opensourceforu.com/2017/03/top-10-open-source-data-mining-tools/
By Shravan I.V Published by OpenSource FoRu. March 25, 2017
https://www.linuxlinks.com/plottingtools/
Published by LinuxLink. 2017
https://data-flair.training/blogs/data-mining-tools-techniques/
Published by DataFlair, Feb 2018
https://stxnext.com/blog/2017/04/12/most-popular-python-scientific-libraries/
Published by STXnext Python Powerhouse. April 12, 2017
https://www.springboard.com/blog/data-science-toolkit/
By Naina Sethi. Published by SpringBoard, 2017
https://www.datamation.com/data-center/slideshows/8-open-source-big-data-mining-tools.html
By Cynthia Harvey. Published on DataMation, 2016
https://www.linuxlinks.com/datamining/
By LinuxLinks. 2018
https://financesonline.com/top-15-data-mining-software-systems/
by FinancesOnline.com, 2018
https://www.georanker.com/data-mining-tools
Published by GeoRanker. By N.Stoyanov. 2017
https://www.proschoolonline.com/blog/top-10-data-analytics-tools/
By ProSchool
https://data-flair.training/blogs/data-mining-tools-techniques/
by
· Published · Updatedhttps://www.ubuntupit.com/top-20-best-data-mining-software-for-linux/
by Mehedi Hasan in UbuntuPI. Published .
https://www.guru99.com/best-data-mining-tools.html
by Guru99
· Published .Now we will create a word cloud using the Online Word Cloud Program by J.Davies and the software program names mentioned in these reviews. The list of the program names that appear in the above articles looks as this:
R Weka Orange RapidMiner DataMelt Weka Orange RapidMiner DataMelt Knime ApacheMahout Elki Keel Moa Rattle DataMelt Knime OpenRefine Orange R TableauiPublic TrifiaWrangler RapidMiner Orange Weka Knime Sisense SSDT ApacheMahout OracleDataMining Rattle DataMelt IBMCognos IBM-SPSS SAS-Data-Mining TeraData Board DundasBI Weka RapidMiner Orange Knime DataMelt ApacheMahout Elki MOA Keel Rattle Gnuplot, Ctioga LabPlot Matplotlib ROOT DataMelt GLE PLPlot RLPlot Genius apidMiner Orange Weka Knime Sisense SSDT ApacheMahout Rattle DataMelt IBM-Cognos IBM-Data-Mining Teradata Board DundasBY Python Spark H2O Astropy Biopython Cubes Deap Scoop PsychoPy Pandas MIpy Matplotlib NumPy NetworkX TomoPy Theano SymPy SciPy ScientificPython SageMath Veusz SunPy graph-tool TensorFlow DataMelt Dask python-weka-wrapper RapiMiner ApacheMahout Orange Weka DataMelt Keel SPMF Rattle RapidMiner Elki R Knime Weka Orange DataMelt Rattle ROOT Sisense OracleDataMining RapidMiner Microsoft-SharePoint IBM-Cognos Knime DundasBI Board Orange SAP Salesforce Domo SPSS-Modeler ProForecast RockDaisy RapidMiner Orange GraphLab-Create R Weka Knime Apache-UIMA Cluto Anaconda Shogun TraMinerR Rosetta R Tableau-Public Python SAS Apache-Spark Excel RapidMiner Knime QlikView Splunk
Insert this list in the Online Word Cloud Program. It will creates the image shown at the header of this article.
Now we can perform a simple statistical analysis of these words. To do this, we will use Python scripting in a combination with the DataMelt program. Our goal is to make a simple bar chart that shows the frequencies of the program names that appear in the above reviews. We will limit ourselves to 30 most mentioned programs, creating a simple Python script shown in https://datamelt.org/code/code.php?id=84750561.py. Running this script inside the DataMelt editor will produce this bar chart:
This chart shows that the first 5 positions belong to Orange, DataMelt, RapidMiner, Knime and Weka. However, we should mention that this chart does not reflect the actual number of the users of these programs, which is rather difficult to determine. Our analysis only tells what various editors think about popularity of programs for data science. In many cases, free availability of such programs plays a crucial role in their evaluation. This may constitute, for better or worse, a realistic picture of popularity of software packages for data science.
By T.Smatzer, Sep. 5, 2018