|Description:||As we said in one of our articles:
For future work, we also plan to support other researchers of the Department of Cybernetics, who are working on applications of machine learning algorithms such as Random Forests, Neural Networks and Support Vector Machines. These work on large data sets and their running time is proportional to the size of the data. It can be diminished by adding more compute nodes, where speedup may or may not be linear depending on the algorithm and on distribution of input data to the nodes. It would be interesting to create models of this relation, and to be able to predict the running time based on data size and cluster size. It would enable a scientist in the role of a cloud customer to create a cluster just large enough to compute his job in, e.g., one hour. In private clouds, it would then enable to compute the length of the queue in units of time. Therefore, future work will include profiling of computing jobs, most likely in the Hadoop framework, and extrapolation of their parameters for larger inputs and cluster sizes.