Konstanz Information Miner or KNIME is a free and open-source data analytics platform that can help you report and integrate as well. So, if this is your first step in the world of the KNIME Analytics Platform, this tutorial is for you!
KNIME integrates many components for data mining and machine learning using the concept of modular data pipelining. It allows you to create data flow visually and selectively execute all the analysis steps. You see, you can explore many things on KNIME, like:
- The KNIME Forum allows you to ask questions and receive answers from the KNIME User Community.
- You can find out what more you can do with KNIME through the KNIME blog.
- Meanwhile, the KNIME TV channel on YouTube provides you with more than 200 videos from Summit talks about the platform, the KNIME Facebook page, the Twitter handle, and the LinkedIn can be used for hearing to the latest announcements.
Besides, you can use the data sets to create report templates. But theirs is a catch here! These templates can be extended to document formats like doc, ppt, pdf, xls, and other formats also. The additional capabilities of KNIME include:
- KNIME core-architecture allows you to process large data volumes. For instance, KNIME Analytics Platforms can will enable you to analyze addresses of 300 million customers, cell images of 200 million people, and so on.
- There are additional plugins that allow you to integrate methods for text mining, time series analysis, and image mining.
- There are many other open-source projects that you can integrate through the KNIME, like the machine learning algorithms from Weka, JFreeChart, and so on.
KNIME Analytics Platform, version 2.1 is released under GPLv3. You see, there is an exception as it allows other users to use the well-defined node Application Programming Interface (API) to add proprietary extensions. Here are the necessary steps that you can follow to download this platform.
The figure above shall reveal the various steps involved in downloading the software. Thats not all, given below are the detailed steps involved in the download process of the software.
- You can begin by going to the download page of the KNIME Analytics Platform: https://www.knime.com/knime-software/knime-analytics-platform.
- You will be able to see three tabs that you can open individually:
- Tab for Register for Help and Updates: The tab allows you to provide some personal information for signing up.
- Getting Started: The tab shall allow you to have access to the information and links about what you can do once the KNIME Analytics Platform is installed.
- Tab for Download KNIME: This is where you have to click to begin the Download.
- The next in the step is to open the Download KNIME tab and click the option for installation that fits your operating system. But wait, there is more! Different options exist for Windows. These are:
- The Windows installer can extract the compressed installation folder and at the same time add an icon to your desktop. Further, it will suggest a suitable memory setting.
- When the self-extracting archive creates a folder that contains the KNIME installation files, you will not need any software to manage the archiving.
- In a preferred location of the system in which you have full access rights, the zip archive can be downloaded, saved, and then extracted.
Since KNIME Analytics Platform is free for all, so to begin using it, all you have to do it is to download it and install it. By now you know how to download it. The next is to begin the installation of the same, and here we have a few easy steps for installation.
- There are different installation packages that you can choose for your operating system, i.e., Mac, Linux, Windows. You can choose to download the installation packages from the KNIME.
- Once you select the operating system without the extensions, the next window shall appear. This window will require you to read all the given terms and conditions. Once you read it, click on ‘I accept the Terms and Conditions’ tab and then click on Download.
- A new window will pop up. Click on the ‘Save File’ link.
- Once the download is complete, read the License Agreement and click on ‘I accept the agreement’ and click on the ‘Next’ tab.
- The next window will ask the destination location. Add the location and then click on ‘Next.’
- The next tab will show the memory settings. Once you are done, the ready to install tab shall come. If you want to change anything, you can do this right here. Otherwise, click on ‘Install’ to begin the installation.
- By default, once the downloading is over, a dialogue of ‘Completing the KNIME Analytics Platform Setup Wizard’ shall arrive. You will have to tick the ‘Launch KNIME Analytics Platform’ and then click on ‘Finish’ to end the installation.
Congratulations! You have successfully downloaded and installed the KNIME Analytics Platform. The software is now ready for use.
Building the First KNIME Machine Learning Model
One of the biggest challenges for beginners who are using data science and machine learning is that there is a lot to learn simultaneously. Also, if you don’t have any background in coding, then you may find it challenging to cope up. Given below are the steps that you can follow that shall create your first workflow in KNIME Analytics Platform.
Importing Data Files on KNIME
If creating a workflow is not enough, you can perform numerous functions through the KNIME Analytics Platform like transformations, data manipulations, and data mining through this platform. To begin using the platform the first and foremost step is to import the data file and here is how you can do this.
- You first have to drag and drop the file reader node to the workflow and then double click on it.
- Once you have successfully imported the data set, you will have to visualize the relevant columns for which data analysis has to be done. Say, if you have to find a correlation or how the columns are related to each other, we will have to create a correlation matrix.
- So, we type ‘Linear Correlation’ in the node repository. Further, drag and drop it to the workflow.
- The next is to connect the output of the file reader to the input of the ‘Linear Correlation’ node.
- Wait! It is not yet over. Click on the green button and ‘Execute’ on the first panel.
- Now, right-click the correlation node and then select the option to ‘View: Correlation Matrix’ to generate an image. With this, you can choose features that are essential and are required for better predictions.
- Last but not least, you can visualize the patterns and the range of the data set to understand it better.
Training the First KNIME model
To begin with the basics, you can first train a Linear Model to understand how to select the features and build a model. Given below is how you can do it.
- Go to the node repository and then drag the ‘Linear Regression Learner’ to the workflow. Further, connect the clean data from the ‘Output Port’ of the Missing Value node.
- In the configuration tab that appears to you now, you have to exclude the Item Identifier and then select the target variable on the top.
- Great! You have completed the task, and now you will need to import the Test data to run the model.
- The test data contains some missing values. So, we will run it through the ‘Missing Value’ node.
- Next, you can introduce the ‘Regression Predictor’ node to clean your test data.
- Finally, load the model into the predictor by connecting the learner’s output with the predictor’s input.
Using KNIME to Submit the Final Solution
All of the above steps will help you to execute the predictor now. With this, the output is almost ready for submission. Let us now look at the process of submission.
- You have to begin by finding the ‘Column Filter’ in your node repository and drag it to the workflow.
- But wait! You can connect the output of the predictor with the column filter and then configure it to filter out the columns that you will need.
- Execute the ‘Column Filter’ and then finally search for the ‘CSV Writer’ node.
- The next is to adjust the path to set it where you want the .csv file stored. Execute this node.
- Now, finally, open the .csv file to correct the column names as according to our solution.
- Compress the .csv file into the .zip file and submit the solution.
The final workflow diagram will be there. This diagram is easier than you think. KNIME Analytics Platform has workflows that are very handy in terms of their portability. You can share it with other professionals or your colleagues by simply clicking on the File and Export KNIME Workflow option.
You can now say that the KNIME Analytics Platform can be used for almost any kind of analysis.
The best part is that predictive modeling can also be taken using a linear regression predictor. Therefore, this can surely help you build a better predictive model super soon. So, go ahead and create one!