Global Data Science Forum

How to get started with Kaggle: A beginner's guide

By Ayush Jain posted Wed October 28, 2020 11:41 AM


Most of us must be knowing about Kaggle. For those who don't, Kaggle is one of the largest online community of data scientists and machine learning practitioners. Kaggle offers multiple services such as public dataset platforms, Kaggle Kernels, etc., but the one it is really known for is its Machine Learning competitions which are regularly hosted by reputed companies and research organisations. While I agree that real-world projects are the best when it comes to gaining experience, participating in Kaggle competitions provides one an opportunity to have hands-on experience on Data Science problems in a competitive environment and also adds value to one's resume.

But for someone who is just starting on his or her data science journey, it is quite natural that he or she might get overwhelmed with such platforms and get confused regarding how to start. Having gone through the same phase, I have tried to break the process into smaller steps which I would like to share. The approach helped me in getting started with Kaggle and also helped me in earning my first silver medal on Kaggle by securing a position in the top 5% in Google Landmark Recognition 2020 Challenge.

Step 1: Take your first step if you are standing still - Create your Kaggle profile if you haven't!

Step 2: Explore Knowledge Competitions available on Kaggle. They are usually a good place to start for a beginner. Go through the public notebooks shared by competitors, especially the ones involving EDA (Exploratory Data Analysis) of the challenge dataset, along with discussions. This would not only give you a fair idea about the platform but also improve your data analysis skills. You would definitely come across a lot of techniques regarding data analysis, data processing and model building that you were earlier unaware of. You can explore other completed challenges as well.

Step 3: After gaining a fair idea about the platform, it is important that you have some hands-on. Select any one active Knowledge Competition and try to improve your score in that by trying out different approaches. You can try to employ some methods and data analysis techniques that you might have come across in step 2. You can also refer to the notebooks and discussions posted by others and try to incorporate that into your solution. I suggest that you spend some good time on this so that you have some nice hands-on experience. At this point, you might also be able to identify some areas where you need to improve. Try to work upon those areas. 

Step 4: Following the previous steps would have given you a decent familiarity with the platform and some idea regarding how to approach a competition. Now it is time for you to participate in an active competition. First, try to properly understand the problem, including the dataset and evaluation metrics being used in the competition. Spend some time on understanding the data through various EDA techniques. You might also find some notebooks already posted by other competitors. Try to leverage those and build upon them. You can also go through the competition discussions which might give you some ideas. There is no single way of approaching all the problems, and it really depends upon the domain of the problem you are dealing with. Try to explore relevant resources which might include blogs, research papers, or even previous Kaggle competitions which were similar to the one you are working on. You can try all sorts of different approaches and also blend them to create a hybrid technique. Top Kagglers do that all the time! Some competitions require the use of TPU's or GPU's or any such requirements. You can always find their documentations and examples which you can refer to on requirement basis. 

Step 5 Participating in Kaggle competitions requires commitment and a constant effort. Since these competitions usually span weeks, sometimes even a few months, you might find your enthusiasm level waning with time. The competition might also demand significant time, tempting you to withdraw. One way to get around the problem is by working in a team. Kaggle competitions have the feature wherein competitors can merge into a team for collaboration. This is quite useful as people from different backgrounds may have expertise in different areas, which can be brought out nicely as a team. It also helps in sharing the workload of the competition amongst team members.

Therefore, it also important for you to build a network of people with whom you can collaborate in the competitions. As a starter you can follow my Kaggle profile here and I would follow you back. If you are looking for a team member, reach out to me and I would be happy to collaborate. 

Happy Kaggling!