The Beginner’s Guide to Kaggle
The Beginner’s Guide to Kaggle
In this guide, we’ll cover everything beginners need to know about getting started on Kaggle. Plus, we’ll share our 7 favorite tips for enjoying Kaggle.
After all, some of the listed competitions have over $1,000,000 prize pools and hundreds of competitors.
Top teams boast decades of combined experience, tackling ambitious problems such as improving airport security or analyzing satellite data.
It’s no surprise that some beginners hesitate to get started on Kaggle. They have reasonable concerns such as:
- How do I even start?
- Will I be up against teams of experienced Ph.D researchers?
- Is it worth competing if I don’t have a realistic chance of winning?
- Is this what data science is all about? (If I don’t do well on Kaggle, do I have future in data science?)
- How can I improve my rank in the future?
Well, if you’ve ever had any of those questions, you’re in the right place.
In this guide, we’ll break down everything you need to know about getting started, improving your skills, and enjoying your time on Kaggle.
Kaggle vs. “Typical” Data Science
First, we need to make something very clear:
Kaggle competitions have important differences from “typical” data science, but they still provide valuable experience if you approach them with the right mindset.
Let us explain:
By nature, competitions (with prize pools) must meet several criteria.
- Problems must be difficult. Competitions shouldn’t be solvable in a single afternoon. To get the best return on investment, host companies will submit their biggest, hairiest problems.
- Solutions must be new. To win the latest competitions, you’ll usually need to perform extended research, customize algorithms, train advanced models, etc.
- Performance must be relative. Competitions must crown a winner, so your solution will be scored against others’.
“Typical” data science
In contrast, day-to-day data science doesn’t need to meet those same criteria.
- Problems can be easy. In fact, data scientists should try to identify low-hanging fruit: impactful projects that can be solved quickly.
- Solutions can be mature. Most common tasks (e.g. exploratory analysis, data cleaning, A/B testing, classic algorithms) already have proven frameworks. There’s need to reinvent the wheel.
- Performance can be absolute. A solution can be very valuable even if it simply beats a previous benchmark.
Kaggle competitions encourage you to squeeze out every last drop of performance, while typical data science encourages efficiency and maximizing business impact.
So is Kaggle worth it?
Despite the differences between Kaggle and typical data science, Kaggle can still be a great learning tool for beginners.
- Each competition is self-contained. You don’t need to scope your own project and collect data, which frees you up to focus on other skills.
- Practice is practice. The best way to learn data science is to learn by doing. As long as you don’t stress out about winning every competition, you can still practice interesting problems.
- The discussions and winner interviews are enlightening. Each competition has its own discussion board and debriefs with the winners. You can peek into the thought-processes of more experienced data scientists.