Summer of Data Science Month 1
A month ago, i joined the Summer of Data Science #SoDS18 challenge on twitter. To participate in the challenge, you just have to pick a list of things related to Data science that you want to learn, make a plan, and share it on social Media. In this post i’ll talk about what my goals were, what i learned during this first month and also share ressources i’ve used.
A quick background#
I’m a computer science student who got really interested in Data Science last year in September. I had a solid background in math, statistics and Probability and in programming languages. As i was reading a lot of blogs about the field, i’ve decided to take some data science courses online. I’ve started with R programming course and Machine Learning course on Udemy. I even started this blog at that time in French to document my progress in Data Science. Those courses were great, especially on the programming part, but i didn’t like the lack of good mathematical explanation behind those machine learning models. I eventually gave up on my Data science goals and the blog.
New goals#
It’s june, it’s summer . I’ve decided to come back to my Data science goals and take another approach. This time, i would learn Data Science with Python. I’ve enrolled in the Data Scientist with Python Track on DataCamp as a lot of people on Twitter and LinkedIn were recommending it. I’ve also decided that if i should start blogging again, it would be in english, not only to practice my english but also to reach a wider audience.
I had those thoughts in mind when i came across a tweet from @BecomingDataSci On Twitter. I read the blog post shared in the tweet, explaining how to go about the challenge. There were already a lot of people taking the challenge, both R and python people. It was the perfect fit for my data Science goals. I’ve shared them in this tweet as it was recommended in the blog post: ….and i started learning.
A great community#
People all over the world are learning or doing something in Data Science new and are sharing it with the #SoDS18 hastag. Some are even learning the same things i’m learning. Need help on a topic ? i just needed to ask with the hashtag and within 5 min, someone have already given the answer. Renee Teate also provide a great help using her large audience to share the questions people who are learning have. She even directly interact with people when she has the answers. I’ve definitely recommend following her on twitter and also following people she interacts with.
Learning…#
The first days were dedicated to learning how to use Git , Github, Jupyter notebooks and Markdown. Here are the resources i used to learn them:
- Git and Github : Version Control with Git on Udacity
- Jupyter notebooks : Jupyter Notebook Tutorial on Youtube
- Markdown: Learn the Basics of Markdown in 10 minutes on Youtube
I had to learn those skills to write better notebooks for my projects and make sharing them easier with github.
I had already started some courses in the Data Scientist with Python Track on DataCamp at that time, so i continued. I found their way of teaching with practical small quizzes very helpful. After few weeks, i was already able to do some small data analysis projects. Here are the courses i’ve already finished in the track:
- Intro to python for Data Science
- Intermediate python for Data Science
- Python Data Science Toolbox Part 1
- Python Data Science Toolbox Part 2
- Importing Data in Python Part 1
- Importing Data in Python Part 2
- Cleaning Data in Python
- Pandas Foundations
- Manipulating DataFrames with Pandas
- Merging DataFrames with Pandas
I’ve dedicated a github repo to the certifications i received after completing each course. You can see them here .
I’ve also started learning Machine Learning with the well known ML course on Coursera by Andrew Ng . I’ve finished the first 2 weeks and have already learned the math behind Linear regression, Gradient descent, normal equation and more. That’s was definitely a good choice.
Learning with Books#
I’m also learning with books. I’ve finished reading Python for Data Analysis by Wes McKinney, the creator of Pandas. That book really help learning pandas, i will definitely read some part of it again. I’m also reading Hands-On Machine Learning with Scikit Learn and Tensorflow and by Aurélien Géron and i already did a Machine Learning project thanks to that book. You can check the Data Analysis Projects i did here , and my Machine Learning Projects here .
LinkedIn…#
During the month i became more active on LinkedIn and discovered and connected with a great Data Science community. I got the books mentioned above in a linkedin post by Randy Lao. He’s a very active member and you should connect with him. I’ve also learned a lot with the pandas quizzes posted by Ted Petrou, you should also connect with him on LinkedIn and Twitter if you’re an active user of pandas.
Month 2 ?#
I’ll keep learning Machine Learning, Data visualisation and will do more Data Analysis and Machine Projects. I plan to also enter some ‘Getting Started’ Competitions on Kaggle. We’ll see how that will go.
Don’t hesitate to follow me on twitter or connect with me on linkedIn if you’re want to talk about this article or Data Science.
Thanks for reading.