/> Noel Tam Data Science Journey

Noel Tam's Data Science Journey

I started my Data Science journey on March 2018, in a 12 weeks Full Time extensive course at General Assembly. It comprises of lectures, labs and projects on topics we learnt. We started with basic python (Project 1), then EDA after intro of Pandas (Project 2), before we start to do full fetch data science modeling, mostly using Scikit-learn library. Of course, that includes data processing, feature selection, cross validation and getting matrices. In Project 3, We did housing price prediction using varies regression models like Simple linear, Multiple Linear and some regularization techniques such as Lasso, Ridge and Elastic net. Project 4, we Web scraped a job posting website, then used NLP to predict job title. Some classification was used too such as Logistic, KNN, Support Vector Machine (SVM), Kernel SVM, Decision Trees, and fine tuned with XGBoost, GridSearch, Bagging techniques. Sadly, no mini projects on Clustering (K-Means, Principal Component Analysis PCA). GA then covered some intro of deep learning and time series when almost approaching the end of the course. Highlight of the course is our capstone project, where it showcases the essence & application of knowledge acquired here. For my case, I have build CNN model using Keras and included Object Detection via Tensorflow's API + OpenCV so as to classify images via webcam, just to spice up the demo.

After GA, while job searching, I did some other projects, most of them on computer vision just for my interest. My journey on Data Science just started, still have a long way to go. Do stay tune and walk this difficult journey with me. (Project below are arranged, Latest on Top)


Portfolio

Projects after GA. (Latest on Top)

11th December 2018: Wow!!! It has been a while since I last blog. Many things has happened, and was too busy. Only now that I have time to rest. I'm now in Korea for a 10 day holiday, sitting on a massage chair in a nice farm house, located in Pyenongchang where 2018 Winter olympic was held. What has happened last 4 months? I didn't get selected for AIAP. I started a new job as intern in a company called XRVision. Startup company who focus on security and safety using AI, although main source of income is face recognision, the company tries to do tons of video analytics stuff. I did a few projects myself too, like age, gender, race, car make models predictions. Have tried and used AGAIN pre-designed models only this time for industries purpose, so accuracy is important. I have re-looked again models such as vggface, resnet, inception, many more..etc... did a couple of fine-tuning, pre-training, custom maded models...etc.. did works coming from papers like SSRNet for age prediction, Siamese net for differating one from the other, as well as "Triplet loss" for car make model prediction when number of images per class low yet classes has more type of problem... and now I'm looking at how we can make full use of the GPU and CPU to speed up training time. The strange thing is even though GPU are assigned, doesn't means it will speed things up. The way of managing my codes plays an important role too... still learning. Below are some of the codes, I used, some made it to production, some don't due to lack of accaracy... (boss expectation is 99%) , but they are all my work, so just wanted to keep them in github.

Project 17: My Python

Codes from me: Before every training, I had to do many image processing, dataset collection , weights conversion, files moving/renaming, tons of preparation works. Here are some simple codes accumulated past 4 months for varies projects. They are coded for a specify purpose, so whats works for me, WILL NOT work for you. Dun clone.

Project 18: SSRNet

Soft-Stagewise-Regression-Network:For Age prediction. Have tried many other models, but this design makes more sense, and giving better accuarcy. It did range classification, inter-lapping range, then lastly scale to a regression, into a single number.

Project 19: Custom/Fine-Tune/Pre-Train

Model codes:More on Fine-tune and Pre-train. Only this time neater coding, bcos I'm more well verse this time.


13th August 2018: Today marks a very important milestone of my Data Science journey. It is my first day as a Machine Learning (Computer Vision) intern. This is the day I start working on Computer Vision stuff as a professional. This company specialized in Computer Vision. In fact they are very good in what they do. So I am very honoured and excited to be offered the role. I think I will learn alot from them, just hope that they are happy to share. Ha ha... Today there is another important news too. Past few weeks I have been trying to enrol the apprenticeship role at AI Singapore. I have done their technical assignment (uploaded below). So today I will receive the result whether I'll be shortlisted for the next stage - workshop test. No news yet. If selected, I will be given a real project from real company dealing with real data invested from government, real money with university's professor. So its going to be real shit for 9x months.

Project 16: AIAP technical test

Apprenticeship entry test: Given by AI Singapore. Total 5 question only required to do 2.


25th June 2018: I'm back from Osaka. Had a magnitude 6.1 earthquake experience. I'm not fearing for my life but rather feared making the wrong decision to stay in the building and endangers my family. Thank God it’s a short one, and the aftershocks are very mild. What troubles me is there is no English instructions on TV/Radio. Perhaps my next project on ChatBot is a good idea. When times of emergency, I can still talk to a robot.

Project 15: Cloth recognition

Cloth recognition:An assignment giving by a company for an internship role. Used Tensorflow Object Detection API with Faster-RCNN and dataset from deepfashion. Result is pretty acceptable.

Project 14: Face ID using only OpenCV

Face detection and Face ID.Using only OpenCV, nothing else. Thanks to tutorial by CodingEntrepreneurs. Detect face and id of the Avengers.

Project 13: Mask_RCNN

Exploring Mask_RCNN instance segmentation. Followed Mark Jay's tutorial. Able to stream over webcam real-time. Personally I think it is too slow to be useful, only 2fps on my GTX1060, comparing with creator's 5fps. But its a great good to know stuff.


12th June 2018: It has been a challenging information overload 1st half of 2018 since my start of my Data Science course. Going to have a week break in Osaka + Kyoto. Blog again soon. Ciao!!

Project 12: YOLO

Exploring YOLO. Followed Mark Jay's tutorial. Credits to Darkflow who created a python version of Darknet (which was in C). I have managed to got it work on my laptop. Stream thru my webcam. Just for fun.

Project 11: Deep Learning

Deep Learning on Regression and Classification via Keras. Not using SciKit-Learn, but exploring deep learning technique to solve regression and classification problem. Boston Housing dataset for prediction. Not dealing with image this time. Let's do this.

Project 10: SQL+Tableau

SQL+Tableau: Not a project. Its just my own SQL and Tableau notes. Decided to brush up my SQL skill and learnt new Tableau skill from Udemy.


31st May 2018: Meet and Greet event went well. Spoken to a couple of companies. Many have shown great interest of what I do. In fact, I kind of like computer vision. I will probably explore more into other techniques like YOLO, Mask_RCNN, face detection, object detect code from scratch. etc. while job hunting at the same time.


GA Capstone Project

Enhance shopping experience thru computer vision:

  • Image classification + Object Detection + Advertising
  • Build CNN from scratch via Keras
  • Build CNN thru transfer learning VGG16
  • Tensorflow Object Detection API, Bounding Box
  • Train Faster RCNN using custom images
  • OpenCV, Webcam stream Real-time
  • Click here for full details

Many of us do shopping. Wouldn’t it be awesome if you could pick up your phone, switch on the camera app, point at the product of interest and phone displays its price , discounts and promotions in real-time? I have selected 12x household products category, total about 20000 images. My goal is to train a model to classify these images according to its labels. Then with this Labels, we can tag it to varies promotions or run a advertising campaign.


18th May 2018: This day marks the end of my 12 weeks training in GA. In the next 1.5 weeks later, I had to showcase this capstone project in a meet and greet event. Well, it still have a bit of work to do before it can really show to outsiders or potential employers. Hope I can make it on time.


Capstone Project

Final Capstone Project. After designed my own model, I discovered tensorflow object detection API, and discovered that there are better more complex models. Here I have custom train my images on Faster-RCNN model, then apply on Object detection API, then loop prediction over while loops OpenCV + tweak labels to advertising slogan.

Project 8: Fine-tuning VGG16

Fine-tuning (aka transfer learning) on custom images on VGG16 model. Here I explored transfer learning technique with VGG16 model. I thought this will increase the accuracy. But surprisingly, Result is terrible.

Project 7: CNN with Keras

Custom images on my own CNN model using Keras. This is a portion of my capstone project. Classification of 12 classes household products, total about 20000 images. Trained on my own custom model created using keras. Includes Training codes + Demo using trained model

Project 6: Keras CNN Dog/Cat

Image classification using Keras of cats and dogs. This is a warm up for capstone project. Classification for only 2 classes. I used this to check whether Tensorflow-GPU installation and Keras is working. It was a success.

Project 5: Web Scrape

Web Scraping Job Posting website. This is data collection for Project 4. Use of beautifulsoup and selenium.

Project 4: NLP

Salary Prediction + Job Title feature importance. Use of NLP (natural language processing). Demonstrate the use of Gridsearch, XGBoost and Bagging

Project 3: Regression/Classification

Housing Price Prediction. Use of several Regression & Classification models. Demonstrate the use of Gridsearch, XGBoost and Bagging.

Project 2: EDA + Pandas

Exploratory Data Analysis (EDA). This is done after 1 week of basic Data Science EDA training involves Pandas, in GA

Project 1: Python

Basic-Python. A micro project is done after 2 weeks training on basic python programming in GA. Titled:"Building "Pokemon Stay". Purpose to practice the use of basic python features such as List, Dictionary, for loop, if-else, create functions, etc..

26th Feb 2018: This is where it all begans. 1st day in GA. Good mix of people from all background, finance, engineering, transport, etc... After a brief introduction of ourselves, my journey to Data Science starts. First with Python.


I am currently job hunting. If you are hiring for Data Science related position, please do consider me. I am contactable via email :noeltam75@gmail.com. Or Connect me via Linkedin: linkedin.com/in/noeltam