[Previous Upwork Job] as developer, where I built a user friendly data extraction application for a small business, in two weeks.
- Job Completed: April 2022
- Job type: Development
- Job categogy: Desktop Application, Data Extraction
- Language: Python
- Style: Functional & Object Oriented
The goal of the project was this; create a platform that scrapes these three social medias:
- Youtube
- TikTok Additionally you would have the choice to scrape these types of data:
- Account data
- Upload/post data
- Socialblade data
The program should also have an AI that would find the dominant colors for every scraped photo. Lastly, since my clients were not familiar with coding, the program should have a graphical user interface (GUI) and be compiled into a .exe file for easy usage with no installs.
The output should be saved in a ZIP file where each dataframe should be individually after its function and the username of the "target". The requirements for the output was different for each platform, but overall:
-
posts_data: [generic] likes, comments, views, shares, thumbnails, description, date, @tags, #tags, etc.. [special] like-to-follow ratio, number of words and special characters in the description, and the top 5 colors used in the post/thumbnail.
-
profile_data: [generic] total likes, followers, following, avatar, bio, date of creation, verification badge, etc.. [special] 60-day Upload rate, number of words and special characters in the bio, and the top 5 colors used in the post/thumbnail.
-
Socialblade_data: Socialblade data should include these 8 dataframes:
-
The top information
-
The summary
-
The ranks
-
All of the averages
-
All of the charts
-
The data table
Since the social medias are using very different platforms, I had to use a different method for each of them. They were all challanging in their own way, but I found tiktok and Socialblade to be especially hard, due to security measurements such as catchpa and cloudflare. I dealt with cloudflare by implementing proxies, fake headers and a series of cookies to my requests. The catchpa however was a bit more difficult, I solved the issue by fetching the temporary verification-token, you recieve after solving the puzzle, which you can then add to the other cookies, gaining access.
Socialblade had another challange for me, since they both html and javascript in their code. Since I don't know any javascript I had to learn a bit of it and then find a way for my python code to read javascript so it could extract the code from it. A simple library named "js2py" did the trick nicely.
Instagram had a different challange for me where they have made it impossible to access their paltform without being a physical user. I solved this by using a web crawler library, along with the login info for instagram accounts, and a function that would switch to another account whenever instagram detected the suspicious activity.
Youtube actually has an api which was a breath of fresh air, however, their api has a daily qouta, which was way too small to be usefull bare in mind. But since you can make 10 google accounts pr phone number, I fixed the issue by doing that, then creating one api key each, and then applied the same revelotuion-function to it.
I have some experience with AI development so making the color-ranker was not that hard, though those things aren't very fun to setup.
As mainly a backend programmer using a backend programming language, making a front end program is quite the drag.
Now you might don't know this but python is not a programming language designed to be turned into a app (exe file), and you have to do a serious workaround in order to make it work.









