Fake news is a pretty funky problem. In fact, it is also a really recent problem. Look at the Google Trends screenshot below: Before 2017, it was barely brought up at all. With such recency and still a lot of problems to be solved, my fellowship friends and I thought it would be fun to take a stab at it for the 2017 Global AI Hackathon.
We were lucky enough to find a dataset on Kaggle that had fake news stories. However, we struggled to find a good dataset for real news, so we decided to create our own. We built a web scraper that grabbed from a variety of reliable sources: liberal, moderate, and conservative.
From there, we developed a Naive Bayes Classifier model that could predict whether an article was fake with 85% accuracy. The model was trained on a set of 4,000 articles and was tested on a set of 1,100 articles.
With a working model in hand, we set up a web demo with a really simple user flow:
Enter the URL of the page in question
We fetch the URL of the page
We parse the information on the page and feed it into the model
We return the result of the model as well as sentiment analysis and similarity between the headline and the article (one key indicator of fake news)