-
Notifications
You must be signed in to change notification settings - Fork 1
/
additional_info
14 lines (11 loc) · 1 KB
/
additional_info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The initial pass on the knowyourmeme page takes info from a table on the right-hand side of the meme. This data, obviously,
is user-entered, and contains the meme: year, origin, tags and categories.
Tags and categories can be taken literally and trusted as okay. Year and origin, however, are ambiguous. Some meme pages list
the meme's content origin (be it a TV show, a book, a song) as the origin, while others list the meme's spread origin (like
Reddit, Twitter, etc.). The same goes for year; some users put in the content origin year (like 1946 for Big Chungus) or the
spread origin year. The second pass in the class memeBODYCrawlerUpdated intends to remedy this.
Over 4000 confirmed memes (Feb. 2019) are taken from the website to be crawled. Submission memes are ignored. This is in the
file everyConfirmedMeme in the resources directory.
Github took the html file and compressed it horrendously. If you want to view the html source (it's 2MB, a lot of tags), use
this site:
https://www.cleancss.com/html-beautify/