-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
299 lines (211 loc) · 19.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
[001] List of Files
[002] How to Run
[003] High-Level Description
[004] Freebase API Key
==================================
[001] LIST OF FILES
==================================
Retriever.java
[creator package]
Creator.java
EntityBoxCreator.java
QueryBoxCreator.java
[entities package]
Actor.java
Author.java
BusinessPerson.java
Coach.java
Film.java
League.java
Organization.java
Person.java
Player.java
Spouse.java
Team.java
[output package]
Output.java
EntityBox.java
QueryBox.java
[result package]
JSONable.java
MQLResult.java
Result.java
Role.java
SearchResult.java
TopicResult.java
Type.java
Value.java
Values.java
[service package]
Service.java
SearchService.java
TopicService.java
MQLService.java
==================================
[003] HIGH LEVEL DESCRIPTION
==================================
The main method in Retriever prompts the user for input.
[INFOBOX QUERY INPUT]
If the user enters an infobox query, an EntityBoxCreator object will be instantiated.
EntityBoxCreator creates a SearchService object using the provided API key and the query.
The requestInfo() method in SearchService is then called. This will retrieve a maximum of 20 results using the Freebase Search API,
from which TopicResult objects will be created based on the retrieved mIds, using the TopicService methods,
in order to see what types of entities each mId matches to. We construct TopicResult objects for each of these search results
to see all of the entities they have performed a role as. It would be more lightweight to simply check the "notable" field that
is easily accessed through the Search API, but this would only show people whose most notable role is as a businessperson, actor,
or author. Using the TopicResult objects, we are able to more comprehensively see all the entities a person has.
Each SearchService object has a set of the entities the program is looking for, which can be PERSON, BUSINESSPERSON, ACTOR, AUTHOR, LEAGUE or TEAM.
The algorithm first searches the first Topic Results and if none of those types are contained in them then the next five Topic Results
are checked, until a total of 20 Topic Results are examined.
If all 20 Topic Results, which are created based on the 20 mIds returned by Freebase Search API,
don't contain any of those types then an error message is printed and the program exits.
Otherwise, the first mId that corresponds to at least one type contained in the above set is chosen, and a SearchResult object is created.
SearchResult contains entity's name, mId, id and notable_id, although essentially only mId is used by the program.
In the next step, EntityBoxCreator fetches the mId and creates a TopicService object using the API key and the mId.
The requestInfo() method in TopicService is then called, which calls the Freebase Topic API using the given mId and creates a Topic Result
object by parsing the json document that is returned by the API. The mapping of the values of the json document happens inside the
Topic Result instance, where the different types associated with the topic are identified. Then, given the identified types the respective
mapping methods are called that parse the json document in search of the required fields that are necessary to form the output.
Specifically, the values from the json documents mapped to the different entities in the program are the following:
--- PERSON ENTITY
[/type/object/name] --> name of the person: a person can only have one name, therefore one value object for that field exists inside
the json document. We use the text field of the value object to retrieve the name of the person.
[/people/person/date_of_birth] --> date of birth of the person: a person can only have one date of birth, therefore one value object
for that field exists inside the json document. We use the text field of the value object to retrieve the date of birth of the person.
[/people/person/place_of_birth] --> place of birth of the person: a person can only have one place of birth, therefore one value object
for that field exists inside the json document. We use the text field of the value object to retrieve the place of birth of the person.
[/people/deceased_person/date_of_death] --> date of death of the person: a person can only have one date of death, therefore one value object
for that field exists inside the json document. We use the text field of the value object to retrieve the date of death of the person.
[/people/deceased_person/place_of_death] --> place of death of the person: a person can only have one place of death, therefore one value object
for that field exists inside the json document. We use the text field of the value object to retrieve the place of death of the person.
[/common/topic/description] --> person's description: a person can only have one description about themselves, therefore one value object
for that field exists inside the json document. Because the text field of the value object contains a cut version of the description field, we use
the value field to retrieve the whole piece of information.
[/people/deceased_person/cause_of_death] --> causes of death of the person: a person can have multiple causes of death, or have their cause
of death described in various ways. Therefore, more than one value object might exist for that field inside the json document. To map the different
value object, we use the Values class which creates instances of Value class. Both classes implement the JSONable interface that uses the Google-Gson library
to map annotated fields inside a json documents to specific fields in the class. Value class contains the following fields, which are mapped with the
help of the Google-Gson library: text (String), value (String), id (String), timestamp (String), property (JSONObject). The property field might contain more
information structured in value objects. Eventually, we use the text field of each mapped value object to retrieve the various causes of death for the person.
[/people/person/sibling_s] --> siblings of the person: a person can have more than one sibling. Therefore similar to the 'cause of death' field we map
the different value objects to instances of value class. Because more information than the one we want is included in the json document we need to do further
processing compared to the previous cases. The required information resides within the property field of each value instance and can be retrieved by mapping the
additional field [/people/sibling_relationship/sibling] and ultimately use the text field from the returned value object to retrieve the name of the sibling. Since
a sibling can have only one name, one value object for that field exists inside the json document.
[/people/person/spouse_s] --> spouses of the person: a person can have more than one spouse over the years of their life. Because we require additional information
related to the spouse, such as the period of time the spouse was married to the person in question, and their marriage location, we choose to create Spouse instances
for each spouse that contain all this information. Therefore, similar to the siblings field we map the value objects to values instances and after parse the property
json object for the following values: [/people/marriage/spouse], [/people/marriage/from], [/people/marriage/to], [/people/marriage/location_of_ceremony] to retrieve spouse's
name, period of marriage, and marriage location respectively. Because all of those values exist only once for each spouse of the perso, one value object for that field exists
inside the json document. Eventually, we use the text field to retrieve the required information.
---BUSINESS PERSON ENTITY
A business person might be associated with an organization in various ways (be the founder/ leader/ board member). Once a new organization is retrieved (a new name) from
the json document, a new instance of the organization class is created and the additional information that is related to it is added to the instance's fields. In case
the organization has already been detected because the person is associated with the particular organization in more than one ways, the instance is found and the additional
information is added to it.
[/organization/organization_founder/organizations_founded] --> organizations founded by the person: a person can be the founder of more than one organization. We are only interested
in the name of those organizations so after we map the json section of the document to the Values instance we parse each value object and retrieve the text field for each object to
get the name.
[/business/board_member/leader_of] --> organizations the person is a leader of: a person can be the leader of more than one organizations. We are interested in organization's name,
the person's title, role and period of leadership. Therefore, after mapping the values object to value instances, we parse their property field to look for
[/organization/leadership/organization],[/organization/leadership/title], [/organization/leadership/role], [/organization/leadership/from], [/organization/leadership/to] respectively.
We use the text fields of those value objects, parsed from the property JSONObject to retrieve the required information.
[/business/board_member/organization_board_memberships] --> organizations the person is a board member of: a person can be a borad member of more than one organizations. We are interested
in organization's name, the person's title, role and period of membership. Therefore, after mapping the values object to value instances, we parse their property field to look for
[/organization/organization_board_membership/organization],[/organization/organization_board_membership/title], [/organization/organization_board_membership/role],
[/organization/organization_board_membership/from], [/organization/organization_board_membership/to] respectively. We use the text fields of those value objects,
parsed from the property JSONObject to retrieve the required information.
---AUTHOR ENTITY
[/book/author/works_written] --> list of books written by the author: an author might have written more than one books. We map the value objects to value instances and retrieve the name
of each book by using their text field.
[/book/book_subject/works] --> list of books about the author: an author might have more than one book written about him. We map the value objects to value instances and retrieve the name
of each book by using their text field.
[/influence/influence_node/influenced] --> list of peoples' names the author has influenced. We map the value objects to value instances and retrieve the name
of each person by using their text field.
[/influence/influence_node/influenced_by] --> list of peoples' names the author was influenced by. We map the value objects to value instances and retrieve the name
of each person by using their text field.
---ACTOR ENTITY
A person is an actor if they have appeared in a movie or a tv show. In case, they have appeared in both movies and tv shows we map both types of information. We create a Film class that
contains all the required information related to the character the actor played in the movie or the tv show, as well as the name of the movie and the tv show.
[/film/actor/film] --> list of movies the person has appeared in. We map the value objects to value instances and to retrieve the additional information about the actor's character in
the movie and the name of the movie we have to parse the property JSONObject for the fields [/film/performance/character] and [/film/performance/film] respectively. We use the text field
to retrieve the required information.
[/tv/tv_actor/starring_roles] --> list of tv shows the person has appeared in. We map the value objects to value instances and to retrieve the additional information about the actor's character in
the tv show and the name of the tv show we have to parse the property JSONObject for the fields [/tv/regular_tv_appearance/character] and [/tv/regular_tv_appearance/series] respectively. We use the text field
to retrieve the required information.
---LEAGUE ENTITY
[/type/object/name] --> name of the league: a league can only have one name, therefore one value object for that field exists inside
the json document. We use the text field of the value object to retrieve the name of the league.
[/sports/sports_league/sport] --> sport associated with the league: One value object for that field exists inside
the json document. We use the text field of the value object to retrieve the sport associated with the league.
[/sports/sports_league/championship] --> championship associated with the league: One value object for that field exists inside
the json document. We use the text field of the value object to retrieve the championship associated with the league.
[/organization/organization/slogan] --> league's slogan: One value object for that field exists inside
the json document. We use the text field of the value object to retrieve the league's slogan.
[/common/topic/official_website] --> league's website: One value object for that field exists inside
the json document. We use the text field of the value object to retrieve the league's website.
[/common/topic/description] --> league's description: One value object for that field exists inside
the json document. We use the value field of the value object to retrieve the league's description.
[/sports/sports_league/teams] --> teams that participate in the league: We map the value objects to value instances and to retrieve each team's name by parsing the property JSONObject
for the field [/sports/sports_league_participation/team]. We use the text field to retrieve the required information.
---TEAM ENTITY
[/type/object/name] --> name of the team: a team can only have one name, therefore one value object for that field exists inside
the json document. We use the text field of the value object to retrieve the name of the team.
[/sports/sports_team/sport] --> sport associated with the team: One value object for that field exists inside
the json document. We use the text field of the value object to retrieve the sport associated with the team.
[/common/topic/description] --> teams's description: One value object for that field exists inside
the json document. We use the value field of the value object to retrieve the teams's description.
[/sports/sports_team/arena_stadium] --> teams's arena: One value object for that field exists inside
the json document. We use the value field of the value object to retrieve the teams's arena.
[/sports/sports_team/founded] --> the year when the team was founded in: One value object for that field exists inside
the json document. We use the value field of the value object to retrieve the year the team was founded.
[/sports/sports_team/location] --> the locations the team is associated with:We map the value objects to value instances and to retrieve each team's location using the text field of each
value object.
[/sports/sports_team/championships] --> the championships the team has won:We map the value objects to value instances and to retrieve each team's championship using the text field of each
value object.
[/sports/sports_team/coaches] --> the coaches of the team: We map the value objects to value instances and to retrieve information about each coach by parsing the property JSONObject
for the fields [/sports/sports_team_coach_tenure/coach], [/sports/sports_team_coach_tenure/from], [/sports/sports_team_coach_tenure/to], [/sports/sports_team_coach_tenure/position] for coach' name,
period of coaching the team and position respectively. We use the text field to retrieve the required information.
[/sports/sports_team/league] --> the leagues the team participates: We map the value objects to value instances and to retrieve information about each coach by parsing the property JSONObject
for the field [/sports/sports_team/league] for each league's name. We use the text field to retrieve the required information.
[/sports/sports_team/roster] --> the players of the team: We map the value objects to value instances and to retrieve information about each coach by parsing the property JSONObject
for the fields [/sports/sports_team_roster/player], [/sports/sports_team_roster/from], [/sports/sports_team_roster/to], [/sports/sports_team_roster/position], [/sports/sports_team_roster/number] for player's name,
period of playing in the team and position and number respectively. We use the text field to retrieve the required information.
For each value that is mapped by the algorithm, we first check if it exists in the document to avoid throwing exceptions. In case a crucial
value is omitted in the document, such as the name of an organization, then the respective field is not considered at all in the output. With respect to the date
field that corresponds to "to - date", if we don't find it inside the json document we assume that the relation continues to present so we substitute the to-date field
with the value "now".
Finally, EntityBoxCreator creates an EntityBox object given the TopicResult that is created by parsing the retrieved json document,
and calls print() method to output the desired information in a formatted view to the console.
[QUESTION INPUT]
If the user enters a question in the form "Who created [X]?" a QueryBoxCreator object will be instantiated.
QueryBoxCreator creates an MQLService object using the provided API key and the processed query (the X from the question).
The requestInfo() method in MQLService is then called.
MQLService makes two requests to the API: first, it assumes that X is a book title and will search for authors who wrote books with X
in the title; next, it assumes that X is an organization name and will search for businesspeople who founded organizations with named X.
For books, we search for
[{"/book/author/works_written":
[{"a:name": null,
"name~=": X}],
"id": null,
"name": null,
"type": "/book/author"}]"
For organizations, we search for
[{"/organization/organization_founder/organizations_founded":
[{"a:name": null,
"name~=": X}],
"id": null,
"name": null,
"type": "/organization/organization_founder"}]"
Each of these queries returns a JSONArray response.
MQLService then creates an MQLResult object, passing in the two JSONArray responses as parameters.
MQLResult parses the two JSONArrays. For each object in each array, the name of the person and the name of the book/organiation
are extracted. We simply look for the person's name, designated with $.name, and either
$./book/author/works_written[*].a:name
$./organization/organization_founder/organizations_founded[*].a:name
depending on which JSONArray we are currently parsing.
A Role object is created, which holds the person's name and the job through which they created their book/organization.
A Map inside MQLResult instance is populated with each pairing of role and creation.
QueryBoxCreator creates a QueryBox object with MQLResult as a parameter.
QueryBox gets the mql_map from MQLResult.getMQLMap() method, and puts the keys into a List which is then sorted in alphabetical order
by the person's first name. The items from this list and their associated values in the map are then printed to console.