-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHubsAndAuth.html
217 lines (211 loc) · 15.5 KB
/
HubsAndAuth.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Hubs and Auth Exercise</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="style/student.css">
<link rel="stylesheet" href="style/main.css">
</head>
<body>
<nav class="navbar navbar-expand-lg navbar-light bg-light">
<div class="container-fluid">
<a class="navbar-brand" href="https://github.com/gilseg10/Information-recovery-class">EasyGo</a>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse justify-content-center" id="navbarNav">
<ul class="navbar-nav">
<li class="nav-item">
<a class="nav-link" href="main.html">Main</a>
</li>
<li class="nav-item">
<a class="nav-link" href="Nitsan.html">Nitsan</a>
</li>
<li class="nav-item">
<a class="nav-link" href="Elad.html">Elad</a>
</li>
<li class="nav-item">
<a class="nav-link" href="Chen.html">Chen</a>
</li>
<li class="nav-item">
<a class="nav-link" href="Gilad.html">Gilad</a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle active" href="#" id="ourProjectDropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
Our Project
</a>
<ul class="dropdown-menu" aria-labelledby="ourProjectDropdown">
<li><a class="dropdown-item" href="GameQueryGraph.html">Game's Query Graphs</a></li>
<li><a class="dropdown-item" href="TF-IDF.html">TF-IDF Screenshots</a></li>
<li><a class="dropdown-item active" href="HubsAndAuth.html">Hubs and Auth Exercise</a></li>
<li><a class="dropdown-item" href="SteamCrawler.html">Steam Crawler</a></li>
</ul>
</li>
</ul>
</div>
</div>
</nav>
<div class="container mt-5">
<h2 class="text-center">Hubs and Auth Exercise</h2>
<!-- Table for Questions and Answers -->
<div class="table-responsive mt-4">
<table class="table table-bordered">
<thead class="thead-light">
<tr>
<th>Question</th>
<th>Answer</th>
</tr>
</thead>
<tbody>
<tr>
<td>Write down different interesting technologies that you used in the project.</td>
<td>a. We used the Steam Web Api - through which we used familiar queries to check many games: what achievements (tags) the game has, average playing time and etc.
<br><br>b. We also used BeautifulSoup - a Python library designed for extracting information from HTML websites. We used it to extract a description text for each game from its web page and also to extract a list of games according to categories that exist on the Steam website.</td>
</tr>
<tr>
<td>How long did your queries run? What does it depend on? Do you think this time can be improved?</td>
<td>The queries run in the range of 10-30 minutes depending on the amount of games (pages we searched for) and the amount of users we used for the database. For the purpose of improvement, it is possible to reduce the number of users by pre-filtering users with a low playing time or a low amount of games in general.</td>
</tr>
<tr>
<td>Are there hubs on the returned pages? authorities? Justify your answer</td>
<td>For the query to find games based on a game category, we provided the crawler with the Hubs (the categories) - and in this way, we received the Authorities (the games) that belong to that Hub. More details in section next answer's section.</td>
</tr>
<tr>
<td>Choose 10 pages that the crawler returned, and which have links between them. Calculate pageRank for each page, it is recommended to use a prepared script. Show the calculations and the final rating.</td>
<td>As part of this assignment, we ran another query where it receives categories of games that we referred to as hubs and by using them, the crawler returned us the most popular games for those categories, by extracting a suitable link (using the < a > tag). We defined these games as Authorities and the crawler saved them for us in a table in a csv file (attached in the submission). Of these, we chose 10 pages: 4 category pages (hubs) and 6 game pages (authorities) and performed a pageRank calculation based on the formula we learned in class:
<br>
<img src="images/RankFormula.png">
<br>
<u>In our context:</u><br>
We neglected 'd'.<br>
N equals 10 pages.<br>
I(p) represents a set of categories that point to p (game).<br>
D(j) represents a set of games that category j points to.<br>
<br>
<u>First step:</u><br>
At startup each page gets a rating of 1/10.<br>
<br>
<u>Second step:</u><br>
Understanding how the ranking of each page is calculated based on the pages that point to it.<br>
• Since category pages only point to other pages (game pages) and no one points to them then their rating will remain 1/10.<br>
• The game pages receive referrals from several different categories, so it was necessary to calculate each one independently.<br>
<br>
<u>Third step:</u><br>
Calculation of each rating of each game:<br>
<br>
Game Name: Counter-Strike 2<br>
I(Counter−Strike 2)={ Action, Strategy }<br>
D(Action)=4 and D(Strategy)=3.<br>
R(Counter-Strike 2) = 0.1 + 0.1 / 4 + 0.1 / 3 = 0.1583<br>
<br>
Game Name: ELDEN RING<br>
I(ELDEN RING)={ Action, RPG }<br>
D(Action)=4 and D(RPG)=4.<br>
R(ELDEN RING) = 0.1 + 0.1 / 4 + 0.1 / 4 = 0.15<br>
<br>
Game Name: The Elder Scrolls® Online<br>
I(The Elder Scrolls® Online)={ Adventure, RPG, Action }<br>
D(Adventure)=3 and D(RPG)=4 D(Action)=4.<br>
R(The Elder Scrolls® Online) = 0.1 + 0.1 / 3 + 0.1 / 4 + 0.1 / 4 = 0.1833<br>
<br>
Game Name: Baldur's Gate 3<br>
I(Baldur's Gate 3)={ Adventure, RPG, Strategy }<br>
D(Adventure)=3 and D(RPG)=4 D(Strategy)=3.<br>
R(Baldur's Gate 3) = 0.1 + 0.1 / 3 + 0.1 / 4 + 0.1 / 3 = 0.1916<br>
<br>
Game Name: The First Descendant<br>
I(Baldur's Gate 3)={ Adventure, Action}<br>
D(Adventure)=3 and D(Action)=4.<br>
R(Baldur's Gate 3) = 0.1 + 0.1 / 3 + 0.1 / 4 = 0.1583<br>
<br>
Game Name: Once Human<br>
I(Once Human)={ Adventure, Action, RPG, Strategy}<br>
D(Adventure)=3 and D(Action)=4 D(RPG)=4 D(Strategy)=3.<br>
R(Once Human) = 0.1 + 0.1 / 3 + 0.1 / 4 + 0.1 / 4 + 0.1 / 3 = 0.2166<br>
</td>
<tr>
<td>Show two different users the rating from the previous section, in which case they should mark relevance and perform relevance feedback. Is it possible to suggest an adapted query in order to improve results?</td>
<td>We presented the query to two friends who play computer games and have a Steam account. From the query we presented 10 games (documents) that emerged from the query, and asked them how relevant they were to them, based on the games they like/purchase on the Steam website. Both friends chose which games are relevant for them. Below is the list of documents that we presented to them:
<br>
<br>
<img src="images/table.png">
<br>
<br>
<u>First User:</u> The first user expressed a strong preference for adventure games
And shared that he especially enjoys games like "The Elder Scrolls® Online" and "Baldur's Gate 3." He felt that these games were highly rated
in the adventure category, because they provide the typical experiences of the genre. However,
The user indicated a specific lack of attraction to the game "Once Human". He stated that this game is not considered a traditional adventure game in his eyes. He recommended that this game should be rated lower or even
will be removed from the adventure category. The user emphasized a preference to see more classic adventure games,
such as "Grand Theft Auto V"
and "The First Descendant". <b>In total, the first user marked 7 relevant games from the list.</b>
<br><br>
<u>Second user:</u>
The second user recently started playing "Baldur's Gate 3" and was very impressed with its placement in the RPG and strategy categories. He emphasized the game's strategic depth, which aligns well with his gaming preferences. In addition, he has expressed great interest in online games, making games like "The Elder Scrolls® Online" and "Destiny 2" particularly relevant to his taste. The user was satisfied
also from the rating of "The Elder Scrolls® Online". The user also stated that the game "Once Human" is not particularly his favorite and therefore was rated lower compared to other games.
<b>In total, the first user marked 6 relevant games from the list.</b>
<br><br>
<u>Precision and Recall calculation:</u> For Precision, the calculation is based on the relevant games divided by the 10 documents we returned. On the other hand, the recall is calculated based on the amount of relevant documents divided by an estimate for the amount of documents found on the Steam website. In our query context, we returned popular games from certain categories. The site has around 100,000 different games and about 60 different categories (genres) of games. Therefore for each category it can be assumed that there are 1,700 games in the area.
<br><br>
<u>First User:</u>
Precision = 7/10 = 0.7
Recall = 7/1700 = 0.00411
<br><br>
<u>Second User:</u>
Precision = 6/10 = 0.6
Recall = 6/1700 = 0.0035
</td>
</tr>
</tr>
</tbody>
</table>
</div>
<!-- Paragraph with title "Crawler Query Details" -->
<div class="mt-5">
<h3>Crawler Query Details</h3>
<p>In this exersize, we created a Python script that retrieves the top 10 games from Steam across six different categories: Action, Adventure, RPG, Strategy, Indie, and Casual. The script saves the game data into a CSV file and generates individual graph visualizations for each category. These graphs illustrate the categories as hubs and the top games as authorities, with edges connecting them, providing a clear and organized view of the top games within each category.</p>
</div>
<!-- 3x2 Container for Pictures -->
<div class="container mt-4">
<div class="row">
<div class="col-md-6 mb-4">
<div class="image-wrapper" style="overflow: hidden; height: 400px;">
<img src="images/TopAction.png" class="img-fluid custom-img" alt="Picture 1">
</div>
</div>
<div class="col-md-6 mb-4">
<div class="image-wrapper" style="overflow: hidden; height: 400px;">
<img src="images/TopAdventure.png" class="img-fluid custom-img" alt="Picture 2">
</div>
</div>
<div class="col-md-6 mb-4">
<div class="image-wrapper" style="overflow: hidden; height: 400px;">
<img src="images/TopCasual.png" class="img-fluid custom-img" alt="Picture 3">
</div>
</div>
<div class="col-md-6 mb-4">
<div class="image-wrapper" style="overflow: hidden; height: 400px;">
<img src="images/TopIndie.png" class="img-fluid custom-img" alt="Picture 4">
</div>
</div>
<div class="col-md-6 mb-4">
<div class="image-wrapper" style="overflow: hidden; height: 400px;">
<img src="images/TopRPG.png" class="img-fluid custom-img" alt="Picture 5">
</div>
</div>
<div class="col-md-6 mb-4">
<div class="image-wrapper" style="overflow: hidden; height: 400px;">
<img src="images/TopStrategy.png" class="img-fluid custom-img" alt="Picture 6">
</div>
</div>
</div>
</div>
</div>
<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.min.js"></script>
</body>
</html>