Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 344f3ed

Browse files
committedJun 21, 2017
data mining
1 parent c7d3b67 commit 344f3ed

12 files changed

+30268
-0
lines changed
 

‎Association_Rules_and_Frequent_Pattern_Mining.ipynb

+743
Large diffs are not rendered by default.

‎Exercises_Auto_Sales.ipynb

+1,310
Large diffs are not rendered by default.

‎Practice_crime.ipynb

+390
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,390 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Module 2 practice - Minneapolis crime data\n",
8+
"\n",
9+
"The dataset used in the notebook (curtesy of Open Data Minneapolis) includes information about 311 calls and crimes committed between 2010 to 2016. \n",
10+
"\n",
11+
"We will use the data to do some association rule mining for finding frequent patterns. Read the data from /dsa/data/DSA-8630/minneapolis_crimedata/crimes.csv"
12+
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"execution_count": null,
17+
"metadata": {
18+
"collapsed": false
19+
},
20+
"outputs": [],
21+
"source": [
22+
"crimes_data = read.csv('/dsa/data/DSA-8630/minneapolis_crimedata/crimes.csv')"
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"execution_count": null,
28+
"metadata": {
29+
"collapsed": false
30+
},
31+
"outputs": [],
32+
"source": [
33+
"head(crimes_data,2)"
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": null,
39+
"metadata": {
40+
"collapsed": false,
41+
"scrolled": true
42+
},
43+
"outputs": [],
44+
"source": [
45+
"dim(crimes_data)"
46+
]
47+
},
48+
{
49+
"cell_type": "markdown",
50+
"metadata": {},
51+
"source": [
52+
"The columns \n",
53+
"- `controlnbr`\n",
54+
"- `CCN`\n",
55+
"- `Time`\n",
56+
"- `ReportedDate`\n",
57+
"- `Offense`\n",
58+
"- `UCRCode`\n",
59+
"- `EnteredDate`\n",
60+
"- `x`\n",
61+
"- `y`\n",
62+
"- `lastchanged`\n",
63+
"- `LastUpdateDate`\n",
64+
"- `OBJECTID`\n",
65+
"- `ESRI_OID` \n",
66+
"\n",
67+
"are not helpful or interpretable anyway. So lets just delete them from dataset."
68+
]
69+
},
70+
{
71+
"cell_type": "markdown",
72+
"metadata": {},
73+
"source": [
74+
"**Activity 1: ** Remove the columns listed avove from dataframe. "
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"metadata": {
81+
"collapsed": false
82+
},
83+
"outputs": [],
84+
"source": [
85+
"# Your code for activity 1 goes here..\n",
86+
"\n",
87+
"crimes_data = crimes_data[,-c(2, 3, 5, 7, 8, 10, 11,12, 13, 14, 15, 17, 18, 19, 20)]"
88+
]
89+
},
90+
{
91+
"cell_type": "markdown",
92+
"metadata": {},
93+
"source": [
94+
"**Activity 2: ** The BeginDate columnis of type factor. Convert its type to POSIXlt using strptime() function,. Also, replace the character \"T\" in the column with a white space \" \". \n",
95+
"\n",
96+
"We are trying to format the date and time values appropriately using strptime() function."
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"metadata": {
103+
"collapsed": true
104+
},
105+
"outputs": [],
106+
"source": [
107+
"# Your code for activity 2 goes here..\n",
108+
"\n",
109+
"crimes_data$BeginDate = strptime(sub(\"T\",\" \", crimes_data$BeginDate), format = \"%Y-%m-%d %X\")"
110+
]
111+
},
112+
{
113+
"cell_type": "markdown",
114+
"metadata": {},
115+
"source": [
116+
"The first 6 characters in address doesn't make any sense. \n",
117+
"\n",
118+
"**Activity 3: ** Strip the first 7 characters or extract rest of the characters from the publicaddress column."
119+
]
120+
},
121+
{
122+
"cell_type": "code",
123+
"execution_count": null,
124+
"metadata": {
125+
"collapsed": false
126+
},
127+
"outputs": [],
128+
"source": [
129+
"# Your code for activity 3 goes here..\n",
130+
"\n",
131+
"crimes_data$publicaddress = substr(crimes_data$publicaddress,7,(length(crimes_data$publicaddress)-7))"
132+
]
133+
},
134+
{
135+
"cell_type": "markdown",
136+
"metadata": {},
137+
"source": [
138+
"**Activity 4:** Extract the date from BeginDate column and stored it as a new column called date. "
139+
]
140+
},
141+
{
142+
"cell_type": "code",
143+
"execution_count": null,
144+
"metadata": {
145+
"collapsed": false
146+
},
147+
"outputs": [],
148+
"source": [
149+
"# Your code for activity 4 goes here..\n",
150+
"\n",
151+
"library(lubridate)\n",
152+
"crimes_data$date = as.Date(format(crimes_data$BeginDate,\"%Y-%m-%d\"))"
153+
]
154+
},
155+
{
156+
"cell_type": "code",
157+
"execution_count": null,
158+
"metadata": {
159+
"collapsed": false
160+
},
161+
"outputs": [],
162+
"source": [
163+
"class(crimes_data$date)"
164+
]
165+
},
166+
{
167+
"cell_type": "markdown",
168+
"metadata": {},
169+
"source": [
170+
"**Activity 5:** Extract weekday from date and hour from from BeginDate column and stored them as new columns called weekday and hour respectively. "
171+
]
172+
},
173+
{
174+
"cell_type": "code",
175+
"execution_count": null,
176+
"metadata": {
177+
"collapsed": true
178+
},
179+
"outputs": [],
180+
"source": [
181+
"# Your code for activity 5 goes here..\n",
182+
"\n",
183+
"crimes_data$weekday = weekdays(crimes_data$date)\n",
184+
"crimes_data$hour = hour(crimes_data$BeginDate)"
185+
]
186+
},
187+
{
188+
"cell_type": "markdown",
189+
"metadata": {},
190+
"source": [
191+
"We dont need the BeginDate column any more. So lets just delete it from dataframe."
192+
]
193+
},
194+
{
195+
"cell_type": "code",
196+
"execution_count": null,
197+
"metadata": {
198+
"collapsed": false
199+
},
200+
"outputs": [],
201+
"source": [
202+
"crimes_data = crimes_data[,names(crimes_data)!=\"BeginDate\"]"
203+
]
204+
},
205+
{
206+
"cell_type": "markdown",
207+
"metadata": {},
208+
"source": [
209+
"Convert the hour variable into an ordered factor with levels \"mid night\", \"morning\", \"noon\",\"night\" for different hours of the day. "
210+
]
211+
},
212+
{
213+
"cell_type": "code",
214+
"execution_count": null,
215+
"metadata": {
216+
"collapsed": false
217+
},
218+
"outputs": [],
219+
"source": [
220+
"crimes_data[[\"hour\"]] <- ordered(cut(crimes_data[[\"hour\"]], c(1,6,12,18,24)), labels = c(\"mid night\", \"morning\", \"noon\",\"night\"))"
221+
]
222+
},
223+
{
224+
"cell_type": "markdown",
225+
"metadata": {
226+
"collapsed": false
227+
},
228+
"source": [
229+
"**Activity 6: ** Convert the columns \"publicaddress\", \"Precinct\", \"weekday\", \"date\" into factor type. "
230+
]
231+
},
232+
{
233+
"cell_type": "code",
234+
"execution_count": null,
235+
"metadata": {
236+
"collapsed": false
237+
},
238+
"outputs": [],
239+
"source": [
240+
"# Your code for activity 6 goes here..\n",
241+
"\n",
242+
"crimes_data[\"publicaddress\"] = as.factor(crimes_data[[\"publicaddress\"]])"
243+
]
244+
},
245+
{
246+
"cell_type": "code",
247+
"execution_count": null,
248+
"metadata": {
249+
"collapsed": false
250+
},
251+
"outputs": [],
252+
"source": [
253+
"crimes_data[\"Precinct\"] = as.factor(crimes_data[[\"Precinct\"]])"
254+
]
255+
},
256+
{
257+
"cell_type": "code",
258+
"execution_count": null,
259+
"metadata": {
260+
"collapsed": true
261+
},
262+
"outputs": [],
263+
"source": [
264+
"crimes_data[\"weekday\"] = as.factor(crimes_data[[\"weekday\"]])"
265+
]
266+
},
267+
{
268+
"cell_type": "code",
269+
"execution_count": null,
270+
"metadata": {
271+
"collapsed": true
272+
},
273+
"outputs": [],
274+
"source": [
275+
"crimes_data[\"date\"] = as.factor(crimes_data[[\"date\"]])"
276+
]
277+
},
278+
{
279+
"cell_type": "code",
280+
"execution_count": null,
281+
"metadata": {
282+
"collapsed": false
283+
},
284+
"outputs": [],
285+
"source": [
286+
"library(\"arules\")"
287+
]
288+
},
289+
{
290+
"cell_type": "code",
291+
"execution_count": null,
292+
"metadata": {
293+
"collapsed": false
294+
},
295+
"outputs": [],
296+
"source": [
297+
"str(crimes_data)"
298+
]
299+
},
300+
{
301+
"cell_type": "markdown",
302+
"metadata": {},
303+
"source": [
304+
"**Activity 7: ** Now, coerce the data set into transactions. Save this transactions to crimes_trans variable."
305+
]
306+
},
307+
{
308+
"cell_type": "code",
309+
"execution_count": null,
310+
"metadata": {
311+
"collapsed": false
312+
},
313+
"outputs": [],
314+
"source": [
315+
"# Your code for activity 7 goes here..\n",
316+
"\n",
317+
"crimes_trans <- as(crimes_data, \"transactions\")\n",
318+
"crimes_trans"
319+
]
320+
},
321+
{
322+
"cell_type": "code",
323+
"execution_count": null,
324+
"metadata": {
325+
"collapsed": false
326+
},
327+
"outputs": [],
328+
"source": [
329+
"summary(crimes_trans)"
330+
]
331+
},
332+
{
333+
"cell_type": "markdown",
334+
"metadata": {},
335+
"source": [
336+
"**Activity 8: ** Generate association rules for the transactions in crimes_trans with support of 0.01 and confidence of 0.6"
337+
]
338+
},
339+
{
340+
"cell_type": "code",
341+
"execution_count": null,
342+
"metadata": {
343+
"collapsed": false
344+
},
345+
"outputs": [],
346+
"source": [
347+
"# Your code for activity 8 goes here..\n",
348+
"\n",
349+
"rules <- apriori(crimes_trans, parameter = list(support = 0.01, confidence = 0.6))"
350+
]
351+
},
352+
{
353+
"cell_type": "markdown",
354+
"metadata": {},
355+
"source": [
356+
"**Activity 9: ** Display the generated rules using inspect(). "
357+
]
358+
},
359+
{
360+
"cell_type": "code",
361+
"execution_count": null,
362+
"metadata": {
363+
"collapsed": false
364+
},
365+
"outputs": [],
366+
"source": [
367+
"# Your code for activity 9 goes here..\n",
368+
"\n",
369+
"inspect(rules)"
370+
]
371+
}
372+
],
373+
"metadata": {
374+
"kernelspec": {
375+
"display_name": "R",
376+
"language": "R",
377+
"name": "ir"
378+
},
379+
"language_info": {
380+
"codemirror_mode": "r",
381+
"file_extension": ".r",
382+
"mimetype": "text/x-r-source",
383+
"name": "R",
384+
"pygments_lexer": "r",
385+
"version": "3.3.2"
386+
}
387+
},
388+
"nbformat": 4,
389+
"nbformat_minor": 2
390+
}

‎Practice_glass.ipynb

+517
Large diffs are not rendered by default.

‎Practice_groceries.ipynb

+134
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"metadata": {
7+
"collapsed": false
8+
},
9+
"outputs": [],
10+
"source": [
11+
"groceries_data = read.csv(\"../../../datasets/groceries.csv\",head=FALSE,col.names=c(\"item1\",\"item2\",\"item3\",\"item4\"))\n",
12+
"head(groceries_data)"
13+
]
14+
},
15+
{
16+
"cell_type": "code",
17+
"execution_count": null,
18+
"metadata": {
19+
"collapsed": false
20+
},
21+
"outputs": [],
22+
"source": [
23+
"summary(groceries_data)"
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {
30+
"collapsed": false
31+
},
32+
"outputs": [],
33+
"source": [
34+
"library(\"arules\")"
35+
]
36+
},
37+
{
38+
"cell_type": "code",
39+
"execution_count": null,
40+
"metadata": {
41+
"collapsed": false
42+
},
43+
"outputs": [],
44+
"source": [
45+
"groceries_trans <- as(groceries_data, \"transactions\")\n",
46+
"groceries_trans"
47+
]
48+
},
49+
{
50+
"cell_type": "code",
51+
"execution_count": null,
52+
"metadata": {
53+
"collapsed": false
54+
},
55+
"outputs": [],
56+
"source": [
57+
"summary(groceries_trans)"
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": null,
63+
"metadata": {
64+
"collapsed": false
65+
},
66+
"outputs": [],
67+
"source": [
68+
"itemFrequencyPlot(groceries_trans, support = 0.02, cex.names=0.8)"
69+
]
70+
},
71+
{
72+
"cell_type": "code",
73+
"execution_count": null,
74+
"metadata": {
75+
"collapsed": false
76+
},
77+
"outputs": [],
78+
"source": [
79+
"rules <- apriori(groceries_trans, parameter = list(support = 0.01, confidence = 0.01))"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {
86+
"collapsed": false
87+
},
88+
"outputs": [],
89+
"source": [
90+
"summary(rules)"
91+
]
92+
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": null,
96+
"metadata": {
97+
"collapsed": false
98+
},
99+
"outputs": [],
100+
"source": [
101+
"inspect(head(rules, n = 3, by = \"confidence\"))"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": null,
107+
"metadata": {
108+
"collapsed": true
109+
},
110+
"outputs": [],
111+
"source": [
112+
"write(rules, file = \"groceries_rules.csv\", sep = \",\", col.names = NA)"
113+
]
114+
}
115+
],
116+
"metadata": {
117+
"anaconda-cloud": {},
118+
"kernelspec": {
119+
"display_name": "R",
120+
"language": "R",
121+
"name": "ir"
122+
},
123+
"language_info": {
124+
"codemirror_mode": "r",
125+
"file_extension": ".r",
126+
"mimetype": "text/x-r-source",
127+
"name": "R",
128+
"pygments_lexer": "r",
129+
"version": "3.3.2"
130+
}
131+
},
132+
"nbformat": 4,
133+
"nbformat_minor": 1
134+
}

‎Visualizing_Association_Rules.ipynb

+426
Large diffs are not rendered by default.

‎data.csv

+10,499
Large diffs are not rendered by default.

‎glass.txt

+214
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.00,1
2+
2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.00,1
3+
3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.00,1
4+
4,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.00,1
5+
5,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.00,1
6+
6,1.51596,12.79,3.61,1.62,72.97,0.64,8.07,0.00,0.26,1
7+
7,1.51743,13.30,3.60,1.14,73.09,0.58,8.17,0.00,0.00,1
8+
8,1.51756,13.15,3.61,1.05,73.24,0.57,8.24,0.00,0.00,1
9+
9,1.51918,14.04,3.58,1.37,72.08,0.56,8.30,0.00,0.00,1
10+
10,1.51755,13.00,3.60,1.36,72.99,0.57,8.40,0.00,0.11,1
11+
11,1.51571,12.72,3.46,1.56,73.20,0.67,8.09,0.00,0.24,1
12+
12,1.51763,12.80,3.66,1.27,73.01,0.60,8.56,0.00,0.00,1
13+
13,1.51589,12.88,3.43,1.40,73.28,0.69,8.05,0.00,0.24,1
14+
14,1.51748,12.86,3.56,1.27,73.21,0.54,8.38,0.00,0.17,1
15+
15,1.51763,12.61,3.59,1.31,73.29,0.58,8.50,0.00,0.00,1
16+
16,1.51761,12.81,3.54,1.23,73.24,0.58,8.39,0.00,0.00,1
17+
17,1.51784,12.68,3.67,1.16,73.11,0.61,8.70,0.00,0.00,1
18+
18,1.52196,14.36,3.85,0.89,71.36,0.15,9.15,0.00,0.00,1
19+
19,1.51911,13.90,3.73,1.18,72.12,0.06,8.89,0.00,0.00,1
20+
20,1.51735,13.02,3.54,1.69,72.73,0.54,8.44,0.00,0.07,1
21+
21,1.51750,12.82,3.55,1.49,72.75,0.54,8.52,0.00,0.19,1
22+
22,1.51966,14.77,3.75,0.29,72.02,0.03,9.00,0.00,0.00,1
23+
23,1.51736,12.78,3.62,1.29,72.79,0.59,8.70,0.00,0.00,1
24+
24,1.51751,12.81,3.57,1.35,73.02,0.62,8.59,0.00,0.00,1
25+
25,1.51720,13.38,3.50,1.15,72.85,0.50,8.43,0.00,0.00,1
26+
26,1.51764,12.98,3.54,1.21,73.00,0.65,8.53,0.00,0.00,1
27+
27,1.51793,13.21,3.48,1.41,72.64,0.59,8.43,0.00,0.00,1
28+
28,1.51721,12.87,3.48,1.33,73.04,0.56,8.43,0.00,0.00,1
29+
29,1.51768,12.56,3.52,1.43,73.15,0.57,8.54,0.00,0.00,1
30+
30,1.51784,13.08,3.49,1.28,72.86,0.60,8.49,0.00,0.00,1
31+
31,1.51768,12.65,3.56,1.30,73.08,0.61,8.69,0.00,0.14,1
32+
32,1.51747,12.84,3.50,1.14,73.27,0.56,8.55,0.00,0.00,1
33+
33,1.51775,12.85,3.48,1.23,72.97,0.61,8.56,0.09,0.22,1
34+
34,1.51753,12.57,3.47,1.38,73.39,0.60,8.55,0.00,0.06,1
35+
35,1.51783,12.69,3.54,1.34,72.95,0.57,8.75,0.00,0.00,1
36+
36,1.51567,13.29,3.45,1.21,72.74,0.56,8.57,0.00,0.00,1
37+
37,1.51909,13.89,3.53,1.32,71.81,0.51,8.78,0.11,0.00,1
38+
38,1.51797,12.74,3.48,1.35,72.96,0.64,8.68,0.00,0.00,1
39+
39,1.52213,14.21,3.82,0.47,71.77,0.11,9.57,0.00,0.00,1
40+
40,1.52213,14.21,3.82,0.47,71.77,0.11,9.57,0.00,0.00,1
41+
41,1.51793,12.79,3.50,1.12,73.03,0.64,8.77,0.00,0.00,1
42+
42,1.51755,12.71,3.42,1.20,73.20,0.59,8.64,0.00,0.00,1
43+
43,1.51779,13.21,3.39,1.33,72.76,0.59,8.59,0.00,0.00,1
44+
44,1.52210,13.73,3.84,0.72,71.76,0.17,9.74,0.00,0.00,1
45+
45,1.51786,12.73,3.43,1.19,72.95,0.62,8.76,0.00,0.30,1
46+
46,1.51900,13.49,3.48,1.35,71.95,0.55,9.00,0.00,0.00,1
47+
47,1.51869,13.19,3.37,1.18,72.72,0.57,8.83,0.00,0.16,1
48+
48,1.52667,13.99,3.70,0.71,71.57,0.02,9.82,0.00,0.10,1
49+
49,1.52223,13.21,3.77,0.79,71.99,0.13,10.02,0.00,0.00,1
50+
50,1.51898,13.58,3.35,1.23,72.08,0.59,8.91,0.00,0.00,1
51+
51,1.52320,13.72,3.72,0.51,71.75,0.09,10.06,0.00,0.16,1
52+
52,1.51926,13.20,3.33,1.28,72.36,0.60,9.14,0.00,0.11,1
53+
53,1.51808,13.43,2.87,1.19,72.84,0.55,9.03,0.00,0.00,1
54+
54,1.51837,13.14,2.84,1.28,72.85,0.55,9.07,0.00,0.00,1
55+
55,1.51778,13.21,2.81,1.29,72.98,0.51,9.02,0.00,0.09,1
56+
56,1.51769,12.45,2.71,1.29,73.70,0.56,9.06,0.00,0.24,1
57+
57,1.51215,12.99,3.47,1.12,72.98,0.62,8.35,0.00,0.31,1
58+
58,1.51824,12.87,3.48,1.29,72.95,0.60,8.43,0.00,0.00,1
59+
59,1.51754,13.48,3.74,1.17,72.99,0.59,8.03,0.00,0.00,1
60+
60,1.51754,13.39,3.66,1.19,72.79,0.57,8.27,0.00,0.11,1
61+
61,1.51905,13.60,3.62,1.11,72.64,0.14,8.76,0.00,0.00,1
62+
62,1.51977,13.81,3.58,1.32,71.72,0.12,8.67,0.69,0.00,1
63+
63,1.52172,13.51,3.86,0.88,71.79,0.23,9.54,0.00,0.11,1
64+
64,1.52227,14.17,3.81,0.78,71.35,0.00,9.69,0.00,0.00,1
65+
65,1.52172,13.48,3.74,0.90,72.01,0.18,9.61,0.00,0.07,1
66+
66,1.52099,13.69,3.59,1.12,71.96,0.09,9.40,0.00,0.00,1
67+
67,1.52152,13.05,3.65,0.87,72.22,0.19,9.85,0.00,0.17,1
68+
68,1.52152,13.05,3.65,0.87,72.32,0.19,9.85,0.00,0.17,1
69+
69,1.52152,13.12,3.58,0.90,72.20,0.23,9.82,0.00,0.16,1
70+
70,1.52300,13.31,3.58,0.82,71.99,0.12,10.17,0.00,0.03,1
71+
71,1.51574,14.86,3.67,1.74,71.87,0.16,7.36,0.00,0.12,2
72+
72,1.51848,13.64,3.87,1.27,71.96,0.54,8.32,0.00,0.32,2
73+
73,1.51593,13.09,3.59,1.52,73.10,0.67,7.83,0.00,0.00,2
74+
74,1.51631,13.34,3.57,1.57,72.87,0.61,7.89,0.00,0.00,2
75+
75,1.51596,13.02,3.56,1.54,73.11,0.72,7.90,0.00,0.00,2
76+
76,1.51590,13.02,3.58,1.51,73.12,0.69,7.96,0.00,0.00,2
77+
77,1.51645,13.44,3.61,1.54,72.39,0.66,8.03,0.00,0.00,2
78+
78,1.51627,13.00,3.58,1.54,72.83,0.61,8.04,0.00,0.00,2
79+
79,1.51613,13.92,3.52,1.25,72.88,0.37,7.94,0.00,0.14,2
80+
80,1.51590,12.82,3.52,1.90,72.86,0.69,7.97,0.00,0.00,2
81+
81,1.51592,12.86,3.52,2.12,72.66,0.69,7.97,0.00,0.00,2
82+
82,1.51593,13.25,3.45,1.43,73.17,0.61,7.86,0.00,0.00,2
83+
83,1.51646,13.41,3.55,1.25,72.81,0.68,8.10,0.00,0.00,2
84+
84,1.51594,13.09,3.52,1.55,72.87,0.68,8.05,0.00,0.09,2
85+
85,1.51409,14.25,3.09,2.08,72.28,1.10,7.08,0.00,0.00,2
86+
86,1.51625,13.36,3.58,1.49,72.72,0.45,8.21,0.00,0.00,2
87+
87,1.51569,13.24,3.49,1.47,73.25,0.38,8.03,0.00,0.00,2
88+
88,1.51645,13.40,3.49,1.52,72.65,0.67,8.08,0.00,0.10,2
89+
89,1.51618,13.01,3.50,1.48,72.89,0.60,8.12,0.00,0.00,2
90+
90,1.51640,12.55,3.48,1.87,73.23,0.63,8.08,0.00,0.09,2
91+
91,1.51841,12.93,3.74,1.11,72.28,0.64,8.96,0.00,0.22,2
92+
92,1.51605,12.90,3.44,1.45,73.06,0.44,8.27,0.00,0.00,2
93+
93,1.51588,13.12,3.41,1.58,73.26,0.07,8.39,0.00,0.19,2
94+
94,1.51590,13.24,3.34,1.47,73.10,0.39,8.22,0.00,0.00,2
95+
95,1.51629,12.71,3.33,1.49,73.28,0.67,8.24,0.00,0.00,2
96+
96,1.51860,13.36,3.43,1.43,72.26,0.51,8.60,0.00,0.00,2
97+
97,1.51841,13.02,3.62,1.06,72.34,0.64,9.13,0.00,0.15,2
98+
98,1.51743,12.20,3.25,1.16,73.55,0.62,8.90,0.00,0.24,2
99+
99,1.51689,12.67,2.88,1.71,73.21,0.73,8.54,0.00,0.00,2
100+
100,1.51811,12.96,2.96,1.43,72.92,0.60,8.79,0.14,0.00,2
101+
101,1.51655,12.75,2.85,1.44,73.27,0.57,8.79,0.11,0.22,2
102+
102,1.51730,12.35,2.72,1.63,72.87,0.70,9.23,0.00,0.00,2
103+
103,1.51820,12.62,2.76,0.83,73.81,0.35,9.42,0.00,0.20,2
104+
104,1.52725,13.80,3.15,0.66,70.57,0.08,11.64,0.00,0.00,2
105+
105,1.52410,13.83,2.90,1.17,71.15,0.08,10.79,0.00,0.00,2
106+
106,1.52475,11.45,0.00,1.88,72.19,0.81,13.24,0.00,0.34,2
107+
107,1.53125,10.73,0.00,2.10,69.81,0.58,13.30,3.15,0.28,2
108+
108,1.53393,12.30,0.00,1.00,70.16,0.12,16.19,0.00,0.24,2
109+
109,1.52222,14.43,0.00,1.00,72.67,0.10,11.52,0.00,0.08,2
110+
110,1.51818,13.72,0.00,0.56,74.45,0.00,10.99,0.00,0.00,2
111+
111,1.52664,11.23,0.00,0.77,73.21,0.00,14.68,0.00,0.00,2
112+
112,1.52739,11.02,0.00,0.75,73.08,0.00,14.96,0.00,0.00,2
113+
113,1.52777,12.64,0.00,0.67,72.02,0.06,14.40,0.00,0.00,2
114+
114,1.51892,13.46,3.83,1.26,72.55,0.57,8.21,0.00,0.14,2
115+
115,1.51847,13.10,3.97,1.19,72.44,0.60,8.43,0.00,0.00,2
116+
116,1.51846,13.41,3.89,1.33,72.38,0.51,8.28,0.00,0.00,2
117+
117,1.51829,13.24,3.90,1.41,72.33,0.55,8.31,0.00,0.10,2
118+
118,1.51708,13.72,3.68,1.81,72.06,0.64,7.88,0.00,0.00,2
119+
119,1.51673,13.30,3.64,1.53,72.53,0.65,8.03,0.00,0.29,2
120+
120,1.51652,13.56,3.57,1.47,72.45,0.64,7.96,0.00,0.00,2
121+
121,1.51844,13.25,3.76,1.32,72.40,0.58,8.42,0.00,0.00,2
122+
122,1.51663,12.93,3.54,1.62,72.96,0.64,8.03,0.00,0.21,2
123+
123,1.51687,13.23,3.54,1.48,72.84,0.56,8.10,0.00,0.00,2
124+
124,1.51707,13.48,3.48,1.71,72.52,0.62,7.99,0.00,0.00,2
125+
125,1.52177,13.20,3.68,1.15,72.75,0.54,8.52,0.00,0.00,2
126+
126,1.51872,12.93,3.66,1.56,72.51,0.58,8.55,0.00,0.12,2
127+
127,1.51667,12.94,3.61,1.26,72.75,0.56,8.60,0.00,0.00,2
128+
128,1.52081,13.78,2.28,1.43,71.99,0.49,9.85,0.00,0.17,2
129+
129,1.52068,13.55,2.09,1.67,72.18,0.53,9.57,0.27,0.17,2
130+
130,1.52020,13.98,1.35,1.63,71.76,0.39,10.56,0.00,0.18,2
131+
131,1.52177,13.75,1.01,1.36,72.19,0.33,11.14,0.00,0.00,2
132+
132,1.52614,13.70,0.00,1.36,71.24,0.19,13.44,0.00,0.10,2
133+
133,1.51813,13.43,3.98,1.18,72.49,0.58,8.15,0.00,0.00,2
134+
134,1.51800,13.71,3.93,1.54,71.81,0.54,8.21,0.00,0.15,2
135+
135,1.51811,13.33,3.85,1.25,72.78,0.52,8.12,0.00,0.00,2
136+
136,1.51789,13.19,3.90,1.30,72.33,0.55,8.44,0.00,0.28,2
137+
137,1.51806,13.00,3.80,1.08,73.07,0.56,8.38,0.00,0.12,2
138+
138,1.51711,12.89,3.62,1.57,72.96,0.61,8.11,0.00,0.00,2
139+
139,1.51674,12.79,3.52,1.54,73.36,0.66,7.90,0.00,0.00,2
140+
140,1.51674,12.87,3.56,1.64,73.14,0.65,7.99,0.00,0.00,2
141+
141,1.51690,13.33,3.54,1.61,72.54,0.68,8.11,0.00,0.00,2
142+
142,1.51851,13.20,3.63,1.07,72.83,0.57,8.41,0.09,0.17,2
143+
143,1.51662,12.85,3.51,1.44,73.01,0.68,8.23,0.06,0.25,2
144+
144,1.51709,13.00,3.47,1.79,72.72,0.66,8.18,0.00,0.00,2
145+
145,1.51660,12.99,3.18,1.23,72.97,0.58,8.81,0.00,0.24,2
146+
146,1.51839,12.85,3.67,1.24,72.57,0.62,8.68,0.00,0.35,2
147+
147,1.51769,13.65,3.66,1.11,72.77,0.11,8.60,0.00,0.00,3
148+
148,1.51610,13.33,3.53,1.34,72.67,0.56,8.33,0.00,0.00,3
149+
149,1.51670,13.24,3.57,1.38,72.70,0.56,8.44,0.00,0.10,3
150+
150,1.51643,12.16,3.52,1.35,72.89,0.57,8.53,0.00,0.00,3
151+
151,1.51665,13.14,3.45,1.76,72.48,0.60,8.38,0.00,0.17,3
152+
152,1.52127,14.32,3.90,0.83,71.50,0.00,9.49,0.00,0.00,3
153+
153,1.51779,13.64,3.65,0.65,73.00,0.06,8.93,0.00,0.00,3
154+
154,1.51610,13.42,3.40,1.22,72.69,0.59,8.32,0.00,0.00,3
155+
155,1.51694,12.86,3.58,1.31,72.61,0.61,8.79,0.00,0.00,3
156+
156,1.51646,13.04,3.40,1.26,73.01,0.52,8.58,0.00,0.00,3
157+
157,1.51655,13.41,3.39,1.28,72.64,0.52,8.65,0.00,0.00,3
158+
158,1.52121,14.03,3.76,0.58,71.79,0.11,9.65,0.00,0.00,3
159+
159,1.51776,13.53,3.41,1.52,72.04,0.58,8.79,0.00,0.00,3
160+
160,1.51796,13.50,3.36,1.63,71.94,0.57,8.81,0.00,0.09,3
161+
161,1.51832,13.33,3.34,1.54,72.14,0.56,8.99,0.00,0.00,3
162+
162,1.51934,13.64,3.54,0.75,72.65,0.16,8.89,0.15,0.24,3
163+
163,1.52211,14.19,3.78,0.91,71.36,0.23,9.14,0.00,0.37,3
164+
164,1.51514,14.01,2.68,3.50,69.89,1.68,5.87,2.20,0.00,5
165+
165,1.51915,12.73,1.85,1.86,72.69,0.60,10.09,0.00,0.00,5
166+
166,1.52171,11.56,1.88,1.56,72.86,0.47,11.41,0.00,0.00,5
167+
167,1.52151,11.03,1.71,1.56,73.44,0.58,11.62,0.00,0.00,5
168+
168,1.51969,12.64,0.00,1.65,73.75,0.38,11.53,0.00,0.00,5
169+
169,1.51666,12.86,0.00,1.83,73.88,0.97,10.17,0.00,0.00,5
170+
170,1.51994,13.27,0.00,1.76,73.03,0.47,11.32,0.00,0.00,5
171+
171,1.52369,13.44,0.00,1.58,72.22,0.32,12.24,0.00,0.00,5
172+
172,1.51316,13.02,0.00,3.04,70.48,6.21,6.96,0.00,0.00,5
173+
173,1.51321,13.00,0.00,3.02,70.70,6.21,6.93,0.00,0.00,5
174+
174,1.52043,13.38,0.00,1.40,72.25,0.33,12.50,0.00,0.00,5
175+
175,1.52058,12.85,1.61,2.17,72.18,0.76,9.70,0.24,0.51,5
176+
176,1.52119,12.97,0.33,1.51,73.39,0.13,11.27,0.00,0.28,5
177+
177,1.51905,14.00,2.39,1.56,72.37,0.00,9.57,0.00,0.00,6
178+
178,1.51937,13.79,2.41,1.19,72.76,0.00,9.77,0.00,0.00,6
179+
179,1.51829,14.46,2.24,1.62,72.38,0.00,9.26,0.00,0.00,6
180+
180,1.51852,14.09,2.19,1.66,72.67,0.00,9.32,0.00,0.00,6
181+
181,1.51299,14.40,1.74,1.54,74.55,0.00,7.59,0.00,0.00,6
182+
182,1.51888,14.99,0.78,1.74,72.50,0.00,9.95,0.00,0.00,6
183+
183,1.51916,14.15,0.00,2.09,72.74,0.00,10.88,0.00,0.00,6
184+
184,1.51969,14.56,0.00,0.56,73.48,0.00,11.22,0.00,0.00,6
185+
185,1.51115,17.38,0.00,0.34,75.41,0.00,6.65,0.00,0.00,6
186+
186,1.51131,13.69,3.20,1.81,72.81,1.76,5.43,1.19,0.00,7
187+
187,1.51838,14.32,3.26,2.22,71.25,1.46,5.79,1.63,0.00,7
188+
188,1.52315,13.44,3.34,1.23,72.38,0.60,8.83,0.00,0.00,7
189+
189,1.52247,14.86,2.20,2.06,70.26,0.76,9.76,0.00,0.00,7
190+
190,1.52365,15.79,1.83,1.31,70.43,0.31,8.61,1.68,0.00,7
191+
191,1.51613,13.88,1.78,1.79,73.10,0.00,8.67,0.76,0.00,7
192+
192,1.51602,14.85,0.00,2.38,73.28,0.00,8.76,0.64,0.09,7
193+
193,1.51623,14.20,0.00,2.79,73.46,0.04,9.04,0.40,0.09,7
194+
194,1.51719,14.75,0.00,2.00,73.02,0.00,8.53,1.59,0.08,7
195+
195,1.51683,14.56,0.00,1.98,73.29,0.00,8.52,1.57,0.07,7
196+
196,1.51545,14.14,0.00,2.68,73.39,0.08,9.07,0.61,0.05,7
197+
197,1.51556,13.87,0.00,2.54,73.23,0.14,9.41,0.81,0.01,7
198+
198,1.51727,14.70,0.00,2.34,73.28,0.00,8.95,0.66,0.00,7
199+
199,1.51531,14.38,0.00,2.66,73.10,0.04,9.08,0.64,0.00,7
200+
200,1.51609,15.01,0.00,2.51,73.05,0.05,8.83,0.53,0.00,7
201+
201,1.51508,15.15,0.00,2.25,73.50,0.00,8.34,0.63,0.00,7
202+
202,1.51653,11.95,0.00,1.19,75.18,2.70,8.93,0.00,0.00,7
203+
203,1.51514,14.85,0.00,2.42,73.72,0.00,8.39,0.56,0.00,7
204+
204,1.51658,14.80,0.00,1.99,73.11,0.00,8.28,1.71,0.00,7
205+
205,1.51617,14.95,0.00,2.27,73.30,0.00,8.71,0.67,0.00,7
206+
206,1.51732,14.95,0.00,1.80,72.99,0.00,8.61,1.55,0.00,7
207+
207,1.51645,14.94,0.00,1.87,73.11,0.00,8.67,1.38,0.00,7
208+
208,1.51831,14.39,0.00,1.82,72.86,1.41,6.47,2.88,0.00,7
209+
209,1.51640,14.37,0.00,2.74,72.85,0.00,9.45,0.54,0.00,7
210+
210,1.51623,14.14,0.00,2.88,72.61,0.08,9.18,1.06,0.00,7
211+
211,1.51685,14.92,0.00,1.99,73.06,0.00,8.40,1.59,0.00,7
212+
212,1.52065,14.36,0.00,2.02,73.42,0.00,8.44,1.64,0.00,7
213+
213,1.51651,14.38,0.00,1.94,73.61,0.00,8.48,1.57,0.00,7
214+
214,1.51711,14.23,0.00,2.08,73.36,0.00,8.62,1.67,0.00,7

‎header.txt

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
1. Id number: 1 to 214
2+
2. RI: refractive index
3+
3. Na: Sodium (unit measurement: weight percent in corresponding oxide, as
4+
are attributes 4-10)
5+
4. Mg: Magnesium
6+
5. Al: Aluminum
7+
6. Si: Silicon
8+
7. K: Potassium
9+
8. Ca: Calcium
10+
9. Ba: Barium
11+
10. Fe: Iron
12+
11. Type of glass: (class attribute)
13+
-- 1 building_windows_float_processed
14+
-- 2 building_windows_non_float_processed
15+
-- 3 vehicle_windows_float_processed
16+
-- 4 vehicle_windows_non_float_processed (none in this database)
17+
-- 5 containers
18+
-- 6 tableware
19+
-- 7 headlamps

‎market

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Bread, Milk
2+
Bread, Diapers, Beer, Eggs
3+
Milk, Diapers, Beer, Cola
4+
Bread, Milk, Diapers, Beer
5+
Bread, Milk, Diapers, Cola

‎readme.txt

+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
1. Title: Glass Identification Database
2+
3+
2. Sources:
4+
(a) Creator: B. German
5+
-- Central Research Establishment
6+
Home Office Forensic Science Service
7+
Aldermaston, Reading, Berkshire RG7 4PN
8+
(b) Donor: Vina Spiehler, Ph.D., DABFT
9+
Diagnostic Products Corporation
10+
(213) 776-0180 (ext 3014)
11+
(c) Date: September, 1987
12+
13+
3. Past Usage:
14+
-- Rule Induction in Forensic Science
15+
-- Ian W. Evett and Ernest J. Spiehler
16+
-- Central Research Establishment
17+
Home Office Forensic Science Service
18+
Aldermaston, Reading, Berkshire RG7 4PN
19+
-- Unknown technical note number (sorry, not listed here)
20+
-- General Results: nearest neighbor held its own with respect to the
21+
rule-based system
22+
23+
4. Relevant Information:n
24+
Vina conducted a comparison test of her rule-based system, BEAGLE, the
25+
nearest-neighbor algorithm, and discriminant analysis. BEAGLE is
26+
a product available through VRS Consulting, Inc.; 4676 Admiralty Way,
27+
Suite 206; Marina Del Ray, CA 90292 (213) 827-7890 and FAX: -3189.
28+
In determining whether the glass was a type of "float" glass or not,
29+
the following results were obtained (# incorrect answers):
30+
31+
Type of Sample Beagle NN DA
32+
Windows that were float processed (87) 10 12 21
33+
Windows that were not: (76) 19 16 22
34+
35+
The study of classification of types of glass was motivated by
36+
criminological investigation. At the scene of the crime, the glass left
37+
can be used as evidence...if it is correctly identified!
38+
39+
5. Number of Instances: 214
40+
41+
6. Number of Attributes: 10 (including an Id#) plus the class attribute
42+
-- all attributes are continuously valued
43+
44+
7. Attribute Information:
45+
1. Id number: 1 to 214
46+
2. RI: refractive index
47+
3. Na: Sodium (unit measurement: weight percent in corresponding oxide, as
48+
are attributes 4-10)
49+
4. Mg: Magnesium
50+
5. Al: Aluminum
51+
6. Si: Silicon
52+
7. K: Potassium
53+
8. Ca: Calcium
54+
9. Ba: Barium
55+
10. Fe: Iron
56+
11. Type of glass: (class attribute)
57+
-- 1 building_windows_float_processed
58+
-- 2 building_windows_non_float_processed
59+
-- 3 vehicle_windows_float_processed
60+
-- 4 vehicle_windows_non_float_processed (none in this database)
61+
-- 5 containers
62+
-- 6 tableware
63+
-- 7 headlamps
64+
65+
8. Missing Attribute Values: None
66+
67+
Summary Statistics:
68+
Attribute: Min Max Mean SD Correlation with class
69+
2. RI: 1.5112 1.5339 1.5184 0.0030 -0.1642
70+
3. Na: 10.73 17.38 13.4079 0.8166 0.5030
71+
4. Mg: 0 4.49 2.6845 1.4424 -0.7447
72+
5. Al: 0.29 3.5 1.4449 0.4993 0.5988
73+
6. Si: 69.81 75.41 72.6509 0.7745 0.1515
74+
7. K: 0 6.21 0.4971 0.6522 -0.0100
75+
8. Ca: 5.43 16.19 8.9570 1.4232 0.0007
76+
9. Ba: 0 3.15 0.1750 0.4972 0.5751
77+
10. Fe: 0 0.51 0.0570 0.0974 -0.1879
78+
79+
9. Class Distribution: (out of 214 total instances)
80+
-- 163 Window glass (building windows and vehicle windows)
81+
-- 87 float processed
82+
-- 70 building windows
83+
-- 17 vehicle windows
84+
-- 76 non-float processed
85+
-- 76 building windows
86+
-- 0 vehicle windows
87+
-- 51 Non-window glass
88+
-- 13 containers
89+
-- 9 tableware
90+
-- 29 headlamps
91+
92+
93+

‎rules.graphml

+15,918
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)
Please sign in to comment.