You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dragnet isn't interested in the shiny chrome or boilerplate dressing of a
8
-
web page. It's interested in... 'just the facts.' The machine learning
9
-
models in Dragnet extract the main article content and optionally
10
-
user generated comments from a web page. They provide state
11
-
of the art performance on variety of test benchmarks.
7
+
Dragnet isn't interested in the shiny chrome or boilerplate dressing
8
+
of a web page. It's interested in... 'just the facts.' The machine
9
+
learning models in Dragnet extract the main article content and
10
+
optionally user generated comments from a web page. They provide
11
+
state of the art performance on variety of test benchmarks.
12
12
13
13
For more information on our approach check out:
14
14
@@ -17,8 +17,8 @@ at WWW in 2013, gives an overview of the machine learning approach.
17
17
*[A comparison](https://moz.com/devblog/benchmarking-python-content-extraction-algorithms-dragnet-readability-goose-and-eatiht/) of Dragnet and alternate content extraction packages.
18
18
*[This blog post](https://moz.com/devblog/dragnet-content-extraction-from-diverse-feature-sets/) explains the intuition behind the algorithms.
19
19
20
-
This project was originally inspired by
21
-
Kohlschütter et al, [Boilerplate Detection using Shallow Text Features](http://www.l3s.de/~kohlschuetter/publications/wsdm187-kohlschuetter.pdf) and
20
+
This project was originally inspired by
21
+
Kohlschütter et al, [Boilerplate Detection using Shallow Text Features](http://www.l3s.de/~kohlschuetter/publications/wsdm187-kohlschuetter.pdf) and
22
22
Weninger et al [CETR -- Content Extraction with Tag Ratios](http://web.engr.illinois.edu/~weninge1/cetr/), and more recently by [Readability](https://github.com/buriy/python-readability).
23
23
24
24
# GETTING STARTED
@@ -79,8 +79,6 @@ virtual machine with Dragnet and it's dependencies.
0 commit comments