Skip to content

Commit e2e8913

Browse files
authored
Merge pull request dragnet-org#44 from seomoz/cbf/set-work-dir-to-vagrant
Set the working directory to /vagrant by default on login.
2 parents a948af7 + 8c71692 commit e2e8913

File tree

3 files changed

+16
-15
lines changed

3 files changed

+16
-15
lines changed

README.md

+8-10
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11

22
Dragnet
3-
=====================================
3+
=======
44

55
[![Build Status](https://api.travis-ci.org/seomoz/dragnet.png)](https://api.travis-ci.org/seomoz/dragnet.png)
66

7-
Dragnet isn't interested in the shiny chrome or boilerplate dressing of a
8-
web page. It's interested in... 'just the facts.' The machine learning
9-
models in Dragnet extract the main article content and optionally
10-
user generated comments from a web page. They provide state
11-
of the art performance on variety of test benchmarks.
7+
Dragnet isn't interested in the shiny chrome or boilerplate dressing
8+
of a web page. It's interested in... 'just the facts.' The machine
9+
learning models in Dragnet extract the main article content and
10+
optionally user generated comments from a web page. They provide
11+
state of the art performance on variety of test benchmarks.
1212

1313
For more information on our approach check out:
1414

@@ -17,8 +17,8 @@ at WWW in 2013, gives an overview of the machine learning approach.
1717
* [A comparison](https://moz.com/devblog/benchmarking-python-content-extraction-algorithms-dragnet-readability-goose-and-eatiht/) of Dragnet and alternate content extraction packages.
1818
* [This blog post](https://moz.com/devblog/dragnet-content-extraction-from-diverse-feature-sets/) explains the intuition behind the algorithms.
1919

20-
This project was originally inspired by
21-
Kohlschütter et al, [Boilerplate Detection using Shallow Text Features](http://www.l3s.de/~kohlschuetter/publications/wsdm187-kohlschuetter.pdf) and
20+
This project was originally inspired by
21+
Kohlschütter et al, [Boilerplate Detection using Shallow Text Features](http://www.l3s.de/~kohlschuetter/publications/wsdm187-kohlschuetter.pdf) and
2222
Weninger et al [CETR -- Content Extraction with Tag Ratios](http://web.engr.illinois.edu/~weninge1/cetr/), and more recently by [Readability](https://github.com/buriy/python-readability).
2323

2424
# GETTING STARTED
@@ -79,8 +79,6 @@ virtual machine with Dragnet and it's dependencies.
7979

8080
```bash
8181
vagrant ssh
82-
# inside the vagrant vm
83-
$ cd /vagrant
8482
# these should now pass
8583
$ make test
8684
```

Vagrantfile

+6-5
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,15 @@ ENV['VAGRANT_DEFAULT_PROVIDER'] = 'virtualbox'
55

66
Vagrant.configure(2) do |config|
77
config.vm.box = "ubuntu/trusty64"
8+
config.vm.hostname = 'dragnet'
89

9-
# lxml has trouble building if the amount of memory is 512:
10-
# http://stackoverflow.com/questions/16149613/installing-lxml-with-pip-in-virtualenv-ubuntu-12-10-error-command-gcc-failed
10+
# lxml has trouble building if the amount of memory is 512:
11+
# http://stackoverflow.com/questions/16149613/installing-lxml-with-pip-in-virtualenv-ubuntu-12-10-error-command-gcc-failed
1112
config.vm.provider "virtualbox" do |vb|
1213
vb.memory = "2048"
1314
end
1415

15-
config.vm.provision "shell", privileged: false, inline: <<-SHELL
16-
cd /vagrant; ./provision.sh
17-
SHELL
16+
config.vm.provision "shell", privileged: false, inline: <<-SHELL
17+
cd /vagrant; ./provision.sh
18+
SHELL
1819
end

provision.sh

+2
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ export PATH=$HOME/py/bin:$PATH
1818
# configure conda for future login (for vagrant)
1919
echo "export PATH=$PATH" >> $HOME/.bashrc
2020

21+
echo "cd /vagrant" >> $HOME/.bashrc
22+
2123
pip install "Cython>=0.21.1"
2224
pip install -r requirements.txt
2325

0 commit comments

Comments
 (0)