Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitattributes #50

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ test-coverage: $(LINGUIST_PATH)
tail -n +2 $(COVERAGE_PROFILE) >> $(COVERAGE_REPORT); \
rm $(COVERAGE_PROFILE); \
fi; \
done;
done;

code-generate: $(LINGUIST_PATH)
mkdir -p data
Expand Down
35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ Note that even if enry's CLI is compatible with linguist's, its main point is th
Development
------------

*enry* re-uses parts of original [linguist](https://github.com/github/linguist) especially data in `languages.yml` to generate internal data structures. In oreder to update to latest upstream run
*enry* re-uses parts of original [linguist](https://github.com/github/linguist) especially data in `languages.yml` to generate internal data structures. In order to update to latest upstream run

make clean code-generate

Expand All @@ -140,6 +140,7 @@ Using [linguist/samples](https://github.com/github/linguist/tree/master/samples)
* all files for SQL language fall to the classifier because we don't parse this [disambiguator expresion](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb#L433) for `*.sql` files right. This expression doesn't comply with the pattern for the rest of [heuristics.rb](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb) file.



Benchmarks
------------

Expand Down Expand Up @@ -172,6 +173,38 @@ to get time averages for main detection function and strategies for the whole sa
if you want see measures by sample file



.gitattributes
--------------

Like in linguist you can override the strategies via `.gitattributes` file.
Add a `.gitattributes` file to the directory and use the same matchers that you would use in linguist `linguist-documentation`,`linguist-language` or `linguist-vendored` to do the override.

#### Vendored code

Use the `linguist-vendored` attribute to vendor or un-vendor paths.

```
$ cat .gitattributes
this-is-a-vendor-directory/ linguist-vendored
this-is-not/ linguist-vendored=false
```
#### Documentation

Documentation works the same way as vendored code but using `linguist-documentation` and `linguist-documentation=false`.

#### Language assignation

If you want some files to be classified according to certain language use `linguist-language=[language]`.

```
$ cat .gitattributes
.*\.go linguist-language=MyFavouriteLanguage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the regex must be compatible with Golang regexp, does it mean that a .gitattribute file used in linguist has a different syntax? Does it make .gitattributes files for enry incompatibles with linguist ones?

```

Note that the regular expression that matches the file name should be compatible with go, see: [Golang regexp](https://golang.org/pkg/regexp/).


Why Enry?
------------

Expand Down
37 changes: 21 additions & 16 deletions cli/enry/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ func main() {
log.Fatal(err)
}

gitAttributes := enry.NewGitAttributes()
reader, err := os.Open(".gitattributes")
if err == nil {
gitAttributes.LoadGitAttributes("", reader)
}

errors := false
out := make(map[string][]string, 0)
err = filepath.Walk(root, func(path string, f os.FileInfo, err error) error {
Expand All @@ -53,8 +59,9 @@ func main() {
relativePath = relativePath + "/"
}

if enry.IsVendor(relativePath) || enry.IsDotFile(relativePath) ||
enry.IsDocumentation(relativePath) || enry.IsConfiguration(relativePath) {
if gitAttributes.IsVendor(relativePath) || enry.IsDotFile(relativePath) ||
gitAttributes.IsDocumentation(relativePath) || enry.IsConfiguration(relativePath) ||
gitAttributes.IsGenerated(path) {
if f.IsDir() {
return filepath.SkipDir
}
Expand All @@ -66,20 +73,18 @@ func main() {
return nil
}

language, ok := enry.GetLanguageByExtension(path)
if !ok {
if language, ok = enry.GetLanguageByFilename(path); !ok {
content, err := ioutil.ReadFile(path)
if err != nil {
errors = true
log.Println(err)
return nil
}

language = enry.GetLanguage(filepath.Base(path), content)
if language == enry.OtherLanguage {
return nil
}
content, err := ioutil.ReadFile(path)
if err != nil {
errors = true
log.Println(err)
return nil
}

language := gitAttributes.GetLanguage(filepath.Base(path))
if language == enry.OtherLanguage {
language = enry.GetLanguage(filepath.Base(path), content)
if language == enry.OtherLanguage {
return nil
}
}

Expand Down
26 changes: 26 additions & 0 deletions common.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package enry
import (
"bufio"
"bytes"
"os"
"path/filepath"
"regexp"
"strings"
Expand Down Expand Up @@ -95,6 +96,12 @@ func GetLanguageByClassifier(content []byte, candidates []string) (language stri
return getLanguageByStrategy(GetLanguagesByClassifier, "", content, candidates)
}

// GetLanguageByGitattributes returns the language assigned to a file for a given regular expresion in .gitattributes.
// This strategy needs to be initialized calling LoadGitattributes
func GetLanguageByGitattributes(filename string) (language string, safe bool) {
return getLanguageByStrategy(GetLanguagesByGitAttributes, filename, nil, nil)
}

func getLanguageByStrategy(strategy Strategy, filename string, content []byte, candidates []string) (string, bool) {
languages := strategy(filename, content, candidates)
return getFirstLanguageAndSafe(languages)
Expand Down Expand Up @@ -407,6 +414,25 @@ func GetLanguagesBySpecificClassifier(content []byte, candidates []string, class
return classifier.Classify(content, mapCandidates)
}

// GetLanguagesByGitAttributes returns either a string slice with the language
// if the filename matches with a regExp in .gitattributes or returns an empty slice
// in case no regExp matches the filename. It complies with the signature to be a Strategy type.
func GetLanguagesByGitAttributes(filename string, content []byte, candidates []string) []string {
gitAttributes := NewGitAttributes()
reader, err := os.Open(".gitattributes")
if err != nil {
return nil
}

gitAttributes.LoadGitAttributes("", reader)
lang := gitAttributes.GetLanguage(filename)
if lang != OtherLanguage {
return []string{}
}

return []string{lang}
}

// GetLanguageExtensions returns the different extensions being used by the language.
func GetLanguageExtensions(language string) []string {
return data.ExtensionsByLanguage[language]
Expand Down
Loading