-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check valid robots.txt file #50
Comments
I really need this feature and I will be happy to help :) |
Hello @coderingo Thank you so much for wiling to contribute 😄 |
Hi @vieiralucas Thanks for the quick response. I am up to start working on it. Can you kindly give me a hint about where to start from? I read the standards specified by google and they are very forgiving regarding the syntax validation. Kind Regards :) |
Hello, So I read this page from google and I agree that they are very forgiving regarding syntax validation.
This basically means that if a line is invalid it is just ignored. IMHO So, I would create a method, I'm not sure how to call it, maybe // when valid
{
valid: true,
invalidLines: []
}
// when invalid
{
valid: false,
invalidLines: [
{ number: 2, content: '....' }
]
} Another thing that might be a good idea is to just add an @lucasfcosta, @coderingo What do you think? I personally like this latest idea |
Hi @vieiralucas I agree with your last idea to just add an On another note, The parse gave me an error on almost a valid file. I will like to share contents of the file: ` User-agent: * Sitemap: http://www.gstatic.com/profiles-sitemap.xml Its a modified part from https://google.com/robots.txt . If you notice the four spaces at the start of the
I haven't debugged it further but I think it may be because of how the array of lines is created from the content. Thanks for your precious time and dedication to the issues |
I think that we can be a little more forgiving about parsing rules. We wrote this code a while ago so we were kind of strict in our implementation and didn't consider many edge cases. I have already implemented the same package in Go and it has parsing rules that are more likely to be compliant with Google's standards. I think that implementation might help you when thinking about how you should tackle parsing. I also like the idea of pointing out which things are wrong in a line basis since one invalid line won't invalidate the others. Basically we should start treating each line as an atomic unit, it can be parsed alone. Whenever you send us your pull request we will be very happy to review and point you the right direction if we disagree with anything, but please feel free to do it the way you think it's better. Thank you very much for contributing! |
Hi Lucas, I am working my head around the gobotto. Its my first interaction with golang 😅 . I will try to work on it. Seems interesting. As for the let userAgentIndex = line.toLowerCase().indexOf('user-agent:'); |
@coderingo You are totally right! Ah, and sorry for showing you Go code without asking if you knew Go first. Let me give you a more detailed explanation of the algorithm I used there:
I think that's pretty much it, maybe there are some more edge cases I haven't thought about, so if you can remember any other please let me know. Thanks again for your help! |
We should add a method to check if the content of robots.txt contains valid syntax.
The text was updated successfully, but these errors were encountered: