-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MBS-11948: More strict favicon class assignment #2256
base: master
Are you sure you want to change the base?
Conversation
AFAICT there's no reason not to just let URL() do the validation for us here - if it errors, then it's not a valid URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks very nice overall.
- If @mwiencek confirms
URL
usage is appropriate here, it is probably not useful to add unit tests for theisValidURL
function as its new implementation seems to be browser-dependent. - MBS-11948’s implementation looks nice, and I did check some favicon classes but not all of them since you added unit tests for these and any regression here would not be blocking editors. My only comments are about tests’ implementation.
|
||
export default function isUrlValid(url: string): boolean { | ||
try { | ||
new URL(url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, when I wrote that in 2014 the URL
API was brand new (if supported at all outside of Chrome) and we didn't have core-js polyfills or anything back then.
One difference that probably exists, though, is that something like new URL('/haha')
is valid, whereas the current function requires a fully-specified URL. This seems like something that should be looked into, if it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite true - https://developer.mozilla.org/en-US/docs/Web/API/URL/URL "If url is a relative URL, base is required, and will be used as the base URL". Otherwise, it errors anyway.
}, | ||
{ | ||
input_url: 'https://ra.co/tracks/544258', | ||
input_entity_type: 'recording', | ||
expected_relationship_type: 'otherdatabases', | ||
expected_clean_url: 'https://ra.co/tracks/544258', | ||
only_valid_entity_types: ['recording'], | ||
expected_favicon_class: 'residentadvisor-favicon', | ||
}, | ||
{ | ||
input_url: 'http://www.ra.co/reviews/7636', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test RA reviews’ favicon too?
Edit: Don’t make extraneous tests if setting the favicon doesn’t depend the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they have a different favicon for reviews? Am I missing something? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I was initially looking for a favicon test of kenjimiura.com
when I noticed you added a favicon test to all RA tests but that one; I just wondered why. Now looking at the code again (commit 86ef099
- root/static/scripts/common/constants.js
- line 156), setting this favicon doesn’t depend on the path indeed. So maybe it’s not useful to have 7 favicon tests for RA URLs, having 1 might be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I see :) I added it to all of them but the one with no valid entity types (that's what I generally did elsewhere too) just in case we changed some specific cleanup and it no longer matched the favicon, but admittedly it might be overdoing it :) I can remove it from the rest if desired.
{ | ||
input_url: 'http://www.ra.co/exchange/552', | ||
input_entity_type: 'release', | ||
expected_relationship_type: 'shownotes', | ||
expected_clean_url: 'https://ra.co/exchange/552', | ||
only_valid_entity_types: ['release'], | ||
expected_favicon_class: 'residentadvisor-favicon', | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest testing the reported URL’s favicon for that bug:
{ | |
input_url: 'http://www.ra.co/exchange/552', | |
input_entity_type: 'release', | |
expected_relationship_type: 'shownotes', | |
expected_clean_url: 'https://ra.co/exchange/552', | |
only_valid_entity_types: ['release'], | |
expected_favicon_class: 'residentadvisor-favicon', | |
}, | |
{ | |
input_url: 'http://www.ra.co/exchange/552', | |
input_entity_type: 'release', | |
expected_relationship_type: 'shownotes', | |
expected_clean_url: 'https://ra.co/exchange/552', | |
only_valid_entity_types: ['release'], | |
expected_favicon_class: 'residentadvisor-favicon', | |
}, | |
{ | |
input_url: 'https://www.kenjimiura.com/', | |
input_entity_type: 'artist', | |
expected_relationship_type: undefined, | |
expected_clean_url: 'https://www.kenjimiura.com/', | |
expected_favicon_class: 'no-favicon', | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds sensible, will add
@@ -4848,6 +5405,11 @@ testData.forEach(function (subtest, i) { | |||
} | |||
tested = true; | |||
} | |||
if (subtest.expected_favicon_class) { | |||
const actualFaviconClass = getFaviconClass(actualCleanUrl); | |||
st.equal(actualFaviconClass, subtest.expected_favicon_class, 'Correct favicon'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it test the input URL too? For existing unclean URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting we wouldn't necessarily care if the favicon is not shown for existing URLs that need fixing anyway :) (that said, wonder if we could / should have some sort of either report or automatic script for fixing those).
if (subtest.expected_favicon_class) { | ||
const actualFaviconClass = getFaviconClass(actualCleanUrl); | ||
st.equal(actualFaviconClass, subtest.expected_favicon_class, 'Correct favicon'); | ||
tested = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that imply that a test can just consist of input_url
and expected_favicon_class
?
If it is tested against the clean URL, expected_clean_url
should be mandatory.
Maybe the documentation about test properties in dcabded commit’s message should be added to this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IMO that should be ok, shouldn't it? It's tested against the return of cleanURL()
, which is the original if there's no cleanup function, so effectively, it is tested against whatever will actually be added to the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, or is your idea (seeing the miura.com example above) that there should be an expected_clean_url
even if the expectation is there's no cleanup and you get the same that came in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IMO that should be ok, shouldn't it?
Not sure what “that” and “it” refer to.
It's tested against the return of
cleanURL(), which is the original if there's no cleanup function
It is hiding the URL being actually tested for favicon class, from both the test file and the test output.
The clean URL may significantly differ from the input URL, that’s why I suggested to make expected_clean_url
mandatory for use with expected_favicon_class
.
so effectively, it is tested against whatever will actually be added to the DB.
So effectively, it doesn’t test against actual unclean URLs already in the DB.
Let’s use the above thread #2256 (comment) for follow-up on this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, or is your idea (seeing the miura.com example above) that there should be an
expected_clean_url
even if the expectation is there's no cleanup and you get the same that came in?
Yes. (Did post the previous comment before seeing that one.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can always do that and always have expected_clean_url
- I just thought that was a bit misleading because it suggested a cleanup was being run, but maybe not :)
We have had several issues in the past of favicons being assigned incorrectly. The one triggering this issue was kenjimiura.com being matched with the residentadvisor "ra.co" string. There's no reason why we should stick to this naive approach to matching when we have regular expressions and URL objects, so this makes use of both to actually validate the full hostnames of URLs and, when needed, the start of their paths. The regexes are mostly based on the existing matchup/cleanup we do. The few we don't validate/clean up at all I based on what the structure seems to be via a quick database check, although admittedly I might have missed some edge cases. As an additional bonus, I added testing for the favicon classes, to make sure that at least all the URLs we were already testing do match the expected favicon class (that should also serve as a reminder to change the regexes there if we change how a site is cleaned up).
86ef099
to
f240b36
Compare
A note: I did not add tests for the URLs not currently being tested at all in the URLCleanup tests (a bunch of them we don't clean up at all, we just autoselect, and I think there might be one or two stores we don't even autoselect to begin with). We could do that, but then we'd need to decide for sure how to do the testing for non-cleaned URLs :) |
After further discussion on IRC, it seems that existing I did investigate for potential edge cases: Found It would also require to import |
It's a very large file, so if we can't avoid it. :\ Can we put the match regular expressions into a separate file and import them into |
That would make
|
It'd still load the whole file in that case.
This wouldn't be too difficult, actually, since you can access .source to rebuild the expressions. |
We should decide on an option here. IIRC we looked into building at runtime and it was actually a lot more complicated than it sounded, and then I looked at extracting the regexes from URLCleanup with a script and it was also more complicated than needed, so that leaves two reasonable options: a) have one external file, and use that for match in URLCleanup. Or, if you have a simple idea to parse the file and automaticaly extract the regexes, I'm all ears. |
Fix MBS-11948
We have had several issues in the past of favicons being assigned incorrectly. The one triggering this issue was kenjimiura.com
being matched with the residentadvisor "ra.co" string.
There's no reason why we should stick to this naive approach to matching when we have regular expressions and URL objects, so this makes use of both to actually validate the full hostnames of URLs and, when needed, the start of their paths.
The regexes are mostly based on the existing matchup/cleanup we do. The few we don't validate/clean up at all I based on what the structure seems to be via a quick database check, although admittedly I might have missed some edge cases.
As an additional bonus, I added testing for the favicon classes, to make sure that at least all the URLs we were already testing
do match the expected favicon class (that should also serve as a reminder to change the regexes there if we change how a site is cleaned up).