We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_ner_crf.py train分支时,为什么from_pretrained()函数后留了一个逗号,导致,tokenizer.tokenize()分词的不是逐个字符分的,导致tokens和label_ids长度不等。 tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path, do_lower_case=args.do_lower_case, )
run_ner_span.py 这个就是用的不带逗号的from_pretrained()tokenizer.tokenize()分词是逐个字符分的,这样tokens和label_ids长度相等。 tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path, do_lower_case=args.do_lower_case)
这么写的区别在哪, 感觉这是个bug 修正后,最大的问题,用run_ner_crf.py训练自己数据集,两个epoch之后recall就会一直下降,这个问题作者遇到过吗,有空麻烦回一下
The text was updated successfully, but these errors were encountered:
#86
Sorry, something went wrong.
No branches or pull requests
run_ner_crf.py
train分支时,为什么from_pretrained()函数后留了一个逗号,导致,tokenizer.tokenize()分词的不是逐个字符分的,导致tokens和label_ids长度不等。
tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path, do_lower_case=args.do_lower_case, )
run_ner_span.py
这个就是用的不带逗号的from_pretrained()tokenizer.tokenize()分词是逐个字符分的,这样tokens和label_ids长度相等。
tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path, do_lower_case=args.do_lower_case)
这么写的区别在哪,
感觉这是个bug
修正后,最大的问题,用run_ner_crf.py训练自己数据集,两个epoch之后recall就会一直下降,这个问题作者遇到过吗,有空麻烦回一下
The text was updated successfully, but these errors were encountered: