-
Notifications
You must be signed in to change notification settings - Fork 13
REVAI-4324: Multichannel transcript grouping #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
020da79
add parameters
dmtrrk f49454d
cleanup
dmtrrk 3894377
cleanup
dmtrrk 0e0f242
cleanup
dmtrrk 4ba95f4
cleanup
dmtrrk 7609478
cleanup
dmtrrk bfc7424
add tests
dmtrrk 95245a2
add tests
dmtrrk 8307b3d
bump version
dmtrrk 041a3f3
add type
dmtrrk 0064d86
fix exports
dmtrrk f7bb640
fix exports
dmtrrk 285a797
fix exports
dmtrrk d39ea12
fix exports
dmtrrk ee356bd
update docs
dmtrrk 752bc94
update docs
dmtrrk 1455ee3
update docs
dmtrrk 5ca4836
update docs
dmtrrk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,10 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """Top-level package for rev_ai""" | ||
|
|
||
| __version__ = '2.20.0' | ||
| __version__ = '2.21.0' | ||
|
|
||
| from .models import Job, JobStatus, Account, Transcript, Monologue, Element, MediaConfig, \ | ||
| CaptionType, CustomVocabulary, TopicExtractionJob, TopicExtractionResult, Topic, Informant, \ | ||
| SpeakerName, LanguageIdentificationJob, LanguageIdentificationResult, LanguageConfidence, \ | ||
| SentimentAnalysisResult, SentimentValue, SentimentMessage, SentimentAnalysisJob, \ | ||
| CustomerUrlData, RevAiApiDeploymentConfigMap, RevAiApiDeployment | ||
| CaptionType, GroupChannelsType, CustomVocabulary, TopicExtractionJob, TopicExtractionResult, \ | ||
| Topic, Informant, SpeakerName, LanguageIdentificationJob, LanguageIdentificationResult, \ | ||
| LanguageConfidence, SentimentAnalysisResult, SentimentValue, SentimentMessage, \ | ||
| SentimentAnalysisJob, CustomerUrlData, RevAiApiDeploymentConfigMap, RevAiApiDeployment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -337,95 +337,154 @@ def get_list_of_jobs(self, limit=None, starting_after=None): | |
|
|
||
| return [Job.from_json(job) for job in response.json()] | ||
|
|
||
| def get_transcript_text(self, id_): | ||
| def get_transcript_text(self, id_, group_channels_by=None, group_channels_threshold_ms=None): | ||
| """Get the transcript of a specific job as plain text. | ||
|
|
||
| :param id_: id of job to be requested | ||
| :param group_channels_by: optional, GroupChannelsType grouping strategy for | ||
| multichannel transcripts. None for default. | ||
| :param group_channels_threshold_ms: optional, grouping threshold in milliseconds. | ||
| None for default. | ||
| :returns: transcript data as text | ||
| :raises: HTTPError | ||
| """ | ||
| if not id_: | ||
| raise ValueError('id_ must be provided') | ||
|
|
||
| url = self._build_transcript_url( | ||
| id_, | ||
| group_channels_by=group_channels_by, | ||
| group_channels_threshold_ms=group_channels_threshold_ms | ||
| ) | ||
|
|
||
| response = self._make_http_request( | ||
| "GET", | ||
| urljoin(self.base_url, 'jobs/{}/transcript'.format(id_)), | ||
| url, | ||
| headers={'Accept': 'text/plain'} | ||
| ) | ||
|
|
||
| return response.text | ||
|
|
||
| def get_transcript_text_as_stream(self, id_): | ||
| def get_transcript_text_as_stream(self, | ||
| id_, | ||
| group_channels_by=None, | ||
| group_channels_threshold_ms=None): | ||
| """Get the transcript of a specific job as a plain text stream. | ||
|
|
||
| :param id_: id of job to be requested | ||
| :param group_channels_by: optional, GroupChannelsType grouping strategy for | ||
| multichannel transcripts. None for default. | ||
| :param group_channels_threshold_ms: optional, grouping threshold in milliseconds. | ||
| None for default. | ||
| :returns: requests.models.Response HTTP response which can be used to stream | ||
| the payload of the response | ||
| :raises: HTTPError | ||
| """ | ||
| if not id_: | ||
| raise ValueError('id_ must be provided') | ||
|
|
||
| url = self._build_transcript_url( | ||
| id_, | ||
| group_channels_by=group_channels_by, | ||
| group_channels_threshold_ms=group_channels_threshold_ms | ||
| ) | ||
|
|
||
| response = self._make_http_request( | ||
| "GET", | ||
| urljoin(self.base_url, 'jobs/{}/transcript'.format(id_)), | ||
| url, | ||
| headers={'Accept': 'text/plain'}, | ||
| stream=True | ||
| ) | ||
|
|
||
| return response | ||
|
|
||
| def get_transcript_json(self, id_): | ||
| def get_transcript_json(self, | ||
| id_, | ||
| group_channels_by=None, | ||
| group_channels_threshold_ms=None): | ||
| """Get the transcript of a specific job as json. | ||
|
|
||
| :param id_: id of job to be requested | ||
| :param group_channels_by: optional, GroupChannelsType grouping strategy for | ||
| multichannel transcripts. None for default. | ||
| :param group_channels_threshold_ms: optional, grouping threshold in milliseconds. | ||
| None for default. | ||
| :returns: transcript data as json | ||
| :raises: HTTPError | ||
| """ | ||
| if not id_: | ||
| raise ValueError('id_ must be provided') | ||
|
|
||
| url = self._build_transcript_url( | ||
| id_, | ||
| group_channels_by=group_channels_by, | ||
| group_channels_threshold_ms=group_channels_threshold_ms | ||
| ) | ||
|
|
||
| response = self._make_http_request( | ||
| "GET", | ||
| urljoin(self.base_url, 'jobs/{}/transcript'.format(id_)), | ||
| url, | ||
| headers={'Accept': self.rev_json_content_type} | ||
| ) | ||
|
|
||
| return response.json() | ||
|
|
||
| def get_transcript_json_as_stream(self, id_): | ||
| def get_transcript_json_as_stream(self, | ||
| id_, | ||
| group_channels_by=None, | ||
| group_channels_threshold_ms=None): | ||
| """Get the transcript of a specific job as streamed json. | ||
|
|
||
| :param id_: id of job to be requested | ||
| :param group_channels_by: optional, GroupChannelsType grouping strategy for | ||
| multichannel transcripts. None for default. | ||
| :param group_channels_threshold_ms: optional, grouping threshold in milliseconds. | ||
| None for default. | ||
| :returns: requests.models.Response HTTP response which can be used to stream | ||
| the payload of the response | ||
| :raises: HTTPError | ||
| """ | ||
| if not id_: | ||
| raise ValueError('id_ must be provided') | ||
|
|
||
| url = self._build_transcript_url( | ||
| id_, | ||
| group_channels_by=group_channels_by, | ||
| group_channels_threshold_ms=group_channels_threshold_ms | ||
| ) | ||
|
|
||
| response = self._make_http_request( | ||
| "GET", | ||
| urljoin(self.base_url, 'jobs/{}/transcript'.format(id_)), | ||
| url, | ||
| headers={'Accept': self.rev_json_content_type}, | ||
| stream=True | ||
| ) | ||
|
|
||
| return response | ||
|
|
||
| def get_transcript_object(self, id_): | ||
| def get_transcript_object(self, id_, group_channels_by=None, group_channels_threshold_ms=None): | ||
| """Get the transcript of a specific job as a python object`. | ||
|
|
||
| :param id_: id of job to be requested | ||
| :param group_channels_by: optional, GroupChannelsType grouping strategy for | ||
| multichannel transcripts. None for default. | ||
| :param group_channels_threshold_ms: optional, grouping threshold in milliseconds. | ||
| None for default. | ||
| :returns: transcript data as a python object | ||
| :raises: HTTPError | ||
| """ | ||
| if not id_: | ||
| raise ValueError('id_ must be provided') | ||
|
|
||
| url = self._build_transcript_url( | ||
| id_, | ||
| group_channels_by=group_channels_by, | ||
| group_channels_threshold_ms=group_channels_threshold_ms | ||
| ) | ||
|
|
||
| response = self._make_http_request( | ||
| "GET", | ||
| urljoin(self.base_url, 'jobs/{}/transcript'.format(id_)), | ||
| url, | ||
| headers={'Accept': self.rev_json_content_type} | ||
| ) | ||
|
|
||
|
|
@@ -814,3 +873,22 @@ def _create_job_options_payload( | |
|
|
||
| def _create_captions_query(self, speaker_channel): | ||
| return '' if speaker_channel is None else '?speaker_channel={}'.format(speaker_channel) | ||
|
|
||
| def _build_transcript_url(self, id_, group_channels_by=None, group_channels_threshold_ms=None): | ||
| """Build the get transcript url. | ||
|
|
||
| :param id_: id of job to be requested | ||
| :param group_channels_by: optional, GroupChannelsType grouping strategy for | ||
| multichannel transcripts. None for default. | ||
| :param group_channels_threshold_ms: optional, grouping threshold in milliseconds. | ||
| None for default. | ||
| :returns: url for getting the transcript | ||
| """ | ||
| params = [] | ||
| if group_channels_by is not None: | ||
| params.append('group_channels_by={}'.format(group_channels_by)) | ||
| if group_channels_threshold_ms is not None: | ||
| params.append('group_channels_threshold_ms={}'.format(group_channels_threshold_ms)) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dmtrrk i think you were saying you have doubts about this, i thiunk this is right, we are dealing with these two parameters independently |
||
|
|
||
| query = '?{}'.format('&'.join(params)) | ||
| return urljoin(self.base_url, 'jobs/{}/transcript{}'.format(id_, query)) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """Enum for group_channels_by types""" | ||
|
|
||
| from enum import Enum | ||
|
|
||
|
|
||
| class GroupChannelsType(str, Enum): | ||
| SPEAKER = 'speaker' | ||
| SENTENCE = 'sentence' | ||
| WORD = 'word' | ||
|
|
||
| @classmethod | ||
| def from_string(cls, status): | ||
| return cls[status.upper()] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would use type hints if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see we use type hints in this code. I consider this to be python 2.x compatibility