Token Usage Metrics and Streaming Output #3445

AN0DA · 2024-06-20T11:01:03Z

AN0DA
Jun 20, 2024

Hi, does anyone have any ideas on how to achieve this?

I need to collect token usage metrics and send them using both standard and streaming endpoints. For reference, my code looks like this (all code snippets are parts of a larger class):

class PFClientSingleton:
    _instance = None

    def __new__(cls) -> PFClient:
        if cls._instance is None:
            cls._instance = super(PFClientSingleton, cls).__new__(cls)
            cls._instance._initialized = False
        return cls._instance

    def __init__(self) -> None:
        if not self._initialized:
            self.pf = PFClient()
            self.flow = load_flow("./promptflow")
            self.setup_connections()
            self._initialized = True

    async def run(self, inputs: dict) -> dict:
        self.flow.context.streaming = False
        return self.flow(**inputs)

    async def run_stream(self, inputs: dict) -> Generator[str, None, None]:
        self.flow.context.streaming = True
        result = self.flow(**inputs)

        event = {"message": ""}
        try:
            for r in result["answer"]:
                event["message"] += r
                yield {"data": json.dumps(event)}
        except asyncio.CancelledError as e:
            print("Disconnected from client (via refresh/close)")

Referring to this discussion (#3352), I managed to get what I want. Although the implementation is slightly overengineered, it works great and looks like this:

class PFClientSingleton:
    _instance = None

    def __new__(cls) -> PFClient:
        if cls._instance is None:
            cls._instance = super(PFClientSingleton, cls).__new__(cls)
            cls._instance._initialized = False
        return cls._instance

    def __init__(self) -> None:
        if not self._initialized:
            self.pf = PFClient()
            self.setup_connections()
            self._initialized = True

    async def run(self, inputs: dict, *, stream: bool) -> tuple[str, dict]:
        inputs_path = self._inputs_to_temp_file(inputs)
        inputs_mapping = {key: f"${{data.{key}}}" for key in inputs.keys()}

        run = self.pf.run("./promptflow", data=inputs_path, column_mapping=inputs_mapping, stream=False)

        while run.status not in ["Completed", "Bypassed", "CancelRequested", "Canceled", "Failed"]:
            await asyncio.sleep(1)
        if run.status != "Completed":
            raise HTTPException(status_code=500, detail="Run failed")

        result = self._read_output(f"{run.properties['output_path']}/outputs.jsonl")

        metrics_to_keep = ['total_tokens', 'prompt_tokens', 'completion_tokens', 'duration']
        metrics = {key: run.properties["system_metrics"][key] for key in metrics_to_keep}

        os.remove(inputs_path)
        return result, metrics

    def _inputs_to_temp_file(self, inputs: dict) -> str:
        with NamedTemporaryFile(delete=False, mode='w', encoding='utf-8') as temp_file:
            json.dump(inputs, temp_file)
            return temp_file.name

    def _read_output(self, output_path: str) -> str:
        with open(output_path, 'r') as f:
            return json.loads(f.readline())["answer"]

However, I cannot wrap my mind around capturing the stream from the run() function. I went through the promptflow library implementations for Run, Flow, and similar classes, as well as how promptflow-serve is coded, but nothing seems to work for me.

Is achieving this functionality as simple and convenient when using load_flow and Flow class unachievable when using run()? Do you have any ideas on how I can solve this case?

Thanks!

Nemryk · 2024-06-20T11:06:37Z

Nemryk
Jun 20, 2024

I also suffer from this problem. It will be helpful to achieve solution!

0 replies

D-W- · 2024-06-21T04:00:19Z

D-W-
Jun 21, 2024
Maintainer

Hi @AN0DA @Nemryk , if you want to execute your flow quickly and collect token usage. You can use tracing & flow-as-a-function. You can reference this sample. Just remember to call start_trace first.

In opened trace UI, you can find token count here:

Streaming output is also supported with flex flow, you can reference this sample.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token Usage Metrics and Streaming Output #3445

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Token Usage Metrics and Streaming Output #3445

AN0DA Jun 20, 2024

Replies: 2 comments

Nemryk Jun 20, 2024

D-W- Jun 21, 2024 Maintainer

AN0DA
Jun 20, 2024

Nemryk
Jun 20, 2024

D-W-
Jun 21, 2024
Maintainer