Skip to content

[Bug]: Docagent PDF ingestion fails - character issue #1731

@jk10001

Description

@jk10001

Describe the bug

I get the following error when running the code from agents_docagent.ipynb:

Data Ingestion Task Failed, Error 'charmap' codec can't encode character '\u2612' in position 138: character maps to

I've tried it with a few other pdfs, and some will process fully. Some pdfs will cause the same error but with encoding other characters e.g. '\u2264'

I'm running ag2 0.9 on Windows, Python 3.12.2.

Seems to be a similar error to issue #1167 but 'charmap' codec instead of 'ascii'. I've tried re-creating the venv from scratch (as suggested in that thread) but the same error occurs.

Steps to reproduce

No response

Model Used

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions