Skip to content

Issues with xl files generated by openpyxl #146

@ddTech

Description

@ddTech

Hi!

I have to make use of xl files as a data source that obviously are being createde by a tool called openpyxl and ran into some issues trying to accomplish this with the vfpXWorkbookXLSX-Class.

I have no idea how common the use of this openpyxl tool is and if this needs to be addressed here at all.
However I want to share my findings with You in case someone else runs into this (these) or similar problems.

An attempt to open a openpyxl xlsx file with vfpXWorkbookXLSX::openXLSXWorkbook() silently fails. It runs through without complaint but.GetLastRowNumber(1,1)returns 0. A closer look at the tables shows that XL_CELLS has no records.

Excel is able to open the file.
The good news is: once the the file has been saved with Excel, it can be used with vfpXWorkbookXLSX as well.
So, if manual steps are necessary and You have Excel at hand, just make sure You open and save the file with Excel once You got it. Then You're good to go.

However, as this might not always be possible and I was interested in finding the culprit, I started to track what's going on.
I found that they are not consistently unsing paths in xl\_rels\workbook.xml.rels.
worksheets start with /xl/worksheets/sheet... instead of worksheets/sheet...
Added to lcWbXmlPath this makes the if ADIR(laTemp, lcFile) > 0 fail and completely skip the file.

So I added a simple path check to openXLSXWorkbook()

	DO WHILE !EMPTY(lcRelation)
		lcRelId    = this.GetNodeAttributeValue(lcRelation, "Id")
		lnRelId    = CAST(SUBSTR(lcRelId, 4) AS I)
		lcTarget   = this.GetNodeAttributeValue(lcRelation, "Target")
		
		*-- DD changed on 27.10.25
		*-- Fix inconsistent path on files generated by openpyxl
		*-- They use 
		*--		Target="/xl/worksheets/sheet1.xml" 
		*-- instead of 
		*-- 	Target="worksheets/sheet1.xml"
		*-- which leads to ...\xl\/xl/worksheets/... 
		*-- -> ADIR(laTemp...) = 0 -> sheet ignored
		if left(lower(m.lcTarget), 4) == "/xl/"
			m.lcTarget = substr(m.lcTarget, 5)
		endif

		lcRelType  = this.GetNodeAttributeValue(lcRelation, "Type")
		lcRelType  = SUBSTR(lcRelType, RATC("/", lcRelType)+1)
		lcFileName = lcWbXmlPath + lcTarget
		IF ADIR(laTemp, lcFileName) > 0
			DO CASE

That solves the problem and the XLSX File can be used directly

That was problem no. 1

Preparing a cursor with the data from the XLSX I realized that now all empty cells were filled with a string that happens to be the first caption (i.e. cell A1).
I found that in XL_CELLS these cells had a CellValue of "StringValue" were of DataType "C" and had a StringID of 0. As the StringID of cell A1 is 0 they all were filled with this value.
In the Excel-saved version these where empty CellValues of DataType "X".

So I had a closer look at sheet1.xml and found that empty cells look like <c r="F2" s="2" t="s"></c>
No v but a t attribute.
In the Excel-saved version there is also no 'v' attribute but it's lacking the 't' attribute as well (<c r="F2" s="2"/>).
The t attribute is being evaluated in .readCellValueFormat() and with a value of "s" it results in a DataType CHAR that is expected to also come with a v holding the StringNdx, which here is not the case. Thus the default index of 0 is being returned.

I was not sure about the lcCellType == "str" so I added a one liner with with an empty() check on lcCellType =="s"

	DO CASE
		CASE lcCellType == "f"
			loCell.DataType = DATA_TYPE_FORMULA
			
		*-- DD changed on 27.10.25
		*-- type "s" but no "v" attribute 
		*-- like provided in openpyxl with empty cells.
		*-- something like 
		*-- ... <c r="F2" s="2" t="s"></c> ...
		*-- instead of 
		*-- ... <c r="F2" s="2"/> ... 
		*-- as would be found in files saved by Excel
		CASE m.lcCellType == "s" and empty(m.loCell.CellVal)
			*-- keep empty DATA_TYPE_NONE

		CASE lcCellType == "s" .OR. lcCellType == "str"
			loCell.StringNdx = CAST(loCell.CellVal AS I)
			loCell.CellVal   = "StringValue"
			loCell.DataType  = DATA_TYPE_CHAR
	
		CASE lcCellType == "inlineStr"

I guess, modifying the original line to CASE (lcCellType == "s" .OR. lcCellType == "str") AND ! empty(m.loCell.CellVal) should work well in this case.

Problem no. 2 solved as well.

Looking good so far. But then I ran into a "data type error".
A cell of type date was identified as integer in XL_CELLS. Excel sees it as date, I found the corresponding definition in styles.xml and XL_CELLXFS and XL_NUMFMTS have them as well.
It took me a while to understand what's wrong in this case. And there is actually nothing wrong per se. The problem here lies in the order of operation.

In the xl\_rels\workbook.xml.rels of the openpyxl generated file worksheets appear on top of styles and themes. So when the sheets are being processed, styles are not yet available.
If I manually reorder the entries everything is fine.

So in order to solve this, my first idea was to order the entries by rId in a descending order, but I found XLSX files created with another tool (Apache POI) where the entries have the correct order and have ascending rId's. So that would compromize these applications.

I have no solution to that yet. I guess instead of just iterating through the .rels file one has to check and pick the entries in a defined order. But I am not familiar with the matter and don't know much about the dependencies of the individual modules.

My current workaround here is to leave the .rels file untouched and to check for integer cells that should be of type date
Something like

	m.lvValue = m.loXL.GetCellValue(1, 1, m.lnR, m.laColDefs[m.lnI,1])    

	...

	if m.laColDefs[m.lnI,3] = "D" and vartype(m.lvValue) = "N" 
		*-- Kommt ein Datum als Integer?
		m.lvValue = {^1899-12-30} + m.lvValue			
	endif 

Not a true solution, but works for my current needs.

I have not looked at / tried the openXLSXWorkbookEx() and other ...Ex() variants yet.

I hope this is useful

Kind regards from Berlin

Frank

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions