excel xml issue, but could textpad come to the rescue?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
billyclark
Posts: 3
Joined: Wed May 21, 2008 6:59 pm

excel xml issue, but could textpad come to the rescue?

Post by billyclark »

ok team, i have an issue that textpad could become a hero on if it's possible to help fix the decimal issue that excel creates.

the problem....
an excel xml file has this number in a cell .53, but when you go look at the xml the number is .53297777777777777

so...
could someone tell me the regular expression for any number that has a decimal in it, then has more than 5 chars after the decimal place until the </Data> tag begins

then...
could it look at the 3rd number after the decimal and figure out if the 2nd number needs to round up or not if the value is 5 or more.

any ideas on how to do this and/or if it's possible? if not textpad, then what would you recommend?

here's a line straight from the .xml file
<Cell><Data ss:Type="Number">7.4999999999999997E-2</Data>
gan

Post by gan »

I'm not sure how to round up/down using regular expression. I cannot see how you could do that, but maybe someone else have a smart solution for that using regular expression.

If it's ok to just cut off the end you could do it like the example below:

Search for:

Code: Select all

("Number">[0-9]+\.[0-9][0-9])[0-9].*(</Data>)
And replace with this:

Code: Select all

\1\2
This should remove everything starting with the third decimal until </data> so your example line after the replace should then be:

Code: Select all

<Cell><Data ss:Type="Number">7.49</Data>
But i'm sure this could be done much easier using Excel to be able to round and not just cut the end. I'm not sure if i understand....the number is shown with 2 decimals in excel, but when saved as xml it might be more then 2 decimals? In that case i'm pretty sure that's because the value of the cell actually contains more then 2 decimals, but you have selected to only show 2. If you select the cell you should see the real value in the field above the sheet. You could use the ROUND function in excel before the export which should solve the whole problem. You also have a setting in excel that says "use precision as displayed" that will use the value shown. So in case you set the format to 2 decimals that's what will be used, but this function does not round up/down.

I don't think Excel create this issue, but rather export whatever the cell contains. My recommendation is to fix this using excel which i'm sure is pretty straight forward.
billyclark
Posts: 3
Joined: Wed May 21, 2008 6:59 pm

here's the issue in more detail

Post by billyclark »

http://www.codeproject.com/KB/XML/ooxml ... ctive.aspx

Since we have had several cases logged for this, I wanted to send out a note to everyone. If you are experiencing rounding issues when you are loading data, unfortunately, this an Excel bug, not a Workday issue. The only workaround we have is to edit the data in Notepad. Keep in mind it is only an issue in certain cases. Here is an excerpt from a website with additional details:

Entered versus stored values

We all take for granted that when we type a value such as 1234.1234 in a cell of a spreadsheet, that's what actually gets stored. Excel has this auto-number format matching capability where it tries to make sense of what is manually entered in order to deduce if that's a string, a number, a boolean or a date and applies a number format accordingly, but what's being stored as a value is what is entered. By the way, if you hit Alt+F11 in Excel, and enter something like Range("C3").Value and run the macro, you'll get the entered value in cell C3 ; if you enter something like Range("C3").Text, you'll get the formatted value in cell C3, where the number format has been applied to the value. Note that the return value takes advantage of the locale and number formatting which means you may get "1234,1234" (note the comma) instead of "1234.1234".

Is this storage neutrality true with the new formats?

To reproduce the scenario :

* start Excel 2007 and create a new spreadsheet
* type value 123456.123456
* resize the column manually so that the value entirely shows
* hit return and type 12345.12345
* hit return and type 1234.1234
* hit return and type 123.123
* hit return and type 12.12
* save the spreadsheet (xlsx file)
* close it, unzip it

Here is a screenshot of what you should see at this point :

http://www.codeproject.com/KB/XML/ooxml ... esign3.gif
Typing a few numeric values in Excel 2007

The corresponding XML in the main part xl/worksheets/sheet1.xml is :

http://www.codeproject.com/images/minus.gifCollapse

<sheetData>

<row r="2" spans="4:4">

<c r="D2">

<v>123456.123456</v>

</c>

</row>

<row r="3" spans="4:4">

<c r="D3">

<v>12345.123449999999</v>

</c>

</row>

<row r="4" spans="4:4">

<c r="D4">

<v>1234.1233999999999</v>

</c>

</row>

<row r="5" spans="4:4">

<c r="D5">

<v>123.123</v>

</c>

</row>

<row r="6" spans="4:4">

<c r="D6">

<v>12.12</v>

</c>

</row>

</sheetData>

The problem is that Excel 2007 does not store what we entered. If we read the XML, we are going to grab numbers that have rounding errors compared to the actual numbers we typed. Let's see how far the problem goes :

Entered value


Stored value


Rounding error

123456.123456


123456.123456


0

12345.12345


12345.123449999999


o(1e-5)

1234.1234


1234.1233999999999


o(1e-4)

123.123


123.123


0

12.12


12.12


0

Not only there is a rounding error, but its order of magnitude changes depending on the value. Ironically enough, if you entered 4321.4321, it would be stored as is, with no rounding error.

It is absolutely lost on me how implementers are expected to deal with this mess. The spreadsheet does not reflect the proper values, and you can easily see where it goes. Imagine non-Microsoft applications used in healthcare and critical systems relying on the spreadsheet data. Not only the rounding error seems arbitrary (one would have to go back and study the artefacts of IEEE floating-point values, several decades of work), but it changes. There is no way we can possibly take advantage of this, with one notable exception : if we are able to be in an execution environment for which reading those floating-point values does not produce those artefacts, and returns the proper entered values, then we are good. Problem : Microsoft does not document the execution environment. We can fairly assume its Windows, but what else? And if I am using Linux, how do I work with this?

It's important to understand that if we open the spreadsheet in Excel 2007, we see the proper values. No loss (based on the values entered) seem to have occured, the problem is that the data in XML just cannot be used as is.

As an aside, the stored value does not use the locale (it always uses the dot as decimal separator), therefore we have to assume this is all US English. If we wrote software in Excel VBA that grabs the value in cells, then processes it, there is no way we could migrate our VBA code to work with this XML part without substantial rework. We are left with Excel's own international implementation artefacts, undocumented.
gan

Post by gan »

Ok, now i understand the problem and agree it's a totally mess to deal with this. I never used the XML export and didn't know about this problem. Regarding how to use regular expression i don't have any other idea except what i already said. Maybe someone else have a good idea how to solve this using textpad. Writing a small application that could round the numbers should be pretty easy and can probably be added as an external tool in textpad.
I guess waiting for MS to fix this could take a while.
Post Reply