htm to array converter

has anyone done a html table converter to array or csv?
I have an basic html page containing a single table, what would the quickest way to get data as from a table?
excel can’t be used in between as this must be completely handled within servoy.
my 1st idea is to use an external exe accepting arguments from the command line but this requires the utility to be installed on client machine, not handy.

If the html page you’re going to get the table from is always going to be in the same format, it should be fairly simple to parse out the table data.

You can read the file in with plugins.file.readTXTFile() and get the entire document as a string.

From there you should be able to use either regular expressions (e.g. Regular expression for html Table Parsing | ali raza) or using the split function for strings and break it apart that way

has anyone done a html table converter to array or csv?

Is the HTML table valid XML/XHTML?

I thought about that, but this is just what I wanted to avoid, parsing all tags to get data…
and no it’s not an xml file, just a basic html

with the classic body> around it.
at least xml would have been easier to read.
its amount of columns is constant, only the amount of rows varies.

no it’s not an xml file, just a basic html

with the classic body> around it.
at least xml would have been easier to read.

Well, it could still be valid XML really (for the purposes of what you need). If I were you I would try to get an E4X object from the HTML itself. See below…

var obj = 
<html>
<body>
<table>
<tr>
<td>1</td>
</tr>
</table>
</body>
</html>;

obj is now an e4x object (typeof(obj == ‘xml’)), and with that you can do what you need to w/o parsing anything; using the e4x methods that are provided within Servoy’s JSLib.

e4x in jslib? you mean XMLList methods or something new I haven’t seen?
to me e4x is a JS extension available in some browsers, but I never played with it.

See link below to Servoy Wiki regarding XML/E4X.

http://wiki.servoy.com/display/public/DOCS/XML

not lucky.
I have tried to make an xml object from the html page, but I get xml parsing error upon object creation.
on top I noticed a few illegal things in it, missing tags, non coherent charset, a mix of windows 1251 and iso8859…
I have tried to correct manually to see where I could go, but endless.
If I make a file myself, it works, but that is not the goal!
I have tried a few things but as I am totally ignorant of the xml structure terms, I need to start from scratch with the xml world. so that means 3 days…
not worth it for what I want to do. I just saw that children would list tags which I then would convert to rows and fields, but that’s not probably enough.
too bad
thanks anyway for trying