Servoy Forum

Posted: **Fri Jan 08, 2010 2:54 am**

Hey folks. I always cringe about importing csv files from some outside source and typically I cry about it for 15 seconds and then open it up in excel and save it to xml or some other format that I trust just so I don't have to deal with commas, quotes, and carriage returns. Of course today I had a situation where I couldn't just do that (silly clients), so I broke down and used those 15 seconds to whip up a csv parser. While there is no real "standard" for csv documents this parser will handle embedded characters such as strings with quotes, commas, and returns inside of them, rows that end with carriage returns, new lines, or a combination of the two, and it even gives you the choice of stripping out the white spaces or not. I was going to do this with regexes but then I remembered how terrible I am with them (also, this ends up being faster than some good regexes some other folks suggested). Code is below and if interested, I based my csv standards on the rules found here http://en.wikipedia.org/wiki/Comma-separated_values

Code: Select all: /******************************************************************************** @method: parseCSV(sData, bTrim) @arg sData : [string] The csv data you want to parse @opt bTrim : [boolean] optional argument used if you want to trim the white spaces off the fields @return aParsedData : [array] multi-dimensional array of rows and columns @description : Takes a string and applies csv rules to it to form an array of rows and columns. As there aren't hard standards for csv I applied the rules from the wiki http://en.wikipedia.org/wiki/Comma-separated_values @note: @history: 01/07/2010 MSF Created parseCSV method ********************************************************************************/ var sData; var bTrim = false; var aParsedData; var aRow; var aSearch; var bInQuotes; var bSpecial; var sColumn; // If no string to parse was passed there is no reason to run this method so return if (!arguments[0]) { return; } sData = arguments[0]; // If column data should be trimmed set the marker if (arguments[1]) { bTrim = true; } // Set the default character search array made up of the special characters that csv has to deal with aSearch = ['"', ",", "\n", "\r"]; // Initialize the return array, the first column, and the first row array aParsedData = new Array(); sColumn = ""; aRow = new Array(); // Loop through the characters in the string for (var i = 0; i < sData.length; i++) { // Reset the special character marker to false bSpecial = false; // Loop through the current special character search array for (var s = 0; s < aSearch.length; s++) { // And if we hit a speical character note that if ( sData[i] == aSearch[s]) { bSpecial = true; break; } } // If this is a special character that we are currently looking for it process it separately if ( bSpecial ) { // Switch on the current character (we can only get to commas, carriage returns, and line feeds if we aren't in double-quotes which is take care of by the aSearch array) switch (sData[i]) { // If the character is a comma then we know this is the end of a column of data so push it to the row and reinitialize the column string case ",": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); sColumn = ""; break; // If this is a carriage return then it is the end of a row so push the column to the row and then push the row to the ParsedData array case "\r": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); aRow = new Array(); sColumn = ""; // If the next character happens to be a new line then skip it because this is a csv whose rows end with \r\n if ( sData[i+1] && "\n" == sData[i+1] ) { i++; } break; // If this is a new line then it is the end of a row so push the column to the row and then push the row to the ParsedData array case "\n": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); aRow = new Array(); sColumn = ""; break; // Quotes are highly special, read on case '"': // Check to see if we are already inside a set of double-quotes if ( bInQuotes ) { // If we are inside double-quotes, are currently on a quote, AND the next character is a quote it means we have found an escaped quote that we want to allow // So add a single quote to the column and then skip the next character by incrementing i if (sData[i+1] && '"' == sData[i+1]) { sColumn += '"'; i++; } // If we are inside quotes but the next character is not another set of double quotes it means we have reached the quoted data so change the Search array back to our normal set and move on else { aSearch = ['"', ",", "\n", "\r"]; bInQuotes = false; } } // If we are not inside a set of double-quotes, this is out first set and means that the next special character that we care about is out closing double-quotes else { bInQuotes = true; aSearch = ['"']; } break; } } // If this is NOT a special character just add it to the current column's data else { sColumn += sData[i]; } } // Finalize the data by adding the last column and row to our return data if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); // Enjoy your data return aParsedData;

Posted: **Fri Jan 08, 2010 10:51 am**

Hi Matt,

Thanks a lot ! Much appreciated !

For my own use I added a third optional parameter columnDelimiter because we work a lot with ; (Excel CSV, Europe)

Regards,

Posted: **Mon Jan 11, 2010 5:51 pm**

Matt,

You da' man! That's a really nice, compact piece of code! Thanks very much for sharing it!

Posted: **Mon Jan 11, 2010 6:27 pm**

Glad to hear some folks can use it. I might add a couple more options soon but we shall see. Have a good week all.

Posted: **Mon Jan 11, 2010 6:54 pm**

Thanks Matt!
really helpfull! :-)

Posted: **Wed Jan 13, 2010 2:00 pm**

Just as I was about to do a very similar task (for the nth time) this comes along "just in time".

Thanks - it even gets rid of the double quotes that I was going to have to get rid of as well.

Posted: **Thu Jan 21, 2010 5:48 pm**

Hello everyone,

I've been playing around with Matt's method, and I've adapted it to specify a separator character (as per my module). Here is the modified code, which will be included in my module:

Code: Select all: function parseCSV() { /* @method: parseCSV(sData, bTrim, sSeparator) @arg sData : [string] The csv data you want to parse @opt bTrim : [boolean] optional argument used if you want to trim the white spaces off the fields @arg sSeparator : [string] The separator character for columns delimitation @return aParsedData : [array] multi-dimensional array of rows and columns @description : Takes a string and applies csv rules to it to form an array of rows and columns. As there aren't hard standards for csv I applied the rules from the wiki http://en.wikipedia.org/wiki/Comma-separated_values @note: Original by Matt Frizzell of adBlocks, modified by Ben Savignac of LOGIsoft @history: 01/07/2010 MSF Created parseCSV method 01/21/2010 BRS Modified parseCSV method ********************************************************************************/ var sData; var bTrim = false; var aParsedData; var aRow; var aSearchTemplate; var aSearch; var bInQuotes; var bSpecial; var sColumn; var sSeparator; // If no string to parse was passed there is no reason to run this method so return if (!arguments[0]) { return; } sData = arguments[0]; // If column data should be trimmed set the marker if (arguments[1]) { bTrim = true; } // If a separator has been defined, set it if (arguments[2]) { sSeparator = arguments[2]; aSearchTemplate = ['"', "\n", "\r", sSeparator]; } else { aSearchTemplate = ['"', "\n", "\r"]; } // Set the default character search array made up of the special characters that csv has to deal with aSearch = aSearchTemplate.slice(0); // Initialize the return array, the first column, and the first row array aParsedData = new Array(); sColumn = ""; aRow = new Array(); // Loop through the characters in the string for (var i = 0; i < sData.length; i++) { // Reset the special character marker to false bSpecial = false; // Loop through the current special character search array for (var s = 0; s < aSearch.length; s++) { // And if we hit a speical character note that if ( sData[i] == aSearch[s]) { bSpecial = true; break; } } // If this is a special character that we are currently looking for it process it separately if ( bSpecial ) { // Switch on the current character (we can only get to separator characters, carriage returns, and line feeds if we aren't in double-quotes which is take care of by the aSearch array) switch (sData[i]) { // If the character is the separator OUTSIDE DOUBLE QUOTES then we know this is the end of a column of data, so push it to the row and reinitialize the column string case sSeparator: if ( !bInQuotes ) { // If we are not inside double-quotes, are currently on the separator, AND the next character is a quote, // then we know this is the end of a column of data so push it to the row and reinitialize the column string if (sData[i+1] && '"' == sData[i+1]) { if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); sColumn = ""; aSearch = aSearchTemplate.slice(0); } // If we are not inside quotes but the next character is not another set of double quotes it means this is not really a separator and we just add the character to the data else { sColumn += sData[i]; aSearch = aSearchTemplate.slice(0); } } // If we are inside a set of double-quotes, this is not really a separator and we just add the character to the data else { sColumn += sData[i]; aSearch = aSearchTemplate.slice(0); } break; // If this is a carriage return then it is the end of a row so push the column to the row and then push the row to the ParsedData array case "\r": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); aRow = new Array(); sColumn = ""; // If the next character happens to be a new line then skip it because this is a csv whose rows end with \r\n if ( sData[i+1] && "\n" == sData[i+1] ) { i++; } break; // If this is a new line then it is the end of a row so push the column to the row and then push the row to the ParsedData array case "\n": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); aRow = new Array(); sColumn = ""; break; // Quotes are highly special, read on case '"': // Check to see if we are already inside a set of double-quotes if ( bInQuotes ) { // If we are inside double-quotes, are currently on a quote, AND the next character is a quote it means we have found an escaped quote that we want to allow // So add a single quote to the column and then skip the next character by incrementing i if (sData[i+1] && '"' == sData[i+1]) { sColumn += '"'; i++; } // If we are inside quotes but the next character is not another set of double quotes it means we have reached the quoted data so change the Search array back to our normal set and move on else { aSearch = aSearchTemplate.slice(0); bInQuotes = false; } } // If we are not inside a set of double-quotes, this is our first set and means that the next special characters that we care about are closing double-quotes and the separator character else { bInQuotes = true; aSearch = ['"', sSeparator]; } break; } } // If this is NOT a special character just add it to the current column's data else { sColumn += sData[i]; } } // Finalize the data by adding the last column and row to our return data if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); // Enjoy your data return aParsedData; }

This is run using the following method:

Code: Select all: function btn_ParseCSV() { var target = plugins.file.showFileOpenDialog(1); // Get the file to import from if ( target.isFile() ) // Check if we selected a file or a folder { var targetFullPath = target.getAbsolutePath(); // Obtain the full file path var fileContents = application.readTXTFile(targetFullPath); var sSeparator = ";"; // Change to the separator character used in the definition var result = globals.parseCSV(fileContents, false, sSeparator); } }

Enjoy!

Ben

Posted: **Thu Jan 21, 2010 6:03 pm**

Excellent stuff Ben - really looking forward to seeing this in your module.

Best of both worlds!

Posted: **Sat Jan 23, 2010 3:18 pm**

This is run using the following method:

Code: Select all
function btn_ParseCSV() { var target = plugins.file.showFileOpenDialog(1); // Get the file to import from if ( target.isFile() ) // Check if we selected a file or a folder { var targetFullPath = target.getAbsolutePath(); // Obtain the full file path var fileContents = application.readTXTFile(targetFullPath); var sSeparator = ";"; // Change to the separator character used in the definition var result = globals.parseCSV(fileContents, false, sSeparator); } }

Enjoy!

Ben

Hi Ben,

What are your performance/memory experiences with large csv-files ? Is it than necessary to read the file line by line and process it line by line (if possible in Servoy ?) ?

Lambert.

Posted: **Tue Sep 20, 2011 8:41 am**

mattfrizzell wrote:Hey folks. I always cringe about importing csv files from some outside source and typically I cry about it for 15 seconds and then open it up in excel and save it to xml or some other format that I trust just so I don't have to deal with commas, quotes, and carriage returns. Of course today I had a situation where I couldn't just do that (silly clients), so I broke down and used those 15 seconds to whip up a csv parser. While there is no real "standard" for csv documents this parser will handle embedded characters such as strings with quotes, commas, and returns inside of them, rows that end with carriage returns, new lines, or a combination of the two, and it even gives you the choice of stripping out the white spaces or not. I was going to do this with regexes but then I remembered how terrible I am with them (also, this ends up being faster than some good regexes some other folks suggested). Code is below and if interested, I based my csv standards on the rules found here http://en.wikipedia.org/wiki/Comma-separated_values

Code: Select all
/******************************************************************************** @method: parseCSV(sData, bTrim) @arg sData : [string] The csv data you want to parse @opt bTrim : [boolean] optional argument used if you want to trim the white spaces off the fields @return aParsedData : [array] multi-dimensional array of rows and columns @description : Takes a string and applies csv rules to it to form an array of rows and columns. As there aren't hard standards for csv I applied the rules from the wiki http://en.wikipedia.org/wiki/Comma-separated_values @note: @history: 01/07/2010 MSF Created parseCSV method ********************************************************************************/ var sData; var bTrim = false; var aParsedData; var aRow; var aSearch; var bInQuotes; var bSpecial; var sColumn; // If no string to parse was passed there is no reason to run this method so return if (!arguments[0]) { return; } sData = arguments[0]; // If column data should be trimmed set the marker if (arguments[1]) { bTrim = true; } // Set the default character search array made up of the special characters that csv has to deal with aSearch = ['"', ",", "\n", "\r"]; // Initialize the return array, the first column, and the first row array aParsedData = new Array(); sColumn = ""; aRow = new Array(); // Loop through the characters in the string for (var i = 0; i < sData.length; i++) { // Reset the special character marker to false bSpecial = false; // Loop through the current special character search array for (var s = 0; s < aSearch.length; s++) { // And if we hit a speical character note that if ( sData[i] == aSearch[s]) { bSpecial = true; break; } } // If this is a special character that we are currently looking for it process it separately if ( bSpecial ) { // Switch on the current character (we can only get to commas, carriage returns, and line feeds if we aren't in double-quotes which is take care of by the aSearch array) switch (sData[i]) { // If the character is a comma then we know this is the end of a column of data so push it to the row and reinitialize the column string case ",": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); sColumn = ""; break; // If this is a carriage return then it is the end of a row so push the column to the row and then push the row to the ParsedData array case "\r": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); aRow = new Array(); sColumn = ""; // If the next character happens to be a new line then skip it because this is a csv whose rows end with \r\n if ( sData[i+1] && "\n" == sData[i+1] ) { i++; } break; // If this is a new line then it is the end of a row so push the column to the row and then push the row to the ParsedData array case "\n": if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); aRow = new Array(); sColumn = ""; break; // Quotes are highly special, read on case '"': // Check to see if we are already inside a set of double-quotes if ( bInQuotes ) { // If we are inside double-quotes, are currently on a quote, AND the next character is a quote it means we have found an escaped quote that we want to allow // So add a single quote to the column and then skip the next character by incrementing i if (sData[i+1] && '"' == sData[i+1]) { sColumn += '"'; i++; } // If we are inside quotes but the next character is not another set of double quotes it means we have reached the quoted data so change the Search array back to our normal set and move on else { aSearch = ['"', ",", "\n", "\r"]; bInQuotes = false; } } // If we are not inside a set of double-quotes, this is out first set and means that the next special character that we care about is out closing double-quotes else { bInQuotes = true; aSearch = ['"']; } break; } } // If this is NOT a special character just add it to the current column's data else { sColumn += sData[i]; } } // Finalize the data by adding the last column and row to our return data if (bTrim) { sColumn = utils.stringTrim(sColumn); } aRow.push(sColumn); aParsedData.push(aRow); // Enjoy your data return aParsedData;

Hi!

Thanks for sharing.

I am experiencing a slowdown when parsing large csv files. Would you like to share how to manage this?

Thanks and Regards.

Posted: **Tue Sep 20, 2011 9:11 am**

rogel wrote:I am experiencing a slowdown when parsing large csv files. Would you like to share how to manage this?

Probably because the data is build in memory.
Try writing each line directly to a file, see: http://forum.servoy.com/viewtopic.php?f=22&t=13866&p=72648&hilit=java.io.filewriter#p72648
This improved the speed of my exports very much!

Hope this helps.

Posted: **Wed Sep 21, 2011 5:14 pm**

mboegem wrote:
rogel wrote:I am experiencing a slowdown when parsing large csv files. Would you like to share how to manage this?

Probably because the data is build in memory.
Try writing each line directly to a file, see: http://forum.servoy.com/viewtopic.php?f=22&t=13866&p=72648&hilit=java.io.filewriter#p72648
This improved the speed of my exports very much!

Hope this helps.

Thanks Marc!

I did use bufferedreader to read the file.

Posted: **Wed Sep 21, 2011 6:20 pm**

Marc talked about fileWrite() and not fileRead()...

What do you do with the import data ? Collect in memory ? If so then there will be a slow down with large csv files.
If you write the import data to a Servoy foundset then it's up to Servoy to do it the speedy way (balance speed / memory).

We are doing some testing this moment because we see that the foundset filling speed (rows added per second) drops at a certain (memory ?) moment. We save the data during the import to the database at a regular interval of 500 records. (Yes, autosave is off)

Unfortunately I cannot look into the black box called JSFoundSet so I don't know what happens under the hood (foundset memory management)...

Regards,

Posted: **Thu Sep 22, 2011 8:31 am**

lwjwillemsen wrote:Marc talked about fileWrite() and not fileRead()...

What do you do with the import data ? Collect in memory ? If so then there will be a slow down with large csv files.
If you write the import data to a Servoy foundset then it's up to Servoy to do it the speedy way (balance speed / memory).

We are doing some testing this moment because we see that the foundset filling speed (rows added per second) drops at a certain (memory ?) moment. We save the data during the import to the database at a regular interval of 500 records. (Yes, autosave is off)

Unfortunately I cannot look into the black box called JSFoundSet so I don't know what happens under the hood (foundset memory management)...

Regards,

Sorry about that. But I will use the filewrite in my backup database utility. what i do with the import data(csv) is i create an xml file. i have autosave(false) as well. i do create record in the foundset one at a time and save it right away before going to the next record in the xml.

hope this helps.

Posted: **Thu Sep 22, 2011 1:12 pm**

You could run a test without filling the foundset and look what happens with regard to the time consumption...

Regards,

Servoy Forum

CSV Parser method

CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method

Re: CSV Parser method