CSV Parser method

Share business templates, ideas, experiences, etc with fellow Servoy developers here

CSV Parser method

Postby mattfrizzell » Fri Jan 08, 2010 2:54 am

Hey folks. I always cringe about importing csv files from some outside source and typically I cry about it for 15 seconds and then open it up in excel and save it to xml or some other format that I trust just so I don't have to deal with commas, quotes, and carriage returns. Of course today I had a situation where I couldn't just do that (silly clients), so I broke down and used those 15 seconds to whip up a csv parser. While there is no real "standard" for csv documents this parser will handle embedded characters such as strings with quotes, commas, and returns inside of them, rows that end with carriage returns, new lines, or a combination of the two, and it even gives you the choice of stripping out the white spaces or not. I was going to do this with regexes but then I remembered how terrible I am with them (also, this ends up being faster than some good regexes some other folks suggested). Code is below and if interested, I based my csv standards on the rules found here http://en.wikipedia.org/wiki/Comma-separated_values


Code: Select all
/********************************************************************************
   @method: parseCSV(sData, bTrim)
   
   @arg sData : [string] The csv data you want to parse
   @opt bTrim : [boolean] optional argument used if you want to trim the white spaces off the fields
   
   @return aParsedData : [array] multi-dimensional array of rows and columns

   @description : Takes a string and applies csv rules to it to form an array of rows and columns.
               As there aren't hard standards for csv I applied the rules from the wiki http://en.wikipedia.org/wiki/Comma-separated_values
   
   @note:
   
   @history:   01/07/2010   MSF   Created parseCSV method
********************************************************************************/
var sData;
var bTrim = false;

var aParsedData;
var aRow;
var aSearch;
var bInQuotes;
var bSpecial;
var sColumn;


// If no string to parse was passed there is no reason to run this method so return
if (!arguments[0])
{
   return;
}
sData = arguments[0];

// If column data should be trimmed set the marker
if (arguments[1])
{
   bTrim = true;
}

// Set the default character search array made up of the special characters that csv has to deal with
aSearch = ['"', ",", "\n", "\r"];

// Initialize the return array, the first column, and the first row array
aParsedData = new Array();
sColumn = "";
aRow = new Array();

// Loop through the characters in the string
for (var i = 0; i < sData.length; i++)
{
   // Reset the special character marker to false
   bSpecial = false;
   
   // Loop through the current special character search array
   for (var s = 0; s < aSearch.length; s++)
   {
      // And if we hit a speical character note that
      if ( sData[i] == aSearch[s])
      {
         bSpecial = true;
         break;
      }
   }
   
   // If this is a special character that we are currently looking for it process it separately
   if ( bSpecial )
   {
      // Switch on the current character (we can only get to commas, carriage returns, and line feeds if we aren't in double-quotes which is take care of by the aSearch array)
      switch (sData[i])
      {
         // If the character is a comma then we know this is the end of a column of data so push it to the row and reinitialize the column string
         case ",":
            if (bTrim)
            {
               sColumn = utils.stringTrim(sColumn);
            }
            aRow.push(sColumn);
            sColumn = "";
            break;
         
         // If this is a carriage return then it is the end of a row so push the column to the row and then push the row to the ParsedData array
         case "\r":
            if (bTrim)
            {
               sColumn = utils.stringTrim(sColumn);
            }
            aRow.push(sColumn);
            aParsedData.push(aRow);
            aRow = new Array();
            sColumn = "";
            
            // If the next character happens to be a new line then skip it because this is a csv whose rows end with \r\n
            if ( sData[i+1] && "\n" == sData[i+1] )
            {
               i++;
            }
            break;
      

         // If this is a new line then it is the end of a row so push the column to the row and then push the row to the ParsedData array
         case "\n":
            if (bTrim)
            {
               sColumn = utils.stringTrim(sColumn);
            }
            aRow.push(sColumn);
            aParsedData.push(aRow);
            aRow = new Array();
            sColumn = "";
            break;      
         
         // Quotes are highly special, read on
         case '"':
            // Check to see if we are already inside a set of double-quotes
            if ( bInQuotes )
            {
               // If we are inside double-quotes, are currently on a quote, AND the next character is a quote it means we have found an escaped quote that we want to allow
               // So add a single quote to the column and then skip the next character by incrementing i
               if (sData[i+1] && '"' == sData[i+1])
               {
                  sColumn += '"';
                  i++;
               }
               // If we are inside quotes but the next character is not another set of double quotes it means we have reached the quoted data so change the Search array back to our normal set and move on
               else
               {
                  aSearch = ['"', ",", "\n", "\r"];
                  bInQuotes = false;
               }
            }
            // If we are not inside a set of double-quotes, this is out first set and means that the next special character that we care about is out closing double-quotes
            else
            {
               bInQuotes = true;
               aSearch = ['"'];
            }
            break;
      }
   }
   // If this is NOT a special character just add it to the current column's data
   else
   {
      sColumn += sData[i];
   }
}

// Finalize the data by adding the last column and row to our return data
if (bTrim)
{
   sColumn = utils.stringTrim(sColumn);
}
aRow.push(sColumn);
aParsedData.push(aRow);

// Enjoy your data
return aParsedData;
Matt Frizzell
adBlocks
www.adblocks.com
mattfrizzell
 
Posts: 51
Joined: Mon Aug 21, 2006 4:00 pm

Re: CSV Parser method

Postby lwjwillemsen » Fri Jan 08, 2010 10:51 am

Hi Matt,

Thanks a lot ! Much appreciated !

For my own use I added a third optional parameter columnDelimiter because we work a lot with ; (Excel CSV, Europe)

Regards,
Lambert Willemsen
Vision Development BV
lwjwillemsen
 
Posts: 680
Joined: Sat Mar 14, 2009 5:39 pm
Location: The Netherlands

Re: CSV Parser method

Postby bobcusick » Mon Jan 11, 2010 5:51 pm

Matt,

You da' man! That's a really nice, compact piece of code! Thanks very much for sharing it! :D
User avatar
bobcusick
 
Posts: 126
Joined: Mon Jan 12, 2009 9:13 pm

Re: CSV Parser method

Postby mattfrizzell » Mon Jan 11, 2010 6:27 pm

Glad to hear some folks can use it. I might add a couple more options soon but we shall see. Have a good week all.
Matt Frizzell
adBlocks
www.adblocks.com
mattfrizzell
 
Posts: 51
Joined: Mon Aug 21, 2006 4:00 pm

Re: CSV Parser method

Postby Harjo » Mon Jan 11, 2010 6:54 pm

Thanks Matt!
really helpfull! :-)
Harjo Kompagnie
ServoyCamp
Servoy Certified Developer
Servoy Valued Professional
SAN Developer
Harjo
 
Posts: 4321
Joined: Fri Apr 25, 2003 11:42 pm
Location: DEN HAM OV, The Netherlands

Re: CSV Parser method

Postby Thomas Parry » Wed Jan 13, 2010 2:00 pm

Just as I was about to do a very similar task (for the nth time) this comes along "just in time". :D
Thanks - it even gets rid of the double quotes that I was going to have to get rid of as well.
Tom Parry
Prospect IT
Java/C++/Servoy/Jasper Reports/Simulation/Service Applications
http://www.prospect-saas.biz
Thomas Parry
 
Posts: 498
Joined: Thu Jan 10, 2008 8:48 pm
Location: Ottawa, Canada

Re: CSV Parser method

Postby LOGIsoft » Thu Jan 21, 2010 5:48 pm

Hello everyone,

I've been playing around with Matt's method, and I've adapted it to specify a separator character (as per my module). Here is the modified code, which will be included in my module:

Code: Select all
function parseCSV()
{
   /*
   @method: parseCSV(sData, bTrim, sSeparator)

   @arg sData : [string] The csv data you want to parse
   @opt bTrim : [boolean] optional argument used if you want to trim the white spaces off the fields
   @arg sSeparator : [string] The separator character for columns delimitation

   @return aParsedData : [array] multi-dimensional array of rows and columns

   @description : Takes a string and applies csv rules to it to form an array of rows and columns.
               As there aren't hard standards for csv I applied the rules from the wiki http://en.wikipedia.org/wiki/Comma-separated_values

   @note:   Original by Matt Frizzell of adBlocks, modified by Ben Savignac of LOGIsoft

   @history:   01/07/2010   MSF   Created parseCSV method
               01/21/2010   BRS   Modified parseCSV method
    ********************************************************************************/
   var sData;
   var bTrim = false;

   var aParsedData;
   var aRow;
   var aSearchTemplate;
   var aSearch;
   var bInQuotes;
   var bSpecial;
   var sColumn;
   var sSeparator;


   // If no string to parse was passed there is no reason to run this method so return
   if (!arguments[0])
   {
      return;
   }
   sData = arguments[0];

   // If column data should be trimmed set the marker
   if (arguments[1])
   {
      bTrim = true;
   }

   // If a separator has been defined, set it
   if (arguments[2])
   {
      sSeparator = arguments[2];
      aSearchTemplate = ['"', "\n", "\r", sSeparator];
   } else {
      aSearchTemplate = ['"', "\n", "\r"];
   }
   
   // Set the default character search array made up of the special characters that csv has to deal with
   aSearch = aSearchTemplate.slice(0);

   // Initialize the return array, the first column, and the first row array
   aParsedData = new Array();
   sColumn = "";
   aRow = new Array();

   // Loop through the characters in the string
   for (var i = 0; i < sData.length; i++)
   {
      // Reset the special character marker to false
      bSpecial = false;

      // Loop through the current special character search array
      for (var s = 0; s < aSearch.length; s++)
      {
         // And if we hit a speical character note that
         if ( sData[i] == aSearch[s])
         {
            bSpecial = true;
            break;
         }
      }

      // If this is a special character that we are currently looking for it process it separately
      if ( bSpecial )
      {
         // Switch on the current character (we can only get to separator characters, carriage returns, and line feeds if we aren't in double-quotes which is take care of by the aSearch array)
         switch (sData[i])
         {
         // If the character is the separator OUTSIDE DOUBLE QUOTES then we know this is the end of a column of data, so push it to the row and reinitialize the column string
         case sSeparator:
         if ( !bInQuotes )
         {
            // If we are not inside double-quotes, are currently on the separator, AND the next character is a quote,
            // then we know this is the end of a column of data so push it to the row and reinitialize the column string
            if (sData[i+1] && '"' == sData[i+1])
            {
               if (bTrim)
               {
                  sColumn = utils.stringTrim(sColumn);
               }
               aRow.push(sColumn);
               sColumn = "";
               aSearch = aSearchTemplate.slice(0);
            }
            // If we are not inside quotes but the next character is not another set of double quotes it means this is not really a separator and we just add the character to the data
            else
            {
               sColumn += sData[i];
               aSearch = aSearchTemplate.slice(0);
            }
         }
         // If we are inside a set of double-quotes, this is not really a separator and we just add the character to the data
         else
         {
            sColumn += sData[i];
            aSearch = aSearchTemplate.slice(0);
         }
         break;

         // If this is a carriage return then it is the end of a row so push the column to the row and then push the row to the ParsedData array
         case "\r":
         if (bTrim)
         {
            sColumn = utils.stringTrim(sColumn);
         }
         aRow.push(sColumn);
         aParsedData.push(aRow);
         aRow = new Array();
         sColumn = "";

         // If the next character happens to be a new line then skip it because this is a csv whose rows end with \r\n
         if ( sData[i+1] && "\n" == sData[i+1] )
         {
            i++;
         }
         break;


         // If this is a new line then it is the end of a row so push the column to the row and then push the row to the ParsedData array
         case "\n":
         if (bTrim)
         {
            sColumn = utils.stringTrim(sColumn);
         }
         aRow.push(sColumn);
         aParsedData.push(aRow);
         aRow = new Array();
         sColumn = "";
         break;     

         // Quotes are highly special, read on
         case '"':
         // Check to see if we are already inside a set of double-quotes
         if ( bInQuotes )
         {
            // If we are inside double-quotes, are currently on a quote, AND the next character is a quote it means we have found an escaped quote that we want to allow
            // So add a single quote to the column and then skip the next character by incrementing i
            if (sData[i+1] && '"' == sData[i+1])
            {
               sColumn += '"';
               i++;
            }
            // If we are inside quotes but the next character is not another set of double quotes it means we have reached the quoted data so change the Search array back to our normal set and move on
            else
            {
               aSearch = aSearchTemplate.slice(0);
               bInQuotes = false;
            }
         }
         // If we are not inside a set of double-quotes, this is our first set and means that the next special characters that we care about are closing double-quotes and the separator character
         else
         {
            bInQuotes = true;
            aSearch = ['"', sSeparator];
         }
         break;
         }
      }
      // If this is NOT a special character just add it to the current column's data
      else
      {
         sColumn += sData[i];
      }
   }

   // Finalize the data by adding the last column and row to our return data
   if (bTrim)
   {
      sColumn = utils.stringTrim(sColumn);
   }
   aRow.push(sColumn);
   aParsedData.push(aRow);

   // Enjoy your data
   return aParsedData;
}


This is run using the following method:

Code: Select all
function btn_ParseCSV()
{
   var target = plugins.file.showFileOpenDialog(1);  // Get the file to import from
   if ( target.isFile() )  // Check if we selected a file or a folder
   {
      var targetFullPath = target.getAbsolutePath();  // Obtain the full file path
      var fileContents = application.readTXTFile(targetFullPath);
      var sSeparator = ";";  // Change to the separator character used in the definition
      var result = globals.parseCSV(fileContents, false, sSeparator);
   }
}


Enjoy!

Ben
Benoit R. Savignac, president
LOGIsoft Consulting inc.
--------------------------------------------------------
Servoy Alliance Network Partner
User avatar
LOGIsoft
 
Posts: 260
Joined: Wed May 24, 2006 3:06 pm
Location: St-Basile-le-Grand (Montreal), Qc, Canada

Re: CSV Parser method

Postby Kahuna » Thu Jan 21, 2010 6:03 pm

Excellent stuff Ben - really looking forward to seeing this in your module.

Best of both worlds!
(Servoy Version: 6.0.7 Win XP / 7 - SQL Server 2008 R2)
Ian Cordingley (Kahuna)
Kahuna
 
Posts: 1235
Joined: Thu Oct 26, 2006 1:39 am
Location: 1/2 NE UK 1/2 Olvera Spain

Re: CSV Parser method

Postby lwjwillemsen » Sat Jan 23, 2010 3:18 pm

This is run using the following method:

Code: Select all
function btn_ParseCSV()
{
   var target = plugins.file.showFileOpenDialog(1);  // Get the file to import from
   if ( target.isFile() )  // Check if we selected a file or a folder
   {
      var targetFullPath = target.getAbsolutePath();  // Obtain the full file path
      var fileContents = application.readTXTFile(targetFullPath);
      var sSeparator = ";";  // Change to the separator character used in the definition
      var result = globals.parseCSV(fileContents, false, sSeparator);
   }
}


Enjoy!

Ben


Hi Ben,

What are your performance/memory experiences with large csv-files ? Is it than necessary to read the file line by line and process it line by line (if possible in Servoy ?) ?

Lambert.
Lambert Willemsen
Vision Development BV
lwjwillemsen
 
Posts: 680
Joined: Sat Mar 14, 2009 5:39 pm
Location: The Netherlands

Re: CSV Parser method

Postby rogel » Tue Sep 20, 2011 8:41 am

mattfrizzell wrote:Hey folks. I always cringe about importing csv files from some outside source and typically I cry about it for 15 seconds and then open it up in excel and save it to xml or some other format that I trust just so I don't have to deal with commas, quotes, and carriage returns. Of course today I had a situation where I couldn't just do that (silly clients), so I broke down and used those 15 seconds to whip up a csv parser. While there is no real "standard" for csv documents this parser will handle embedded characters such as strings with quotes, commas, and returns inside of them, rows that end with carriage returns, new lines, or a combination of the two, and it even gives you the choice of stripping out the white spaces or not. I was going to do this with regexes but then I remembered how terrible I am with them (also, this ends up being faster than some good regexes some other folks suggested). Code is below and if interested, I based my csv standards on the rules found here http://en.wikipedia.org/wiki/Comma-separated_values


Code: Select all
/********************************************************************************
   @method: parseCSV(sData, bTrim)
   
   @arg sData : [string] The csv data you want to parse
   @opt bTrim : [boolean] optional argument used if you want to trim the white spaces off the fields
   
   @return aParsedData : [array] multi-dimensional array of rows and columns

   @description : Takes a string and applies csv rules to it to form an array of rows and columns.
               As there aren't hard standards for csv I applied the rules from the wiki http://en.wikipedia.org/wiki/Comma-separated_values
   
   @note:
   
   @history:   01/07/2010   MSF   Created parseCSV method
********************************************************************************/
var sData;
var bTrim = false;

var aParsedData;
var aRow;
var aSearch;
var bInQuotes;
var bSpecial;
var sColumn;


// If no string to parse was passed there is no reason to run this method so return
if (!arguments[0])
{
   return;
}
sData = arguments[0];

// If column data should be trimmed set the marker
if (arguments[1])
{
   bTrim = true;
}

// Set the default character search array made up of the special characters that csv has to deal with
aSearch = ['"', ",", "\n", "\r"];

// Initialize the return array, the first column, and the first row array
aParsedData = new Array();
sColumn = "";
aRow = new Array();

// Loop through the characters in the string
for (var i = 0; i < sData.length; i++)
{
   // Reset the special character marker to false
   bSpecial = false;
   
   // Loop through the current special character search array
   for (var s = 0; s < aSearch.length; s++)
   {
      // And if we hit a speical character note that
      if ( sData[i] == aSearch[s])
      {
         bSpecial = true;
         break;
      }
   }
   
   // If this is a special character that we are currently looking for it process it separately
   if ( bSpecial )
   {
      // Switch on the current character (we can only get to commas, carriage returns, and line feeds if we aren't in double-quotes which is take care of by the aSearch array)
      switch (sData[i])
      {
         // If the character is a comma then we know this is the end of a column of data so push it to the row and reinitialize the column string
         case ",":
            if (bTrim)
            {
               sColumn = utils.stringTrim(sColumn);
            }
            aRow.push(sColumn);
            sColumn = "";
            break;
         
         // If this is a carriage return then it is the end of a row so push the column to the row and then push the row to the ParsedData array
         case "\r":
            if (bTrim)
            {
               sColumn = utils.stringTrim(sColumn);
            }
            aRow.push(sColumn);
            aParsedData.push(aRow);
            aRow = new Array();
            sColumn = "";
            
            // If the next character happens to be a new line then skip it because this is a csv whose rows end with \r\n
            if ( sData[i+1] && "\n" == sData[i+1] )
            {
               i++;
            }
            break;
      

         // If this is a new line then it is the end of a row so push the column to the row and then push the row to the ParsedData array
         case "\n":
            if (bTrim)
            {
               sColumn = utils.stringTrim(sColumn);
            }
            aRow.push(sColumn);
            aParsedData.push(aRow);
            aRow = new Array();
            sColumn = "";
            break;      
         
         // Quotes are highly special, read on
         case '"':
            // Check to see if we are already inside a set of double-quotes
            if ( bInQuotes )
            {
               // If we are inside double-quotes, are currently on a quote, AND the next character is a quote it means we have found an escaped quote that we want to allow
               // So add a single quote to the column and then skip the next character by incrementing i
               if (sData[i+1] && '"' == sData[i+1])
               {
                  sColumn += '"';
                  i++;
               }
               // If we are inside quotes but the next character is not another set of double quotes it means we have reached the quoted data so change the Search array back to our normal set and move on
               else
               {
                  aSearch = ['"', ",", "\n", "\r"];
                  bInQuotes = false;
               }
            }
            // If we are not inside a set of double-quotes, this is out first set and means that the next special character that we care about is out closing double-quotes
            else
            {
               bInQuotes = true;
               aSearch = ['"'];
            }
            break;
      }
   }
   // If this is NOT a special character just add it to the current column's data
   else
   {
      sColumn += sData[i];
   }
}

// Finalize the data by adding the last column and row to our return data
if (bTrim)
{
   sColumn = utils.stringTrim(sColumn);
}
aRow.push(sColumn);
aParsedData.push(aRow);

// Enjoy your data
return aParsedData;


Hi!

Thanks for sharing.

I am experiencing a slowdown when parsing large csv files. Would you like to share how to manage this?

Thanks and Regards.
rogel
 
Posts: 264
Joined: Mon Jul 04, 2011 9:09 am

Re: CSV Parser method

Postby mboegem » Tue Sep 20, 2011 9:11 am

rogel wrote:I am experiencing a slowdown when parsing large csv files. Would you like to share how to manage this?


Probably because the data is build in memory.
Try writing each line directly to a file, see: http://forum.servoy.com/viewtopic.php?f=22&t=13866&p=72648&hilit=java.io.filewriter#p72648
This improved the speed of my exports very much!

Hope this helps.
Marc Boegem
Solutiative / JBS Group, Partner
• Servoy Certified Developer
• Servoy Valued Professional
• Freelance Developer

Image

Partner of Tower - The most powerful Git client for Mac and Windows
User avatar
mboegem
 
Posts: 1742
Joined: Sun Oct 14, 2007 1:34 pm
Location: Amsterdam

Re: CSV Parser method

Postby rogel » Wed Sep 21, 2011 5:14 pm

mboegem wrote:
rogel wrote:I am experiencing a slowdown when parsing large csv files. Would you like to share how to manage this?


Probably because the data is build in memory.
Try writing each line directly to a file, see: http://forum.servoy.com/viewtopic.php?f=22&t=13866&p=72648&hilit=java.io.filewriter#p72648
This improved the speed of my exports very much!

Hope this helps.

Thanks Marc!

I did use bufferedreader to read the file.
rogel
 
Posts: 264
Joined: Mon Jul 04, 2011 9:09 am

Re: CSV Parser method

Postby lwjwillemsen » Wed Sep 21, 2011 6:20 pm

Marc talked about fileWrite() and not fileRead()...

What do you do with the import data ? Collect in memory ? If so then there will be a slow down with large csv files.
If you write the import data to a Servoy foundset then it's up to Servoy to do it the speedy way (balance speed / memory).

We are doing some testing this moment because we see that the foundset filling speed (rows added per second) drops at a certain (memory ?) moment. We save the data during the import to the database at a regular interval of 500 records. (Yes, autosave is off)

Unfortunately I cannot look into the black box called JSFoundSet so I don't know what happens under the hood (foundset memory management)...

Regards,
Lambert Willemsen
Vision Development BV
lwjwillemsen
 
Posts: 680
Joined: Sat Mar 14, 2009 5:39 pm
Location: The Netherlands

Re: CSV Parser method

Postby rogel » Thu Sep 22, 2011 8:31 am

lwjwillemsen wrote:Marc talked about fileWrite() and not fileRead()...

What do you do with the import data ? Collect in memory ? If so then there will be a slow down with large csv files.
If you write the import data to a Servoy foundset then it's up to Servoy to do it the speedy way (balance speed / memory).

We are doing some testing this moment because we see that the foundset filling speed (rows added per second) drops at a certain (memory ?) moment. We save the data during the import to the database at a regular interval of 500 records. (Yes, autosave is off)

Unfortunately I cannot look into the black box called JSFoundSet so I don't know what happens under the hood (foundset memory management)...

Regards,


Sorry about that. But I will use the filewrite in my backup database utility. what i do with the import data(csv) is i create an xml file. i have autosave(false) as well. i do create record in the foundset one at a time and save it right away before going to the next record in the xml.

hope this helps.
rogel
 
Posts: 264
Joined: Mon Jul 04, 2011 9:09 am

Re: CSV Parser method

Postby lwjwillemsen » Thu Sep 22, 2011 1:12 pm

You could run a test without filling the foundset and look what happens with regard to the time consumption...

Regards,
Lambert Willemsen
Vision Development BV
lwjwillemsen
 
Posts: 680
Joined: Sat Mar 14, 2009 5:39 pm
Location: The Netherlands

Next

Return to Sharing Central

Who is online

Users browsing this forum: No registered users and 6 guests