Sample Rexx Procedure for Parsing CSV data

Document created by Eoin Employee on May 13, 2016
Version 1Show Document
  • View in full screen mode

Accurately parsing Endevor CSV data can be challenging when some of the fields can contain free text like descriptions, comments etc.

Using the CSV LIST OPTION DELIMITER '|' can help but to be more accurate you need to look for the delimiters AND the bounding quotes.
The following two procedures work like the builtin WORD and WORDS procedures but instead of looking for space delimited words look for CSV Delimited words with the broken bar.

 

/*                                                                           

   DWords for "delimited words" is a slightly more complex version of CWords 

   (comma delimited words) it check to see if the words are also wrapped in  

   quotes in addition to a delimiter of "¦" and if so, it also stripps them. 

   This allows it to work more reliably with Endevor's CSV outpu, but it's   

   still not guaranteed against folks adding double.quote¦double.quote into  

   a text field.  Long term Endevor's CSV should probably more closely match 

   EXCEL's standard and or allow escape! when a delimiter is found in the data

                                                                             

   */                                                                        

DWords:procedure                                                             

   parse arg inword                                                          

   if length(inword) = 0 then return 0                                       

   select                       /* check for Quote or Apost delimiters   */  

     when left(inword,1) == '"' then TgtStr = '"¦"'                          

     when left(inword,1) == "'" then TgtStr = "'¦'"                          

     othersise                  then TgtStr =  "¦"                           

   end                                                                       

   lenTgt = length(TgtStr)      /* if extra delim's used extend comp len */  

   count = 1                                                                 

   do i = 1 by 1 to length(inword)                                           

     if substr(inword,i,lenTgt) == TgtStr then                               

        count = count + 1                                                    

   end                                                                       

   return count                                                              

                                                                             

DWord:procedure                                                              

   parse arg inword, incount                                                 

   sa= "inword:'" || inword || "' and count:" incount                        

   select                       /* check for Quote or Apost delimiters   */  

     when left(inword,1) == '"' then TgtStr = '"¦"'                          

     when left(inword,1) == "'" then TgtStr = "'¦'"                          

     othersise                  then TgtStr =  "¦"                           

   end                                                                       

   lenTgt = length(TgtStr)      /* if extra delim's used extend comp len */  

   if lenTgt > 1 then           /* Calculate an offset depending on lenth*/  

      if incount = 1 then offset = 1 /* the First word will only have one*/  

      else                offset = 2 /* otherwise we have to skip both   */  

   else                   offset = 0 /* but if no delim's then ignore    */  

   outword = ''                                                              

   if incount = 0,                                                           

    | incount > dwords(inword) then return outword                           

   count = 1                    /* start by skiping to end of prev word  */  

   do i = 1 by 1 while count < incount                                       

     if substr(inword,i,lenTgt) == TgtStr then                            

        count = count + 1                                                 

   end                                                                    

   do j = I+offset by 1 until j >= length(inword) /* start with offset   */

     thisChr = substr(inword,j,lenTgt)                                    

     if thisChr == TgtStr then leave              /* get out if next fnd */

     if j = length(inword) then                   /* or we find trailing */

        if substr(inword,j,1) == left(TgtStr,1) then leave /* delimiter  */

     outword = Outword||left(thischr,1)                                   

   end                                                                    

   return strip(outword)                                                  

 

Is this helpful?

Please share some of the things you are doing with CSV data.

Attachments

    Outcomes