AnsweredAssumed Answered

Problem Parsing CSV File in GEL Script

Question asked by alanbrobst on Mar 22, 2019
Latest reply on Mar 27, 2019 by gcubed

I'm a relative newcomer to GEL and not a Java guru, so I'm hoping someone out there can give me some advice.

 

My goal is to use GEL to read a CSV file (which is exported from our ERP system and stored on our SFTP server), then parse each row into individual columns, and then populate a staging table via XOG.

 

I have almost everything working exactly right except for one problem: the content of the CSV file is giving me headaches.  It's comma-delimited, but the only time quotation marks are used is when there is a comma in a field that is not a delimiter.  In those cases, the value is embedded in quotation marks.  For example:

 

PART_NUMBER,ITEM_DESCRIPTION,UNIT_COST,QTY_SHIPPED
3EU48C891,CIRCUIT BREAKER BOX,138.22,50
4FJ62D837,SWITCH PLATE,0.04,5000
5GL29E906,"BREAKER, CIRCUIT, 20A",6.94,200

 

Note that in the first 2 rows the value in the second column DOES NOT contain commas, so there are no quotation marks around the ITEM_DESCRIPTION field. In the third row, the value in the second column DOES contain commas, so there are quotation marks before and after the ITEM_DESCRIPTION value itself.

 

When I parse the row using files:readFile coded as follows:

 

<files:readFile fileName="${vFileName}" delimiter="," var="vPartData" embedded="false"/>

 

I get:

 

First Row: (OK)
Field 1 = 3EU48C891
Field 2 = CIRCUIT BREAKER BOX
Field 3 = 138.22
Field 4 = 50

 

Second Row: (OK)
Field 1 = 4FJ62D837
Field 2 = SWITCH PLATE
Field 3 = 0.04
Field 4 = 5000

 

Third Row: (NOT OK)
Field 1 = 5GL29E906
Field 2 = "BREAKER
Field 3 = CIRCUIT
Field 4 = 20A"

 

The fields:readFile doesn't recognize that the commas in the second field in the third row aren't delimiters, and it parses the third row incorrectly.  The ITEM_DESCRIPTION field becomes 3 separate columns because my command doesn't recognize that the 2 commas in that value aren't supposed to be delimiters.

 

Does anyone know the proper delimiter syntax so that all rows will be parsed correctly, even if there are non-delimiting commas in a row?  If you do, you will be my hero.

 

Thanks,

 

Alan Brobst

GE Transportation, a Wabtec Company

Outcomes