Ever had to work with CSV files? It stands for Comma Separated Values, and although the file format looks incredibly simple, it can be a pain to work with. An example of CSV data:
firstname,lastname,"lastname, firstname",age,description Paul,Klinkenberg,"Klinkenberg, Paul",31,"CFML junkie, father" Luce,Klinkenberg,"Klinkenberg, Luce",0,"Main actions during the day: - sleeping - drinking - crying - creating dirty diapers" "Emma ""my love""",Klinkenberg,"Klinkenberg, Emma ""my love""",30,"Perfect :-)"
Now how would you parse this data? Line by line probably, and then do a ListToArray(line, ","). But that would not work in this case, because you would get 5 items instead of 4. And you would get 8 rows, instead of 4. These caveats, and some other troubles, make sure that CSV files can be pretty hard to parse. But not anymore!!!
Thanks to Ben Nadel's extensive research into CSV parsing, and a bit of extra coding and tweaking on my part, you can now start using <cfcsv> in Railo!
Usage
The tag has 2 actions:
- Parse: can take a file path or a string, and returns the results as a query (default), or as an array of arrays.
- Create: takes a query, and returns a string of CSV data.
Attributes
Name | Used when | Required | Type | Default value | Description |
---|---|---|---|---|---|
Action | parse, create | yes | String | Must be one of the following: parse, create | |
Variable | parse, create | no | String | cfcsv | Variable name which will hold the returned value |
Textqualifier | parse, create | no | String, 1 character | " | The character with which string values are 'quoted'. For example, when a string contains the csv file delimiter, a line break, or the textqualifier itself, then the string will be quoted to indicate it is one string. |
Delimiter | parse, create | no | String, 1 character | , | The character used to delimit values. By default, this is a comma, but it can also be a TAB, a semicolon, or any other character. |
Output | parse | no | String | query | Must be one of the following: query, array. |
File | parse | no (either File or Data attribute must be given) | String | An absolute or relative path to a csv file. | |
Data | parse | no (either File or Data attribute must be given) | String | A string containing csv data. | |
Trimendoffile | parse | no | Boolean | true | Should we remove any line breaks at the end of the file. When set to false, and the last line ends in a line break, you will get an extra (empty) row of data. |
Hascolumnnames | parse | no | Boolean | true | Whether the data to parse has a first line with column names. When true, a returned query will use these headers as query column names. Otherwise, the query will have column names "col1", col2, etcetera. This setting is only used for query output. |
TrimLines | parse | no | Boolean | false | Whether or not to remove all spaces and tabs at the start and end of every line in the File or Data. NEW: added in version 1.2, March 2015 |
CharSet | parse | no | String | When given, this is the charset to read the File in. It has no effect on the Data attribute or the output. NEW: added in version 1.2, March 2015 |
|
Query | create | Yes, when action is create. No otherwise | Query | The query to create the csv data from. | |
IncludeColumnNames | create | no | Boolean | true | Must we include the query column names as the first line of the returned csv data. |
Examples
The output for these examples can be viewed here.
<!--- Parse a csv file, and return the resulting query into 'variables.myQuery'. Also note the relative file path usage! ---> <cfcsv action="parse" file="testdata.csv" variable="myQuery" delimiter=";" /> <cfdump eval=myQuery /> <!--- some test data, ending with extra line breaks ---> <cfset myData = "test,test2 data 1,data 2 'quoted here, with a line break',second col " /> <!--- Parse a string of CSV data. The text qualifier is a single quote ---> <cfcsv action="parse" data="#myData#" variable="parsedDataQuery" textqualifier="'" /> <cfdump eval=parsedDataQuery /> <!--- Create CSV data from a query, returning it as 'variables.csvData' ---> <cfcsv action="create" query="#parsedDataQuery#" variable="csvData" /> <pre><cfoutput>#htmleditformat(csvData)#</cfoutput></pre> <!--- Create CSV data from a query, without a first line of headers (query column names) ---> <cfcsv action="create" query="#parsedDataQuery#" variable="csvData" includeColumnNames=false /> <pre><cfoutput>#htmleditformat(csvData)#</cfoutput></pre> <!--- Create CSV data from a query, using non-default delimiter and text-qualifier ---> <cfcsv action="create" query="#parsedDataQuery#" variable="csvData" delimiter="|" textqualifier="'" /> <pre><cfoutput>#htmleditformat(csvData)#</cfoutput></pre> <!--- Parse a csv file, returning it as an array of arrays ---> <cfcsv action="parse" file="testdata.csv" variable="myArray" output="array" delimiter=";" /> <cfdump eval=myArray /> <!--- Return results as an array of arrays, and not removing optional trailing line breaks. This will result in a few extra empty arrays, because every line will be seen as a row of data. ---> <cfcsv action="parse" file="testdata.csv" variable="myArray" output="array" trimendoffile=false delimiter=";" /> <cfdump eval=myArray /> <!--- Not removing optional trailing line breaks. This will result in a few extra query rows, because every line will be seen as a row of data. ---> <cfcsv action="parse" file="testdata.csv" variable="myQuery" output="query" trimendoffile=false delimiter=";" /> <cfdump eval=myQuery /> <!--- Do not use first line as headers. This will give the query column names called 'col1', 'col2', etc. ---> <cfcsv action="parse" file="testdata.csv" variable="myQuery" hascolumnnames=false delimiter=";" /> <cfdump eval=myQuery />
Change log
Version 1.1, sept. 22 2011: Changed the way the regex pattern is built, so we can also use tabs /spaces as delimiters.
Version 1.2, March 2015:
- Lucee support;
- optional parameter "trimlines" in Parse function;
- timestamps are now formatted as yyyy-mm-dd HH:mm:ss, and dates as yyyy-mm-dd, instead of the curly brace notation of {d '2010-01-02'}
- optional parameter "charset" in Parse function, to read the file with the correct character set
Installation
You can install the custom tag by using your Railo server admin.
Go to your Server admin (default is http://localhost/railo-context/admin/server.cfm), click on Extension > Applications, and install the CFCSV custom tag from the Railo Extension Store.
Lastly, you have to restart Railo to be able to use the tag.
The source code
You can view all source code on GitHub, at https://github.com/paulklinkenberg/lucee-cfcsv. It mainly consists of:
- cfcsv.cfc: the code that does all the hard work
- CSV.cfc: the Railo custom tag which invokes cfcsv.cfc
Thanks!
A big thank you goes out to Ben Nadel and Steven Levithan, who wrote the CSV parsing code. Ben never stops to amaze me with his incredible amount of blog posts and researches into all kinds of Coldfusion related subjects. You rock dude!
Questions? Complaints?
Leave a message; I appreciate it!
| Viewed 19038 times
#1 by Craig Kaminsky - February 4, 2011 at 5:21 PM
#2 by Matt Woodward - February 5, 2011 at 8:20 PM
This prompted me to want to take a moment to plug the CFML Conventional Wisdom list, which is a public list where any CFML developer can go to discuss new additions to CFML:
http://groups.google.com/group/cfml-conventional-wisdom
One of the awesome things we're starting to see happen now that we're almost three years into the open source revolution in CFML is more community contributions to the engines by way of patches to the source as well as extensions like this. And it's really fantastic that an addition can go from idea to implementation so quickly.
The only caveat I'd like to throw out there is we do run the risk of winding up with different implementations of a lot of features between Railo and OpenBD (and potentially Adobe CF if they decide to introduce some of the additions to the language that develop in the open source projects), which is why I'd recommend that people considering making additions to the engines open up discussion on the CFML Conventional Wisdom list to get feedback, as well as see if there's more or less an agreed upon standard that comes out of the discussions.
For example, OpenBD already has CSVRead() and CSVWrite() functions, but no tag, so it'll be interesting to see if the tag you came up with more or less matches how the functions in OpenBD work.
http://www.openbluedragon.org/manual/?/function/csvread
http://www.openbluedragon.org/manual/?/function/csvwrite
What's noteworthy is the fact that those functions started with a different name and some different functionality entirely, but after a discussion on the CFML Conventional Wisdom list things were adjusted:
http://groups.google.com/group/cfml-conventional-wisdom/browse_thread/thread/c8e8e6e3e7aa4974/a5caadc9cb7cdaf8#
Similarly with REEscape(), a discussion got started on the CFML Conventional Wisdom list for that as well, and there's already been some very good insight that will impact the end result that gets put into OpenBD (and I believe Peter Farrell already created the initial implementation):
http://groups.google.com/group/cfml-conventional-wisdom/browse_thread/thread/c12d45d68889352f
There were also been a host of IP-releated functions introduced to OpenBD a while that probably overlap quite a bit with CFDNS.
My point isn't to stifle anyone's motivation to add new tags or functions. Really I'm just making a modest plea to go to the CFML Conventional Wisdom list and propose your idea so the new functionality can be discussed and, potentially at least, a standard or convention can be agreed upon. Let's just keep the lines of communication open!
This is precisely the sort of issues we're going to be discussing heavily at OpenCF Summit, so hope to see a lot of you there.
http://www.opencfsummit.org
#3 by Michael Brennan-White - February 6, 2011 at 1:22 AM
I don't see this tag as an option to install though I was able to install the log analyzer.
#4 by Paul Klinkenberg - February 6, 2011 at 12:29 PM
Cheers, Paul
#5 by Paul Klinkenberg - February 6, 2011 at 1:53 PM
Thanks for pointing it out! Cheers, Paul
#6 by Ben Nadel - February 7, 2011 at 11:30 PM
#7 by Jean-Marc Bideaud - March 11, 2011 at 5:47 PM
I just have a problem when parsing large csv files, I get a Java heap space error message : java.lang.OutOfMemoryError.
I understand I should adjust my Java memory settings but don't know how to do this : I just installed Windows Railo 3.2.1 Tomcat version from getRailo.org. It's running as 'Apache Tomcat Railo' service, I don't know how to modify the memory environment.
Could you please explain me how to do this or direct me to a documentation fit for my problem ? thanks, Jean-Marc
#8 by Jean-Marc Bideaud - March 11, 2011 at 6:15 PM
Paul I just found a way to increase JvmMx in Tomcat Windows registry and I am now able to parse a 12Mo csv file in a query of 110 000 records ! it's really nice for me, thanks for your work.
#9 by Paul Klinkenberg - March 11, 2011 at 9:52 PM
Have a nice weekend! Paul
#10 by Thomas Alexander - July 6, 2011 at 7:46 AM
I'm bit new to coldfusion.
I was trying to parse a csv that i'm getting as an API response. For certain fields values i'm having extra ", characters, so the parser splits the csv from there also
This is the format i'm having , can you please check if it could be parsed.
"1","1","1297794014","11","1","0","0","0","0","Tom Alex","","","test@test.com","test","testekrjsktj","system \"ns1.exmaple.com\", has identified"
"4","1","1298054610","21","2","0","5","1","0","Test2 User <>","","","Tom Alex ","Re: [#1] test","jdhfksjdhfkjsdhf",""
"8","1","1298072657","23","2","1","5","7","0","Test 3 <>","","","","","Hi guys",""
#11 by Arthur Blake - September 16, 2011 at 8:59 PM
<cfif arguments.delimiter EQ chr(9)>
<cfset local.escapedDelimiter = "\t" />
<cfelse>
<cfset local.escapedDelimiter = regExSafe(arguments.delimiter) />
</cfif>
Then it works.
Also, I can't seem to use it with local scoped variables (or indeed variables with any scope) in the variable parameter, only unscoped variables seem to work... Not sure how to fix that one. Any ideas?
#12 by Paul Klinkenberg - September 22, 2011 at 11:59 PM
It was written with the "verbose flag" on, which makes sure that whitespace in the regex is ignored. Normally no problem, except for when you're using a tab (or space) as delimiter. I uploaded version 1.1.3 now, which is installable from the Railo extension store.
Regarding the unscoped variables: that's not working atm. If you reaally want it, the code is opensource :)
Cheers, Paul
#13 by Arthur Blake - October 4, 2011 at 3:30 PM
Also, I cannot seem to use a command line client to access your SVN repo. Is there an anonymous log in for view only access, that I could use?
#14 by Paul Klinkenberg - October 4, 2011 at 10:47 PM
I want to move the svn projects to github, so other people can collaborate. But for now, I made the repos read-only for anonymous connections :-)
Cheers, Paul
#15 by Arthur Blake - October 4, 2011 at 10:55 PM
#16 by Brook - October 11, 2011 at 1:09 AM
<cfargument name="startRow" type="numeric" default="1" hint="Row to start reading at" />
<cfargument name="maxRows" type="numeric" default="0" hint="Maximum rows to read from the start row. 0 for all" />
And then updated this code:
<cfif len( arguments.file )>
<!--- Read the file into Data. --->
<cfset arguments.csv = fileRead( arguments.file ) />
</cfif>
to:
<cfif len( arguments.file )>
<cfif arguments.startRow is not 1 or arguments.maxRows is not 0>
<!--- create the buffer so we can append one line at a time --->
<cfset arguments.csv = createObject("java","java.lang.StringBuffer").Init()>
<cfif arguments.maxRows is not 0>
<cfset arguments.maxRows = arguments.maxRows>
<cfelse>
<cfset arguments.maxRows=0>
</cfif>
<!--- Read the file into Data. --->
<cfloop file="#arguments.file#" index="local.line" from="#arguments.startRow#" to="#arguments.maxRows#">
<cfset arguments.csv.append(local.line & chr(13) & chr(10))>
</cfloop>
<cfelse>
<cfset arguments.csv = fileRead( arguments.file ) />
</cfif>
</cfif>
#17 by Dino Edwards - February 1, 2012 at 7:19 PM
#18 by Paul Klinkenberg - February 1, 2012 at 8:04 PM
Good luck, Paul
#19 by Dino Edwards - February 1, 2012 at 9:26 PM
#20 by Hendrik Kramer - March 22, 2012 at 11:23 AM
<cfset data = fileRead( "test.csv", "utf-8" ) />
<cfcsv action="parse" data="#data#" variable="q" hascolumnnames=true delimiter="," />
but not provide a charset in the cfcsv tag itself.
<cfcsv action="parse" file="test.csv" variable="q" hascolumnnames=true delimiter="," />
Of course, I could set the default encoding in the railo administrator, but I have to work with CSV files in different encodings (Shift_JIS, ISO-8859-1, UTF-8 and more).
Would be great!
#21 by Paul Klinkenberg - March 22, 2012 at 11:58 AM
Paul
#22 by Colorado Techie - August 21, 2012 at 6:17 PM
#23 by Colorado Techie - August 21, 2012 at 6:58 PM
#24 by Paul Klinkenberg - August 21, 2012 at 8:50 PM
It is a nice idea to be able to have an array of arrays as input, next to the current query input. I will put it on one of my todo lists...
It would be fairly easy though, you could use something like:
<cfset delimiter = "," />
<cfsavecontent variable="csv"><cfloop array="#myArray#" index="arr"><cfloop array="#arr#" index="item"><cfif find(delimiter, item) or find('"', item) or find(chr(10), item)>"#replace(item, '"', '""', 'all')#"<cfelse>#item#</cfif>#delimiter#</cfloop>#chr(65535)#</cfloop></cfsavecontent>
<cfset csv = trim(replace(csv, "#delimiter##chr(65535)#", chr(10), "all")) />
Hope this helps :) Paul
#25 by Colorado Techie - March 9, 2013 at 6:46 AM
I got the Array of Arrays to save back to CSV (using StringBuffer: http://www.bennadel.com/blog/305-ColdFusion-CFFile-vs-Java-java-io-BufferedOutputStream.htm ).
Unfortunately, I'm having a problem with the cfcsv tag now... My file has 91,000 rows in it (it is a 32 MB text file) and cfcsv truncates the data at about 61,000 rows.
I ran into a similar problem using my own code (with StringBuffer) and I ended up writing out my StringBuffer data to the file in increments of 10,000 rows and that fixed the problem.
Unfortunately, I'm not sure what to do about the cfcsv issue. I'm not getting any OutOfMemory errors nor any stack traces... I just try to read in the file, then do ArrayLen(arrRows) and it outputs 61,488 instead of 91,509 (which is how many rows my file has).
Any ideas?
#26 by Colorado Techie - March 9, 2013 at 8:47 AM
So, anyone having file truncation issues should make sure that their data is formed correctly.
(I had the following line:
element1|element2|ele "ment" 3|element4
if you use a double quote as a qualifier, the third element needs the qualifiers to be escaped AND the entire element needs to be wrapped in the qualifier.
Bleh. Anyhow, thanks again for this function, I use it all the time!)
#27 by Paul Klinkenberg - March 9, 2013 at 3:20 PM
#28 by Brandon Culpepper - March 14, 2013 at 9:50 PM
Is cfcsv no longer available to be installed in this manor?
Thanks for the help!
Brandon
#29 by Brandon Culpepper - March 14, 2013 at 10:37 PM
Still curious as to why it isn't showing up in the extensions store though.
Thanks again!
Brandon
#30 by Paul Klinkenberg - March 14, 2013 at 10:40 PM
It is not showing up in the server admin anymore, because the backend Mura plugin we use in the extension store, only has 2 options for install type: Web or Server. it should have a third option: Both.
Now every time I update a plugin in the backend, the install type "Both" gets reset to Web :-( Still need to fix that.
The tag is available from the web admin though.
Cheers, Paul
#31 by Colby Litnak - August 1, 2013 at 6:17 PM
cfcsv works great... except I think one of the engineers I work with found a bug in it. It looks like if there is some tabs after a quoted row while parsing the csv file, it will stop processing.
take the following csv, for example (replace the [tab] with the actual control character):
col1,col2,col3
"1val1","1val2","1val3"[tab][tab]
"2val1","2val2","2val3"
It returns a query with just the first row and the 3rd value omitted.
The engineer made a slight change to the regex in cfcsv.cfc to grab it and continue parsing, but it still leaves the third value blank. In any case, I thought you should know.
Here is the new line 86 of cfcsv.cfc
<cfsavecontent variable="local.regEx"><cfoutput>\G(?:#local.escapedTextqualifier#([^#local.escapedTextqualifier#]*+(?>#local.escapedTextqualifier##local.escapedTextqualifier#[^#local.escapedTextqualifier#]*+)*)#local.escapedTextqualifier#|([^#local.escapedTextqualifier##local.escapedDelimiter#\r\n]*+))(#local.escapedDelimiter#|\r\n?|[\t]*+\r\n|\n|$)</cfoutput></cfsavecontent>
We might play around with it some more to get it to work a little bit better.
#32 by Paul Klinkenberg - August 1, 2013 at 8:59 PM
You could remove the extra tabs before passing it to cfcsv: [cfset csvdata = rereplace(csvdata, '\t+([\r\n]|$)', '\1', 'all') /], or make sure there aren't any tabs at the end of a line.
In the MS Access world we live in (assuming that's where it came from), it seems regular to have extra tabs, but it just isn't valid csv if Access outputs it as "string"[tab][tab]. It should have outputted it as "string",[tab],[tab]
Anyway, in a future version I might optionally trim all lines, as in [cfcsv trimlines="true" .../]
Cheers, Paul
#33 by Duncan - November 26, 2014 at 7:31 AM
Excellent tag. We download a lot of date formatted stuff, and this seems to get outputted from the database as a timestamp, that Excel cant deal with. e.g. {ts '2014-02-03 00:00:00'}
Sure, you can make a formula and run it down the columns, but its man hours that are scarce. Is there an option to change this formatting in the tag (without bashing around your tag?)
Thanks!
#34 by Tommy - January 10, 2015 at 6:55 AM
Thanks a million for the cfcsv tag. Installation was a snap and it was just what the doctor ordered!
#35 by ccsimmons - February 17, 2015 at 5:10 AM
First off, thanks for this GREAT tag. Just wondering if there will be a Lucee compatible version of the cfcsv tag. Installation gave me the following error:
Message column [price] does not exist
Detail columns are [type,id,name,label,description,version,category,image,download,paypal,author,codename,video,support,documentation,forum,mailinglist,network,created,provider,info,uid]
Stacktrace The Error Occurred in
/context/admin/extension.applications.upload.cfm: line 123
called from /context/admin/extension.applications.cfm: line 98
called from /context/admin/web.cfm: line 389
called from /context/admin/server.cfm: line 2
#36 by DMan - March 3, 2015 at 1:33 PM
this doesn't work with the latest Lucee, db error. Is this trivial/hard to port to Lucee?
Thanks.
#37 by Paul Klinkenberg - March 5, 2015 at 12:23 AM
There is a new version out just now!
* Lucee support;
* optional parameter "trimlines" in Parse function;
* timestamps are now formatted as yyyy-mm-dd HH:mm:ss, and dates as yyyy-mm-dd, instead of the curly brace notation of {d '2010-01-02'}
* optional parameter "charset" in Parse function, to read the file with the correct character set
#38 by Paul Klinkenberg - March 5, 2015 at 12:54 AM
After that, the cfcsv extension should be listed in the Applications page.
#39 by Fred B - July 4, 2016 at 4:42 PM
How do I install it in Lucee 5?
#40 by Hugh - August 19, 2016 at 12:18 AM
___________________________________________________________________________
The file could not be uploaded. The error:
Only .lex is allowed as extension!
_____________________________________________________________________________
Not sure if this is a known issue, but if you had any suggestions it would be nice.
#41 by Paul Klinkenberg - August 19, 2016 at 11:26 PM
You could try changing the extension of the zip file to *.lex instead, and then try uploading again.
The cfcsv extension was added to the extension store in the lucee admin in march 2015 already (https://luceeserver.atlassian.net/browse/LDEV-207), but disappeared there meanwhile, for unknown reasons.
Kind regards, Paul Klinkenberg
#42 by Jonathan Smith - September 6, 2016 at 4:26 PM
Thanks for creating this tag! Super helpful. I also couldn't find the extension in the Lucee extensions list, and couldn't manually install a zip, or a .lex of this project from the Lucee Admin.
http://preview.lucee.org/download/?type=extensions
What I ended up doing was locating the tag extension folder via:
writeDump(expandPath("{lucee-server}/library/tag"));
Then, copying your csv.cfc directly into this folder.