Skip to content

Commit b0bae2a

Browse files
committed
add csvjson
1 parent 3e20822 commit b0bae2a

14 files changed

+603
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ Gem Family
55

66
[**csvreader**](csvreader) - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
77

8+
[csvjson](csvjson) - read tabular data in the CSV <3 JSON format, that is, comma-separated values CSV (line-by-line) records with javascript object notation (JSON) encoding rules
89

910

1011

csvjson/.gitignore

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#######################
2+
# ignore ruby rake generated folders
3+
4+
/pkg/
5+
/doc/
6+
7+
8+
################
9+
# ignore (top-level) datapackage folders
10+
11+
/pack/
12+
/.pack/

csvjson/HISTORY.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
### 0.0.1 / 2018-10-14
2+
3+
* Everything is new. First release

csvjson/Manifest.txt

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
HISTORY.md
2+
LICENSE.md
3+
Manifest.txt
4+
README.md
5+
Rakefile
6+
datasets/hello.json.csv
7+
datasets/hello11.json.csv
8+
lib/csvjson.rb
9+
lib/csvjson/parser.rb
10+
lib/csvjson/version.rb
11+
test/helper.rb
12+
test/test_parser.rb
13+
test/test_parser_misc.rb

csvjson/README.md

+156
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# CSV <3 JSON Parser / Reader
2+
3+
csvjson library / gem - read tabular data in the CSV <3 JSON format, that is, comma-separated values CSV (line-by-line) records with javascript object notation (JSON) encoding rules
4+
5+
* home :: [github.com/csvreader/csvjson](https://github.com/csvreader/csvjson)
6+
* bugs :: [github.com/csvreader/csvjson/issues](https://github.com/csvreader/csvjson/issues)
7+
* gem :: [rubygems.org/gems/csvjson](https://rubygems.org/gems/csvjson)
8+
* rdoc :: [rubydoc.info/gems/csvjson](http://rubydoc.info/gems/csvjson)
9+
* forum :: [wwwmake](http://groups.google.com/group/wwwmake)
10+
11+
12+
13+
## What's CSV <3 JSON?
14+
15+
CSV <3 JSON is a Comma-Separated Values (CSV)
16+
variant / format / dialect
17+
where the line-by-line records follow the
18+
JavaScript Object Notation (JSON) encoding rules.
19+
It's a modern (simple) tabular data format that
20+
includes arrays, numbers, booleans, nulls, nested structures, comments and more.
21+
Example:
22+
23+
24+
```
25+
# "Vanilla" CSV <3 JSON
26+
27+
1,"John","12 Totem Rd. Aspen",true
28+
2,"Bob",null,false
29+
3,"Sue","Bigsby, 345 Carnival, WA 23009",false
30+
```
31+
32+
or
33+
34+
```
35+
# CSV <3 JSON with array values
36+
37+
1,"directions",["north","south","east","west"]
38+
2,"colors",["red","green","blue"]
39+
3,"drinks",["soda","water","tea","coffe"]
40+
4,"spells",[]
41+
```
42+
43+
For more see the [official CSV <3 JSON Format documentation »](https://github.com/csvspecs/csv-json)
44+
45+
46+
47+
## Usage
48+
49+
``` ruby
50+
txt <<=TXT
51+
# "Vanilla" CSV <3 JSON
52+
53+
1,"John","12 Totem Rd. Aspen",true
54+
2,"Bob",null,false
55+
3,"Sue","Bigsby, 345 Carnival, WA 23009",false
56+
TXT
57+
58+
records = CsvJson.parse( txt ) ## or CSV_JSON.parse or CSVJSON.parse
59+
pp records
60+
# => [[1,"John","12 Totem Rd. Aspen",true],
61+
# [2,"Bob",nil,false],
62+
# [3,"Sue","Bigsby, 345 Carnival, WA 23009",false]]
63+
64+
# -or-
65+
66+
records = CsvJson.read( "values.json.csv" ) ## or CSV_JSON.read or CSVJSON.read
67+
pp records
68+
# => [[1,"John","12 Totem Rd. Aspen",true],
69+
# [2,"Bob",nil,false],
70+
# [3,"Sue","Bigsby, 345 Carnival, WA 23009",false]]
71+
72+
# -or-
73+
74+
CsvJson.foreach( "values.json.csv" ) do |rec| ## or CSV_JSON.foreach or CSVJSON.foreach
75+
pp rec
76+
end
77+
# => [1,"John","12 Totem Rd. Aspen",true]
78+
# => [2,"Bob",nil,false]
79+
# => [3,"Sue","Bigsby, 345 Carnival, WA 23009",false]
80+
```
81+
82+
83+
84+
### What about Enumerable?
85+
86+
Yes, the reader / parser includes `Enumerable` and runs on `each`.
87+
Use `new` or `open` without a block
88+
to get the enumerator (iterator).
89+
Example:
90+
91+
92+
``` ruby
93+
csv = CsvJson.new( "1,2,3" ) ## or CSV_JSON.new or CSVJSON.new
94+
it = csv.to_enum
95+
pp it.next
96+
# => [1,2,3]
97+
98+
# -or-
99+
100+
csv = CsvJson.open( "values.json.csv" ) ## or CSV_JSON.open or CSVJSON.open
101+
it = csv.to_enum
102+
pp it.next
103+
# => [1,"John","12 Totem Rd. Aspen",true]
104+
pp it.next
105+
# => [2,"Bob",nil,false]
106+
```
107+
108+
109+
110+
### What about headers?
111+
112+
Yes, you can. Use the `CsvHash`
113+
from the csvreader library / gem
114+
if the first line is a header (or if missing pass in the headers
115+
as an array) and you want your records as hashes instead of arrays of strings.
116+
Example:
117+
118+
``` ruby
119+
txt <<=TXT
120+
"id","name","address","regular"
121+
1,"John","12 Totem Rd. Aspen",true
122+
2,"Bob",null,false
123+
3,"Sue","Bigsby, 345 Carnival, WA 23009",false
124+
TXT
125+
126+
records = CsvHash.json.parse( txt )
127+
pp records
128+
129+
# => [{"id": 1,
130+
# "name": "John",
131+
# "address": "12 Totem Rd. Aspen",
132+
# "regular": true},
133+
# {"id": 2,
134+
# "name": "Bob",
135+
# "address": null,
136+
# "regular": false},
137+
# ... ]
138+
```
139+
140+
For more see the [official CsvHash documentation in the csvreader library / gem »](https://github.com/csvreader/csvreader)
141+
142+
143+
144+
145+
146+
147+
## License
148+
149+
The `csvjson` scripts are dedicated to the public domain.
150+
Use it as you please with no restrictions whatsoever.
151+
152+
153+
## Questions? Comments?
154+
155+
Send them along to the [wwwmake forum](http://groups.google.com/group/wwwmake).
156+
Thanks!

csvjson/Rakefile

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
require 'hoe'
2+
require './lib/csvjson/version.rb'
3+
4+
Hoe.spec 'csvjson' do
5+
6+
self.version = CsvJson::VERSION
7+
8+
self.summary = "csvjson - read tabular data in the CSV <3 JSON format, that is, comma-separated values CSV (line-by-line) records with javascript object notation (JSON) encoding rules"
9+
self.description = summary
10+
11+
self.urls = ['https://github.com/csvreader/csvjson']
12+
13+
self.author = 'Gerald Bauer'
14+
self.email = '[email protected]'
15+
16+
# switch extension to .markdown for gihub formatting
17+
self.readme_file = 'README.md'
18+
self.history_file = 'HISTORY.md'
19+
20+
self.extra_deps = [
21+
]
22+
23+
self.licenses = ['Public Domain']
24+
25+
self.spec_extras = {
26+
required_ruby_version: '>= 2.2.2'
27+
}
28+
29+
end

csvjson/datasets/hello.json.csv

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
1,"John","12 Totem Rd. Aspen",true
2+
2,"Bob",null,false
3+
3,"Sue","Bigsby, 345 Carnival, WA 23009",false

csvjson/datasets/hello11.json.csv

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# hello world
2+
3+
1, "John", "12 Totem Rd. Aspen", true
4+
2, "Bob", null, false
5+
3, "Sue", "Bigsby, 345 Carnival, WA 23009", false

csvjson/lib/csvjson.rb

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# encoding: utf-8
2+
3+
require 'pp'
4+
require 'json'
5+
require 'logger'
6+
7+
8+
## our own code
9+
## todo/check: use require_relative - why? why not?
10+
require 'csvjson/version' # note: let version always go first
11+
require 'csvjson/parser'
12+
13+
14+
## add some "alternative" shortcut aliases
15+
CSV_JSON = CsvJson
16+
CSVJSON = CsvJson
17+
CSVJ = CsvJson
18+
CsvJ = CsvJson
19+
20+
21+
# say hello
22+
puts CsvJson.banner if $DEBUG || (defined?($RUBYCOCO_DEBUG) && $RUBYCOCO_DEBUG)

csvjson/lib/csvjson/parser.rb

+131
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# encoding: utf-8
2+
3+
4+
class CsvJson
5+
6+
###################################
7+
## add simple logger with debug flag/switch
8+
#
9+
# use Parser.debug = true # to turn on
10+
#
11+
# todo/fix: use logutils instead of std logger - why? why not?
12+
13+
def self.build_logger()
14+
l = Logger.new( STDOUT )
15+
l.level = :info ## set to :info on start; note: is 0 (debug) by default
16+
l
17+
end
18+
def self.logger() @@logger ||= build_logger; end
19+
def logger() self.class.logger; end
20+
21+
22+
23+
24+
def self.open( path, mode=nil, &block ) ## rename path to filename or name - why? why not?
25+
26+
## note: default mode (if nil/not passed in) to 'r:bom|utf-8'
27+
f = File.open( path, mode ? mode : 'r:bom|utf-8' )
28+
csv = new( f )
29+
30+
# handle blocks like Ruby's open()
31+
if block_given?
32+
begin
33+
block.call( csv )
34+
ensure
35+
csv.close
36+
end
37+
else
38+
csv
39+
end
40+
end # method self.open
41+
42+
43+
def self.read( path )
44+
open( path ) { |csv| csv.read }
45+
end
46+
47+
48+
def self.foreach( path, &block )
49+
csv = open( path )
50+
51+
if block_given?
52+
begin
53+
csv.each( &block )
54+
ensure
55+
csv.close
56+
end
57+
else
58+
csv.to_enum ## note: caller (responsible) must close file!!!
59+
## remove version without block given - why? why not?
60+
## use Csv.open().to_enum or Csv.open().each
61+
## or Csv.new( File.new() ).to_enum or Csv.new( File.new() ).each ???
62+
end
63+
end # method self.foreach
64+
65+
66+
def self.parse( data, &block )
67+
csv = new( data )
68+
69+
if block_given?
70+
csv.each( &block ) ## note: caller (responsible) must close file!!! - add autoclose - why? why not?
71+
else # slurp contents, if no block is given
72+
csv.read ## note: caller (responsible) must close file!!! - add autoclose - why? why not?
73+
end
74+
end # method self.parse
75+
76+
77+
78+
def initialize( data )
79+
if data.is_a?( String )
80+
@input = data # note: just needs each for each_line
81+
else ## assume io
82+
@input = data
83+
end
84+
end
85+
86+
87+
88+
include Enumerable
89+
90+
def each( &block )
91+
if block_given?
92+
@input.each_line do |line|
93+
94+
logger.debug "line:" if logger.debug?
95+
logger.debug line.pretty_inspect if logger.debug?
96+
97+
98+
## note: chomp('') if is an empty string,
99+
## it will remove all trailing newlines from the string.
100+
## use line.sub(/[\n\r]*$/, '') or similar instead - why? why not?
101+
line = line.chomp( '' )
102+
line = line.strip ## strip leading and trailing whitespaces (space/tab) too
103+
logger.debug line.pretty_inspect if logger.debug?
104+
105+
if line.empty? ## skip blank lines
106+
logger.debug "skip blank line" if logger.debug?
107+
next
108+
end
109+
110+
if line.start_with?( "#" ) ## skip comment lines
111+
logger.debug "skip comment line" if logger.debug?
112+
next
113+
end
114+
115+
## note: auto-wrap in array e.g. with []
116+
json = JSON.parse( "[#{line}]" )
117+
logger.debug json.pretty_inspect if logger.debug?
118+
block.call( json )
119+
end
120+
else
121+
to_enum
122+
end
123+
end # method each
124+
125+
def read() to_a; end # method read
126+
127+
def close
128+
@input.close if @input.respond_to?(:close) ## note: string needs no close
129+
end
130+
131+
end # class CsvJson

0 commit comments

Comments
 (0)