View
216
Download
0
Category
Tags:
Preview:
Citation preview
How to publish CSV on the Webor
why standards are important
ODI Friday lecture 26.11.2015J. Umbrich, S. Neumaier
What is CSV? (see RFC4180)
RFC4180: https://tools.ietf.org/html/rfc4180
COMMA-SEPARATED VALUES
1. Each record is separated by a line break2. The last record may or may not have an ending line
break3. There might be an optional header line4. Within the header and each record, there may be one ore
more fields, separated by commas5. Each field may or may not be enclosed in double
quotes6. Fields containing line break, double quotes, and commas
should be enclosed in double quotes7. If double-quotes are used to enclose fields, then double-
quote appearing inside a field must be escaped8. file extension: .csv mime-type: text/csv
CSV on the Web is more used as
Character-Separated Value files!
Most CSV parsers cater for this by using heuristics to identify so called
CSV-dialects
“CSV” in the wild
separator , ; \tline-ending \n \r \r\nquote chars “ ‘
“CSV” on data.gv.atmime-types CSV dialects
Anaylsing “CSV “ from data.gv.at
CSV-related files:1809
Parsable CSV files:1482
detected a header:1294
delimiter:
';’ 1471
',’ 9
None 2#comment lines:
0 1376
1 39
>1 67
Example 1
Beilage zum Rechnungsabschluss 2013;;;Nr. 4b;;;;;;;;;NACHWEIS ÜBER DEN SCHULDENDIENST;;;;;;;;;;;;Laut Voranschlag 2013 (inkl. Umbuchungen im Laufe des Jahres) waren für die Abwicklung des Schuldendienstes vorgesehen:;;;;;;;; für Verzinsung;;;8.038.400,00;� Zinsreserve gesamt ;;;12.800.000,00;� für Tilgung;;;29.915.700,00;� Gesamtvoranschlag;;;50.754.100,00;�;;;;Diesem Kredit steht die Jahresvorschreibung von;;;49.665.135,01;�gegenüber, sodass sich beim gesamten Schuldendienst;;;;eine E i n s p a r u n g von ;;;1.088.964,99; �ergibt. ;;;;;;;;
Example 1
Beilage zum Rechnungsabschluss 2013;;;Nr. 4b;;;;;;;;;NACHWEIS ÜBER DEN SCHULDENDIENST;;;;;;;;;;;;Laut Voranschlag 2013 (inkl. Umbuchungen im Laufe des Jahres) waren für die Abwicklung des Schuldendienstes vorgesehen:;;;;;;;; für Verzinsung;;;8.038.400,00;� Zinsreserve gesamt ;;;12.800.000,00;� für Tilgung;;;29.915.700,00;� Gesamtvoranschlag;;;50.754.100,00;�;;;;Diesem Kredit steht die Jahresvorschreibung von;;;49.665.135,01;�gegenüber, sodass sich beim gesamten Schuldendienst;;;;eine E i n s p a r u n g von ;;;1.088.964,99; �ergibt. ;;;;;;;;
• ; as separator
Example 1
Beilage zum Rechnungsabschluss 2013;;;Nr. 4b;;;;;;;;;NACHWEIS ÜBER DEN SCHULDENDIENST;;;;;;;;;;;;Laut Voranschlag 2013 (inkl. Umbuchungen im Laufe des Jahres) waren für die Abwicklung des Schuldendienstes vorgesehen:;;;;;;;; für Verzinsung;;;� 8.038.400,00; Zinsreserve gesamt ;;;� 12.800.000,00; für Tilgung;;;� 29.915.700,00; Gesamtvoranschlag;;;� 50.754.100,00;;;;;Diesem Kredit steht die Jahresvorschreibung von;;;� 49.665.135,01;gegenüber, sodass sich beim gesamten Schuldendienst;;;;eine E i n s p a r u n g von ;;;� 1.088.964,99; ergibt. ;;;;;;;;
• ; as separator• Not well-formed table• empty lines, empty cells• headers for column?
Example 4
>>curl -I http://www.wolfsberg.at/fileadmin/user_upload/Downloads/Haushalt2015.csvHTTP/1.1 200 OKDate: Fri, 27 Nov 2015 08:35:13 GMTServer: ApacheLast-Modified: Fri, 20 Feb 2015 08:20:48 GMTETag: "2800c44-2b354-50f80bad05800"Accept-Ranges: bytesContent-Length: 176980Vary: Accept-EncodingContent-Type: text/plain
• HTTP HEADER response
exhibition_id,city,title,location,datefrom,dateuntil3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-257,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-038,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-2910,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-3111,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-1715,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01,1999-09-3017,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-1232,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19
Example 6
exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01,1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n
Example 6
exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01,1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n
Example 6
exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01, 1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n
Example 6
exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01, 1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n
Example 6
exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01, 1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n
Example 6
Example 6
» curl -I http://data.mumok.at/exhibition.csvHTTP/1.1 200 OKDate: Thu, 26 Nov 2015 22:18:47 GMTServer: Apache/2.2.22 (Debian)Last-Modified: Thu, 26 Nov 2015 02:03:28 GMTETag: "6d44-1b853-52567fb2450dd"Accept-Ranges: bytesContent-Length: 112723Content-Type: text/csv; charset=utf-8; header=presentLink: </exhibition.csv-metadata.json>;rel=describedBy;type=application/csvm+json
Example 6
» curl -I http://data.mumok.at/exhibition.csvHTTP/1.1 200 OKDate: Thu, 26 Nov 2015 22:18:47 GMTServer: Apache/2.2.22 (Debian)Last-Modified: Thu, 26 Nov 2015 02:03:28 GMTETag: "6d44-1b853-52567fb2450dd"Accept-Ranges: bytesContent-Length: 112723Content-Type: text/csv; charset=utf-8; header=presentLink: </exhibition.csv-metadata.json>;rel=describedBy;type=application/csvm+json
• Metadata attached to CSV file• Allows for a rich “semantic” description of the table and its data
W3C: CSV on the Web Working Group
Metadata about tabular data Using JSON format
Allows for describing : the CSV dialect, including comments row or multi-
header rows, encoding, language, … data types and value ranges for columns primary key and relation to other tables Transformation rules to convert
CSV to RDF CSV to JSON
Example CSV Metadata
» curl -I http://data.mumok.at/exhibition.csvHTTP/1.1 200 OKDate: Thu, 26 Nov 2015 22:18:47 GMTServer: Apache/2.2.22 (Debian)Last-Modified: Thu, 26 Nov 2015 02:03:28 GMTETag: "6d44-1b853-52567fb2450dd"Accept-Ranges: bytesContent-Length: 112723Content-Type: text/csv; charset=utf-8; header=presentLink: </exhibition.csv-metadata.json>;rel=describedBy;type=application/csvm+json
Example CSV Metadata
{ "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}], "url": "http://data.mumok.at/exhibition.csv", "dc:title": "Exhibitions for objects from the mumok collection", "dcat:keyword": ["art", "museum", "exhibition"], "dc:publisher": { "schema:name": "mumok - museum moderner kunst stiftung ludwig wien", "schema:url": {"@id": "http://www.mumok.at"} }, "dc:license": {"@id": "https://creativecommons.org/licenses/by/3.0/at/legalcode"}, "dc:modified": {"@value": "2015-07-04", "@type": "xsd:date"},….
Example CSV Metadata
"dialect": { "encoding": "utf-8", "lineTerminators": ["\r\n", "\n"], "quoteChar": "\"", "doubleQuote": true, "skipRows": 0, "commentPrefix": "#", "header": true, "headerRowCount": 1, "delimiter": ",", "skipColumns": 0, "skipBlankRows": false, "skipInitialSpace": false, "trim": false },
Example CSV Metadata
"tableSchema": { "columns": [{ "name": "exhibition_id", "titles": "Exhibition Identifier", "dc:description": "A unique identifier for the exhibition.", "datatype": "integer", "required": true }, { "name": "city", "titles": "City", "dc:description": "The city in which the exhibition took place (no language defined, mostly in German).", "datatype": "string" }, {
How to publish CSV on the WEB
Don’t publish CSV on the Web for humans e.g., EXCEL exports
RFC 4180 Encoding
Use UTF-8, don’t mix encodings File extension: .csv Content-type: text/csv
Optional, but big improvement! Ideally, publish CSV MetaData along your CSV file Avoid Acronyms or encodings (e.g., sex=1,2,3)
ADEQUATe Open Data Umfrage
Bitte teilnehmen!http://odsurvey.ai.wu.ac.atOpenDataSurveyAustria
Das Ziel dieses Fragebogens ist es, Informationen über Open Data Potenziale und Barrieren zu sammeln. Die Umfrage dauert etwa 5 bis 15
Minuten, je nach Ihrer Bereitschaft, auch optionale Fragen zu beantworten.
Recommended