Additional Information

Additional Notes:

File / Folder naming– Each delivery will be in a folder named as per “as-at” date – eg “2016-06-01”
– Sample / test files will be in separate dated folders – eg “sample_2016_06_01”
Compression– Files will be compressed in gzip format, with “.gz” extension
md5sum– File named “md5sum” will be added showing a MD5 checksum for all gzipped files – this helps check for any integrity issues during download
File transfer mechanism– SFTP
– User authentication: choice of password or private-key certificate
Format of files– CSV, conformant to RFC 4180
– Linebreaks in text indicated with \n (common in address fields)
– Unix line endings
Column separator, (comma)
Fields enclosed byOptionally enclosed by double quotes, e.g when field contains a comma
e.g: xxx,”Test Company, LLC”,yyy
Escape characterDouble quotes in text are escaped with double quotes as per RFC 4180
e.g: xxx,”test “”hello”” company llc”,yyy

Data Types

varchar(255)string up to 255 chars in length
textstring up to 65,535 bytes in length – NB: UTF8 consumes 1-3 bytes per character
mediumtextstring up to 16MB bytes in length – NB: UTF8 consumes 1-3 bytes per character
longtextstring up to 4GB bytes in length – NB: UTF8 consumes 1-3 bytes per character
pipe separated fieldsUnable to determine field length. Data in this field can consist of 0, 1 or multiple instances of an underlying field. so data may need to be normalised during ETL
booleancontains: true, false, or is left blank to signify “unknown” state
Note [1]Data in this field is combined with other data attributes in YAML serialized format, and is stored in a single
mediumtext MySQL field. This allows OpenCorporates to handle a wide variety of string lengths from multiple data sources
Note [2]Data in this field is combined with other data attributes in a ruby object, and is stored in a single longtext MySQL field. This allows OpenCorporates to handle a wide variety of string lengths from multiple data sources
Updated on August 19, 2024
