1. Home
  2. Data
  3. Dictionaries
  4. Data Dictionary: Additional Information

Data Dictionary: Additional Information

Additional Notes:

File / Folder naming– Each delivery will be in a folder named as per “as-at” date – eg “2016-06-01”
– Sample / test files will be in separate dated folders – eg “sample_2016_06_01”
Compression– Files will be compressed in gzip format, with “.gz” extension
md5sum– File named “md5sum” will be added showing a MD5 checksum for all gzipped files – this helps check for any integrity issues during download
File transfer mechanism– SFTP
– User authentication: choice of password or private-key certificate
Format of files– CSV, conformant to RFC 4180 https://tools.ietf.org/html/rfc4180
– Linebreaks in text indicated with \n (common in address fields)
– Unix line endings
Column separator, (comma)
Fields enclosed byOptionally enclosed by double quotes, e.g when field contains a comma
e.g: xxx,”Test Company, LLC”,yyy
Escape characterDouble quotes in text are escaped with double quotes as per RFC 4180
e.g: xxx,”test “”hello”” company llc”,yyy
EncodingUTF-8

Data Types

varchar(255)string up to 255 chars in length
textstring up to 65,535 bytes in length – NB: UTF8 consumes 1-3 bytes per character
mediumtextstring up to 16MB bytes in length – NB: UTF8 consumes 1-3 bytes per character
longtextstring up to 4GB bytes in length – NB: UTF8 consumes 1-3 bytes per character
pipe separated fieldsUnable to determine field length. Data in this field can consist of 0, 1 or multiple instances of an underlying field. so data may need to be normalised during ETL
booleancontains: true, false, or is left blank to signify “unknown” state
Note [1]Data in this field is combined with other data attributes in YAML serialized format, and is stored in a single
mediumtext MySQL field. This allows OpenCorporates to handle a wide variety of string lengths from multiple data sources
Note [2]Data in this field is combined with other data attributes in a ruby object, and is stored in a single longtext MySQL field. This allows OpenCorporates to handle a wide variety of string lengths from multiple data sources
Updated on July 27, 2023

Was this article helpful?

Related Articles

Need Support?
Can’t find the answer you’re looking for? Don’t worry we’re here to help!
Contact Support