The ability to search for company data record-by-record is invaluable, but what about when you need the data at scale – such as to automate workflows, underpin a tech platform or connect it with other datasets?
Whilst some company registries are still in the business of providing company data on CDs (anybody remember those?), two main delivery methods of data delivery at scale prevail today from registries and data vendors: API integration and bulk data deliveries.
It’s important to understand the benefits and characteristics of each before deciding how best to power your tech product, data platform or internal systems with company data.
Over 400 organisations rely on our API or bulk data, from government agencies to disruptive tech companies. Drawing on our extensive experience, this blog post will outline the different kinds of business needs which could drive you to choose the bulk or API route (or both) when you require company data at scale.
Bulk Data Set
Whilst bulk downloads are the most basic form of data delivery at scale, OpenCorporates is one of the few organisations to provide this for company data. Our bulk data clients receive regular deliveries of company data in an easy-to-use CSV format, which helps them power solutions to some of the most interesting business challenges.
If you’re interested in receiving a sample of the bulk data we provide, you can always get in touch.
Market needs where bulk company data adds value typically include:
- Spine data: Bulk data is essential for organisations who are looking for a robust spine of company data to sit at the foundation of their data, tech or internal systems.
Our transparent legal entity reference data offers platforms, ranging from data lakes to SaaS products, a basic reference point to which other datasets can be compared and appended. For example, Quantifind uses our global bulk dataset to underpin their machine learning-powered anti-financial crime investigations tool.
- Data analysis: A company dataset delivered in bulk can be used for a wide range of analysis, including identifying trends or anomalies – and creating models.
For example, FNA’s network analytics platform uses OpenCorporates’ bulk data to help financial crime investigators take a proactive approach to identifying risk. By combining our company data with a range of other sources such as financial transactions and news reporting, and applying machine learning, the technology identifies anomalous activity patterns that could be early warning signs of risk.
- Combining datasets: Bulk data can be combined with other datasets, such as business licenses, procurement data, sanctions lists and more. Combining datasets in bulk like this is often how insights are uncovered.
For example: it was only by combining OpenCorporates’ data with a list of firms provided with Covid-relief funds under the Paycheck Protection Program that the Anti-Corruption Data Collective were able to investigate alleged fraud. Similarly, investigations platforms (such as those used in anti-financial crime) need to know what “dots” (read: companies and their officers) are out there in order to join them and find potential linkages to investigate.
- Where simplicity is needed to start with: Bulk data can be advantageous where low technical barriers to entry are important to an organisation starting to use company data at scale. Many NGOs, journalists or law enforcement officials have the ability to deal with a single dataset, but they may not have the skills or resources to write the code to interface with an API, particularly when they need to do so on an ongoing basis.
- Security: Many government and law enforcement agencies, as well as some large financial institutions, are restricted in their use of external APIs. They can be bound by rules that, for example, prohibit them from making queries about entities they are investigating. So acquiring the data in bulk allows them to utilise it at scale internally.
API integrations are increasingly used to enable two systems to exchange data and talk to each other.
In the world of company data, API integrations are often beneficial for the following business needs:
- Single queries: API calls are valuable where single queries need to be made. They are often used when you already have a name and company number for a legal entity and need to verify that against official company data time and time again in a repeatable way. For example: such as in the identification and verification (ID&V) process of Know Your Customer (KYC) due diligence.
- Workflow automation: API integrations also enable automated workflows to run. Instead of the whole dataset en masse, APIs can pull in a few records or attributes at a time, such as names, addresses or officers. This saves you (or your product’s users) the time and effort of having to search for these records manually.
Many regulatory technology (RegTech) tools that focus on automated due diligence or business verification already use the OpenCorporates API in this way – such as Exiger’s DDIQ. Their tools typically call on OpenCorporates data to help their users verify the identity of a company or company officer they need to conduct due diligence on, as one of the initial steps in a longer workflow.
- Low-latency queries: In some cases, it is critical to have access to very up-to-date data. This is particularly true where anti-money laundering or KYC regulations require entities to be screened against current data, or where out-of-date data will give the ‘wrong’ result.
- Where smaller record numbers are needed: An API is the best option for an organisation that finds it problematic, or not cost-effective, to import an entire dataset. If a user needs only a small number of records, for example in an onboarding process, then downloading and importing millions of records might not be as efficient.
- Using data without storing high volumes: Similarly, API calls are useful where an organisation does not want to store much data themselves, but prefers to access large datasets for one-off retrievals.