Search
  • Paul Smith and Tatiana Lyubimova

Knowledge Graph: Creating a global commodity data catalog

Updated: Apr 18




Introduction


Particle.One is building the global knowledge graph for commodities, which includes:

  • Data

  • Relationships

  • Analytics and Models


A core difference between Particle.One and traditional data providers lies in our knowledge graph technology. Traditional data providers collect and deliver only time series data. We annotate the data with useful metadata (e.g., supply, geography, associated commodities) and use a knowledge graph to connect metadata in the form of a set of vertices and edges between them, to highlight the otherwise unseen economic relationships between time series collected.


Data is sourced from more than 100 providers, covering more than 100 million time series, in categories such as:

  • Supply, demand, and inventory

  • Trade and supply chain

  • Macroeconomic indicators

  • Public company disclosures

  • Fundamental data



The Particle.One Knowledge Graph goes beyond traditional financial terminals since it allows you to reason about the data and its relationships, find answers about financial events, and move from seeing the effect (e.g., price has moved) to seeing the causes of why the data is changing. We empower you to see the unseen:

  • Find the causes of an increase in oil consumption in China on 2020-06-03

  • Find all data pertaining to sugar production

  • What are the variables affecting the current volatility of gold

  • Rank the top 5 commodities that are needed for manufacturing cars

  • Predict the local supply for soybean inflation in the next 6 months

  • Forecast the demand for ethylene in China provinces

  • Find what US public companies are most affected by the price of nickel

It encodes knowledge that can be used to:

  • explore relationships between economic quantities (e.g., price, supply, inventory, demand for commodity)

  • build predictive models for economic quantities

  • collect economically motivated data for further use in research, investment strategies, etc.

  • build data-driven market reports

  • map economic quantities to commodity and equity instruments

  • assess drivers of portfolio risk

  • track the different stages of a commodity’s industry chain (e.g., upstream, midstream, downstream)


Investigate the data


A significant percentage of time in model development is spent on basic data onboarding and analysis.

We create tools that save hours on research. Using our notebooks, you can immediately see data distributions, investigate outliers, and see if the data is consistent:

  • Different representations of numbers (e.g., in thousands, millions)

  • Empty cells/rows

  • Redundant characters

  • Misaligned data



Data available


Number of current data providers: 84

Number of time series currently published: 1.6 million

Number of time series to be published: more than 500 million

Number of commodities covered: 69


Top commodities by number of time series available:


Top countries by number of time series available:



Collection process


Original source of the data


Many existing data providers rely on each other to source the data. The reason for this approach is that it is simpler and cheaper to resell data captured by someone else, rather than capturing it from the original source.


Particle.One always collects the data from the original sources, so it can be:

  • published as soon as it is available, without artificial delays

  • without any alteration

  • with a point-in-time semantic

We source the data by connecting to each original provider, dealing with the complexity of dishomogeneous semantics and formats, and present the data in a uniform, consistent manner.



Point-in-time data


Point-in-time means that each piece of data is presented “as-of-date”, i.e., the view of the data reflects what an observer would have seen at that specific time. This means capturing the evolution of the data over time, and not the most recent view.

Data sources often issue amendments, restatements, corrections, change of methodologies, which effectively rewrite history.


A point-in-time semantic is essential for any accurate and representative backtesting.


Only by running the capture system in real-time is possible to have a point-in-time view of the data. It is often difficult or impossible to reconstruct the data from a historical view.



Publication timestamps


The timestamps used by original data sources often confuse the period of report (i.e., the end of the period that a data point refers to) with the publication timestamp.


The Particle.One Knowledge Graph contains a publication timestamp which represents when the data was available to the customer. We collect both when the data is sampled from the original source and when the data is available to the customer.



API


Over 1 million time series are available in real time via a REST API or a Python library to make sure captured data goes directly into your data science flow.


223 views0 comments