Search
  • Paul Smith and Tatiana Lyubimova

Commodity-company mapping: Using knowledge graph tools to discover supply chain relationships

Updated: Apr 18



Particle.One uses knowledge graph tools to discover otherwise unseen relationships between time series that affect commodity markets and their participants.


We have collected over 150 million datasets containing essential fundamental and macro information.


Our searchable knowledge base contains:

  • Data

  • Metadata

  • Relationships (e.g., leading or lagging, casual, supply-chain) between data

  • Models for explaining, forecasting, and reasoning about economic events and quantities.


Using proprietary NLP (natural language processing) techniques, we retrieve information from both structured and unstructured texts and map to relevant entities, equity tickers, and commodities.


We present part of our research process below.


Using machine learning tools, our team scans the data disclosed in 10-K and 10-Q forms of S&P 1500 companies and associated each equity ticker with relevant commodities. This is one piece of our larger effort of creating the global knowledge graph for commodity markets.



Data

Company universe: S&P 1500 (the “Particle” universe)

Time period: 2017-01-01 to 2017-12-31

Forms: 10-K and 10-Q


Statistics

Based on the statistics above, we notice that the commodities mentioned in the forms do not remain the same over the course of the reporting period.



Mapping results

Below we present an example of a typical mapping flow. We extract Forms 10-K (annual forms) and 10-Q (quarterly forms) using Particle’s EDGAR API.


Form sample:

https://www.sec.gov/Archives/edgar/data/1111335/000111133517000012/0001111335-17-000012-index.html

10-K, filed by Visteon Corp (CIK 0001111335) on 2017-10-26


Form’s human-readable .html content sample: https://www.sec.gov/Archives/edgar/data/1111335/000111133517000012/visteonq3201710-q.htm?s2B424CDE71E750D4ABE1B32384E07102


The form’s html contains a massive amount of data, which amounts to roughly 24,200 words or 55 A4 pages.


Form’s machine-readable .txt content sample: https://www.sec.gov/Archives/edgar/data/1111335/000111133517000012/0001111335-17-000012.txt


The form’s .txt contains an even larger amount of data, including large chunks of XML markup, which makes it difficult to process manually. Examples below highlight what the data looks like in the raw format.










Particle.One’s proprietary NLP (Natural Language Processing) technology parses and cleans the data, leaving only the meaningful text from the form.


Snippet of the result:


Natural Language Processing techniques applied to the text determine which commodities may have a material impact on a company’s business operations.


In the sample form, we match “Aluminum”, “Copper”, “Fuel Oil” and “Natural Gas” and store the context of the matches:


Mapping formats

The initial goal of the research was to map the Particle.One universe of 74 commodities to S&P 1500 companies.

We present this mapping in two distinct ways:


1. An index from commodity to companies


2. An index from company to commodities


Table structure:

  1. cik: CIKs of companies

  2. ticker: Tickers of companies

  3. name: Names of companies

  4. commodities: Commodities mapped to the company

The mapping table is sorted by CIKs (ascending); commodities are sorted alphabetically.



Number of companies per commodity


We compute the distribution of mentions of the universe of commodities.



Company clusters


Our company clustering algorithm groups companies based on commodities significant to business operations. These clusters cut across traditional industry and sector-based classifications, providing the basis for principled modeling with commodity-driven factors.


Additionally, company clusters provide a coarse measure of supply chain relationships.

A snippet of the matrix:


In the co-occurrence matrix, each company corresponds to a vector of size p (a matrix row).

We use these company vectors to cluster the companies with the K-means algorithm.



Table structure:

  1. cluster_id: The cluster’s numerical identifier

  2. cluster_size: The number of companies in the cluster

  3. company_ciks: CIKs of companies in the cluster

  4. company_tickers: Tickers of companies in the cluster

  5. company_names: Names of companies in the cluster

  6. cluster_sectors: Sectors of companies in the cluster (coming soon)

  7. top_cluster_commodities: Top 5 frequent commodities mapped to the companies in the cluster

CIKs, tickers and names are not aligned; all are sorted



Visual mapping representation


A dot is a company (more precisely, a company-(mapped) commodity pair, because if a company is mapped to several commodities, each mapping gets its own dot).

  • X axis shows the company’s sector

  • Y axis shows the commodity the company is mapped to

The dots are colored according to the clusters that the companies are assigned to (but because there are 30 different clusters, a lot of the colors are very similar to each other).


Our company clustering algorithm groups companies based on commodities significant to business operations. These clusters cut across traditional industry and sector-based classifications, providing the basis for principled modeling with commodity-driven factors.




All commodities, all clusters



CME subgroups of the commodities



First 10 clusters



Our Knowledge Graph and AutoML technology accelerate common exploratory and modeling flows while taking into account economic relationships, making it easier for quants and discretionary traders to access, understand, and act on market-moving information: forecast price movements, supply and demand.

121 views0 comments