In 2017 The Economist neatly encapsulated the current zeitgeist for Data Science with a leading feature titled “The world’s most valuable resource is no longer oil, but data”[1]. Whilst the focus of that article was on the Technology sector, when looked at within the context of commodity companies data represents a vast untapped resource to extract and monetise.


What is Data Science, and what can it do for me?


What is now referred to as Data Science can be thought of as an evolution of the statistics and Business Intelligence fields. This evolution has been driven by the increasing digitisation and interconnectedness of the global economy, and by the advances in computing technology that have enabled analysis of the resultant data explosion.


In plain terms, Data Science is about producing actionable insights from data.


Historically that mostly involved the use of descriptive statistics to explain past business performance. This is still an important component, but these days larger data sets and more sophisticated computing resources allow insights to be delivered in a more timely fashion and with increased impact via novel visualisation tools.


However, where modern Data Science is really making a difference is in the area of predictive analytics. This forward-looking approach uses computational statistics to find meaningful relationships between variables in large datasets. This process, broadly defined as Artificial Intelligence and made up of various Machine Learning and Deep Learning techniques, can be used in almost every area of business; from optimising operational processes, to modelling future demand more accurately, to accelerating resource discovery, and even in making predictions on the likely direction of commodity prices.

Figure 1. A comparison of Statistics/BI vs. ‘Data Science’

(Adapted from [2] & Revolution Analytics)


However, the embrace of Data Science by a company represents much more than just the rollout of a new technological toy. It goes to the very heart of the company’s analytical process. As described in the “Field Guide to Data Science”[2] it enables a move away from traditional deductive reasoning, where you form a hypothesis and then look to confirm it empirically with data, towards the inductive method where the data themselves are used to suggest new hypotheses. “By actively combining the ability to reason deductively and inductively, Data Science creates an environment where models of reality no longer need to be static and empirically based. Instead, they are constantly tested, updated and improved until better models are found”. In this manner, Data Science helps a company to continually seek out and discover new competitive advantage.


For commodity trading companies, the traditional competitive advantage emanating from a proprietary physical network of production, logistics and storage assets has been eroding in recent years as information leaks out via new technologies such as satellite imagery. There are two ways in which the embrace of Data Science will allow trading companies to resist this decline and re-establish the importance of owning the physical assets. Firstly, the application of advanced analytics on its proprietary data will further enhance the informational value extracted, especially when combined with other internal and external data to create deeper and perhaps unexpected insights.  Secondly, these advanced analytics can be performed on publicly available data to uncover trading opportunities that can only be monetised by those with the necessary physical assets ready to deploy.




How do I implement Data Science in my company?


For the Data Science approach to be effective it needs to be implemented from the top down.  This starts with the appointment of a ‘Chief Data Officer’ (CDO) who is responsible for overseeing all data governance aspects, before co-ordinating the data analytics activity. Good data governance means policies that effectively capture data, ensure its quality and manage access to it. Data often sits in disparate business silos, is commonly captured in different systems (if at all) and is often unstructured and inconsistently formatted. The first job for an incoming CDO is to map out these data sources and devise a plan for storing them in a uniform manner in a centralised ‘data lake’.


Before the magic of Data Science can be performed, a huge amount of preparatory work goes in to the collection and validation of data. Data needs to be formatted, sanity checked (human error may have included some erroneous values) and missing data points dealt with by exclusion or imputation methods. Data veracity is a tedious process but an essential one to avoid a ‘garbage-in garbage-out’ result. Therefore, be aware that there is a significant time lag between the hiring of data scientists and their ability to produce meaningful insights from the firm’s data.


Ultimately, once you have committed to integrating Data Science into your company’s processes, the key to getting results will be in establishing buy-in from your workforce. That doesn’t mean that every employee needs to become a Data Scientist, but they should understand the value of the company’s data resources and how the Data Science process can help them.  Defining and communicating the clear goals of the data science function (in terms of what it hopes to produce, and what it will need to get there) with the entire firm is critical on this path. These Data Science goals and the business model of the organisation will determine whether the function should be centralised, embedded throughout the firm with the domain experts (the commodity traders), or be a hybrid of the two.



Figure 3. An example of a hybrid organisational structure where a central Data Science function co-ordinates overall data governance and strategy, and individual data scientists are embedded at the product team level to act as a trusted bridge and translate central strategy into commercial success.


By virtue of their proprietary deal flow, entities in the physical commodity space will have a unique real-time read on resource supply and demand, and on economic activity. Whilst this ‘edge’ is routinely used for economic gain by individual silos, it is often fiercely guarded.  It is essential that these valuable internal data streams are codified and made accessible to the algorithmic muscle of the company’s data science function, as it is likely that their value will grow as the dataset expands with each additional internal and external source. However, let us not underestimate the complexity of putting these results to work within a commodity company. Common regressive approaches to machine learning assume independent and identically distributed variables, however the fact that commodity producers and traders are themselves part of the supply equation may invalidate this assumption. Furthermore, the logistical impracticality of altering production volumes may mean that some ‘insights’ (model outputs) are not, after all, ‘actionable’, at least not in the physical markets. This illustrates why legacy commodity teams will not be replaced wholesale by data scientists any time soon. Their domain expertise will be essential to the data scientist to translate the raw data source into usable structured data, to ensure practical real-world constraints are factored in to the models, and to intelligently apply their results.  Perhaps more so than in any other domain, the successful application of Data Science in the commodities market will necessarily be a collaborative effort with the legacy traders.


The journey to full data science ‘maturity’ will take the company through a number of intermediate stages, each one building on the preparatory work of the former and yielding increasingly sophisticated analysis. However, the target for senior management should not be a firm where the staff all become increasingly conversant with advanced statistics and computer science, as this is unrealistic. Rather, the goal should be the development of advanced ‘Data Products’ that allow non-technical users to harness the power of the sophisticated analytics that lie beneath.





As a recent article published by Refinitiv explains, demand for Data Scientists is accelerating and commodity trading houses, whilst a little slow to start investing in this capability compared to hedge funds and banks, are now starting to close the gap. Data Science promises wide ranging benefits, from operational optimisation to informational edge and new competitive advantages. Don’t get left behind!



[1]  “The world’s most valuable resource is no longer oil, but data”, The Economist, May 6th 2017
[2] “Field Guide to Data Science”, Booz Allen Hamilton, 2015.