Last updated: June 13, 2023

Inside Carbonfact’s LCA Engine: Dealing With Missing Weight Data

See all references used for this article.

June 13, 2023

Measuring the environmental footprint of a clothing item requires a lot of data inputs. This includes the materials used, how the materials were made, where they came from, how the product was assembled, and the transportation method that was used, to name a few.

The Problem With Data Collection

It is virtually impossible to have primary data on all these factors. Especially if your supply chain is scattered across the globe, and you only rely on email communication. You can easily fall into the trap of wasting a lot of time collecting primary data without proper prioritization.

The Power Law: Focusing on the Important Inputs

Luckily, not all of the necessary inputs are of equal importance. There is a power law at play which shows that most of a product’s footprint comes from a few key data points. We’ll dive deeper into this power law in an upcoming article.

The Importance of Product Weight

The weight of a product is probably the most important input when calculating its environmental footprint. It has a direct influence on the amount of material needed, the required packaging, as well as the transportation cost.

Dealing With Missing Weight Data: An Inside Look at Carbonfact

At Carbonfact, we handle missing weights in the same way as we handle other missing information. Our approach involves utilizing available data whenever possible and carefully selecting appropriate constants when necessary. For weights in particular, we have several imputation methods that we try one at a time until one works.

Frame 1000008481 (3)

Using Machine Learning to Guess Product Weights

Our most sophisticated solution involves machine learning. We’ve trained a supervised machine learning algorithm to guess a product’s weight based on whatever information is available. For instance, in the case of shoes, the materials used for the outsole and the upper are strong variables.

Building Statistical Models With Available Data

Next, we try to leverage whatever weights are available. For instance, if a customer can provide weights in 50% of cases, then we use that information to build a statistical model. This method can be very simple: if dresses weigh on average 400 grams, with a sufficiently low variance, then that is a good heuristic.

Leveraging Existing Customer Data

If a customer is unable to share specific information, we try to look at what data is available for other existing customers. For example, we’ve accumulated a substantial repository of primary data on underwear.

The Last Resort: Fallback To Constant Values

Finally, if there is no data, we fallback to constant values from agreed-upon databases. On the one hand, as a good source for this is the PEFCR. On the other hand, our customers can also provide us with reasonable values via word-of-mouth recommendations. In any case, we would choose one or several values that would be applied to the whole catalog.

Conclusion: Embracing Uncertainty

Of course, all of these methods are imperfect in some sense. They all come with an amount of uncertainty. At Carbonfact, we like to keep track of these imputation decisions we make. A scientific way to do that is to quantify the uncertainty that ensues from each decision . For instance, instead of using a single value, we can indicate a product’s weight in a range.

We’ll take a look at uncertainty in the next article. Stay tuned!

Want to know more? Schedule a 1:1 demo with our team! Learn about our platform & services Get all your questions answered Learn about plans and pricing

Max Halford,

I’m Head of Data at Carbonfact, where we measure the carbon footprint of clothing items. Before that I worked for Alan, a health insurance company. My PhD topic was about applying machine learning – Bayesian networks in particular – to query optimisation in relational databases. My current areas of interest revolve around online machine learning , document processing , as well as tooling and good practices for data analytics and engineering.