Inside Carbonfact’s LCA Engine: Dealing With Missing Weight Data
Table of Contents
    Last Updated

    June 13, 2023

    Measuring the environmental footprint of a clothing item requires a lot of data inputs. This includes the materials used, how the materials were made, where they came from, how the product was assembled, and the transportation method that was used, to name a few.

    The Problem With Data Collection

    It is virtually impossible to have primary data on all these factors. Especially if your supply chain is scattered across the globe, and you only rely on email communication. You can easily fall into the trap of wasting a lot of time collecting primary data without proper prioritization.

    The Power Law: Focusing on the Important Inputs

    Luckily, not all of the necessary inputs are of equal importance. There is a power law at play which shows that most of a product’s footprint comes from a few key data points. We’ll dive deeper into this power law in an upcoming article.

    The Importance of Product Weight

    The weight of a product is probably the most important input when calculating its environmental footprint. It has a direct influence on the amount of material needed, the required packaging, as well as the transportation cost.

    Dealing With Missing Weight Data: An Inside Look at Carbonfact

    At Carbonfact, we handle missing weights in the same way as we handle other missing information. Our approach involves utilizing available data whenever possible and carefully selecting appropriate constants when necessary. For weights in particular, we have several imputation methods that we try one at a time until one works.

    Frame 1000008481 (3)

    Using Machine Learning to Guess Product Weights

    Our most sophisticated solution involves machine learning. We’ve trained a supervised machine learning algorithm to guess a product’s weight based on whatever information is available. For instance, in the case of shoes, the materials used for the outsole and the upper are strong variables.

    Building Statistical Models With Available Data

    Next, we try to leverage whatever weights are available. For instance, if a customer can provide weights in 50% of cases, then we use that information to build a statistical model. This method can be very simple: if dresses weigh on average 400 grams, with a sufficiently low variance, then that is a good heuristic.

    Leveraging Existing Customer Data

    If a customer is unable to share specific information, we try to look at what data is available for other existing customers. For example, we’ve accumulated a substantial repository of primary data on underwear.

    The Last Resort: Fallback To Constant Values

    Finally, if there is no data, we fallback to constant values from agreed-upon databases. On the one hand, as a good source for this is the PEFCR. On the other hand, our customers can also provide us with reasonable values via word-of-mouth recommendations. In any case, we would choose one or several values that would be applied to the whole catalog.

    Conclusion: Embracing Uncertainty

    Of course, all of these methods are imperfect in some sense. They all come with an amount of uncertainty. At Carbonfact, we like to keep track of these imputation decisions we make. A scientific way to do that is to quantify the uncertainty that ensues from each decision. For instance, instead of using a single value, we can indicate a product’s weight in a range.

    We’ll take a look at uncertainty in the next article. Stay tuned!

    Book your personalized
    demo today!

    {{item.prompt}}

    If you click on “Accept all” you agree to the use of these cookies. To find out more about the cookies we use, see our Privacy & Cookie Policy. Or, you can continue without agreeing .