Skip to content

facebookresearch/SustainableAI

Sustainable AI

The past 50 years has seen a dramatic increase in the amount of computing capability per person, in particular, those enabled by AI. Despite the positive societal benefits, AI technologies can come with significant environmental implications. To scale AI sustainably, we need to make AI and computing more broadly efficient and flexible. Going beyond efficiency, we look into designing and optimizing AI models and computing infrastructures across the lifecycle. This repository provides resources from our research innovations and artifacts to talks and education materials to sustainably advance towards artificial general intelligence.

Overview: Scaling AI Sustainably

The past 50 years has seen a dramatic increase in the amount of computing capability per person. Since the introduction of the first commercially available microprocessor -- the Intel 4004 released in 1971 – the number of transistors in a computer die has increased by over a million fold. Accelerating knowledge discovery in science and engineering demands even higher computation capability beyond what a single microprocessor can offer.

High-performance computing infrastructures, from supercomputers solving weather forecasting, molecular modeling, and other complex computational problems to AI training clusters, connect tens of thousands of processors using advanced networking gears to further scale up computation capability. For example, the Frontier supercomputer -- the world’s first exascale supercomputer -- comes with more than 37,000 Graphics Processing Units (GPUs) to deliver more than 1 quintillion calculations per second. And, to propel our reach to artificial general intelligence, AI superclusters, such as Google’s Tensor Processing Unit (TPU) based AI Hypercomputer or Meta’s Research SuperCluster (RSC), also rely on horizontal scaling to achieve high performance, scalability, and efficiency.

With data centers of domain-specific hardware specialization, the computing industry has achieved many orders-of-magnitude efficiency improvements through decades of technological innovations. Such efficiency improvement is key to keep the energy use of data centers globally constant. Between 2010 and 2018, the compute instances in global data centers increased by 5.5 times while the overall energy use increased by merely 0.06 times.

AI is revolutionizing the entire industry, from education and medicine to e-commerce, finance, and entertainment. OpenCatalyst uses AI to discover new electrocatalysis for efficient renewable energy storage. AlphaFold uses AI to predict protein structures rapidly that has the potential to revolutionize the entire biological science domain. FarmBeats uses AI to improve farming efficiency. AI is changing the way we live, learn, communicate, and interact with each other in a profound manner. Despite the positive societal benefits, the development of AI technologies necessitates an increase in data center energy use and associated greenhouse gas emissions, from the rising demand of computing resources. Between 2019 and 2021, the amount of data used for AI model training increased by more than two times, corresponding to a more than 20 times model size increase. In fact, since the inception of AlexNet in 2012, the number of parameters of the state-of-the-art AI models have been increasing exponentially into the scale of trillions of parameters and requiring terabytes of memory capacity for storage. The AI scaling trend is pushing the frontier of computing infrastructures.

The first step is to understand AI’s carbon impact across its lifecycle holistically. AI’s lifecycle carbon impact comes from use of AI systems, called operational carbon footprint, as well as from manufacturing of AI systems and data center construction materials, called embodied carbon footprint.

Operational Carbon Footprint

Operational carbon footprint is defined as the carbon emissions (defined as carbon dioxide equivalent (CO2e)) associated with the electricity consumed. For a key production recommendation model, the energy consumption breakdown is roughly 30:30:40 over the key phases of Data, Experimentation / Training, and Inference. For example, the operational carbon footprint of llama3 model training is estimated to produce 2,290 tonnes of carbon emissions using the GPU Thermal Design Power (TDP) of 700W and the average emission factor of the US grids. In reality, AI workflows typically involve dozens, if not hundreds, of training runs to produce a final model. Once a model is trained to meet the desirable accuracy level, it is further optimized to serve various product use cases. In case of the language translation model, the inference footprint can double the training carbon footprint over its entire model lifetime.

Embodied Carbon Footprint

AI’s carbon impact goes beyond emissions associated with the electricity needed to power the models (operational energy use) to include the embodied emissions of the required infrastructure, such as, semiconductor manufacturing, and steel and cement used for data center construction. AI systems used for model training and inference at scale come with manufacturing carbon emissions that are produced during the production of system hardware. Carbon embodied in AI system hardware constitutes a substantial portion to AI’s overall carbon footprint. Taking the multilingual translation task as an example, the model’s overall lifecycle carbon footprint is approximately four times higher than the operational carbon footprint of model training.

For consumer electronics, embodied carbon footprint is a more significant than operational carbon footprint over the computing device’s lifecycle, with a rough breakdown of 80 to 20. As consumer electronics evolve into wearables that germinate the next wave of computing, additional demand in data centers will be required to support the new computing paradigm. And, we expect a similar embodied to operational carbon footprint breakdown for wearables, such as smart glasses. Tomorrow, we will have augmented reality with contextual AI capability. This is an important time for us, computer system designers, to innovate with sustainability in mind. What do sustainability-first design principles look like for the next wave of computing?

Carbon Tools

CATransformers

https://github.com/facebookresearch/CATransformers

To enable carbon-guided AI model-hardware design space exploration, we design CATransformers. CATransformers employs joint neural network and hardware architecture search and optimizes for the total carbon footprint. By incorporating both operational and embodied carbon metrics into early design space exploration of domain-specific hardware accelerators, CATransformers demonstrates that optimizing for carbon yields design choices distinct from those optimized solely for latency or energy efficiency.

CATransformers_

ACT

https://github.com/facebookresearch/ACT

To enable carbon-guided computer system design, we design ACT --- an architectural carbon modeling tool --- to enable sustainable computer system design. ACT comprises an analytical, architectural carbon footprint model and use-case dependent optimization metrics to estimate the carbon footprint of hardware --- embodied carbon. ACT estimates greenhouse gas (GHG) emissions from hardware manufacturing based on workload characteristics, hardware specifications, semiconductor fab characteristics, and environmental factors.

ACT addresses a crucial gap in quantifying and enabling sustainability-driven hardware design space exploration for sustainability to be considered as a first-order design objective, alongside performance, power, and area.

Sustainable AI Landing Page 2

Carbon Explorer

https://github.com/facebookresearch/CarbonExplorer

To enable carbon-guided datacenter computing, we design Carbon Explorer ---

Sustainable AI Landing Page 3

Bending the Demand Curve with Efficiency

To develop AI -- this century’s most important technology -- sustainably, we must make AI and computing, more broadly, efficient and flexible. Ample efficiency optimization opportunities are present across the entire AI model life cycle, spanning data, algorithms, and system hardware.

Data efficiency:

AI models train on a large amount of data. When designed well, data scaling, data sampling and selection strategies can lead to faster training time and higher model quality at the same time. Realizing these data efficiency optimization potential is not straightforward in the world, where data formats are highly fragmented, data modality is diverse, and data quality of training samples is uncertain. This is why having a common metadata format for AI, such as Croissant, is of paramount importance to sustain the increasing infrastructure demands of data. Complementary to AI data optimization, the data storage and ingestion pipeline for AI demands significant power capacity. An optimized composite data storage infrastructure using novel application-aware cache policies can absorb more than 3 times higher IO than a baseline LRU flash cache, reducing power demand in a petabyte-scale production AI training cluster by 29%.

Model efficiency:

Making machine learning models parameter-efficient can contribute to more effective use of energy. Taking the family of foundational language models, Llama, as an example, LlaMA-13B outperforms GPT-3 (175B) for a variety of tasks and, at the same time, consumes approximately 24 times lower energy than GPT-3. Llama as a parameter-efficient model for language tasks is superior across the key design dimensions of accuracy, training time, energy consumption and operational carbon footprint. Model optimization for efficient inference is a fertile ground. By clustering redundant attention heads over tokens in Transformer-based large language models (LLMs), a simple inference-time model pruning technique, such as CHAI or LayerSkip, can improve inference efficiency significantly. Model efficiency is more important now than ever to help bend AI’s rising energy demand.

System efficiency:

By enabling agile design space exploration for large model acceleration on distributed systems, the MAD-Max performance modeling framework enables efficient acceleration. Machine learning researchers can use the framework to navigate the design space of model parallelization and hardware deployment strategies, by navigating at the pareto frontier of training time performance and energy use holistically. For the multilingual language model, a combination of data locality optimization, GPU acceleration, low precision data format use, and algorithmic optimization can effectively bring over 800 times energy efficiency improvement. In addition, optimization frameworks, such as Zeus, can enable model developers to navigate the tradeoff between operational energy consumption and performance optimization by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs, thus reducing inefficient energy usage in model training. Going Beyond Efficiency

Making AI more flexible and resilient to the changing environment helps. Scale matters. As the scale of model training increases from hundreds to tens of thousands of GPUs, training workflow failures occur more frequently, with substantial training time degradation. Thus, fault-tolerant, resilient distributed training frameworks are becoming more important, not only for large scale machine learning models but also for distributed training environments, such as federated learning at the edge. While existing software system stack does not yet support flexible computation shifting, when available, such flexibility feature can provide an important lever to application developers and data center operators to explicitly annotate and expose flexibility at the levels of functions, programs, services, or workloads. It enables better control and management decisions to dynamic signals, such as compute resource availability, electricity availability, or carbon intensity of energy for improved resiliency and sustainability.

AI can be part of the solution space by accelerating grid decarbonization and renewable energy storage technology advancements. A significant fraction -- 5 to 15% -- of renewable energy generated in the grids around the world today goes into waste due to curtailment. More accurate energy demand and emission forecasting methods provide power grid operators effective signals to improve grid-level energy demand-response management. However, emission forecasting is a dynamic yet complex problem space. Predicting energy demand in a highly-interconnected grid transmission network as well as renewable energy availability in the presence of changing weather conditions requires building and running computationally-intensive physical models and simulations. This is where AI can help -- to predict when and where renewable energy is stranded. With the capability to accurately forecast when and where renewable surplus happens can help improve renewable energy utilization, resulting in more effective demand-response management at the grid level and thereby accelerating grid decarbonization. What other complex climate problems can we leverage AI to solve?

Last but not least, lowering carbon embodied in hardware helps mitigate AI’s carbon impact. Taking an Apple iPhone3 from 2009 and iPhone11 from 2019 released a decade later as an example, the operational carbon footprint improved by 1.6 times while the manufacturing carbon footprint increased by 3 times. This is primarily driven by more advanced hardware architectures with a much larger collection of application-specific accelerators and a higher semiconductor manufacturing environmental footprint. More than 80% of the iPhone11’s lifecycle carbon footprint is embodied in the hardware while less than 20% comes from operational uses. This shift opens up a new design space for computer system designers. How do we minimize lifecycle carbon emissions of AI and computing through lowering carbon embodied in the hardware? What do sustainability-driven computer systems look like when we keep end-of-life processing in mind at the design time? How do we weigh and co-design operational and embodied carbon footprint to minimize AI and computing’s lifecycle carbon and environmental footprint?

Looking Forward

The past decades witnessed impressive technology advances that bootstrapped the AI revolution. The laser focus on performance and energy efficiency optimization has made computing capable, cost-effective, and prevalent. In the presence of the growth of AI and its applications, we must continue to focus on efficiency optimization across the AI model development cycle as well as across the hardware-software system stack holistically.

To sustainably scale this century’s most important technology, we must go beyond efficiency. We need AI to be flexible, be part of the solution space, and come with minimal manufacturing environmental footprints to achieve an environmentally sustainable future for computing. To do so, we must be able to measure. However, it is not going to be straightforward. There is a need to develop a standard, scientifically rigorous emission accounting methodology for AI. Having a common, easy-to-use carbon accounting standard is a key step to enable systematic and transparent assessment for modern AI use cases and incentivize meaningful climate actions.

Characterizing and analyzing carbon emissions is a complex process. While there are initial efforts, such as the MLPerf Power that proposes a way to measure power consumption of AI systems, there are not yet metrics or standard tools. This is an area where active research is needed. Carbon Explorer and ACT are just the initial steps to arm the computing industry, such as Meta, and the research community with design space exploration tools that put carbon as a first-class design principle. The role of carbon, along with performance, power, and cost efficiency opens new opportunities for the AI system stack design. While enabling a more sustainable scaling for AI, there are also plenty of impactful opportunities to deploy AI to tackle climate challenges.

Research Publications

Acknowledgement

We would like to thank many colleagues at Meta and collaborators from the academic research community and from the computing industry. It is upon us — each and everyone of us — to contribute to a sustainable future for computing and the society.

License

This repository is CC-BY-NC 4.0 licensed, as found in the LICENSE file. The tools in the sub-repositories have their own individual licenses.

About

Enabling sustainable AI with a collection of carbon modeling and optimization frameworks

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors