原文作者	TensorFlow團隊	圖片	TensorFlow團隊提供主題圖片：Designed by Freepik
翻譯	宗諭	審閱	阿吉老師
說明	感謝TensorFlow團隊的授權翻譯，特此致謝！原文請見這裡。

We are excited to introduce a new optimization toolkit in TensorFlow: a suite of techniques that developers, both novice and advanced, can use to optimize machine learning models for deployment and execution.

我們很高興向大家介紹：新的TensorFlow最佳化工具包，它包含了一整套的技術，不論您是入門或進階開發者都可運用這套工具包來最佳化機器學習模型的部署與執行。

While we expect that these techniques will be useful for optimizing any TensorFlow model for deployment, they are particularly important for TensorFlow Lite developers who are serving models on devices with tight memory, power constraints, and storage limitations. If you haven’t tried out TensorFlow Lite yet, you can find out more about it here.

我們期待這個工具包對於部署任何TensorFlow模型，都能做到最佳化，然而，這個工具包對TensorFlow Lite的開發者來說尤其重要！因為，TensorFlow Lite的開發者需要在裝置上運作模型，這些裝置不論在記憶體、電源與儲存容量上都更為吃緊。如果您尚未試用過TensorFlow Lite，可在這裡了解更多相關資訊。

The first technique that we are adding support for is post-training quantization to the TensorFlow Lite conversion tool. This can result in up to 4x compression and up to 3x faster execution for relevant machine learning models.

這個工具包加入的第一項技術是針對TensorFlow Lite轉換工具的訓練後量化（post-training quantization)功能。針對相關的機器學習模型，這項技術最高可達到4倍的壓縮率，並使執行速度最高加快3倍。

By quantizing their models, developers will also gain the additional benefit of reduced power consumption. This can be useful for deployment in edge devices, beyond mobile phones.

藉由量化模型，開發者也將因電源功耗減少而得到額外益處。這有助於部署模型在非手機範圍的邊緣裝置上。

Enabling post-training quantization

The post-training quantization technique is integrated into the TensorFlow Lite conversion tool. Getting started is easy: after building their TensorFlow model, developers can simply enable the ‘post_training_quantize’ flag in the TensorFlow Lite conversion tool. Assuming that the saved model is stored in saved_model_dir, the quantized tflite flatbuffer can be generated:

啟用訓練後量化

訓練後量化技術已被整合至TensorFlow Lite轉換工具中。如何開始使用呢？很簡單！在建立TensorFlow模型後，開發者只要在TensorFlow Lite轉換工具中啟用「post_training_quantize」旗標就可以了。假設我們把模型放在saved_model_dir 這個資料夾中，就能產生量化後的「tflite flatbuffer」：

converter=tf.contrib.lite.TocoConverter.from_saved_model(saved_model_dir)
converter.post_training_quantize=True
tflite_quantized_model=converter.convert()
open(“quantized_model.tflite”, “wb”).write(tflite_quantized_model)

Our tutorial walks you through how to do this in depth. In the future, we aim to incorporate this technique into general TensorFlow tooling as well, so that it can be used for deployment on platforms not currently supported by TensorFlow Lite.

我們的教學文件會詳細告訴您關於「訓練後量化」這項技術。未來，我們希望把這項技術也納入一般TensorFlow工具中，使它能部署於一些目前TensorFlow Lite尚未支援的平台上。

Benefits of post-training quantization

4x reduction in model sizes
Models, which consist primarily of convolutional layers, get 10–50% faster execution
RNN-based models get up to 3x speed-up
Due to reduced memory and computation requirements, we expect that most models will also have lower power consumption

「訓練後量化」的效益：

模型大小最高可減少4倍
主要由卷積層所組成的模型，執行速度可加快10%-50%。
以遞迴神經網路RNN為基礎的模型，最高可加速3倍。
由於減低記憶體和低運算的要求日增，我們期待大多數的模型也能做到低功耗。

See graphs below for model size reduction and execution time speed-ups for a few models (measurements done on Android Pixel 2 phone using a single core).

關於降低模型大小與提高執行速度，請參考下面幾張圖表（使用Android Pixel 2 單核心手機）。

圖1 Model Size Comparison: Optimized models are almost 4x smaller 模型大小比較：最佳化後，模型大小幾乎小了4倍。

圖2 Latency Comparison: Optimized models are 1.2 to 1.4x faster 延遲比較：經過最佳化的模型加速1.2至1.4倍。

These speed-ups and model size reductions occur with little impact to accuracy. In general, models that are already small for the task at hand (for example, mobilenet v1 for image classification) may experience more accuracy loss. For many of these models we provide pre-trained fully-quantized models.

執行加速及降低模型大小對準確性影響不大。總體來說，原本就輕量化、專門針對行動裝置任務的模型（例如用於影像分類的mobilenet v1），準確性可能會損失較多。針對前述這些模型，我們提供預先訓練好的全量化模型。

圖3 Accuracy Comparison: Optimized models show negligible accuracy drop, except for mobilenets. 準確性比較：除了mobilenets外，其它最佳化後的模型，準確性幾乎沒有降低。

We expect to continue improving our results in the future, so please see the model optimization guide for the latest measurements.

我們未來會繼續改良。關於最新的量測結果請參考模型優化教學。

How post-training quantization works

Under the hood, we are running optimizations (otherwise referred to as quantization) by lowering the precision of the parameters (i.e. neural network weights) from their training-time 32-bit floating-point representations into much smaller and efficient 8-bit integer ones. See the post-training quantization guide for more details.

「訓練後量化」如何運作？

我們運作最佳化（又稱為量化）的方式，是藉由降低參數的精確性（例如，神經網路權重），使參數的模型訓練時間，從32位元浮點表示法降低為更小也更有效率的8位元整數。如果讀者想更深入了解，請參考「訓練後量化」教學。

These optimizations will make sure to pair the reduced-precision operation definitions in the resulting model with kernel implementations that use a mix of fixed- and floating-point math. This will execute the heaviest computations fast in lower precision, but the most sensitive ones with higher precision, thus typically resulting in little to no final accuracy losses for the task, yet a significant speed-up over pure floating-point execution. For operations where there isn’t a matching “hybrid” kernel, or where the Toolkit deems it necessary, it will reconvert the parameters to the higher floating point precision for execution. Please see the post-training quantization page for a list of supported hybrid operations.

這樣的最佳化，將確保在結果模型中精確性降低後的操作定義能符合內核實作，後者在實作混用了定點和浮點數學。這樣可在精確度較低的情況下做到最多的運算，但其中最敏感且精確性較高的模型，通常不會（或僅微量)因此導致精確度降低，反之如果是純浮點數執行的話，執行速度則有顯著提升。針對非「混合型」內核的運算，或工具包認為有必要，「訓練後量化」會再將參數轉換為較高精度的浮點數。關於「訓練後量化」支援哪些混合運算，請參考「訓練後量化」網頁。

Future work

We will continue to improve post-training quantization as well as work on other techniques which make it easier to optimize models. These will be integrated into relevant TensorFlow workflows to make them easy to use.

未來改進

我們將持續改善「訓練後量化」技術，以及其它模型最佳化技術。而這些技術將被整合至相關的TensorFlow工作流程，使大家能夠更方便使用。

Post-training quantization is the first offering under the umbrella of the optimization toolkit that we are developing. We look forward to getting developer feedback on it.

這套還在開發中的最佳化工具包中，「訓練後量化」是我們推出的第一項技術。期待收到大家的回饋。

Please file issues at GitHub and ask questions at Stack Overflow.

請在GitHub提出你們的建議，並請在Stack Overflow詢問問題。

TensorFlow團隊

感謝TensorFlow團隊的授權翻譯，特此致謝！原文請見這裡。

Enabling post-training quantization

啟用訓練後量化

Benefits of post-training quantization

4x reduction in model sizes

Models, which consist primarily of convolutional layers, get 10–50% faster execution

RNN-based models get up to 3x speed-up

Due to reduced memory and computation requirements, we expect that most models will also have lower power consumption

「訓練後量化」的效益：

模型大小最高可減少4倍

主要由卷積層所組成的模型，執行速度可加快10%-50%。

以遞迴神經網路RNN為基礎的模型，最高可加速3倍。

由於減低記憶體和低運算的要求日增，我們期待大多數的模型也能做到低功耗。

How post-training quantization works

「訓練後量化」如何運作？

Future work

未來改進

備註：若您想購買AI人工智慧相關產品，歡迎洽詢機器人王國商城。

相關文章

Trending Articles