![]() ![]() Please refer to Figure 2 from our research article for detailed and interactive performance data. Higher sequence lengths, such as 512, and batch sizes, such as 8, will yield up to 10 times lower latency and 14 times lower peak memory consumption. The figure below shows a performance report generated for this model on an iPhone 13 Pro Max with iOS 16.0 installed.īased on the figure above, the latency is improved by a factor of 2.84 times for the sequence length of 128 and batch size of 1 that were chosen for the tutorial. After clicking on the Performance tab, the developer can generate a performance report on locally available devices, for example, on the Mac that is running Xcode or another Apple device that is connected to that Mac. To verify performance, developers can now launch Xcode and simply add this model package file as a resource in their projects. Out_path = "HuggingFace_ane_transformers_distilbert_seqLen128_batchSize1.mlpackage" ane_mlpackage_obj. Import coremltools as ct import numpy as np ane_mlpackage_obj = ct. In order to begin the optimizations, we initialize the baseline model as follows: ![]() The same code is used to generate the Hugging Face distilbert performance data in the figures above. This tutorial is a step-by-step guide to the model deployment process from the case study in our research article. ![]() Tutorial: Optimized Deployment of Hugging Face distilbert Below figures are non-interactive snapshots from the research article for iPhone 13 with iOS16.0 installed: Please check out our research article for a detailed explanation of the optimizations as well as interactive figures to explore latency and peak memory consumption data from our case study: Hugging Face distilbert model deployment on various devices and operating system versions. Use ane_transformers as a reference PyTorch implementation if you are considering deploying your Transformer models on Apple devices with an A14 or newer and M1 or newer chip to achieve up to 10 times faster and 14 times lower peak memory consumption compared to baseline implementations.Īne_transformers.reference comprises a standalone reference implementation and ane_transformers.huggingface comprises optimized versions of Hugging Face model classes such as distilbert to demonstrate the application of the optimization principles laid out in our research article on existing third-party implementations. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |