![]() ![]() Interpreter1 = tf.lite.Interpreter(model_path="/home/pi/google-kws/models2/crnn_state/quantize_opt_for_size_tflite_stream_state_external/stream_state_external.tflite") # Load the TFLite model and allocate tensors. Yeah you have to install full TF which is what I am using but in my code import tensorflow as tf I don’t think it really matters as that is what delegation is for when I am running TF-Lite with the Google-KWS 2 nodes are delegated out. I would have a read of as that also suffers from a ‘crackling’ sound but seems if you overlap slightly and feed a queue it can be done. I guess with the Pi4 it doesn’t matter so much but a whole rake of different frameworks can eat quite a bit of memory as opposed to several uses of 1. I think onnx training can be either as can be used with TF and pytorch and its down to how training and models have been implemented if static optimisation is implemeneted. Tensorflow tends to have faster optimised versions as its a static based lib vs dynamic libs like pytorch so its far less flexible and why pytorch garners so much research as no need to write and compile out due to its dynamic nature. ![]() If you have the process time of a Pi4 producing a approx 10 sec sentence it would be interesting to compare as model vs model / framework vs framework is so confusing and don’t think really there is any metric you can use. ![]() I presume the benefits of 64bit would be the same and also running ONNX mobile as to the above prob has similar results. ONNX Runtime mobile can execute all standard ONNX models but what that exactly means I don’t know as just scraping the tip with tensflow & tensorflow-lite and with flex-delegates all I have gathered is how confusing it is to delegate out and often how constraining the basic functions of the ‘lite’ runtimes can be. Its not all running on TFL as 2 nodes do delegate out to run TF but the speed increases are pretty huge. INFO: TfLiteFlexDelegate delegate: 2 nodes delegated out of 34 nodes with 1 partitions. When I have been playing with that Google-KWS it gives accuracy results for TF vs TFL but also the speed increase of a TFL quantised model is also really big. When you throwing around loads of tensors the wider the databus the more simultaneously you can handle.Īrmv7 -> Aarch64 really is 2-3x perf improvement as with TF its all been optimized for 64bit as it is faster due to it being predominately math libs. Not bothered which but was really curious on performance figures. Yeah I could not work the Vietnam thing out, so not being Vietnamese I was wasn’t so bothered. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |