As iOS engineers, we respect the science of training models, but we live in the trenches of inference. In this episode, we explore the "Great Divide" between creating a model and running it—breaking down the memory mechanics, compiler optimizations, and the reality of on-device execution.
⚡️The Great Divide: Science vs. Engineering
In traditional software, we have "Build Time" and "Runtime." In AI, this is "Training" and "Inference." Training is the Science Phase (Discovery). It requires infinite resources, massive datasets, and weeks of GPU time to produce a "Perfect Recipe."
Inference is the Engineering Phase (Logistics). It is the act of distributing that frozen recipe to millions of devices. We optimize for throughput, battery life, and latency. We don't experiment; we execute.
🧠The Memory Asymmetry
Why can an iPhone run Stable Diffusion but not train it? The answer is State.
- Training is O(N): To calculate gradients (Backpropagation), you must hold the entire history of every layer in memory.
- Inference is O(1): To predict, you only need the current layer's data. Once passed forward, memory is discarded/reused.
🛠️Inference Optimization
Since the "Recipe" is frozen, the compiler (Core ML) can cheat physics to maximize speed on the Apple Neural Engine.
Layer Fusion
Merging multiple operations (Multiply + Add + ReLU) into a single compute kernel to save memory bandwidth.
Pruning
Cutting "weak" connections (near-zero weights). The ANE skips these calculations entirely.
Quantization (Int8)
Dropping from Float32 to Int8 reduces model size by 4x and speeds up inference by acting as a hardware-level "unzipper."
🎯Key Takeaways
- •Respect the Divide: Don't try to "learn" large concepts on device. Front-load the heavy lifting to the server (Training) so the device can run efficiently (Inference).
- •Watch the Ping-Pong: Use Instruments to ensure your model stays on the ANE. If it bounces between GPU and ANE, synchronization costs will kill performance.
- •Quantization is Mandatory: Int8 is the standard for mobile. The accuracy loss is usually negligible compared to the speed/memory gains.
- •Personalization Exception: The only valid on-device training is "Fine-Tuning" (like Face ID) or Federated Learning, where privacy is the architectural goal.
About Sandboxed
Sandboxed is a podcast for people who actually ship iOS apps and care about how secure they are in the real world.
Each episode, we take one practical security topic—like secrets, auth, or hardening your build chain—and walk through how it really works on iOS, what can go wrong, and what you can do about it this week.