A Formation of My Own Making — Part 2: The Wall

Part 2 - The Wall

Don’t have the time to read the entire article? You can listen to it here.

“The first step to understanding a system is to see that nothing works in isolation.” - Leonardo da Vinci, 16th Century.

In ancient Indian warfare, there is a legendary, multi-layered circular military formation known as the Padmavyuha. To the uninitiated, breaching the outer ring feels like an effortless triumph. The gates seem to open up, welcoming the enemy to advance. But the trap of the Padmavyuha is structural: as you advance deeper, the rings tighten, the rules change, and the very environment turns against you. It is a formation designed to let you enter with supreme confidence, only to wall you in once you are completely inside.

In Part 1 of this project, I experienced the ultimate outer-ring triumph. Using Claude Code and Gemini, I had built the individual pieces of an autonomous AI toy car with astonishing speed—all without writing a single line of code syntax myself. I had an iOS chat app running on device local LLMs, a toy car that could be driven via an iOS app using Bluetooth or Wi-Fi, and a YOLO (You Only Look Once) based object detection framework. These independent components were humming beautifully.

And the theory was clear, and I felt that familiar surge of tech-architect confidence. All that was left was to stitch everything into one unified system. I assumed integration was going to be the easy part. I was already planning version two before version one had moved an inch.

It was precisely here, at the gates of the inner ring, that the Padmavyuha closed in.

Technical lesson: Having working pieces does not mean you have a working product. Integration is where most software and engineering projects actually die. Each piece works perfectly in isolation, but the moment you connect them, timing issues, resource conflicts, and hidden architectural mismatches show up. Test integration early, not last.

Life lesson: Confidence from small wins is dangerous if you don’t pause. I had three apps working independently and assumed stitching them was a formality. It was not. The hard part of any journey is not just creating the individual milestones; it is making them work together cohesively.

Before stitching, I needed to decide how the car’s brain would actually think. This is where Apple’s hardware architecture gets incredibly interesting—and brutally complicated.

Apple gives you two distinct paths for running AI on-device. MLX is their open-source framework for Apple Silicon. It leverages the GPU and Apple’s unified memory architecture to run—and even fine-tune—models directly on the device. It is spectacular for LLMs and chatbots. The tradeoff is that large language models are computationally expensive. They keep the GPU busy, consume more power, and can drain the battery surprisingly quickly under sustained use.

Core ML is the other path. It is Apple’s production framework for deploying machine learning models on-device. Rather than targeting a single processor, Core ML can utilize the Neural Engine, GPU, and CPU, automatically selecting the most efficient execution path for a given workload. For real-time vision tasks such as face detection, object recognition, and image classification, it often takes advantage of the Neural Engine’s efficiency, making it far gentler on battery life.

For a toy car that needs to be always-on, utilizing both vision and speech conversation, a hybrid approach makes the most sense. You let Core ML handle the continuous vision pipeline—watching for a person entering the frame or detecting specific objects—because those tasks can run efficiently for long periods without generating excessive heat or power consumption. Then, only when something interesting happens, you wake up the heavier MLX-powered language model to perform deeper conversational reasoning.

And then there is YOLO for object detection, which you can deploy through either path. Running YOLO through Core ML is generally the more mature and battery-friendly option for production use. Once the model is trained and deployed, inference is efficient and predictable. Running YOLO through MLX gives you greater flexibility for experimentation, customization, and rapid iteration, especially if you are actively modifying models or exploring new architectures. The tradeoff is that the ecosystem is still evolving and can be less predictable for long-term mobile deployment.

That was the architectural design. In practice, I went with the hybrid setup: Core ML for the always-on vision layer and MLX for the conversational brain. Claude successfully built the unified iOS app that brought all three elements together into a single interface. And it worked. One single app running locally on an iPhone, handling vision, conversation, and motor control without relying on cloud services.

Technical lesson: Understand the native philosophy of the platform you are building on before you lay the first brick. MLX and Core ML exist for entirely different reasons with entirely different hardware trade-offs. Picking the wrong one costs you battery, performance, or both. Spend time understanding the fundamentals of the underlying hardware silicon before you start prompting your code.

Life lesson: “Understanding the silicon” is not just for computers. Every situation has an underlying “hardware”—the constraints of the room, the culture of the team, the limits of the budget. If you try to build a solution that fights against that hardware, you will fail, no matter how good your “software” is. Don’t fight the silicon. Build for it.

The app was ready. The hardware was ready. The car moved. The car talked, kind of.

I was at the final gate. All I had to do was put the iPhone on the car, turn it on, and watch it go.

But as I placed the phone on its mount and pressed start, the labyrinth finally spoke. And it didn’t use an LLM as I expected. It used a system alert.

You see, I have setup various modes in the iOS app to play with the car manually and autonomously.

Dance mode- Car would randomly show off its moves, not paying attention to the surroundings.

Follow me mode- Car would use the onboard ‘dfr1154’ camera and the YOLO model to follow me.

Explorer or wanderer mode- where it would keep going randomly while avoiding any obstacles using the single ultra-sonic sensor, again stopping only when an oral or a manual STOP command is received from the app.

Do you see the dependency here? The oral command requires me to speak to the iPhone. The manual commands work great only when the iPhone is in my hands. But why worry about it if I am able to communicate through speech?

Well, as the car moved a few feet, the phone screen timed out. And the communication channel between the car and the phone dropped. The car is now a real wanderer with no destination. It’s like the Voyager probe going through the interstellar, with the only difference that Voyager is still sending useful data back to us.

The car is stuck in a loop. The vision processing went dark. Motor controls froze. I had no more logical control over my car.

I tried every trick in the book to keep the iOS app alive. I added a “Background Mode” for external accessories. I implemented a silent audio loop to keep the process alive. I clicked “Always Allow.” Each time, I thought I had found a breach in the wall. And every time, the app stopped working once the phone screen got locked.

So I tried to improve the car’s reactive logic to maintain its dignity when the iOS app was no longer available to it. After all, it had its own tiny onboard brain.

I fought the wall the only way engineers know how: by adding more engineering.

Remember safe positions.
Rotate around itself and map possible escape routes.
Slow down to improve vision quality.
Add local voice commands for basic control.
Create multi-level recovery and escape maneuvers.

Each layer of logic created its own new failure mode. Every solution spawned another dependency. Every dependency introduced another edge case.

Nothing worked coherently.

Every feather I added made the wings look more impressive. None of them changed the fact that I was trying to avoid the obvious answer.

Sometimes the shortest path to autonomy is not fewer components. Sometimes it is the right components.

Daedalus showed all the workings of the flight to his son. He also made it very clear. Don’t fly too close to the sea water; the feathers would get wet. And certainly don’t fly too high; the bees wax would melt.

Unlike me Daedalus knew the limitations of his design. On the other hand Icarus did not see the consequences of defying the logic his father explained to him.

I was inside the inner ring. And the walls were moving and closing in.

This was the wall.

And in Part 3, we will see how to climb it. Or maybe, walk around it.

This is Part Two of a Three-Part series. If you have not yet read Part 1 - Enter the Outer Ring

Part 2 - The Wall

AI

Anatomy of a Prompt: Part 1 - The Machine

A Formation of My Own Making — Part 3: The Exit

A Formation of My Own Making — Part 2: The Wall

A Formation of My Own Making — Part 1: The Outer Ring

The Tragedy of Intelligence: From the Padma-Vyūha to the Age of AI - Part 3 of 3

The Tragedy of Intelligence: From the Padma-Vyūha to the Age of AI - Part 2 of 3

The Tragedy of Intelligence: From the Padma-Vyūha to the Age of AI - Part 1 of 3