The Origin: Moving Beyond Classification After building my initial image classification models, I wanted to tackle a significantly harder problem in Computer Vision: Generative AI and image-to-image translation. Instead of asking a neural network to output a single text label (like "Dog" or "Car"), I wanted to engineer a model capable of outputting a complete, high-resolution matrix of pixels. To test this, I set out to build an AI that could look at a black-and-white image and accurately guess its original colors. To broaden my technical toolkit, I decided to build this entirely from scratch using PyTorch.
Technical Execution: The LAB Color Space Advantage The most critical engineering decision in this project wasn't the neural network itself, but how I structured the data pipeline. If I fed the AI standard RGB (Red, Green, Blue) images, the model would have to predict three highly correlated channels simultaneously, which often results in washed-out, brownish outputs.
Instead, I used scikit-image to convert the training data into the LAB color space. In LAB, the 'L' channel represents Lightness (the black-and-white image), while the 'a' and 'b' channels hold the color information. This reduced the mathematical complexity of the problem by 33%: I could feed the network the 'L' channel as the input, and train it to solely predict the 'ab' color channels. I then normalized these channels to a strict range of [-1, 1] to ensure stable gradient descent during training.
Architecting the U-Net To process the images, I constructed a custom U-Net architecture. A standard Convolutional Neural Network (CNN) compresses an image down to extract its "meaning," but it destroys spatial resolution in the process—leaving you with a tiny, blurry output.
To solve this, my U-Net uses an encoder-decoder structure. The encoder shrinks the image to understand the context (e.g., recognizing that a shape is a tree), while the decoder scales it back up to apply the color. To prevent the final image from losing its sharp edges, I implemented skip connections within the forward() pass. These connections take high-resolution structural data from the early layers of the encoder and concatenate it directly into the decoder. This forces the AI to color perfectly within the lines.
Technical Growth and Takeaways This project fundamentally changed how I view artificial intelligence. By explicitly defining the nn.Module layer by layer—calculating strides, padding, and convolutional bottlenecks—I moved past utilizing AI as a "black box." It reinforced a vital lesson for my future in software and engineering: the success of a machine learning model relies just as much on clever data manipulation (like the LAB conversion) as it does on raw computational power.