Alright, let me tell you about this little project I tackled the other day. I was messing around, trying to do some object detection using TensorFlow, and I thought, “Hey, why not try to detect Bryce Harper in some MLB photos?” Seemed like a fun way to kill an afternoon.
First things first, I needed data. So, I started by scraping a bunch of images of Bryce Harper from Google Images. I literally just googled “Bryce Harper MLB” and then used a simple Python script with `requests` and `BeautifulSoup` to download the first few hundred images. I know, I know, not the most ethical thing, but it was just for a personal project, right?

Next up was the really tedious part: labeling the images. I used LabelImg, a free open-source tool, to draw bounding boxes around Bryce Harper in each image. This took FOREVER. Seriously, I spent like three or four hours just drawing boxes. My eyes were crossing by the end of it. I tried to be as consistent as possible, but some of the images were really low-quality or had Harper in weird poses, which made it tough.
Once I had my labeled data, I needed to convert it to a format that TensorFlow could understand. LabelImg saves the annotations in XML format (Pascal VOC format), so I had to write another Python script to convert those XML files into TFRecord files. This involved a bunch of TensorFlow-specific stuff, like creating feature dictionaries and writing them to the TFRecord format. Honestly, I mostly just copy-pasted code from TensorFlow’s documentation and tweaked it to fit my needs. Worked like a charm, though!
Now for the fun part: training the model! I decided to use TensorFlow’s Object Detection API. I downloaded the pre-trained SSD MobileNet V2 model from their model zoo. Figured it’d be a good starting point, since MobileNet is relatively lightweight and fast. I then configured the `*` file, pointing it to my TFRecord files and specifying the number of classes (just one: Bryce Harper!). I also messed around with some of the hyperparameters, like the learning rate and batch size, but I didn’t really do anything too fancy.
Then I just fired up the training job using the TensorFlow Object Detection API’s training script. I let it run for a few hours on my GPU. The loss started to decrease, which was a good sign. I monitored it using TensorBoard. It wasn’t perfect, but it seemed to be learning something.
After letting it train for a while, I exported the trained model using the API’s export script. This gave me a frozen inference graph that I could use for object detection.
Finally, I wrote a Python script to load the frozen inference graph and run it on some test images. I just grabbed a few more images of Bryce Harper from the internet (you know, for testing purposes). The results were… okay. It detected Harper in some of the images, but it also missed him in others, and it had a few false positives. It was definitely far from perfect, but for a quick weekend project, I was pretty happy with it.
Here’s the thing I learned: data, data, data! I definitely didn’t have enough labeled data. A few hundred images just isn’t enough to train a robust object detection model. Also, the quality of my data wasn’t great. I had images with different resolutions, different lighting conditions, and Harper in all sorts of weird poses. If I wanted to improve the results, I’d need to collect a much larger and more diverse dataset, and I’d need to spend more time carefully labeling the images.

Would I do it again? Yeah, probably! It was a fun little experiment, and I learned a lot about object detection with TensorFlow. Maybe next time I’ll try detecting something else… like, I don’t know, hot dogs at a baseball game?
Lessons Learned:
- Data is king (and queen).
- Labeling is tedious but essential.
- TensorFlow’s Object Detection API is pretty cool.
- I need a better GPU.