Recent work has demonstrated that large-scale pre-training on public datasets significantly enhances differentially private (DP) learning in downstream tasks. We analyze this phenomenon through the lens of representation learning, examining both the last and intermediate layers of neural networks. First, we consider a layer-peeled model that leads to Neural Collapse (NC) in the last layer, showing that the misclassification error becomes dimension-independent when the distance between actual and ideal features is below a threshold. Empirical evaluations reveal that stronger pre-trained models, such as Vision Transformers (ViTs), yield better last-layer representations, though DP fine-tuning remains less robust to perturbations than non-private training. To mitigate this, we propose strategies such as feature normalization and PCA on last-layer features, which significantly improve DP fine-tuning accuracy. Extending our analysis to intermediate layers, we investigate how DP noise affects feature separability in ViTs fine-tuned on private data. Using a representation learning law, we measure the impact of DP noise across layers and find that, without careful hyperparameter tuning, high privacy budgets can severely degrade feature quality. However, with optimized hyperparameters, DP noise has a limited effect on learned representations, enabling high accuracy even under strong privacy guarantees. Together, these findings highlight how pre-training on public data can alleviate the privacy-utility trade-off in DP deep learning, while proper architectural and optimization strategies further enhance robustness across different layers of the model.