In this talk, I will discuss some fundamental questions in modern machine learning: -What is the suitable model capacity of over-parameterized models? -What is the suitable function space for feature learning? -Which function can be learned by two-layer neural networks, statistical and/or computational efficiently? -What is the computational-statistical gap behind this? My talk will partly answer the above questions, both theoretically and empirically.