This paper presents a set of data-driven methods for predicting nitrogen concentration in proton exchange membrane fuel cells (PEMFCs). The nitrogen that accumulates in the anode channel is a critical factor giving rise to significant inefficiency in fuel cells. While periodically purging the gases in the anode channel is a common strategy to combat nitrogen accumulation, such open-loop strategies also create sub-optimal purging decisions. Instead, an accurate prediction of nitrogen concentration can help devise optimal purging strategies. However, model based approaches such as CFD simulations for nitrogen prediction are often unavailable for long-stack fuel cells due to the complexity of the chemical environment, or are inherently slow preventing them from being used for real-time nitrogen prediction on deployed fuel cells. As one step toward addressing this challenge, we explore a set of data-driven techniques for learning a regression model from the input parameters to the nitrogen build-up using a model-based fuel cell simulator as an offline data generator. This allows the trained machine learning system to make fast decisions about nitrogen concentration during deployment based on other parameters that can be obtained through sensors. We describe the various methods we explore, compare the outcomes, and provide future directions in utilizing machine learning for fuel cell physics modeling in general.