Emerging complex engineered systems may have unexpected safety issues due to novel operational environments, increasing autonomy, human-machine interaction, and other factors. To prevent failures in operation or testing that necessitate costly redesign, it is desirable to predict likely failure modes early in the design process. Information about past engineering failures in natural language format presents one possible solution by enabling the retrieval of information that can inform new designs. However, identifying documents containing usable information and extracting the required information can be prohibitively time-consuming when implemented at scale. In this research, an automated natural language processing (NLP) framework is proposed to discover relevant knowledge from documents containing failure-related design information. The framework is applied to NASA’s Lessons Learned Information System (LLIS), which is publicly available. Documents containing usable information are filtered using two different NLP-based models. Next, from the identified usable documents, a failure taxonomy is extracted using a partitioned hierarchical topic modeling approach. Partitions of the document describe different sections of the failure taxonomy — i.e., failure, cause of failure, and recommendations — as indicated by the structure of the original document. The extracted failure taxonomy can be leveraged in early design failure assessment methods. Moreover, the framework can be used to identify documents containing usable failure-related design information from other databases and extract relevant information from these documents.