What is text generation?

Text generation, also referred to as natural language generation or NLG, is a subfield of natural language processing (NLP; which includes computational linguistics). Natural language processing is a technology which involves converting spoken or written human language into a form which can be processed by computers, and vice versa. Some of the better-known applications of NLP include:

Text generation is, in a sense, the opposite of NLP applications such as voice recognition and grammar checking, since it involves converting some form of computerized data into natural language, rather than the other way around.

Text generation is to be distinguished from superficially similar techniques, usually referred to by names such as "report generation", "document generation", "mail merging", etc. These techniques involve simply plugging a fixed data structure such as a table of numbers or a list of names into a template in order to produce complete documents. Due to their limited flexibility, they tend to produce rigid text, often containing grammatical errors (for example, "you have one selections remaining").

Text generation, on the other hand, uses some level of underlying linguistic representation of the text, in order to ensure that it is grammatically correct and fluent. Most text generation systems include a syntactic component, which ensures that grammatical rules such as subject-verb agreement are obeyed; and a text planner, which decides how to arrange sentences, paragraphs, and other components of a text coherently. Perhaps the first use of text generation was in machine translation systems, which analyze a text from a source language into a grammatical or conceptual representation, then use that to generate a corresponding text in the target language. Another early application was in expert systems, where the formal representations of rules and facts could be used to generate texts which explained the system's reasoning.




Back to The Nincompoop's Guide to Text Generation