TnT-LLM: Automating Text Taxonomy Generation and Classification With Large Language Models | HackerNoon
Briefly

The TnT-LLM framework presents an innovative two-phase approach aimed at bolstering taxonomy generation and text classification through LLM integration. Phase 1 employs zero-shot, iterative processes on a representative corpus subset to produce robust taxonomies. This is achieved using stochastic optimization techniques, allowing for dynamic and diverse sample handling. In Phase 2, a broader dataset utilizes the taxonomy for training lightweight classifiers, which can then provide labels for the entire text corpus. This workflow not only enhances classification efficiency but also facilitates real-time application.
The TnT-LLM framework improves taxonomy generation and text classification by combining a zero-shot approach for taxonomy creation with pseudo-labeling from LLM outputs.
In Phase 1, we utilize a representative subset for zero-shot multi-stage taxonomy generation, enhancing taxonomy quality through stochastic optimization to effectively manage corpus diversity.
Phase 2 focuses on deploying a lightweight text classifier trained on pseudo-labels derived from the LLM-augmented taxonomy, enabling efficient offline and real-time classification applications.
The iterative, prompt-based approach for taxonomy generation ensures adaptability, harnessing dynamic corpus properties while balancing cost-effectiveness and representative sample selection.
Read at Hackernoon
[
|
]