Professor Choi Jung-wook’s research team from Hanyang University's School of Electronic Engineering has developed an algorithm for knowledge distillation to train quantization-aware models for generative language models, significantly reducing the inference cost of generative language models, as announced by Hanyang University on November 23.

Generative language 인어공주 토토사이트, including recent ones like ChatGPT, have garnered much attention for their performance that approaches human capabilities in various fields. However, when performing inference operations on generative language 인어공주 토토사이트, substantial storage space and computational costs are required. To address this issue, model compression techniques for generative language 인어공주 토토사이트 have been proposed.

Among these, the compression technique of Weight Quantization involves reducing the precision of the data used to store model weights while maintaining the structure of the 인어공주 토토사이트 model. Ternary Quantization, a form of weight quantization, represents the model's weights using only three values: -1, 0, and 1.

However, a limitation of the Ternary Quantization method is a significant drop in the model's original performance. To overcome this, techniques such as Knowledge Distillation have been actively researched, applying quantization-aware training to “student models” by using a “teacher model” originally composed of a 16-bit floating-point model. Nevertheless, even with the latest techniques, a decrease in accuracy remains a significant challenge.
    
The "Token-Scaled Logit Distillation" technology developed by Professor Choi Jung-wook's research team for ternary weight quantization-aware training is based on the characteristics exhibited by generative language models in response to quantization. This technology minimizes the impact of performance degradation caused by quantization compared to existing knowledge distillation methods, leading to higher performance improvements.

Generative language 인어공주 토토사이트 perform tasks during the learning process, predicting the next word in a sentence after each word. When observing the predicted probability distribution for a specific word, patterns often emerge where the probability of predicting a particular word is low, while probabilities for various other words are high.

Taking these patterns into account, the research team proposed a Token-Scaled Logit Distillation (TSLD) technique, adjusting the degree of knowledge distillation on a token basis dynamically. Applying the TSLD technique can prevent overfitting in ternary 인어공주 토토사이트-aware training, ultimately resulting in higher performance for the quantized model.

The research team validated the TSLD methodology through language modeling and common-sense inference tasks on various generative language 인어공주 토토사이트. The results showed that 인어공주 토토사이트 with ternary quantization exhibited the highest performance with less than a 1% accuracy drop compared to floating-point 인어공주 토토사이트. This consistent high performance was achieved across different types and sizes of 인어공주 토토사이트.

The research, titled "Token-Scaled Logit Distillation for Ternary Weight Generative Language Models," was a collaborative effort involving Hanyang University graduate school student Ph.D., Kim Min-soo (first author), Ph.D., Lee Si-hwa, Lee Jang-hwan, Hong Suk-jin, KT Enterprise's Manager Chang Du-seong, and Seoul National University Professor Sung Won-yong. It is scheduled to be presented at the prestigious international conference "Neural Information Processing Systems (NeurIPS) 2023" in December.

 

[Image Material 1] Professor Choi Jung-wook
[Image Material 1] Professor Choi Jung-wook
[Image Material 2] Schematic diagram of research findings
[Image Material 2] Schematic diagram of research findings

 

키워드

#SDG9
저작권자 © 토토사이트 무단전재 및 재배포 금지