Automatic text summarization of Chinese legal information
A method of automatic text summarization of the legal information provided in Chinese has been developed. The model of the abstract and the procedure of his formation are considered. Two ap-proaches are proposed, namely, to determine the level of importance of sentences, it was suggested to proceed...
Saved in:
| Date: | 2018 |
|---|---|
| Main Authors: | , , , , |
| Format: | Article |
| Language: | Russian |
| Published: |
Інститут проблем реєстрації інформації НАН України
2018
|
| Subjects: | |
| Online Access: | http://drsp.ipri.kiev.ua/article/view/158214 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | Data Recording, Storage & Processing |
Institution
Data Recording, Storage & Processing| Summary: | A method of automatic text summarization of the legal information provided in Chinese has been developed. The model of the abstract and the procedure of his formation are considered. Two ap-proaches are proposed, namely, to determine the level of importance of sentences, it was suggested to proceed to determine the weight values of separate hieroglyphs, rather than words in the text of documents and abstracts. Also consideration of model of documents as networks of sentences for detection of the most important sentences on parameters of this network has been offered. A new hybrid method of automatic text summarization, covering statistical and marker methods, as well as taking into account the location of sentences in the text of the document is introduced. The offered model of the abstract reflects information need of customers during the work with legal information.The approach to determination of weight values of separate hieroglyphs, but not segmented words in the text of documents and abstracts is realized. This technique avoids the cost-effective procedure of the words segmentation needed for other meaningful methods of Chinese language processing.When summarizing the new idea of determination of weight values of sentences on the basis of weights of separate hieroglyphs, but not words as it is standard was realized. Therefore the quality of summarizing is checked not only proceeding from accounting of scales of separate hieroglyphs, but also taking into account scales of the whole words included in the documents and abstracts to be convinced that the offered approach is satisfactory also by criteria of traditional systems of summarizing.Application of two estimates of quality of the paper without participation of experts — a cosine measure and Jensen-Shannon divergence is shown. Summarizing on the basis of the offered network model of the document was the best by criteria of a cosine measure and Jensen-Shannon's distances for abstracts which volume exceeds 2 sentences. The offered approach taking into account little changes can be used for texts of any subject, in particular, of scientific and technical and news information. |
|---|