Time Travel: 2014
Chapter 152 Eve Kali’s confusion (continued)
Chapter 152 Eve Carly’s Confusion (continued)
It is precisely because of the above reasons, no matter which time and space it is.
Many countries around the world are exploring text.
The progress of recording methods in human society is, to some extent, reflected in the different condensed forms of texts.
Text exploration is also an extremely important task for some large enterprises.
The development of text summarization determines the launch of one product after another.
The exploration of texts not only greatly promotes the in-depth study of literature, but also greatly promotes the advancement of science and technology.
All in all, it’s never too much to put some effort into your text summaries.
After all, this is Lin Hui’s first step in the field of technology.
Speaking of the confusion Eve Carly encountered.
Lin Hui did not expect that Eve Carly's confusion mainly focused on the construction of the LH text summary accuracy measurement model.
Lin Hui remembered that he had explained the model construction clearly enough at that time.
When building a model, you must first use a language model to evaluate the fluency of the language generated by the algorithm, then use a similarity model to evaluate the semantic correlation between the text and the abstract, and finally, in order to effectively evaluate the degree of recurrence of entities and proprietary words, introduce the original text information quantitative model to evaluate.
Although in order to prevent the disciples from starving to death, Lin Hui deliberately omitted some trivial steps between these steps.
But this kind of thing is to scientific researchers what trenches are to tanks.
Although there will be some impact, it should not be a big problem.
Really publish all the technical details.
That can’t be called announcing technical routes, it’s called compiling textbooks.
Regarding Lin Hui’s mention of “using language models to evaluate the fluency of algorithm-generated language”
Eve Carly was confused about how Lin Hui obtained the corpus for language model training?
This problem really won't be a problem in the next few years.
Because there are a lot of ready-made corpora.
For Simplified Chinese only, there are several resources such as the National Language Commission Modern Chinese Corpus, Beijing University Corpus, and Corpus Linguistics Online.
However, at this time and space node, Lin Hui obviously cannot tell other researchers that he is using a ready-made prediction library.
After all, some ready-made corpora were basically released around 16 years ago.
Nonetheless, the question of how to explain the source of the corpus does not trouble Lin Hui.
In fact, even if there is no ready-made corpus, it is not too complicated to build a usable corpus that can tune/teach early generative summarization algorithms.
The simplest way - text corpus can be automatically constructed with the help of the Internet.
When building a corpus using this method, the user only needs to provide the required text category system.
Then a large number of websites are collected from the Internet, and the content hierarchy of the website and the web content information corresponding to each keyword are extracted and analyzed.
Filter out the texts needed by users from each website as candidate corpus.
This process is actually not complicated, and is somewhat similar to the process of crawling a web page.
What is more difficult is how to denoise the corpus formed by this method.
But this is not a problem for Lin Hui.
It is only necessary to merge candidate corpora that match the same text category from multiple websites into a candidate corpus for each category.
Then denoising the text under each category in the candidate corpus can improve the quality of the corpus.
After denoising is completed, the corpus can be output.
Although this process is still not easy to implement.
But in the academic field, except for a few isolated experts who like to get into trouble.
In most cases, as long as the logic is consistent, no one will give in.
Apart from being curious about how Lin Hui constructed the corpus.
Relates to "Assessing semantic relatedness between text and abstracts using similarity models"
Eve Kali is curious about what kind of similarity model Lin Hui uses to evaluate the semantic correlation between text summaries and summaries.
Well, this question involves the core of the text summary accuracy model developed by Lin Hui.
The answer to this question cannot be explained in a few words.
You'll Also Like
-
Abnormal Food Article
Chapter 231 2 hours ago -
Peerless Tangmen: Dragon Bear Douluo
Chapter 153 4 hours ago -
Douluo: The Peerless Tang Sect dug out Yu Xiaogang
Chapter 212 4 hours ago -
Douluo started from being accepted by Bibi Dong as a disciple
Chapter 35 4 hours ago -
Douluo's super god level choice
Chapter 94 4 hours ago -
Douluo Continent on the tip of the tongue
Chapter 594 4 hours ago -
Douluo: My mother is the time traveler
Chapter 215 4 hours ago -
Douluo: Rebellious son of the Tang family
Chapter 668 4 hours ago -
Zhu Zhuqing of Douluo started to sign in
Chapter 149 4 hours ago -
Disabled Mr. Zhan is the Child’s Father, It Can’t Be Hidden Anymore!
Chapter 672 15 hours ago