Time Travel: 2014

Chapter 196 A friend comes from afar

How to express the attributes corresponding to numerical natural language?

The general approach of researchers is to vectorize or vectorize numerical languages.

Compared to scalars, vectors are quantities with direction.

In fact, this research direction is not new.

Lin Hui remembered that in his previous life, as early as 1975, researchers first proposed the vector space model (VSM), trying to use this model to process numerical natural language.

Lin Hui searched for relevant information and found that although this space-time was a little slower, the VSM vector space model method was also proposed in 1977.

The so-called VSM model may sound very high-end.

It's not that complicated.

The main idea is to assume that the semantics of a text is only related to the words in the text, ignoring the word order and the relationship between words, and then map the text into vectors through a method based on word frequency statistics, and finally use the distance between vectors Calculated to characterize the similarity between texts.

Calculate distance between two vectors?

This stuff is in high school textbooks.

It is estimated that ordinary candidates who have not forgotten the knowledge after the college entrance examination can use this model to calculate text similarity.

However, many high school students may not even know that what they are learning can do this.

(ps:...the things you learn in high school are very useful, don’t give up just because you can’t see the use for the time being)

Of course, it is precisely because the model is simple and efficient.

For a long time after the model was proposed, it was the mainstream method in the field of text similarity calculation.

But the model is not without its shortcomings.

The VSM-based method still has two shortcomings:

On the one hand, when the amount of text is large, the generated text vectors are very sparse, which leads to a waste of space and computing resources;

On the other hand, VSM ignores the relationship between words in order to achieve the effect of simplifying the model, but in many cases there is a connection between words, so it is unreasonable to simply think that words are independent of each other.

These two flaws are particularly fatal.

The first one directly affects the efficiency of similarity processing, and the second one directly affects the accuracy of word meaning similarity discrimination.

In this case, after the VSM model was used for a period of time, the researchers abandoned the model.

It is not clear yet what specific methods people use to calculate text similarity.

However, Lin Hui noticed that the email Eve Carly sent him previously did not mention vector-related content.

Researchers nowadays seem to have forgotten about vectorization.

Perhaps now it seems that using vectorization for natural language text processing seems to be a very retro research direction.

But in fact, there is still potential to be tapped in the direction of vectorization.

Text similarity calculation can be completely performed using distributed word vectors.

But it’s normal for people in this time and space not to know.

Lin Hui remembered that in his previous life, many important results related to natural language processing were produced in the two years of 2013 and 2014.

The previous life involved the architecture of the text similarity model.

The technology of distributed word vectors for calculating semantic text similarity was born in 2013.

In the previous life, it was after the advent of distributed word vectors that semantic text similarity made breakthrough progress.

This time and space rhythm is two years slower. It is normal that the application of distributed word vectors to calculate text similarity has not been proposed.

One step behind, every step behind.

If the pace is slower than the past two years, this time and space will undoubtedly lag behind in many aspects.

This is undoubtedly good news for Lin Hui.

Applying distributed word vectors to build a method for calculating text similarity is easier said than done.

However, this issue is actually quite complicated when elaborated in detail.

Therefore, Lin Hui did not reply to Eve Carly in the email.

If this time and space involves research on text similarity model architecture, it will be short-lived.

Then wouldn’t Lin Hui have an obligation to help?

It seems that the porter across time and space is going to be online again.

Of course, this kind of transportation is not free.

At the moment, Lin Hui is more concerned about his thesis.

In the case of directional deviations in relevant research, if Lin Hui really wanted to write a paper, wouldn't it be easy to publish several of them?

It is easy for Lin Hui to write a paper of this level.

Although Lin Hui did not go very far in his academic career in his previous life, he published about seven or eight papers in total.

There are several papers that are all in English.

In short, publishing a paper was something Lin Hui was already familiar with.

Under such circumstances, Lin Hui felt that he could easily earn the extra points required for a bachelor's degree from MIT.

Despite this, Lin Hui decided to meet and communicate with Eve Carly before working on matters related to the thesis.

After all, Lin Hui is not very clear about the specific progress of text similarity research in the Western world, so it would be embarrassing if he accidentally crashes.

The collision in business can be euphemistically called business competition.

An academic crash can be a lifelong stain.

Now Lin Hui just hopes to meet Eve Carly as soon as possible.

Fortunately, the meeting Lin Hui was looking forward to happened not long after.

Lin Hui met "Eve Carly" at Beiyuyubei International Airport.

Eve Carly was previously afraid that Lin Hui would not believe her identity, so she attached a bunch of certificates to prove her identity in the email.

Lin Hui had seen the photo of Eve Carly before.

I have to say that Eve Carly’s appearance is very recognizable.

She has long, blond, slightly curly hair, and her height is estimated to be around 176. Her body proportions are great, and her curves are very S.

Although looking at it with a critical eye, Lin Hui felt that the figure and appearance of "Eve Carly" in front of him seemed to be above 90 points.

The most important thing is to give people a very innocent feeling, a feeling that is not stained by fine dust.

Well, what can I say about this feeling? It's very protective.

But Lin Hui is not that calm yet.

She's just a woman, it will only affect his writing/coding speed.

"Eve Carly" seems to have not discovered Lin Hui yet.

Lin Hui walked up to meet her and took the initiative to say hello in English: "Are you Eve Carly? I am Lin Hui, welcome to China."

Well, Lin Hui can still handle these few sentences in English.

However, the man in front of him reacted obviously hesitantly.

Lin Hui felt very strange. Could it be that he had made a mistake?

Just when Lin Hui was confused, a voice suddenly came from behind.

"Are you LIN HUI? I'm Eve Carly, nice to meet you!"

Lin Chong thought, now he was embarrassed.

The first time I picked someone up, I got the wrong person.

However, it shouldn't be. The person in front of me has a very recognizable Western face, and she is exactly the same as the woman in the ID photo that Eve Carly sent earlier.

Lin Hui turned around in confusion, looked at the source of the sound, and saw "Eve Carly" again.

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like