Time Travel: 2014

Chapter 349: Technical strength means that you can do whatever you want

Although the SimpleT software also did not annotate translation data in all languages ​​when it was architected.

But at least there must be annotations for translation data between common languages ​​such as Chinese, English, Russian, French, Spanish, and Japanese.

Even though the translation data between these languages, not all languages ​​can reach the scale of tens of millions of translation annotations.

But at least the Chinese-English and English-Chinese translation annotation data must be quite large.

Under this circumstance, Lin Hui estimated that the annotated data used by the SimpleT software in its previous life would be worth at least US$780 million today.

This is undoubtedly a considerable amount of wealth.

The most important thing is that even Lin Hui took the translation annotation data between these languages ​​in exchange for money.

This does not prevent Lin Hui from launching the SimpleT software into the translation market.

Well, although it's a bit of a profiteer's style.

But how to put it, it is normal to eat more than one chicken.

It can even be said that eating more than one chicken is a typical business feature in the Internet era.

Although it is unlikely that Lin Hui will suddenly be involved in translation annotation in the field of translation in the short term.

But the annotated data in Lin Hui’s hands goes beyond translation.

Let’s deal with this aspect with the natural language that Lin Hui is working on at this time.

Although Lin Hui mainly used unsupervised training to obtain a large amount of data and corresponding model training in the construction of previous generative text summary models.

But Lin Hui does have labeled data in the direction of natural language processing.

And it is extremely large-scale text annotation data.

This is a considerable fortune.

Although the value of this kind of text data annotation is definitely discounted compared to the data of bilingual translation text annotation (the threshold for annotation is higher).

But when it reaches a large scale, even ordinary annotated data is a fortune that cannot be underestimated.

Lin Hui estimated that it would be no problem to exchange tens of millions of dollars for some ordinary annotated data related to text summarization in this time and space.

If these annotated data are packaged in a certain way and you are lucky enough to meet some knowledgeable (yuan) goods (da) people (tou).

During business negotiations, if the negotiator is very good, it is possible to negotiate nearly 100 million US dollars.

If the annotated data is packaged in a certain way, Lin Hui estimates that it is no problem to swindle hundreds of millions of dollars.

What does it mean to package these annotated data in a certain way?

It is to beautify the quality of annotated data.

Strictly speaking, the same labeled data can also be divided into expert labeling and crowdsourcing.

The so-called "expert annotation" is not done by real experts.

“Data annotation” sounds very high-end, but what is it actually?

The data annotation process is often very complex, and when a large amount of data is involved, there will be high requirements on manual labor.

Although it cannot be said to be low-end, this kind of mechanical and complicated work at least has nothing to do with high-end. Professor Zhuan will definitely not do this work.

The so-called expert annotation is usually done part-time by hard-working algorithm engineers.

Or it may be annotated by a specialized algorithm data annotator.

The so-called data annotator is an emerging profession.

In the past life, with the advent of the era of big data and artificial intelligence, a new profession emerged on the Internet to cope with the work of data annotation - data annotator.

The job of a data annotator is to use corresponding tools to crawl and collect data from the Internet, including text, pictures, voices, etc.

Then organize and label the captured data.

The specific work processes of employees labeled in this data are generally very clear:

First, annotators are trained to determine the sample data that needs to be annotated and the annotation rules;

Then, the sample data is marked according to the pre-arranged rules;

Secondly, merge the annotated results.

Algorithmic data annotators are slightly different from ordinary data annotators.

Compared with general data annotation apes, algorithmic data annotation apes often complete the above steps.

It is also necessary to feed the model with annotated data and then debug the model.

Although the workflow only has this additional step, professional algorithmic data annotators are still rare.

It can also be seen from the tasks listed previously that the task of algorithmic data annotators is not only data annotation.

It is often necessary to further evaluate the algorithm model based on annotated data.

As a result, many times the requirement for secretary annotators is that these people should not only perform data annotation.

You also need to understand the corresponding algorithm.

People who meet these two conditions at the same time are often completely rare.

It is precisely because there are so few professional annotators.

Often expert-level annotation can only handle small amounts of data.

For large-scale data labeling and ultra-large-scale data labeling tasks, it is often difficult to rely on expert labeling.

For large-scale data labeling and ultra-large-scale data labeling, crowdsourcing is often the only option.

The crowdsourcing model is a service model that integrates scattered individuals (including part-time workers) and small annotation teams into the platform to complete a complete project.

The main advantages of this model are low cost and relatively flexible.

Although machine learning has been committed to long-term work in previous generations, it can replace expert labeling with crowdsourced data or simply unlabeled data.

But I really have to ask which one is more popular, expert annotation or crowdsourced data.

That is naturally the former.

In this case, it is easy to obtain a higher premium by exaggerating the proportion of expert annotations in the annotated data.

Sounds like a profiteer.

But that's just what it sounds like.

If there is really a big difference between the annotation data that Lin Hui praised so much and the current expert annotation level.

It's useless even if the forest dust blows through the sky.

After all, you need to be strong to make iron.

Since Lin Hui dares to claim that the proportion of expert annotations in the labeled data is high, he is naturally confident.

how to say?

In the next few years, even some data labeled by non-experts will be standardized and normalized in the industry.

In many cases, even crowdsourced data may not be worse than the level of expert annotation in the current industry that has not yet been fully standardized and planned.

Even if Lin Hui uses non-expert annotated data from the next few years to deceive people.

I think some people will buy it in this time and space.

There is no way, technology is strong and you can do whatever you want.

Usually, if you have money, you can do whatever you want.

But when you do whatever you want, others may treat you like a grandson.

With technology, you can do whatever you want.

But when you do whatever you want, others have to treat you as their grandfather.

Let’s take a look at how some manufacturers in later generations will celebrate their birthdays and grab the first release (not specifically mentioned, please don’t take the seats according to the number)

You can probably get a glimpse of the saying "grandfather status due to technology".

The 2021 time and space comes to the 2014 time and space.

Although the time difference between these two time and spaces is only seven years.

But this means that Lin Hui has at least seven years of information gap in most fields.

And the shortcomings in some aspects of this era have caused Lin Hui to have an information gap of far more than seven years in some areas.

Many technologies that were not so advanced in previous lives were ahead of their time when placed in this time and space.

In this way, Lin Hui can really do whatever he wants.

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like