Yandex has moved its search to a transformer neural network based text processing platform that has been emerging for 10 years. The company’s search engine is now better at determining the semantic interaction between requests and the quality of documents on the Internet.
14:37 GMT, Wednesday, November 25, 2020
Current technologies for text processing :
Yandex developed a new technology focused on transformer neural networks for text processing and moved the search engine to it. Transformers is the general term for the common architecture of neural networks that underlies current approaches to textual knowledge processing. For 10 years the company has been working on the development of YATI (Yet Another Transformer With Improvements, “Another transformer with improvements”).
According to the developers, thanks to the new technologies, Yandex’s search has learned much better to determine the semantic relationship between user queries and the content of documents on the Internet.
Yandex Managing Director Tigran Khudaverdyan said the company has been using artificial intelligence technology for over 20 years. Machine learning is the foundation of all services (search, ads, navigator, recommendations). The major breakthrough was the search technology portion, which is not visible to the eye. “The search for “Yandex” switched to new text analysis technologies based on huge neural networks, on transformer architecture,” he explained.
The transformer within is a sequential multiplication of matrices, says Yekaterina Serazhim, a specialist in ranking efficiency in Yandex quest. There is a single GPU card, but with it, you can’t read anything. A lot of GPU cards are required for you. The challenge of transmitting data across the network occurs as soon as there are more of them. “She said You need to physically bind these GPU accelerators together physically.
Now the corporation uses committees to achieve this. Eight GPU accelerators are packaged together on the host and are installed. In a rack, the servers are then lined up tight and wrapped in a net. “These are all big engineering tasks, building a cluster, connecting the servers to a network, providing cooling,” explained Serazhim.
Two stages of transformer learning are available. “The classic technique displays unstructured texts to them. Serazhim said, “We take the text, mask a certain percentage of words in it and make our transformer guess those words.”
The organization complicated the job for YATI by giving him not just the text of a random document, but also actual search requests and document texts that were used by real people.
We asked YATI to guess which document the user would want and which one they wouldn’t like. “We have a standard for this – this is our assessors’ expert markup, who evaluate each document on a complex scale, how relevant it is to the request,” Serazhim continues. Next, Yandex takes this data array and retrains the transformer to guess this expert evaluation – this is how it learns to rate.
An example of what such a transformer can do is given by Khudaverdyan. The technology, for instance, helps you to locate a film of only a tiny fragment by voice definition.
What problems did the developers encounter?
“In the company’s blog on habr.com, the head of the Yandex search group of neural network technologies Alexander Gotmanov spoke in more detail about the technology: “Second, the model learns from easier and cheaper tolokersky estimates the relevance we have in abundance, he said.
Then on the evaluations by more nuanced and costly evaluators.
Finally, the final criterion, which incorporates many factors at once, is educated and with which we measure the consistency of the ranking. You can obtain the best outcome for this form of continuous retraining. The whole learning process is assembled from large samples to small ones and from basic tasks to more complicated and semantic ones.
The task’s computational complexity is the challenge that emerges on the road to mastering a transformer. Gotmanov states that the latest models scale well in terms of consistency, but are millions of times more complex at the same time than those previously used in Yandex quest.
If a neural network can be educated in one hour ago, then it would take 10 years for a transformer to learn on the same Tesla v100 graphics accelerator,” he notes.”
That is, the problem cannot be solved in theory without the simultaneous use of at least 100 accelerators (with the probability of rapid data transmission between them). A sophisticated computer cluster and dispersed training on it should be introduced. “
Gotmanov has said that the model is currently trained on about 100 accelerators concurrently, which are physically placed on separate computers and connect across the network with each other The testing takes about a month.