According to the announcement of the Java project Vespa by Yahoo, it is intended to make it easier for software administrators to create applications that filter out real-time results from a large amount of data in order to present them to the user. While Hadoop and Storm, according to the announcement, help to process the data, the last step to deliver them is still a problem. Vespa wants to close this gap and is larger with regard to the code lines than any previously released open source project of Yahoo.
The software is used at Oath among others for Yahoo.com, Yahoo News or Flickr. It should deliver 9,000 times per second content and ads, with latencies in the tenths of a millisecond range. For Flickr, for example, Vespa handles hundreds of queries per second and roars a few billions of images. Via Yahoo Gemini, Vespa handles around three billion native ads per day.
The data and their calculations are distributed to Vespa on many machines, the software dispensing with a master that could become a bottle neck. Unlike conventional applications, Vespa does not drag the data into a stateless layer to process it, but it does the calculations on the data. For this purpose, the software manages clusters with many nodes, which redundantly distribute data in the background, provides new capacities, implements distributed low latency query and processing algorithms, takes care of the consistency of distributed data, and much more.
Vespa allows application developers to feed data and models of any size into the operating system and perform the final calculations at the desired time. According to Oath, this improves the user experience at lower costs and allows more complex responses because Vespa waives pre-computing responses to inquiries. Developers were more interactive by navigating in real-time and interacting with complex computations instead of starting offline jobs to check the results later. The code for Vespa is available on Github under Apache license 2.0.