How Airbnb leverages ML to derive visitor curiosity from unstructured textual content knowledge and supply customized suggestions to Hosts
At Airbnb, we endeavor to construct a world the place anybody can belong wherever. We attempt to know what our company care about and match them with Hosts who can present what they’re in search of. What higher supply for visitor preferences than the company themselves?
We constructed a system known as the Attribute Prioritization System (APS) to take heed to our company’ wants in a house: What are they requesting in messages to Hosts? What are they commenting on in evaluations? What are widespread requests when calling buyer help? And the way does it differ by the house’s location, property sort, value, in addition to company’ journey wants?
With this customized understanding of what dwelling facilities, amenities, and site options (i.e. “dwelling attributes”) matter most to our company, we advise Hosts on which dwelling attributes to accumulate, merchandize, and confirm. We are able to additionally show to company the house attributes which are most related to their vacation spot and wishes.
We do that via a scalable, platformized, and data-driven engineering system. This weblog put up describes the science and engineering behind the system.
What do company care about?
First, to find out what issues most to our company in a house, we take a look at what company request, touch upon, and speak to buyer help about essentially the most. Are they asking a Host whether or not they have wifi, free parking, a non-public sizzling tub, or entry to the seaside?
To parse this unstructured knowledge at scale, Airbnb constructed LATEX (Listing ATtribute EXtraction), a machine studying system that may extract dwelling attributes from unstructured textual content knowledge like visitor messages and evaluations, buyer help tickets, and itemizing descriptions. LATEX accomplishes this in two steps:
- A named entity recognition (NER) module extracts key phrases from unstructured textual content knowledge
- An entity mapping module then maps these key phrases to dwelling attributes
The named entity recognition (NER) module makes use of textCNN (convolutional neural network for text) and is skilled and wonderful tuned on human labeled textual content knowledge from varied knowledge sources inside Airbnb. Within the coaching dataset, we label every phrase that falls into the next 5 classes: Amenity, Exercise, Occasion, Particular POI (i.e. “Lake Tahoe”), or generic POI (i.e. “put up workplace”).
The entity mapping module makes use of an unsupervised studying strategy to map these phrases to dwelling attributes. To realize this, we compute the cosine distance between the candidate phrase and the attribute label within the fine-tuned phrase embedding house. We think about the closest mapping to be the referenced attribute, and may calculate a confidence rating for the mapping.
We then calculate how continuously an entity is referenced in every textual content supply (i.e. messages, evaluations, customer support tickets), and combination the normalized frequency throughout textual content sources. Residence attributes with many mentions are thought-about extra necessary.
With this technique, we’re in a position to achieve perception into what company are fascinated with, even highlighting new entities that we could not but help. The scalable engineering system additionally permits us to enhance the mannequin by onboarding further knowledge sources and languages.
What do company care about for several types of properties?
What company search for in a mountain cabin is completely different from an city condo. Gaining a extra full understanding of company’ wants in an Airbnb dwelling allows us to offer extra customized steerage to Hosts.
To realize this, we calculate a novel rating of attributes for every dwelling. Based mostly on the traits of a house–location, property sort, capability, luxurious stage, and many others–we predict how continuously every attribute will likely be talked about in messages, evaluations, and customer support tickets. We then use these predicted frequencies to calculate a personalized significance rating that’s used to rank all doable attributes of a house.
For instance, allow us to think about a mountain cabin that may host six individuals with a mean every day value of $50. In figuring out what’s most necessary for potential company, we study from what’s most talked about for different properties that share these similar traits. The outcome: sizzling tub, fireplace pit, lake view, mountain view, grill, and kayak. In distinction, what’s necessary for an city condo are: parking, eating places, grocery shops, and subway stations.
We may instantly combination the frequency of key phrase utilization amongst related properties. However this strategy would run into points at scale; the cardinality of our dwelling segments may develop exponentially massive, with sparse knowledge in very distinctive segments. As an alternative, we constructed an inference mannequin that makes use of the uncooked key phrase frequency knowledge to deduce the anticipated frequency for a section. This inference strategy is scalable as we use finer and extra dimensions to characterize our properties. This permits us to help our Hosts to finest spotlight their distinctive and numerous assortment of properties.
How can company’ preferences assist Hosts enhance?
Now that we’ve got a granular understanding of what company need, we might help Hosts showcase what company are in search of by:
- Recommending that Hosts purchase an amenity company usually request (i.e. espresso maker)
- Merchandizing an current dwelling attribute that company are likely to remark favorably on in evaluations (i.e. patio)
- Clarifying widespread amenities which will find yourself in requests to buyer help (i.e. the privateness and talent to entry a pool)
However to make these suggestions related, it’s not sufficient to know what company need. We additionally should be certain about what’s already within the dwelling. This seems to be trickier than asking the Host as a result of 800+ dwelling attributes we acquire. Most Hosts aren’t in a position to instantly and precisely add the entire attributes their dwelling has, particularly since facilities like a crib imply various things to completely different individuals. To fill in among the gaps, we leverage company suggestions for facilities and amenities they’ve seen or used. As well as, some dwelling attributes can be found from reliable third events, akin to actual property or geolocation databases that may present sq. footage, bed room depend, or if the house is overlooking a lake or seaside. We’re in a position to construct a very full image of a house by leveraging knowledge from our Hosts, company, and reliable third events.
We make the most of a number of completely different fashions, together with a Bayesian inference mannequin that will increase in confidence as extra company verify that the house has an attribute. We additionally leverage a supervised neural community WiDeText machine studying mannequin that makes use of options in regards to the dwelling to foretell the chance that the following visitor will verify the attribute’s existence.
Along with our estimate of how necessary sure dwelling attributes are for a house, and the chance that the house attribute already exists or wants clarification, we’re in a position to give customized and related suggestions to Hosts on what to accumulate, merchandize, and make clear when selling their dwelling on Airbnb.
That is the primary time we’ve identified what attributes our company need all the way down to the house stage. What’s necessary varies drastically based mostly on dwelling location and journey sort.
This full-stack prioritization system has allowed us to present extra related and customized recommendation to Hosts, to merchandize what company are in search of, and to precisely symbolize widespread and contentious attributes. When Hosts precisely describe their properties and spotlight what company care about, company can discover their good trip dwelling extra simply.
We’re at the moment experimenting with highlighting facilities which are most necessary for every sort of dwelling (i.e. kayak for mountain cabin, parking for city condo) on the house’s product description web page. We consider we are able to leverage the information gained to enhance search and to find out which dwelling attributes are most necessary for various classes of properties.
On the Host aspect, we’re increasing this prioritization methodology to embody further ideas and insights into how Hosts could make their listings much more fascinating. This contains actions like liberating up widespread nights, providing reductions, and adjusting settings. By leveraging unstructured textual content knowledge to assist company join with their good Host and residential, we hope to foster a world the place anybody can belong wherever.
If this sort of work pursuits you, try a few of our associated positions at Careers at Airbnb!
It takes a village to construct such a strong full-stack platform. Particular because of (alphabetical by final title) Usman Abbasi, Dean Chen, Guillaume Guy, Noah Hendrix, Hongwei Li, Xiao Li, Sara Liu, Qianru Ma, Dan Nguyen, Martin Nguyen, Brennan Polley, Federico Ponte, Jose Rodriguez, Peng Wang, Rongru Yan, Meng Yu, Lu Zhang for his or her contributions, dedication, experience, and thoughtfulness!