Restaurant Search with Predictive Multispace Queries

Alexei Alexandrovich Yatskov • Keio University


Research Advisors

Yasushi Kiyoki • Yoshiyasu Takefuji • Kuniaki Mukai

This Presentation Online

https://foosoft.net/research/search/slides

Research Objectives

  • Develop an interactive search space visualization system.
  • Convey information which cannot be understood through text.
  • Build a vector space using geophysical and informational properties (AKA features).
  • Use live training data to improve the search efficiency.
  • Design a knowledge import method for accessing data for compatibility with existing database systems.

Conferences and Publications

  • Presented in Brussels, Belgium at iiWAS 2015.
  • Published by ACM (ISBN 978-1-4503-3491-4/15/12).

Why Change Search?

Sometimes too few results…

Sometimes too many results…

…and importantly…

  • User is limited to performing one query at a time.
  • Cognitive gap exists between query and results.
  • Difficult to leverage knowledge of peer users.

Search in the real world…

Search on the internet…

Root of the Problem

  • Difficult to qualitatively comprehend the data:
    • What are the contents of the search space?
    • Why do I get get unsatisfactory results?
    • How do I improve my search query?
    • What are similar users are searching for?
  • Frustrated users are left with few options:
    • Give up trying to find an optimal result.
    • Settle for search by trial and error.

Analogy: Find the Red Nose

Traditional Search

User executes many queries directly:

  • Server executes one search operation per query.
  • Finding a desired result may take a long time.
  • No feedback on how to improve search parameters.

Reactive Search

User executes many queries indirectly:

  • Server executes thousands of searches for the user.
  • Coarse representation of search space is created.
  • Possible to quickly narrow down on desired result.

Search Technology Overview

Search System

  • Client software implemented as an HTML5 web application.
  • Server software developed in Go, runs on Linux (Ubuntu).
  • Knowledge importer software written as a Go application.
  • Live search data used to group users with matching interests.
  • System scales to work with spaces of arbitrary dimensionality.

Web Frameworks

Bootstrap
user interface widgets
Handlebars.js
templating engine
SQLite
database engine
Snap.svg
vector scalar graphics library
Tinycolor
color transformation library
Underscore.js
JavaScript utility library
jQuery
DOM manipulation and AJAX

Network Architecture

  • Responsive interface realized through AJAX queries.
  • System implements a stateless, RESTful API.

Restaurant Data

Search Space

  • Restaurant data is represented in a six-dimensional space.
  • Vectors, known as features indicate content position:
    • Informational features
      • accommodating
      • affordable
      • atmospheric
      • delicious
    • Contextual features
      • accessible
      • nearby

Real Data Obstacles

  • Existing data must be converted to our semantic database format.
  • Restaurant rating and organisation schemes vary from site to site.
  • Companies have little interest in making their data accessible.

General Solution

  • Develop a custom, data-driven semantic knowledge importer.
  • Site structure and rating details are stored as profile data.
  • Profiles automatically selected based on provided review URLs.

Formalizing Navigation

  • Websites have index pages, review pages, and pagers.
    • Index pages contain links to a subset of review pages.
    • Pagers allow users to navigate between index pages.
    • Review pages contain detailed restaurant information.

Import Logic

Most review sites can be processed with a common method!

Selective Knowledge Extraction

  • Useful data is buried deep inside web site layout.
  • Apply knowledge selectors to extract it:
    • CSS selectors are used to locate an HTML element.
    • Attribute selectors are used to specify an element property.
    • Regular expressions are used to post-process element data.
[index.next]
    path = "div.deckTools.btm a.nav.next.rndBtn.rndBtnGreen.taLnk"
    attr = "href"
[item.count]
    path = "h3.reviews_header"
    regEx = "^(\\d+)"

Content Vector Projection

  • Ratings are mapped onto an orthogonal vector space:
    • Restaurant review sites have unique rating scales.
    • Profiles express ratings in terms of core features.
[item.props.service]
    accommodating = 1.0
    affordable = 0.0
    atmospheric = 0.0
    delicious = 0.0
    scale = 5.0
    path = "dl#js-rating-detail > dd:nth-child(4)"
                        

Sample Profile for “Yelp”

name = "yelp"
domains = ["www.yelp.co.jp"]
[index.items]
    path = "a.biz-name"
    attr = "href"

[index.next]
    path = "a.next"
    attr = "href"
[item.name]
    path = "h1.biz-page-title"

[item.address]
    path = "div.media-story address"

[item.count]
    path = "span.review-count > span"
[item.props]
    [item.props.overall]
        accommodating = 1.0
        affordable = 1.0
        atmospheric = 1.0
        delicious = 1.0
        scale = 5.0
        path = "#wrap > div.biz-country-jp > div > div.top-shelf > [...] > div > i"
        regEx = "^([0-9]*\\.?[0-9]+)"
        attr = "title"

Our Data Sources

…different rating systems, site designs, and users!

Tabelog

Yelp

TripAdvisor

Data Overview

  • Over 1500 unique restaurant entries converted and stored.
  • Over 5.5 gigabytes of web data scraped and geocoded.
  • Restaurants joined based on name, latitude, and longitude.

Problems Encountered

Data Representation

  • All data is stored on sever in an SQLite database.
  • Primary database tables are:
    • reviews ⇒ review data
    • categories ⇒ user categories
    • history ⇒ review access history
    • historyGroupshistory to category grouping

User Interaction

Rendering Dimensionality

  • Multidimensional spaces are difficult to imagine and visualize.
  • Parameterized projections of content and compatibility make dimensional simplification possible (AKA shadowing).

Analogy Using a 2D Vector Space

For each axis (feature):

  1. Select a static subspace.
  2. Represent the axis with
    a discrete function.
  3. Gradually mutate the axis
    value over the axis range.
  4. Plot simplified projection
    of the desired data
    (data pressure).

Projecting Data Pressure

  • Data pressure is displayed for each
    dimension as a gradient.
    • white ⇒ low pressure
    • gray ⇒ medium pressure
    • black ⇒ high pressure
  • User can dynamically change
    query parameters to modify the
    number of results.
  • Query resolution can be improved
    at the cost of performance.

Rejected Results are Important

Our system's secondary visual output increases user awareness.

Searching Content Space

Users browse restaurant reviews in a six-dimensional vector space.

Parameter: “nearby

  • Results vary in response to changes in position.
  • Location accessible through JavaScript
    • Hardware GPS enabled devices can get precise position.
    • Other devices rely on Geo-IP database for approximation.
  • Position is provided as latitude and longitude.
  • Linearly mapped between closest and farthest restaurant.

Compute Distance to Address

  • Google Maps Geocoding API was
    used to convert street address to
    latitude and longitude pairs.
  • The haversine formula was applied to
    efficiently approximate distance
    between the user and the target.

Parameter: “accessibility

  • Accessibility is the ease of getting to a restaurant location.
  • We assume that the user is traveling via public transit (train).
  • The user determines distance to walk from a JR East station.

Distance From Nearest Station

  • Distance from JR East stations precomputed for accessibility:
    • 623 stations were matched with restaurant locations.
    • Closest station name and distance is shown in results.

Problems with Rating Systems

  • User ratings are highly subjective and polarized; there are fewer negative reviews than positive and even fewer average reviews.
  • Interest is an improved metric for expressing quality; modern social media does not feature ratings.
  • We avoid ratings, instead grouping users with similar preferences:
    • Similar opinions often translate to a shared tastes.
    • Search based entirely on indirect recommendations is possible.

Defining Compatibility Space

  • Use the profile editor to answer questions:
    • Possible answers are agree, disagree, and neither.
    • Only need to answer questions that users find meaningful.
    • Unanswered questions are ignored in search.
    • Answer data is stored persistently in browser local storage.
  • Users are allowed to freely author new question categories:
    • Limited overhead as only data for answered questions is used.
    • Irrelevant questions can be skipped by other users.
  • Unused question categories can be trivially deleted.

Building Compatibility Space

  • Each question category represents a dimension in the system.
  • User answers correspond to axis values (limited to -1, 0, and 1).
  • Deleting a question is only possible when axis is not referenced.

Using Profile Data

  • User sends profile to server when accessing restaurants via AJAX.
  • Server uses profile data to position reviews in compatibility space.
  • Other users perform search, sending their profile with query.
  • Compatibility is computed between user profile and review.
  • Users can switch between content and compatibility visualization.

Map Display

Try it out!

Prototype application is publicly accessible:

https://search.foosoft.net

Complete source code available on GitHub:

https://github.com/FooSoft/search

Thank you!