Making Predictive Newtab smarter with behavior tracking

In the week since we initially released Predictive Newtab, we’ve learnt a lot from your feedback. The first version worked by gathering contextual data about what you were browsing as keywords, and then searching for those in your history (using the same search backend as RecallMonkey) for more relevant material. We realized there’s some downsides to that approach, especially for people who do not have well tagged bookmarks:

  • Gathering context is hard: Grabbing keywords from your current browsing session is a hard problem. We tried some approaches like part-of-speech tagging to weigh the importance of words in the title. But often, websites overload their titles and keywords to help with search engine optimization. For example, being on the site “CNN.com – Breaking News, U.S., World, Weather, Entertainment & Video News” doesn’t easily give away what you’re actually interested in. Of course, one possibility is to dig deeper into what content you’re browsing, but topic extraction is a hard problem, and not worth the effort if there’s better means to the end.
  • False negatives: I often browse Pandora and YouTube together. I find a good song on Pandora, I go check YouTube for its music video. If you look for typical keywords associated with Pandora (music, radio) and those associated with YouTube (video, broadcast), there will be no correlation. The only way to pick up on trends like these is to track the user’s behavior.

Our goal is to predict what the user will probably do next, and what better way to do that than keep track of what the user does. We do this by keeping track of how a user switches between tabs while browsing. As before, we respect your privacy and store this inside the browser. There is no communication with any web server. Your predicted results now draw from three different sources:

  1. Similar Bookmarks: This works much like before. These work based off tags in your bookmarks, so if you have bookmarks tagged well, this is likely to return the most relevant results. My results are because I have these other sites saved with the tag “news”.
  2. Frequently Browsed With: These are websites you browsed together with the site you currently have open. Predictive Newtab starts tracking this as of version 3, so it might take upto a week for these results to stabilize. In my case, it seems like I’m easily distracted when I’m trying to read the news!
  3. Other similar content: This runs a traditional keyword-based search like the last version and sorts the results by a combination of keyword matches and frequency of visit.

Additionally, we simplified the UI a little and added icons for a visual cue. You can go download the addon and try it out from here. Look out for more improvements in the future, and as always, we welcome your feedback, and the source code is open for you to have a look at. Happy browsing!

Abhinav Sharma on behalf of the Prospector team.