NS, Fix Your Search Feature

Sngl2th

Member
I will keep this to the point.

NS, please fix your search. Let's say I want to watch Tanner Boudreau's season edit from a few days ago, but can't remember how to spell his French last name. I have to type 'Tanner' in the search box and select video in drop menu. Once I get results there is no way to order the results, such as by 'most recent,' like there is when you are just looking through random videos.

Also when you are looking at results from the menu, such as "Top Rated," you can't do a keyword search among those videos.

We should be also able to search by skier, uploader, etc., too. Let's get with the times.
 
First off: -> site discussion

Second: I'd prefer to see a small icon beside your name before you complain about the services of this site sent from the gods. (Why I'm not complaining)

Third: yeah those options would be sick
 
Did not know about Site Discussion, sorry. Like many people, I really just us the site to watch videos, so I wish the search worked a little better.

Obviously NS is the best website ever created, and better than equivalent websites for any other sport (that I know of).

P.S. I maintain my anonymity on here so I can say whatever stupid shit i want.
 
I don't really care about a ton of that stuff but I think it needs to display more then 30-40 results. If you search a mt you scroll through a bunch of old videos with maybe a few newer ones then the list starts repeating
 
To be honest, the search bar sucks.. People often scream 'SEARCHBAR!' when there are several threads about somesort of topic, but you really cant find it..

I just tried to search for 'top gear' because I wanted to post in the 'Official Top Gear Thread', I knew the words 'top gear' were in the title but nothing came up.. How is that even possible??
 
Best thing to do is to type "site:new schoolers.com" into Google and then type your search keyword.
 
13451916:eheath said:
Best thing to do is to type "site:new schoolers.com" into Google and then type your search keyword.

Honestly surprised ns doesn't just pay the fee to have this be the standard search method for the site
 
13453078:toastyteenagers said:
or we could just use those costomized Google searchbars for free

why arent we doing that again?

My favorite sketchy russian porn site has that. If those guys can do it, i'd hope NS could.
 
13452510:toastyteenagers said:
it would be very very hard to develop

like js, html, and maybe even some sq.

in short a dev clusterfuck

Good try but a search index wouldn't use any of those. Currently I'd guess the search is using the limited mySQL full text search. Really they need to implement a Apache Lucene index.

I've discussed how to do this several times, but yea, its a pretty big project.

Basically you would need to back index every thread (by page) thats ever been made. You would then need to update documents as new posts are made to threads. This would likely need to run on a separate server and be exposed through an API, to help isolate load. Videos, pictures, etc would also get their own seperate index.

Can it be done? Yes. Would it be more expense than what its worth? Probably.

A rough estimate for what I'd charge for this would likely be > 20,000$ (although once upon a time I offered to do it for free ;)
 
13462027:iLLbiLLy said:
Good try but a search index wouldn't use any of those. Currently I'd guess the search is using the limited mySQL full text search. Really they need to implement a Apache Lucene index.

I've discussed how to do this several times, but yea, its a pretty big project.

Basically you would need to back index every thread (by page) thats ever been made. You would then need to update documents as new posts are made to threads. This would likely need to run on a separate server and be exposed through an API, to help isolate load. Videos, pictures, etc would also get their own seperate index.

Can it be done? Yes. Would it be more expense than what its worth? Probably.

A rough estimate for what I'd charge for this would likely be > 20,000$ (although once upon a time I offered to do it for free ;)

I know you did. HOnestly we've used free help a few times and it always ended in disaster. Site of this scale its really difficult to actually trust someone to get all the issues with massive load that happens on things they build. Nothing against you - we've just been burned in the past and had hugely useless or insecure features built so its not something we take lightly.

And yes, probably like $20k to start, then its gotta scale and then you gotta keep it maintained across all sections not only forums.

THere's loads of great 3rd party systems that can assist with it, but even still its a freaking massive undertaking. Its always on 'the list' but there's other fish to fry currently.
 
13462066:Mr.Bishop said:
I know you did. HOnestly we've used free help a few times and it always ended in disaster. Site of this scale its really difficult to actually trust someone to get all the issues with massive load that happens on things they build. Nothing against you - we've just been burned in the past and had hugely useless or insecure features built so its not something we take lightly.

And yes, probably like $20k to start, then its gotta scale and then you gotta keep it maintained across all sections not only forums.

THere's loads of great 3rd party systems that can assist with it, but even still its a freaking massive undertaking. Its always on 'the list' but there's other fish to fry currently.

Yea I totally understand. Yet there are ways to do it without risking performance degradation or impact to the site. First off, as I mentioned you would put it on a completely separate service. A PaaS like Azure is a good choice because it's cheap and allows for unlimited scaling. Their cloud service and cloud storage is set up perfectly for something like Lucene. NS is fairly large, but still on the small side of what Lucene was meant to handle.

The API would be very minimal:

Id findDocument(String phrase, [options]);

void indexDocument(Document doc);

void deleteDocument(Id documentId);

Simple 2-way SSL or oAuth could be used to secure the service. You could leave findDocument public if desired so it could be called from other clients in the future (native iPhones).

For the initial index, you would take a backup of your database and index directly on server.

Before you ever make a single change to NS you do QA and load testing using something like Soap UI. Only actually hook it up to NS once your sure everything is behaving as expected. Once its up and running, costs would be minor.

It's been a while, but last time I looked at the 3rd party service they were limited and expensive.
 
13462101:iLLbiLLy said:
Yea I totally understand. Yet there are ways to do it without risking performance degradation or impact to the site. First off, as I mentioned you would put it on a completely separate service. A PaaS like Azure is a good choice because it's cheap and allows for unlimited scaling. Their cloud service and cloud storage is set up perfectly for something like Lucene. NS is fairly large, but still on the small side of what Lucene was meant to handle.

The API would be very minimal:

Id findDocument(String phrase, [options]);

void indexDocument(Document doc);

void deleteDocument(Id documentId);

Simple 2-way SSL or oAuth could be used to secure the service. You could leave findDocument public if desired so it could be called from other clients in the future (native iPhones).

For the initial index, you would take a backup of your database and index directly on server.

Before you ever make a single change to NS you do QA and load testing using something like Soap UI. Only actually hook it up to NS once your sure everything is behaving as expected. Once its up and running, costs would be minor.

It's been a while, but last time I looked at the 3rd party service they were limited and expensive.

We are aware of the available technologies and how to use them, and we even have plans for a new search system in mind. There are just other changes we need to do first.

A proper site-wide search is something we intend to add soon, likely something based on Solr or Sphinx. At the moment our problem isn't so much integrating with the search system itself, it's the fragmented nature of our content storage.

NS always had a Picture section, Video section, Forums, News, Reviews, Members+profiles, etc... Each of these sections should be searchable. If we implemented a site-wide search today, each section would need to separately maintain their data in the search system for additions, changes and removals. It's possible but would be a bit of a mess.

The change I referenced yesterday regarding the "content listing" will simplify this. Until 2-3 years ago, the homepage of NS was separated into modules for each sub-section. So there was the News slider, Video Of The Day, then lists of Blogs, Photos and Reviews. There was no crossover between site sections. When we moved to the block-type layout for the homepage we decided that content is content and they should all be listed together based on member activity. So we began to list all content from the whole site in one big mashed-up list. Today the homepage still does this, though the layout is visually more list-like.

The challenge to make this was to efficiently fetch content from the whole site in a reliable manner. Since the new homepage was an experiment we decided to just fetch content from each section as best we can and merge them in PHP. It's slow but would suffice to test the idea.

2 years later we've decided it was a good move, and needs to go further. Currently I'm working on building an internal "content list" that every site section hooks into. The content listing would maintain info like Rating and Heat for every single piece of content on the whole site. The end goal would be to provide lookup capabilities, like get the "Top Rated Pictures+Viedos+News by Mr.Bishop" or "Get Latest Content from Whole Site", and it would be reliable and extremely fast.

The benefit is obvious, however there are additional features we can build off this. Search is one of them. Since each section will maintain their listing in this new system, it gives us a single point where we can then push that information to a Search system.
 
13462393:nopoles said:
A proper site-wide search is something we intend to add soon, likely something based on Solr or Sphinx. At the moment our problem isn't so much integrating with the search system itself, it's the fragmented nature of our content storage.

NS always had a Picture section, Video section, Forums, News, Reviews, Members+profiles, etc... Each of these sections should be searchable. If we implemented a site-wide search today, each section would need to separately maintain their data in the search system for additions, changes and removals. It's possible but would be a bit of a mess.

Also check out elastic search. All of these technologies are basically an interface built on top of lucene (or similar tech). For my project way back when, none of these met our requirements at the right price-point, but that might not be the case today in a more competitive market.

I have no idea about your database schema (I assume it's fairly normalized with specific areas of denormalization for query optimization), but segmented data shouldn't be an issue. All of these tools support categorized search and likely would even allow you to store each type of content in its own index, with the option to search them separately or as a group.

While were on the subject of database schema... I've always wondered how NS handles optimization...

When you say retrieving content from "top rated videos" does that mean you have a separate table for top rated videos? Or are cumulative scores calculated and stored in an index column directly on the video table (as well as storing the individual votes in a separate table). Obviously this must happen in realtime... Is there a trigger on the "scores" table to update the total (where-ever it's stored)? Or is this done through the Application Layer (php)?

Feel free to ignore these questions... I've just always wonder about the implementation since I've spent more time on NS than any other site (by far) and it has such rich history. I know for a fact that the original foundation was shaky at best (still gotta give harvey credit for vision), and I've always wondered how much of that legacy still exists.
 
13462463:iLLbiLLy said:
I have no idea about your database schema (I assume it's fairly normalized with specific areas of denormalization for query optimization), but segmented data shouldn't be an issue. All of these tools support categorized search and likely would even allow you to store each type of content in its own index, with the option to search them separately or as a group.

It's not that we can't do that, it's just that I'd prefer to do it all from one place instead of scattered through various sections. Each section (Picture, Video, Gear, News, Frorum, etc..) will register their content in one unified list, which would then hook into things like search, content subscription notifications, etc...

13462463:iLLbiLLy said:
When you say retrieving content from "top rated videos" does that mean you have a separate table for top rated videos? Or are cumulative scores calculated and stored in an index column directly on the video table (as well as storing the individual votes in a separate table). Obviously this must happen in realtime... Is there a trigger on the "scores" table to update the total (where-ever it's stored)? Or is this done through the Application Layer (php)?

This is pretty much spot on. User submitted ratings are stored in a ratings table, then the Up and Down votes are stored in the various content tables. Up and Down votes trigger changes to the Heat and Rating values, which then are also stored in the content tables.

So right now getting "Top 10 Rated Videos" is easy, get rows from the Videos table and order by Rating. Top Rated Videos+News is harder. The table formats are different and the model classes for each perform different operations, so to build a "Top 10 Rated Video+News" list requires that we get the top 10 Videos and top 10 News, merge + sort by rating then discard the bottom 10. The next 10 requires that we get 20 from each, merge, sort, discard the top 10 and the bottom 20. This is obviously a problem as you go down the list or include more sections, hence the change I described earlier.
 
13462463:iLLbiLLy said:
I know for a fact that the original foundation was shaky at best (still gotta give harvey credit for vision), and I've always wondered how much of that legacy still exists.

Harvey was an absolute visionary, but the code started in 1999 in notepad. For a million reasons (read:time/money) we've never been able to nuke the whole thing and start over.

So when you see little bits that don't make sense, its because we had to take something designed to be a 2 story house and without interrupting the inhabitants too much - turn it into a skyscraper.
 
13462504:nopoles said:
It's not that we can't do that, it's just that I'd prefer to do it all from one place instead of scattered through various sections. Each section (Picture, Video, Gear, News, Frorum, etc..) will register their content in one unified list, which would then hook into things like search, content subscription notifications, etc...

This is pretty much spot on. User submitted ratings are stored in a ratings table, then the Up and Down votes are stored in the various content tables. Up and Down votes trigger changes to the Heat and Rating values, which then are also stored in the content tables.

So right now getting "Top 10 Rated Videos" is easy, get rows from the Videos table and order by Rating. Top Rated Videos+News is harder. The table formats are different and the model classes for each perform different operations, so to build a "Top 10 Rated Video+News" list requires that we get the top 10 Videos and top 10 News, merge + sort by rating then discard the bottom 10. The next 10 requires that we get 20 from each, merge, sort, discard the top 10 and the bottom 20. This is obviously a problem as you go down the list or include more sections, hence the change I described earlier.

Yea its a tough problem... Especially since the existing system is already in place. I've run into a similar issue in the past. Found this SO article really helpful, but the amount of refactoring required to get here might not be practical.

5lv3n.png


Exclusive Arc might be a more realistic solution (sounds like the direction you're already going)... single table for scores with a FK column for each content type table. Most DBA would advise against it, but in the real world sometimes you gotta just do what works.

13462521:Mr.Bishop said:
Harvey was an absolute visionary, but the code started in 1999 in notepad. For a million reasons (read:time/money) we've never been able to nuke the whole thing and start over.

So when you see little bits that don't make sense, its because we had to take something designed to be a 2 story house and without interrupting the inhabitants too much - turn it into a skyscraper.

Every project I've ever worked on (that wasn't from ground up) has suffered from the same problems... Software is hard and even if you develop with the best intentions to create scalable and low maintenance code, you'll always run into unforeseen changes. At the end of the day, all that really matter is if people use it or not.
 
13462617:iLLbiLLy said:
Yea its a tough problem... Especially since the existing system is already in place. I've run into a similar issue in the past. Found this SO article really helpful, but the amount of refactoring required to get here might not be practical.

5lv3n.png


Exclusive Arc might be a more realistic solution (sounds like the direction you're already going)... single table for scores with a FK column for each content type table. Most DBA would advise against it, but in the real world sometimes you gotta just do what works.

Every project I've ever worked on (that wasn't from ground up) has suffered from the same problems... Software is hard and even if you develop with the best intentions to create scalable and low maintenance code, you'll always run into unforeseen changes. At the end of the day, all that really matter is if people use it or not.

I did not realize this would turn into a genius level discussion on coding. Respect to all the work that goes into NS, the best website on the internet.
 
Back
Top