AKA "the type of shit Mousecore would rather do than their contest entry."
One thing I noticed about this quirky site is that the forums don't have a search feature! Maybe it's just a side effect of the place's age. I noticed that people usually just used Google searches set to only scrawl CYS for results, which is definitely an option. Out of sheer boredom and lack of motivation for my contest entry, I decided it would be kind of cool to have a dedicated CYS search.
So that's basically what I did this entire weekend.
Here is CYScraper.
Like a Google search, you can query anything directly from CYS' forums, but CYScraper lets you filter results based on author, excluded author (all posts made NOT by an author), forum(s), and date before/after/on, as well as if you want to search only post replies vs. only threads (i.e. thread titles). While I'm no Google algorithm engineer, I did try to create a VERY basic relevancy scoring algorithm that ranks posts by default based on a few factors.
You can also do a strict query (i.e. if you search for "heir", you will ONLY see "heir" and never "their") by wrapping your search in "quotations."
Oh, hey, and there's dark mode too.
Some Things to Consider
- THIS IS AN IMPORTANT ONE. The data I have on hand for people to search is very limited. I didn't want to scrape all of the CYS forums in one sitting because I didn't want to get my IP flagged or literally blow up the website. Right now, the data you can query through is any post made within the timeline given on the CYScraper site, as well as any post on the front page of any forum (so as an example even if the Reading Corner's first page has some threads from beyond the timeline, you can still search through those on CYScraper). Over time, if people see this as a legitimate tool, I will collect more data from further back to put into the search service.
- New posts are not automatically entered into the CYScraper database as soon as they are posted. They have to be scraped by my own service first in order to be queried on CYScraper. I am still in the process of figuring out an acceptable automatic scraping schedule that toes the line between having a good amount of fresh data vs. not harassing the site with requests. (I know it says it has a schedule on the CYScraper site itself, but that was an artifact from when I was still brainstorming how the hell I wanted to do this and I'm too lazy to update my entire repo just to remove that one sentence.)
- With that being said, the majority of performance/optimization decisions are made with favor towards CYS. CYS a small site and I don't need or want to flood its servers excessively just for a performance buff on my end, even if that means the tool gets data slower/less frequently.
- Downtime on the CYScraper page itself is definitely possible while I'm still figuring out its hosting configurations.
- The mobile version doesn't look super pretty. I will be going back to test the UI specifically on mobile, but it should look good on Chromium and Firefox desktop browsers.
So yeah, just a funky little project I made to test my Python skills and to solve an incredibly niche problem. Go ham, let me know if you find any problems, and if you have any suggestions, I'm all ears.