Places like Amazon, Facebook, and Twitter are swimming in data, but their problem is that a lot of it is untrustworthy and shilled. But you don't need to use all the data. Toss big data happily, anything suspicious at all, false positives galore accidentally marking new accounts or borderline accounts as shills when deciding what to input to the recommender algorithms. Who cares if you do?
Lately I've been thinking about recommender algorithms and how they go wrong. I keep hitting examples of people arguing that we should ban the fewest accounts possible when thinking about what accounts are used by recommender systems. Why? Or why not the opposite? What's wrong with using the fewest accounts you can without degrading the perceived quality of the recommendations?
The reason this matters is that recommender systems these days are struggling with shilling. Companies are playing whack-a-mole with bad actors who just create new accounts or find new shills every time they're whacked because it's so profitable -- like free advertising -- to create fake crowds that manipulate the algorithms. Propagandists and scammers are loving it and winning. It's easy and lucrative for them.
So what's wrong with taking the opposite strategy, only using the most reliable accounts? As a thought experiment, let's say you rank order accounts by your confidence they are human, independent, not shilling, and trustworthy. Then go down the list of accounts, using their behavior data until the recommendations stop improving at a noticeable level (being careful about cold start and the long tail). Then stop. Don't use the rest. Why not do that? It'd vastly increase costs for adversaries. And it wouldn't change the perceived quality of recommendations because you've made sure it wouldn't.
Amazon's top UK reviewers appear to profit from fake 5-star posts:
Amazon is investigating the most prolific reviewers on its UK website after a Financial Times investigation found evidence that they were profiting from posting thousands of five-star ratings.
Justin Fryer, the number one-ranked reviewer on Amazon.co.uk, reviewed £15,000 worth of products in August alone, from smartphones to electric scooters to gym equipment, giving his five-star approval on average once every four hours.
[...] Overwhelmingly, those products were from little-known Chinese brands, who often offer to send reviewers products for free in return for positive posts. Mr. Fryer then appears to have sold many of the goods on eBay, making nearly £20,000 since June.
When contacted by the FT, Mr. Fryer denied posting paid-for reviews—before deleting his review history from Amazon's website. Mr. Fryer said the eBay listings, which described products as "unused" and "unopened," were for duplicates.
At least two other top 10-ranked Amazon UK reviewers removed their history after Mr. Fryer. Another prominent reviewer, outside of the top 10, removed his name and reviews and changed his profile picture to display the words "please go away."
Amazon still hasn't fixed its problem with bait-and-switch reviews:
Like thousands of other parents, I decided to get my kids a cheap drone for Christmas. I spent $24 for a plastic flying machine with rudimentary collision-avoidance capabilities. A plastic cage mostly kept small fingers away from the four propellers. The kids were delighted for the first couple of hours.
[...] The kids enjoyed the drone so much in its few brief hours of functionality that I thought I might buy them another one.... If I did more research and spent a bit more money, I hoped I could find a higher-quality model that wouldn't fall apart after a few hours.
So I went to Amazon.com, searched for "children's drone," and sorted by "average customer review," figuring the best-reviewed drones were likely to be high quality. They weren't.
[...] "Absolutely love this honey," wrote one reviewer in the UK in March 2019. "It's quite different from any supermarket-purchased honey I've tried."