Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by NCommander on Friday June 06 2014, @08:00PM   Printer-friendly
from the seeing-how-big-our-userbase-is dept.
So, right now, I'm currently sitting with mrcoolbp and martyb in meatspace working out the finer points of incorporation, and the future needs of SoylentNews. One thing that has come up is we really don't have a great idea of our actual usage numbers are. Slashcode has decent internal numbers which give us some rough numbers, but they're only really valid for logged-in users (which bypass the varnish cache), and we're not 100% sure they're accurate anyway. According to slash, we're averaging approximately 50-60k page views per day (I've included the statistics email below), but it doesn't help us in knowing what AC usage look like. According to varnish, we average roughly 400-500k connections per day, but that number is inflated since we're not using keep-alive or HTTP pipelining as of yet.

Furthermore, since we don't log IP addresses in access.log, and IP's run through Slash are turned into IPIDs, its hard to get an idea of where our userbase is (the general feeling is the vast majority of us are based in the United States, but even then, that's more because our peak hours of traffic are between 4 and 10 PM EST). We've wanted to get a better idea of what our traffic and userbase are, so we're asking permission from the community to install piWik, and embed its javascript tag in the footer of each page, which will give us a wide berth of solid information to work from.Our plan is to setup piwik on a separate server, and have it available at stats.soylentnews.org, which can easily be killed via a hostfile. Furthermore, piwik honors the Do-Not-Tracker header for all web browsers except IE10, allowing easy opt-out. I can understand that a lot of users have concerns about any tracking, but we're trying to be upfront and honest about this, so no one gets hugely surprised. While we might post general information (i.e., usage from countries, user agents, etc) that piwik generates, we will purge IP addresses out of the piwik database as soon we're able, to limit the amount of personal information we're keeping about any user. While we're running piwik, we'll have a persistent notification in the "Site News" slashbox that collection is ongoing which will link to this post.

I'd like to get this setup over the weekend, and start collecting information by Sunday at the latest, then run collection for a few weeks. After that, we'll remove the tracking code, publish the results, and purge the piWik database of all personal information. We'll likely periodically re-enable stat tracking to get an idea of how we're doing, with a similiar notification post going up before we do so to give people the chance to opt-out before collection. Obviously, if the community feels dead-set against this, we'll abandon this plan, and simply work with what little information we have available.

SoylentNews Stats for 2014-06-05

                   UIDs      IPIDs      Pages
        total:        -          -      57452 (1341.1 MB)
 static total:        -          -       3822
gstatic total:        -          -       5972
  grand total:      892       4549      59666 (1561.6 MB)
 secure total:        -          -          0
sbscrbr total:        -          -          0

        posts:      153        219
     comments:      437       1546      19402 (330.4 MB)
        index:      726       2319       9107
     articles:      683       2860       9889 (373.1 MB)
       search:       11         92        209 (5.7 MB)
     journals:       43         98        229 (6.2 MB)
        users:      109        161        593 (15.9 MB)
          rss:       46        362       2214 (220.6 MB)
        other:      217        700      18023 (173.3 MB)


     formkeys:      487 rows total
     comments:      573 posted yesterday
  submissions:       16 submissions
 sub/comments:     31.2% of the submissions came from comment posters from this day



    not found:     4769 pages sent with status 404 (not found)

   total hits: 140856136





------------------------
                            Yesterday   | 2 days ago | 3 days ago
    Avg Hits Per Article:          706.4|       690.1|       629.9
Avg Comments Per Article:           30.4|        32.1|        18.4



Pages From RSS By Section
------------------------------------------------
Section		         Pages     UIDS    IPIDS
           Main Page      2508       87      539



For Main Page
                  Pages      IPs   Bandwidth    Users
        total:    57452     4353   1341.1 MB      885
        index:     9107     2319    436.5 MB      726
     comments:    19402     1546    330.4 MB      437
     articles:     9889     2860    373.1 MB      683
       search:      209       92      5.7 MB       11
          rss:     2214      362    220.6 MB       46
        other:    18023      700    173.3 MB      885


-----------------------

Top stories viewed by article.pl:
   883 14/06/05/0025257 n1         First-Person Shooter Engine in
   789 14/06/04/2126226 n1         Apple CEO Says Users Buy an An
   708 14/06/05/0132243 n1         Seattle Approves $15 Minimum W
   617 14/06/05/0121251 n1         Tesla S Road Trip Report
   578 14/06/04/2131208 n1         Intel Wants Your Next PC to Ha
   468 14/06/05/1256249 Woods      Dwarf Fortress Update Coming N
   453 14/06/04/1343246 janrinok   ISPs Urged to Quarantine Infec
   332 14/06/05/1418207 martyb     Computer Programs Are People,
   328 14/06/05/133219  Woods      FBI Offers $10,000 Reward For
   261 14/06/04/1329216 janrinok   Underwater Sound Examined for
   259 14/06/05/1419254 janrinok   How to Spend $750 for One Minu
   252 14/06/05/133201  LaminatorX Apple to Allow Virtual Currenc
   230 14/06/04/1310212 janrinok   Pixar Releasing its 3D Renderi
   225 14/06/04/1337244 janrinok   Vincent van Gogh's Severed Ear
   217 14/06/05/1315234 LaminatorX High Brain Integration and Cre
   194 14/06/04/1212207 martyb     Google Trying Out End-to-End E
   178 14/06/04/111243  LaminatorX Domestic Terror Task Force is
   155 14/06/04/1315250 janrinok   Ambulance Drones Might Appear
   141 14/06/03/211257  n1         What's Lost as Handwriting Fad
   139 14/06/04/1059208 LaminatorX Learning to Eat Vegetables in
   125 14/06/03/2048227 n1         Battlestar Galactica Reboot
   122 14/06/04/0527240 LaminatorX Windows Start Menu Won't Retur
   118 14/06/05/149215  janrinok   British Recording Industry Thi




-----------------------

Top referers:
84  http://www.netvibes.com
67  http://feedly.com
61  http://www.google.co.uk
42  http://google.com
38  http://barrapunto.com
30  https://www.google.com
29  http://7rmath4ro2of2a42.onion
29  http://maps.google.com
27  http://www.newsblur.com
22  http://www.inoreader.com
19  http://www.protopage.com
15  http://t.co
14  http://li694-22.members.linode.com
14  http://sylnt.us
14  http://theoldreader.com
10  http://www.google.com
9  http://www.jaruzel.com
7  http://hager.pipedot.org
7  http://pi.local
6  http://www.igoogleportal.com
 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by bzipitidoo on Saturday June 07 2014, @06:08PM

    by bzipitidoo (4388) on Saturday June 07 2014, @06:08PM (#52677) Journal

    This seems a question about 2 related things. Statistics is only the surface. It's really about how to fund a news web site, and how to keep it honest, keep it from degenerating into a propaganda organ. How to do that I don't know.

    It is amazing and scary just how badly compromised much of our news reporting is these days. An example of this is the treatment of Noam Chomsky. Mostly, he is simply ignored. His writings appear only in fringy sorts of Internet sites. At first read, he seems like he could be a lunatic. That seems a more likely proposition than that all of mainstream media is so badly corrupted that they could and would slant coverage of major events like the current mess in the Ukraine. The mainstream media take on the Ukraine is that Russia has reverted to form, the evil empire is back and eager to expand. And Putin has irrationally decided he just plain doesn't like the West anymore, if he ever did to start with. He doesn't have any good reason for this, it's all about greed and glory. This narrative doesn't pass the smell test. It's also very dangerous. What if Putin decides he can't shake this image, can't get a fair hearing in the western media? If he's painted as a bastard no matter what, why even try to be nice? What isn't reported is that the US engineered the coup in the Ukraine, putting in power these ultranationalists who are so extreme even many Nazis didn't want to associate with them. It is these extremists and their attempt to purge the Ukraine of all Russian people that pushed matters to the point that large sections of the country went into open rebellion. That story makes a lot more sense than the mainstream media line of basically "stuff happens", and that the fighting broke out just because, and then Russia seized on it as an opportunity and made things worse. It happened because it's man's nature to fight, or some such implication.

    Our media has meekly gone along with other propaganda campaigns, like the smearing of the leftist government of Venezuela as economic incompetents, crazy business bashing thieves of the rich, and so on. Why? Follow the money.

    As to funding, the trouble I've had with ads, and why I always end up just blocking everything, is that they always push too far. I don't mind a few ads, I really don't. I do mind when a significant % of my bandwidth is being used for extremely annoying, loud video advertisements, or ads that cover what I'm trying to read, or distract me with constant flashing of bright colors and motion. Marketing bosses haven't figured out that getting in people's faces is suicide. Or they just don't care, so long as they can count an ad blocked the same as an ad read, and have their bogus numbers accepted by clients.

    Even ones that pledge not to do disruptive advertising don't take that far enough when they look away from less scrupulous operators. CAN-SPAM is a case in point. It's an acknowledgement that spam emailers go too far. But the first version actually winked at the problem. It seemed more like a power play by bigger, more established advertisers to squeeze out competition.

  • (Score: 2) by NCommander on Monday June 09 2014, @02:16PM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday June 09 2014, @02:16PM (#53229) Homepage Journal

    For any ads thing, we're going to poll the community for recommendations on ad networks, as well as give anyone "first dibs" to buy adspace on us which we can fully manage. Subscribers will be able to turn off ads (since they'll be helping to fund the site), and perhaps high-karma users as well, for their contributions. Unfortunately, Google seems to have really taken over this area, and while there are still a few other ad companies, none of them seem to great towards user's privacy and such. I'd love to be proven wrong.

    --
    Still always moving