Stories
Slash Boxes
Comments

SoylentNews is people

posted by NCommander on Friday June 06 2014, @08:00PM   Printer-friendly
from the seeing-how-big-our-userbase-is dept.
So, right now, I'm currently sitting with mrcoolbp and martyb in meatspace working out the finer points of incorporation, and the future needs of SoylentNews. One thing that has come up is we really don't have a great idea of our actual usage numbers are. Slashcode has decent internal numbers which give us some rough numbers, but they're only really valid for logged-in users (which bypass the varnish cache), and we're not 100% sure they're accurate anyway. According to slash, we're averaging approximately 50-60k page views per day (I've included the statistics email below), but it doesn't help us in knowing what AC usage look like. According to varnish, we average roughly 400-500k connections per day, but that number is inflated since we're not using keep-alive or HTTP pipelining as of yet.

Furthermore, since we don't log IP addresses in access.log, and IP's run through Slash are turned into IPIDs, its hard to get an idea of where our userbase is (the general feeling is the vast majority of us are based in the United States, but even then, that's more because our peak hours of traffic are between 4 and 10 PM EST). We've wanted to get a better idea of what our traffic and userbase are, so we're asking permission from the community to install piWik, and embed its javascript tag in the footer of each page, which will give us a wide berth of solid information to work from.Our plan is to setup piwik on a separate server, and have it available at stats.soylentnews.org, which can easily be killed via a hostfile. Furthermore, piwik honors the Do-Not-Tracker header for all web browsers except IE10, allowing easy opt-out. I can understand that a lot of users have concerns about any tracking, but we're trying to be upfront and honest about this, so no one gets hugely surprised. While we might post general information (i.e., usage from countries, user agents, etc) that piwik generates, we will purge IP addresses out of the piwik database as soon we're able, to limit the amount of personal information we're keeping about any user. While we're running piwik, we'll have a persistent notification in the "Site News" slashbox that collection is ongoing which will link to this post.

I'd like to get this setup over the weekend, and start collecting information by Sunday at the latest, then run collection for a few weeks. After that, we'll remove the tracking code, publish the results, and purge the piWik database of all personal information. We'll likely periodically re-enable stat tracking to get an idea of how we're doing, with a similiar notification post going up before we do so to give people the chance to opt-out before collection. Obviously, if the community feels dead-set against this, we'll abandon this plan, and simply work with what little information we have available.

SoylentNews Stats for 2014-06-05

                   UIDs      IPIDs      Pages
        total:        -          -      57452 (1341.1 MB)
 static total:        -          -       3822
gstatic total:        -          -       5972
  grand total:      892       4549      59666 (1561.6 MB)
 secure total:        -          -          0
sbscrbr total:        -          -          0

        posts:      153        219
     comments:      437       1546      19402 (330.4 MB)
        index:      726       2319       9107
     articles:      683       2860       9889 (373.1 MB)
       search:       11         92        209 (5.7 MB)
     journals:       43         98        229 (6.2 MB)
        users:      109        161        593 (15.9 MB)
          rss:       46        362       2214 (220.6 MB)
        other:      217        700      18023 (173.3 MB)


     formkeys:      487 rows total
     comments:      573 posted yesterday
  submissions:       16 submissions
 sub/comments:     31.2% of the submissions came from comment posters from this day



    not found:     4769 pages sent with status 404 (not found)

   total hits: 140856136





------------------------
                            Yesterday   | 2 days ago | 3 days ago
    Avg Hits Per Article:          706.4|       690.1|       629.9
Avg Comments Per Article:           30.4|        32.1|        18.4



Pages From RSS By Section
------------------------------------------------
Section		         Pages     UIDS    IPIDS
           Main Page      2508       87      539



For Main Page
                  Pages      IPs   Bandwidth    Users
        total:    57452     4353   1341.1 MB      885
        index:     9107     2319    436.5 MB      726
     comments:    19402     1546    330.4 MB      437
     articles:     9889     2860    373.1 MB      683
       search:      209       92      5.7 MB       11
          rss:     2214      362    220.6 MB       46
        other:    18023      700    173.3 MB      885


-----------------------

Top stories viewed by article.pl:
   883 14/06/05/0025257 n1         First-Person Shooter Engine in
   789 14/06/04/2126226 n1         Apple CEO Says Users Buy an An
   708 14/06/05/0132243 n1         Seattle Approves $15 Minimum W
   617 14/06/05/0121251 n1         Tesla S Road Trip Report
   578 14/06/04/2131208 n1         Intel Wants Your Next PC to Ha
   468 14/06/05/1256249 Woods      Dwarf Fortress Update Coming N
   453 14/06/04/1343246 janrinok   ISPs Urged to Quarantine Infec
   332 14/06/05/1418207 martyb     Computer Programs Are People,
   328 14/06/05/133219  Woods      FBI Offers $10,000 Reward For
   261 14/06/04/1329216 janrinok   Underwater Sound Examined for
   259 14/06/05/1419254 janrinok   How to Spend $750 for One Minu
   252 14/06/05/133201  LaminatorX Apple to Allow Virtual Currenc
   230 14/06/04/1310212 janrinok   Pixar Releasing its 3D Renderi
   225 14/06/04/1337244 janrinok   Vincent van Gogh's Severed Ear
   217 14/06/05/1315234 LaminatorX High Brain Integration and Cre
   194 14/06/04/1212207 martyb     Google Trying Out End-to-End E
   178 14/06/04/111243  LaminatorX Domestic Terror Task Force is
   155 14/06/04/1315250 janrinok   Ambulance Drones Might Appear
   141 14/06/03/211257  n1         What's Lost as Handwriting Fad
   139 14/06/04/1059208 LaminatorX Learning to Eat Vegetables in
   125 14/06/03/2048227 n1         Battlestar Galactica Reboot
   122 14/06/04/0527240 LaminatorX Windows Start Menu Won't Retur
   118 14/06/05/149215  janrinok   British Recording Industry Thi




-----------------------

Top referers:
84  http://www.netvibes.com
67  http://feedly.com
61  http://www.google.co.uk
42  http://google.com
38  http://barrapunto.com
30  https://www.google.com
29  http://7rmath4ro2of2a42.onion
29  http://maps.google.com
27  http://www.newsblur.com
22  http://www.inoreader.com
19  http://www.protopage.com
15  http://t.co
14  http://li694-22.members.linode.com
14  http://sylnt.us
14  http://theoldreader.com
10  http://www.google.com
9  http://www.jaruzel.com
7  http://hager.pipedot.org
7  http://pi.local
6  http://www.igoogleportal.com
 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by kbahey on Saturday June 07 2014, @01:40AM

    by kbahey (1147) on Saturday June 07 2014, @01:40AM (#52483) Homepage

    I reiterate what has been said, i.e. that the information stays within SoylentNews and is not shared with marketers or any other third party.

    However, it requires Javascript, and therefore will track only those who have Javascript enabled.

    One reason I came here from Slashdot, is that the abomination that is Beta requires Javascript, and I browse with Javascript off for performance, privacy and security reasons.

  • (Score: 2) by NCommander on Saturday June 07 2014, @02:02AM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Saturday June 07 2014, @02:02AM (#52490) Homepage Journal

    Generally, those with Javascript disabled are those who would have the most issue with being tracked even for our own internal purposes. There are other ways to collect information from piwik such as hidden image but my rough guess is less than 1-2% disable javascript on the site, so the numbers by and large would be accurate.

    My thought is if you don't run JS, you don't want to be tracked, and don't want to be executing foreign code. I can respect that, and this provides a nice middle ground.

    --
    Still always moving
    • (Score: 1) by kbahey on Saturday June 07 2014, @02:07AM

      by kbahey (1147) on Saturday June 07 2014, @02:07AM (#52492) Homepage

      Don't like to be tracked nor have foreign code executed. But more importantly and more practically, I don't want my CPU usage going up and the fan whirring just because there is crappy code on many sites. Mainly it is Flash stuff, but that often gets loaded via Javascript from ad servers.

    • (Score: 2) by Angry Jesus on Saturday June 07 2014, @02:37PM

      by Angry Jesus (182) on Saturday June 07 2014, @02:37PM (#52621)

      > My thought is if you don't run JS, you don't want to be tracked, and don't want to be executing foreign code.

      FWIW, the overwhelming reason I disable javascript is to avoid malware. For all intents and purposes, every browser exploit over the last 15 years has had javascript as a necessary component. Completely self-hosting exploits (like a corrupt gif that does a stack-smash on the gif parser) can be counted with the fingers on one hand. Disabling javascript protects against malware served from 3rd parties like ad networks as well as if the site itself has been secretly compromised.

      The plugin I use to avoid data-stalking is primarily requestpolicy [requestpolicy.com] which stops cross-site requests so that whether it is a simple web-bug or a giant set of javascript functions, my browser doesn't even load them.

      I'd like to think I'm not unique in making this analysis and that most people who disable javascript do it for security first. So I hope you will take that perspective into account when thinking about improvements to soylent -- the temptation to start using javascript is very strong and it is easy to believe that soylent's own server will always be trustworthy. But having a site's own server hacked to install drive-by malware is a very common attack, so trusting javascript from any source is a security risk.