Stories
Slash Boxes
Comments

SoylentNews is people

The Fine print: The following are owned by whoever posted them. We are not responsible for them in any way.

Such a script would ease the burden of implementing a few site-wide changes to The Global Computer Index.

As it stands 319 HTML files need to be revised in one or more of four separate ways.

Simply to contemplate such laborious and tedious work gets down so I focus on the smaller countries first, as well as the countries of whose cities I list only a very few.

I use find to produce a list of all the files that require revision. What I'd like is a script that sorts that into countries - or into US states - that have the fewest cities that require revision.

That won't save me any effort but it will make me far more productive. It's much easier for me to initiate a task if it at least appears to be a small task.

Here's some sample data:

$ find . -name index.html -exec grep -l 'Computer Job' {} \; | grep -v united | tail
./pakistan/rawalpindi/index.html
./philippines/manila/index.html
./poland/gdansk/index.html
./poland/warsaw/index.html
./russia/moscow/index.html
./russia/novosibirsk/novosibirsk/index.html
./russia/tomsk/index.html
./russia/tomsk-oblast/index.html
./serbia/belgrade/index.html
./singapore/index.html

In this list I would start with Singapore then go on to Serbia and the Philippines.

If I only needed to change "Computer Job" to "Computer Industry Job" I would use sed. But sed alone won't do it because I often have to break long lines into smaller chunks so as to make iFone Fanbois happy.

I'm also migrating my entire site to HTML 5 - but many of my as-yet-unrevised pages are _already_ HTML 5 but some get warnings when I validate them.

Some have spelling errors. Some have errors that doubtlessly would lead foreign patriots to undertake a vendetta against me, my male children and all their male children.

So really I do need to at least inspect all 319 candidate files.

I thank you, and your future managers thank you.

Display Options Threshold/Breakthrough Reply to Article Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2) by The Mighty Buzzard on Wednesday May 16 2018, @11:41AM (7 children)

    A) That's a pointless script that won't help your productivity at all.
    B) You're asking for free help on a commercial venture.
    C) "Thanks a bunch" is not sufficient compensation for subcontracting something I do for a living, even if it would only take a few minutes.

    If it were a script for your desktop, a service I used, or a worthwhile nonprofit thing, my answer might be different but you're asking me to perform my trade to your financial benefit but not my own. That's not cool, so no.

    --
    My rights don't end where your fear begins.
    • (Score: 2) by MichaelDavidCrawford on Wednesday May 16 2018, @12:58PM (4 children)

      by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Wednesday May 16 2018, @12:58PM (#680373) Homepage Journal

      Quoting from the footer of every last page of Soggy Jobs [soggy.jobs]:

      "There is _no_ charge to list your company here, nor will there _ever_ be."

      If put three years of work into the site but I have yet to earn so much as a dime from it.

      While I've got your attention, what leads you to believe that your script wouldn't make me more productive? I made my reason plain.

      --
      Yes I Have No Bananas. [gofundme.com]
      • (Score: 2) by The Mighty Buzzard on Wednesday May 16 2018, @01:33PM (3 children)

        Regardless of what order you do them in, the total number is the same. Just get started instead of wasting time organizing to no end. It's a common code monkey flaw but not an excusable one in anyone out of their 20s.

        I withdraw the financial objections being as they're inapplicable.

        --
        My rights don't end where your fear begins.
        • (Score: 2) by MichaelDavidCrawford on Wednesday May 16 2018, @01:37PM (2 children)

          by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Wednesday May 16 2018, @01:37PM (#680391) Homepage Journal

          I implied that fact when I pointed out that such a script wouldn't save me any work.

          What my requested script _will_ do is increase my productivity - and substantially so.

          This because I would be far less inclined to put off small tasks than I am to put off large tasks.

          How often do you _really_ write code for eight hours in a single day?

          Whenever I start a new program I do so by implementing "Hello World". I then gradually - just a little bit at a time - edit Hello.c until I produce a deliverable.

          --
          Yes I Have No Bananas. [gofundme.com]
          • (Score: 0) by Anonymous Coward on Wednesday May 16 2018, @04:42PM (1 child)

            by Anonymous Coward on Wednesday May 16 2018, @04:42PM (#680438)

            The idea of breaking up an overwhelmingly large task (update all pages globally) into smaller tasks (update all pages in $COUNTRY) is a reasonable one, but sorting by which countries need the fewest pages updated seems superfluous.
            Knowing which ones need the fewest changes might let you get to those without procrastination, but the ones with more changes? You've still gotta do them eventually, and by piling them all up at the end, it looks to me like you're just guaranteeing yourself a pile of procrastination at that point.

            But unlike TMB, I don't do this for a living; I'm a machinist, so this was a pleasant diversion to bang out on lunch break -- even though I'm convinced it's completely useless.
            #!/usr/bin/awk -f
            BEGIN {
                FS="/"
            }
            country != $2 {
                print count, country
                country=$2
                count=0
            }
            {
                count++
            }
            END {
                print count, country
            }

            Feed it your list of files and pipe the result through sort -n or whatever.
            It's dirty -- e.g. depends on input being sorted to the extent that all ./foo/* are adjacent, so if you're concatenating multiple sets of input, pipe them through sort first. You get the idea, it'll work if you make it work. Also, it outputs a bogus blank line at the beginning. If that bugs you, I'm sure you can fix it.

    • (Score: 0) by Anonymous Coward on Wednesday May 16 2018, @05:24PM

      by Anonymous Coward on Wednesday May 16 2018, @05:24PM (#680455)

      But, but TMB, just think of all the exposure you'll get for doing it! In all seriousness though, if there is any criticism on how someone does this, then this is going straight to https://www.reddit.com/r/ChoosingBeggars/ [reddit.com]

    • (Score: 2) by cafebabe on Thursday May 31 2018, @03:49PM

      by cafebabe (894) on Thursday May 31 2018, @03:49PM (#686786) Journal

      I provided a working script with import for the manually created data [soylentnews.org] and didn't even get an acknowledgement. It may be the case that my response was missed. If it was specifically ignored and there is specific effort to re-implement this work in another language [soylentnews.org] then I'm very offended.

      --
      1702845791×2
  • (Score: 2, Interesting) by Anonymous Coward on Thursday May 17 2018, @01:32AM (1 child)

    by Anonymous Coward on Thursday May 17 2018, @01:32AM (#680585)

    I second what TMB wrote above about the task.

    However, what I can offer free is that if you're going to parse HTML, you need a proper HTML parser. The easy way is to use one from CPAN and Perl5. You have perl5 already there on your OS X machine. Just choose a parser and add it. The examples in the manual pages for the parsing modules will do most of the work for you.

    Run everything through HTML Tidy first.

  • (Score: 0) by Anonymous Coward on Thursday May 17 2018, @11:01AM (9 children)

    by Anonymous Coward on Thursday May 17 2018, @11:01AM (#680678)

    It will be more work up front, but save you immensely down the road

    You should expend your energy not on updating the template data of 300+ html files but instead you should be looking to separate the html templating from the actual data content.

    Your data content (name of business, contact, description, etc) should be stored in some form of structured computer readable format (json, yaml, csv, xml, does not matter which, just that it is in a structured computer readable format).

    Your templates that create the html pages should exist once, in combination with a piece of code that acts as a static site generator.

    Then, when you need to change the template design, you have one set of template files to edit, and rerun the generator on the data content files.

    As well, when you have to change multiple of the data files, because of the structured format, you'll nearly always be able to create a piece of custom code to change them all in-mass.

    And in the end, no manual edits of 300+ files to change some spelling errors in the templates.

    • (Score: 2) by MichaelDavidCrawford on Thursday May 17 2018, @06:07PM (8 children)

      by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Thursday May 17 2018, @06:07PM (#680806) Homepage Journal

      The payload is all in table elements always with four columns

      I've planned to automate it this whole time but am unclear of what I want my automation to do

      --
      Yes I Have No Bananas. [gofundme.com]
      • (Score: 1, Interesting) by Anonymous Coward on Thursday May 17 2018, @09:08PM (6 children)

        by Anonymous Coward on Thursday May 17 2018, @09:08PM (#680888)

        look at static generation https://www.staticgen.com [staticgen.com]

        • (Score: 2) by MichaelDavidCrawford on Friday May 18 2018, @01:44AM (5 children)

          (I will look at staticgen right after I post this.)

          I'm going to use Subversion on a remote server so as to implement reliable storage.

          I'll use Python to actually operate on the date. The first thing I want to do is write a simple Python app that will enable me to enter a given company just once, then add all their cities, states or provinces and maybe counties all at the same time, then distribute updates to all the write HTML files.

          That will need to create new HTML files from time to time - Oracle is all over Creation, so presently I add just one country at a time.

          --
          Yes I Have No Bananas. [gofundme.com]
          • (Score: 0) by Anonymous Coward on Friday May 18 2018, @01:16PM (4 children)

            by Anonymous Coward on Friday May 18 2018, @01:16PM (#681144)

            I'm going to use Subversion on a remote server so as to implement reliable storage.

            You do realize, don't you, that Subversion provides change history, but not reliable storage. Reliable storage is something like a RAID array to reduce the risk of loss from a disk failure followed by a backup strategy to cover for the small remaining risk of a multi-drive failure.

      • (Score: 2) by cafebabe on Thursday May 24 2018, @07:49PM

        by cafebabe (894) on Thursday May 24 2018, @07:49PM (#683714) Journal

        It would be much easier to update your website from one CSV file. Example implementation requires make, bash and perl which are typically present on Linux and MacOS:-

        begin 644 soggy-jobs-dist-20180521-231510.tar.gz
        M'XL("%!5`UL"`W-O9V=Y+6IO8G,M9&ES="TR,#$X,#4R,2TR,S$U,3`N=&%R
        M`.U9;5/;2!+F:_PK.EJ!),=C2<:&`):!(J3"74*NP+FK/<P&(8UM;63)JQEC
        M.X'][=>CD6P93-@LMTE='5,%FI>>[I[NGIZG@<6]WI3\&E\RX@>,DYIEO[0:
        M-9O4UNV&;9GOW$^T&X1TY1'-PK;9:(BOO5G?+'Y%J]=J]15[?=VR-AI6HU%?
        MP=6&O;D"ULIW:"/&W01@A2<!^VU$P_OH'EK_'VT_/:?1%0S0RZ723Y![&U['
        M"?0Y'VZ;)A,14A418B*%?F"($(%VG\)!'+$XX<%H4,65(4V&E(_<$,+`HY%'
        MH9>X$:<^\'@)+X@C\.+(#WB`/=YW^3*B@`FB*YID?(9Q$L7(=]@//!C32V`!
        MI\"&U`O<,/@<1#T((KBD+N-B@D]AW*<11#&$<=2C"23TMU&04%]HS"A-9;)4
        MZ#2D$8_HF%7CI&=Z\6"`8U8=AKLL\)U:8V.CL>9A;^.E7;=MZ**%NJ.$]Y&I
        M3[D;A*PJ3)A?(!!QQ4NE5T>G;6=FUS+RPU]]/A`?CUV52G2"A^+;I6=HP!#D
        M2%`UX2)D0'SP7>Z2E!BN00@"8E^42LQ#,U#<1P97,+=9H4LN<"N%%ZL_KPY6
        M?;+Z9O7=ZNE%Z=FX1SF0`9!HN,S)I6!0U$B.A$8%(C3/<,1I8F9G6;KTP.*#
        MRW^`("=I22LM/;`T,P_\*1Z)Q=TNAN<#MG7#<!ND@;/S@]B?>:=4\D+J1MMH
        M*8SO7H)A]C&PD'DWB'P@?(J[ND!8\!EUL8`,$Z2S4,#$37HHU@+6=]EH@%JK
        M^O'^NT/C]](S#R]`/D)2<;&`?(+:G*CT+!G,-\QEHV"I%X@E9S9_CS7XA,/:
        M&C)ZM__W0P.*1\"#)]['P$:.NN=#M2H(!Y_\(`%-U5\=G1A$U4\_O+8,56\?
        MH1H:*N4-Q0ES@PJR4\N`6@M,GUZ9T2@,+T"K5LWE#(J21"(F)!H-:!)X)!Y'
        M>+6(!Z1[C_0J;M#NT\PL&W!]#9W;`KQ'<NNZ(:.I*TC2_>K!>I\#-,U6JO]]
        M9%*HY'R)]+5OV3#Y_"W4/!G1S,'6/&1PDZ-<#,>^B#G,L1J;5,OF9*+-QQ(8
        MG%EDZSS]53ZS[/-\N"Y[8KDVFVS,>ANRM\COL:PN%.',[`#BG'B"9:&N%.-<
        M1G9J@/GIA:V<`C,1NXY6UNYLQ*LAK'!K9VJ8.]O3*]`VEC()6+RUL2'LWZ,1
        M#H*!VZ-`L$=">D5#6`<$8V$@$K3\$O%P`1D%/C0L3":]O!/%9.ABNCD!T@82
        M8XB;"W[<0S_N[6D7]Z0!%`G5+#0&GW#493]<#WFW2BM/[;LU]@#^G^&1OQ#_
        MK]?M.?[?W!3X?[->>\+_WP?_FR.6F)=!9*9HC_Q+@-C#+,EY23#D3\7`8XL!
        M*RL&BM6!>X693AVZO.]H\I)I.PCJ1Y<@"H6/"<6<^J4$H/;=R`^IP_I!E^_@
        M1-F/O?E(%2--$]UX2"/]Z+BB-/--BG%]G:`;D@@L03'N(VO=QU(DHKZNAOAQ
        MFD?'+<-(14EN52==$/0W^..%,:/(UA`3&3-[IW13T'6<"+LO559=4#;5\/V'
        M=D5IW:=BBI4!:=*M.S,%<&:Y!C1)T/BI<"_VBZ(YG?#Y4#(^;;\Z/#D!1;6V
        MY<YM2=>)%$%$)P&7?%+^:S-7Z!JG@V&(#UA:;VB5CII/X`'64E:Z-:G;%0W#
        M:Y$6]4XQM)YZNR)2(4ZQ*4,R71&87LQGE8RR=*FPB$&470(X./VGN`:CB'&,
        M5Q_>M-^]Q;!;ZN7<QW+X.S//.E$G.7^AFF;,Q-'WN@$-?4=/K3P<L;XN9RKJ
        M"T,&3KYW\$71SW[I*)W.>5G?W>YTJOG(*!M*9?<:5ROG+PSL56YZ@IV.)AU%
        M/)E65+P(G%94CNB^HN+=JJC]>(`3XH97D"SBKL<-1\H6>X/N_"@9%P.!U6PN
        M93@/8!E5EI/3[A2G;4>2[RS2"FO\@MARG_P;SDVS%TB+%+8]1($\>&*>R>4S
        MEWSND'/S-H]E%#(J%2WGHU7RKJUE,0FR`M,5&0LYI9('4D%(S;E%E'=L&3Z2
        MN'`+9]>PIF2<YO=/:?*D-=.A,-_A3>ZWFB[T$]IU.DKJP8[2$NYLFFZK:>+R
        M;&/1?\+),U=]G:<@19Y_P\]=GC=`$2A^A<\B]1T]\C#[8ZIDU*C-@>S]=Q2Z
        M0Y1>BD7*HC?,HCO4,/9<\5A^R1UX@[Z7EPORV%=D#K\1.0-3);Z"?(2)>N_M
        M^X/]]M'[XSW`K;!W\/ZX?7C<WLMNN3L<AE/Q=J99!?,)/F[4]?JS4`']$YTR
        M6,U5R!(+QTN7)SXAN)`\\YTB:]K&3D:-EZHSTZ6S9]XYTPU>-+9`G:DJB+F=
        MKZ[-'Z&Y)-QBI#G\">(_"O_/_OKWU^'_6KVQ/L?_&Q+_K]>?\/^/P_]O:<_U
        MIG`T>"H#OD<9\'7$5L3D.>S&!"_+@S\!K3[^"5CEW4%4S%H`4QDNEY7(C13P
        M?/G#GXZRFF7AX90+F.BK98$P%0242K5LJO8,<-UA?/LESR>6LL_7'I(@7UU-
        MT3#@@DC7E(JBH0$M-+'UH'VK"F)A^4P7;()^2TLA%-U\3DBUO$M("X$DRLSG
        M=<0Z(E#ULPQH=DA'$S#Z##H\!>OEVTO7S?9):[[<;+\JCO9G_1=O3@Y?.^F)
        M6^+(K2++7YKGY5VCL*]C[K<6AHML%T;Z72E2",H0C%-FQNZ/8K?L<(OT.#YI
        M&>9:>J-T=;VB(G9X65$;%;5>43<JZJ9ATA1HE/XOW_^%4O81[__&1OV^]]\2
        M_?3]M^O6IHWS-80!UM/[_SU:\_FK]P?MG_]Q",+!K5(S_7?J9!!&S%&RQW<\
        M'E?'Z^D3:&]M;9F3M)@41-NA&_4<A48*S'J"!X)^_`SP"4P?4R(>V"M'$?43
        M/J"D/1U213SB8N0HX@\PIN"Y`U[?31CESH?V:_)2`1.Y8-42TM9K\:]5%^&%
        M_/\O'$4^>BZ90I8UYT4-%DKICDQ^Y`ZHHUP%="RP3$'J./!YW_'I%2(3D@XJ
        MR"@0&($PSPVI8TL%S.PXE[$_Q<\;^YN407(\@WN)E=5EG/@T$4/\]:S)^RW!
        MP8VFJ'(_GW$32A-6G,F+SME,6]:)?:&;8#6KX,182!(+F;:F=.M3H?/4GMI3
        ,N]/^`V?S__T`*```
        `
        end

        (Usual instructions for uudecode process [soylentnews.org].)

        Usage:-

        • make scrape obtains legacy web site.
        • make import converts legacy web site into one CSV file. This process may omit some data but script can be adapted.
        • make tidy invokes OpenOffice or similar to edit CSV file.
        • make export converts CSV file into web site. URLs may not match legacy web site but script can be adapted.
        • make all performs scrape, import, tidy and export. This would be ambitious but is included for completeness and demonstration purposes.
        • make defaults to export only.
        • Other commands perform archiving.

        CSV file is flat and de-normalized with following format:-

        1. First column is country.
        2. Second column is state.
        3. Third column is town (or city).
        4. Fourth column is organisation name.
        5. Fifth column is organisation's home page.
        6. Sixth column is organisation's contact page.
        7. Seventh column is organisation's job board.
        8. All subsequent columns may be used internally.

        Export script contains minimal code for styling web site. You may want to improve it. For example, by adding hyperlinks to improve web site navigation.

        --
        1702845791×2
(1)