Stories
Slash Boxes
Comments

SoylentNews is people

The Fine print: The following are owned by whoever posted them. We are not responsible for them in any way.

Such a script would ease the burden of implementing a few site-wide changes to The Global Computer Index.

As it stands 319 HTML files need to be revised in one or more of four separate ways.

Simply to contemplate such laborious and tedious work gets down so I focus on the smaller countries first, as well as the countries of whose cities I list only a very few.

I use find to produce a list of all the files that require revision. What I'd like is a script that sorts that into countries - or into US states - that have the fewest cities that require revision.

That won't save me any effort but it will make me far more productive. It's much easier for me to initiate a task if it at least appears to be a small task.

Here's some sample data:

$ find . -name index.html -exec grep -l 'Computer Job' {} \; | grep -v united | tail
./pakistan/rawalpindi/index.html
./philippines/manila/index.html
./poland/gdansk/index.html
./poland/warsaw/index.html
./russia/moscow/index.html
./russia/novosibirsk/novosibirsk/index.html
./russia/tomsk/index.html
./russia/tomsk-oblast/index.html
./serbia/belgrade/index.html
./singapore/index.html

In this list I would start with Singapore then go on to Serbia and the Philippines.

If I only needed to change "Computer Job" to "Computer Industry Job" I would use sed. But sed alone won't do it because I often have to break long lines into smaller chunks so as to make iFone Fanbois happy.

I'm also migrating my entire site to HTML 5 - but many of my as-yet-unrevised pages are _already_ HTML 5 but some get warnings when I validate them.

Some have spelling errors. Some have errors that doubtlessly would lead foreign patriots to undertake a vendetta against me, my male children and all their male children.

So really I do need to at least inspect all 319 candidate files.

I thank you, and your future managers thank you.

Display Options Threshold/Breakthrough Reply to Comment Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Thursday May 17 2018, @11:01AM (9 children)

    by Anonymous Coward on Thursday May 17 2018, @11:01AM (#680678)

    It will be more work up front, but save you immensely down the road

    You should expend your energy not on updating the template data of 300+ html files but instead you should be looking to separate the html templating from the actual data content.

    Your data content (name of business, contact, description, etc) should be stored in some form of structured computer readable format (json, yaml, csv, xml, does not matter which, just that it is in a structured computer readable format).

    Your templates that create the html pages should exist once, in combination with a piece of code that acts as a static site generator.

    Then, when you need to change the template design, you have one set of template files to edit, and rerun the generator on the data content files.

    As well, when you have to change multiple of the data files, because of the structured format, you'll nearly always be able to create a piece of custom code to change them all in-mass.

    And in the end, no manual edits of 300+ files to change some spelling errors in the templates.

  • (Score: 2) by MichaelDavidCrawford on Thursday May 17 2018, @06:07PM (8 children)

    by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Thursday May 17 2018, @06:07PM (#680806) Homepage Journal

    The payload is all in table elements always with four columns

    I've planned to automate it this whole time but am unclear of what I want my automation to do

    --
    Yes I Have No Bananas. [gofundme.com]
    • (Score: 1, Interesting) by Anonymous Coward on Thursday May 17 2018, @09:08PM (6 children)

      by Anonymous Coward on Thursday May 17 2018, @09:08PM (#680888)

      look at static generation https://www.staticgen.com [staticgen.com]

      • (Score: 2) by MichaelDavidCrawford on Friday May 18 2018, @01:44AM (5 children)

        (I will look at staticgen right after I post this.)

        I'm going to use Subversion on a remote server so as to implement reliable storage.

        I'll use Python to actually operate on the date. The first thing I want to do is write a simple Python app that will enable me to enter a given company just once, then add all their cities, states or provinces and maybe counties all at the same time, then distribute updates to all the write HTML files.

        That will need to create new HTML files from time to time - Oracle is all over Creation, so presently I add just one country at a time.

        --
        Yes I Have No Bananas. [gofundme.com]
        • (Score: 0) by Anonymous Coward on Friday May 18 2018, @01:16PM (4 children)

          by Anonymous Coward on Friday May 18 2018, @01:16PM (#681144)

          I'm going to use Subversion on a remote server so as to implement reliable storage.

          You do realize, don't you, that Subversion provides change history, but not reliable storage. Reliable storage is something like a RAID array to reduce the risk of loss from a disk failure followed by a backup strategy to cover for the small remaining risk of a multi-drive failure.

    • (Score: 2) by cafebabe on Thursday May 24 2018, @07:49PM

      by cafebabe (894) on Thursday May 24 2018, @07:49PM (#683714) Journal

      It would be much easier to update your website from one CSV file. Example implementation requires make, bash and perl which are typically present on Linux and MacOS:-

      begin 644 soggy-jobs-dist-20180521-231510.tar.gz
      M'XL("%!5`UL"`W-O9V=Y+6IO8G,M9&ES="TR,#$X,#4R,2TR,S$U,3`N=&%R
      M`.U9;5/;2!+F:_PK.EJ!),=C2<:&`):!(J3"74*NP+FK/<P&(8UM;63)JQEC
      M.X'][=>CD6P93-@LMTE='5,%FI>>[I[NGIZG@<6]WI3\&E\RX@>,DYIEO[0:
      M-9O4UNV&;9GOW$^T&X1TY1'-PK;9:(BOO5G?+'Y%J]=J]15[?=VR-AI6HU%?
      MP=6&O;D"ULIW:"/&W01@A2<!^VU$P_OH'EK_'VT_/:?1%0S0RZ723Y![&U['
      M"?0Y'VZ;)A,14A418B*%?F"($(%VG\)!'+$XX<%H4,65(4V&E(_<$,+`HY%'
      MH9>X$:<^\'@)+X@C\.+(#WB`/=YW^3*B@`FB*YID?(9Q$L7(=]@//!C32V`!
      MI\"&U`O<,/@<1#T((KBD+N-B@D]AW*<11#&$<=2C"23TMU&04%]HS"A-9;)4
      MZ#2D$8_HF%7CI&=Z\6"`8U8=AKLL\)U:8V.CL>9A;^.E7;=MZ**%NJ.$]Y&I
      M3[D;A*PJ3)A?(!!QQ4NE5T>G;6=FUS+RPU]]/A`?CUV52G2"A^+;I6=HP!#D
      M2%`UX2)D0'SP7>Z2E!BN00@"8E^42LQ#,U#<1P97,+=9H4LN<"N%%ZL_KPY6
      M?;+Z9O7=ZNE%Z=FX1SF0`9!HN,S)I6!0U$B.A$8%(C3/<,1I8F9G6;KTP.*#
      MRW^`("=I22LM/;`T,P_\*1Z)Q=TNAN<#MG7#<!ND@;/S@]B?>:=4\D+J1MMH
      M*8SO7H)A]C&PD'DWB'P@?(J[ND!8\!EUL8`,$Z2S4,#$37HHU@+6=]EH@%JK
      M^O'^NT/C]](S#R]`/D)2<;&`?(+:G*CT+!G,-\QEHV"I%X@E9S9_CS7XA,/:
      M&C)ZM__W0P.*1\"#)]['P$:.NN=#M2H(!Y_\(`%-U5\=G1A$U4\_O+8,56\?
      MH1H:*N4-Q0ES@PJR4\N`6@M,GUZ9T2@,+T"K5LWE#(J21"(F)!H-:!)X)!Y'
      M>+6(!Z1[C_0J;M#NT\PL&W!]#9W;`KQ'<NNZ(:.I*TC2_>K!>I\#-,U6JO]]
      M9%*HY'R)]+5OV3#Y_"W4/!G1S,'6/&1PDZ-<#,>^B#G,L1J;5,OF9*+-QQ(8
      MG%EDZSS]53ZS[/-\N"Y[8KDVFVS,>ANRM\COL:PN%.',[`#BG'B"9:&N%.-<
      M1G9J@/GIA:V<`C,1NXY6UNYLQ*LAK'!K9VJ8.]O3*]`VEC()6+RUL2'LWZ,1
      M#H*!VZ-`L$=">D5#6`<$8V$@$K3\$O%P`1D%/C0L3":]O!/%9.ABNCD!T@82
      M8XB;"W[<0S_N[6D7]Z0!%`G5+#0&GW#493]<#WFW2BM/[;LU]@#^G^&1OQ#_
      MK]?M.?[?W!3X?[->>\+_WP?_FR.6F)=!9*9HC_Q+@-C#+,EY23#D3\7`8XL!
      M*RL&BM6!>X693AVZO.]H\I)I.PCJ1Y<@"H6/"<6<^J4$H/;=R`^IP_I!E^_@
      M1-F/O?E(%2--$]UX2"/]Z+BB-/--BG%]G:`;D@@L03'N(VO=QU(DHKZNAOAQ
      MFD?'+<-(14EN52==$/0W^..%,:/(UA`3&3-[IW13T'6<"+LO559=4#;5\/V'
      M=D5IW:=BBI4!:=*M.S,%<&:Y!C1)T/BI<"_VBZ(YG?#Y4#(^;;\Z/#D!1;6V
      MY<YM2=>)%$%$)P&7?%+^:S-7Z!JG@V&(#UA:;VB5CII/X`'64E:Z-:G;%0W#
      M:Y$6]4XQM)YZNR)2(4ZQ*4,R71&87LQGE8RR=*FPB$&470(X./VGN`:CB'&,
      M5Q_>M-^]Q;!;ZN7<QW+X.S//.E$G.7^AFF;,Q-'WN@$-?4=/K3P<L;XN9RKJ
      M"T,&3KYW\$71SW[I*)W.>5G?W>YTJOG(*!M*9?<:5ROG+PSL56YZ@IV.)AU%
      M/)E65+P(G%94CNB^HN+=JJC]>(`3XH97D"SBKL<-1\H6>X/N_"@9%P.!U6PN
      M93@/8!E5EI/3[A2G;4>2[RS2"FO\@MARG_P;SDVS%TB+%+8]1($\>&*>R>4S
      MEWSND'/S-H]E%#(J%2WGHU7RKJUE,0FR`M,5&0LYI9('4D%(S;E%E'=L&3Z2
      MN'`+9]>PIF2<YO=/:?*D-=.A,-_A3>ZWFB[T$]IU.DKJP8[2$NYLFFZK:>+R
      M;&/1?\+),U=]G:<@19Y_P\]=GC=`$2A^A<\B]1T]\C#[8ZIDU*C-@>S]=Q2Z
      M0Y1>BD7*HC?,HCO4,/9<\5A^R1UX@[Z7EPORV%=D#K\1.0-3);Z"?(2)>N_M
      M^X/]]M'[XSW`K;!W\/ZX?7C<WLMNN3L<AE/Q=J99!?,)/F[4]?JS4`']$YTR
      M6,U5R!(+QTN7)SXAN)`\\YTB:]K&3D:-EZHSTZ6S9]XYTPU>-+9`G:DJB+F=
      MKZ[-'Z&Y)-QBI#G\">(_"O_/_OKWU^'_6KVQ/L?_&Q+_K]>?\/^/P_]O:<_U
      MIG`T>"H#OD<9\'7$5L3D.>S&!"_+@S\!K3[^"5CEW4%4S%H`4QDNEY7(C13P
      M?/G#GXZRFF7AX90+F.BK98$P%0242K5LJO8,<-UA?/LESR>6LL_7'I(@7UU-
      MT3#@@DC7E(JBH0$M-+'UH'VK"F)A^4P7;()^2TLA%-U\3DBUO$M("X$DRLSG
      M=<0Z(E#ULPQH=DA'$S#Z##H\!>OEVTO7S?9):[[<;+\JCO9G_1=O3@Y?.^F)
      M6^+(K2++7YKGY5VCL*]C[K<6AHML%T;Z72E2",H0C%-FQNZ/8K?L<(OT.#YI
      M&>9:>J-T=;VB(G9X65$;%;5>43<JZJ9ATA1HE/XOW_^%4O81[__&1OV^]]\2
      M_?3]M^O6IHWS-80!UM/[_SU:\_FK]P?MG_]Q",+!K5(S_7?J9!!&S%&RQW<\
      M'E?'Z^D3:&]M;9F3M)@41-NA&_4<A48*S'J"!X)^_`SP"4P?4R(>V"M'$?43
      M/J"D/1U213SB8N0HX@\PIN"Y`U[?31CESH?V:_)2`1.Y8-42TM9K\:]5%^&%
      M_/\O'$4^>BZ90I8UYT4-%DKICDQ^Y`ZHHUP%="RP3$'J./!YW_'I%2(3D@XJ
      MR"@0&($PSPVI8TL%S.PXE[$_Q<\;^YN407(\@WN)E=5EG/@T$4/\]:S)^RW!
      MP8VFJ'(_GW$32A-6G,F+SME,6]:)?:&;8#6KX,182!(+F;:F=.M3H?/4GMI3
      ,N]/^`V?S__T`*```
      `
      end

      (Usual instructions for uudecode process [soylentnews.org].)

      Usage:-

      • make scrape obtains legacy web site.
      • make import converts legacy web site into one CSV file. This process may omit some data but script can be adapted.
      • make tidy invokes OpenOffice or similar to edit CSV file.
      • make export converts CSV file into web site. URLs may not match legacy web site but script can be adapted.
      • make all performs scrape, import, tidy and export. This would be ambitious but is included for completeness and demonstration purposes.
      • make defaults to export only.
      • Other commands perform archiving.

      CSV file is flat and de-normalized with following format:-

      1. First column is country.
      2. Second column is state.
      3. Third column is town (or city).
      4. Fourth column is organisation name.
      5. Fifth column is organisation's home page.
      6. Sixth column is organisation's contact page.
      7. Seventh column is organisation's job board.
      8. All subsequent columns may be used internally.

      Export script contains minimal code for styling web site. You may want to improve it. For example, by adding hyperlinks to improve web site navigation.

      --
      1702845791×2