• Automating an atypical search & replace

    From Richard Owlett@21:1/5 to All on Sat Jul 13 11:08:48 2024
    I'm reformatting some HTML files containing chapters of the KJV Bible.
    My source follows the practice of italicizing some words.
    I find italics distracting.

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".
    Obviously it would not be wise to fully automate the action.
    I wish to find all occurrences of <span
    class='add'>arbitrary_text</span> an manually confirm the edit.

    In general, is it feasible?
    Can KDE's Kate do it?

    TIA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Richard Owlett on Sat Jul 13 19:48:57 2024
    On 13.07.2024 18:08, Richard Owlett wrote:
    I'm reformatting some HTML files containing chapters of the KJV Bible.
    My source follows the practice of italicizing some words.
    I find italics distracting.

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".
    Obviously it would not be wise to fully automate the action.
    I wish to find all occurrences of <span
    class='add'>arbitrary_text</span> an manually confirm the edit.

    In general, is it feasible?

    Yes, sure.

    Some remarks...
    I would use Regular Expressions (RE) for that task.
    If <span> sections can be nested in your HTML source then you
    cannot do that with plain RE processors.
    Since you want to inspect each <span> pattern individually it's
    not clear what you mean by "automate" (which I'd interpret as
    running a batch job to do the process).
    Actually you seem to want a sequential find + replace-or-skip.

    In Vim I'd search for the "<span ..." pattern and then delete
    to the next "</span>" pattern. (Assuming no nested <span>.)
    Rinse repeat.
    That could be (for example) the commands [case 1]

    /<span class='add'>
    d/<\/span>df>

    If there's no other <...> inside the span-sections you could
    simplify that to [case 2]

    /<span class='add'>
    d2f>

    with the opportunity to repeat those search+delete commands
    by simply typing n. for every match, like n.n.n.n. or if
    you want to skip some like, e.g., n.nnnn.n.nnn.n

    With n you get to the next span pattern and . repeats the
    last command.

    In [case 1] the repeat isn't possible since we have two delete
    operations d/<\/span> and df> , but here you can define
    macros to trigger the command by a keystroke or just use the
    recording function to repeat the once recorded commands.

    Sounds complicated? - Maybe. - But if we know your exact data
    format we can provide the best command sequence for Vim for
    most easy use.


    Can KDE's Kate do it?

    Don't know.

    Janis


    TIA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Richard Owlett on Sat Jul 13 21:18:01 2024
    On 13.07.2024 18:08, Richard Owlett wrote:
    I'm reformatting some HTML files containing chapters of the KJV Bible.
    My source follows the practice of italicizing some words.
    I find italics distracting.

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    It just occurred to me that if you say the italic text entities are
    the text objects in this span clause then the italic text-decoration
    is likely defined as a CSS attribute of the respective CSS class.
    That would in your example mean the class "add". Since you generally
    don't seem to like italics it would be easier - and also the usual
    way to tackle such a text - to change the single CSS attribute of
    the class. You find it in the CSS section of the header file or in
    a file with the CSS definition that is referenced in the HTML file.
    Look out for a line like "font-style: italic; and remove that.

    Janis


    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".
    Obviously it would not be wise to fully automate the action.
    I wish to find all occurrences of <span
    class='add'>arbitrary_text</span> an manually confirm the edit.

    In general, is it feasible?
    Can KDE's Kate do it?

    TIA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Richard Owlett on Sat Jul 13 23:39:14 2024
    On Sat, 13 Jul 2024 11:08:48 -0500, Richard Owlett wrote:

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".

    This is beyond the abilities of regular expressions. This is the point
    where you need to use an actual HTML/XML-parsing library.

    See also <https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags>.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Lawrence D'Oliveiro on Sat Jul 13 23:13:55 2024
    AOn Sat, 13 Jul 2024 23:39:14 -0000 (UTC), Lawrence D'Oliveiro wrote:
    On Sat, 13 Jul 2024 11:08:48 -0500, Richard Owlett wrote:

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".

    This is beyond the abilities of regular expressions. This is the point
    where you need to use an actual HTML/XML-parsing library.


    In general I'd agree with you. But the OP made a big deal -- in a
    different thread, for some reason -- about wanting to use minimal
    HTML, so I doubt very much there will be nested <span> ... </span>
    sequences.

    Also, the OP quite rightly wanted to confirm each change before it is
    made, so presumably if there are any nested sequences he will say no
    to that particular edit and fix it manually.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Richard Owlett on Sat Jul 13 23:08:54 2024
    On Sat, 13 Jul 2024 11:08:48 -0500, Richard Owlett wrote:

    I'm reformatting some HTML files containing chapters of the KJV Bible.
    My source follows the practice of italicizing some words.
    I find italics distracting.

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".
    Obviously it would not be wise to fully automate the action.
    I wish to find all occurrences of <span
    class='add'>arbitrary_text</span> an manually confirm the edit.

    In general, is it feasible?

    Yes, of course. Any editor above the level of Notepad ought to be
    able to do this. (Sadly, a lot of editors are not above the level of
    Notepad.)

    For instance, in Vim you would use this command after opening the
    file:

    :%s;<span class='add'>\([^<]*\)</span>;\1;gc

    % = process every line of the file
    \( ... \) makes that part of the pattern match addressable
    ]* matches a string of characters not including a <. If there is
    other HTML between span and /span, it will not match.
    \1 = the text found between span and /span
    gc = do every occurrence on each line, but confirm each one

    Can KDE's Kate do it?

    I've no idea.

    But there's an easier solution. Change the definition of class add in
    your style sheet:

    span.add { font-style:normal; }

    Then you won't have to edit the HTML at all.
    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Lawrence D'Oliveiro on Sun Jul 14 02:47:02 2024
    On 07/13/2024 06:39 PM, Lawrence D'Oliveiro wrote:
    On Sat, 13 Jul 2024 11:08:48 -0500, Richard Owlett wrote:

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".

    This is beyond the abilities of regular expressions. This is the point
    where you need to use an actual HTML/XML-parsing library.

    See also <https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags>.


    Thank you for the reference. Also I've begun perusing https://docs.kde.org/stable5/en/kate/katepart/regular-expressions.html .
    One of my motivations for this project is education.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Stan Brown on Sun Jul 14 02:51:45 2024
    On 07/14/2024 01:13 AM, Stan Brown wrote:
    AOn Sat, 13 Jul 2024 23:39:14 -0000 (UTC), Lawrence D'Oliveiro wrote:
    On Sat, 13 Jul 2024 11:08:48 -0500, Richard Owlett wrote:

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".

    This is beyond the abilities of regular expressions. This is the point
    where you need to use an actual HTML/XML-parsing library.


    In general I'd agree with you. But the OP made a big deal -- in a
    different thread, for some reason -- about wanting to use minimal
    HTML, so I doubt very much there will be nested <span> ... </span>
    sequences.

    I'd compare using a minimal HTML to learning to crawl before pursuing
    running a marathon ;}


    Also, the OP quite rightly wanted to confirm each change before it is
    made, so presumably if there are any nested sequences he will say no
    to that particular edit and fix it manually.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Stan Brown on Sun Jul 14 03:02:12 2024
    On 07/14/2024 01:08 AM, Stan Brown wrote:
    On Sat, 13 Jul 2024 11:08:48 -0500, Richard Owlett wrote:

    I'm reformatting some HTML files containing chapters of the KJV Bible.
    My source follows the practice of italicizing some words.
    I find italics distracting.

    These occurrences are consistently of the form
    <span class='add'>arbitrary_text</span>

    I wish to delete "<span class='add'>" and *ASSOCIATED* "</span>".
    Obviously it would not be wise to fully automate the action.
    I wish to find all occurrences of <span
    class='add'>arbitrary_text</span> an manually confirm the edit.

    In general, is it feasible?

    Yes, of course. Any editor above the level of Notepad ought to be
    able to do this. (Sadly, a lot of editors are not above the level of Notepad.)

    For instance, in Vim you would use this command after opening the
    file:

    :%s;<span class='add'>\([^<]*\)</span>;\1;gc

    % = process every line of the file
    \( ... \) makes that part of the pattern match addressable
    ]* matches a string of characters not including a <. If there is
    other HTML between span and /span, it will not match.
    \1 = the text found between span and /span
    gc = do every occurrence on each line, but confirm each one

    I'll use parsing that expression as a guide to understanding https://docs.kde.org/stable5/en/kate/katepart/regular-expressions.html .


    Can KDE's Kate do it?

    I've no idea.

    I'm gaining an appreciation of just how much HTML Kate can handle.
    Its highlighting feature begins to serve for minimal syntax checking.


    But there's an easier solution. Change the definition of class add in
    your style sheet:

    span.add { font-style:normal; }

    Then you won't have to edit the HTML at all.


    Learning CSS is beyond my current goals.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Lawrence D'Oliveiro on Sun Jul 14 16:48:26 2024
    On 07/14/2024 04:15 PM, Lawrence D'Oliveiro wrote:
    On Sun, 14 Jul 2024 03:02:12 -0500, Richard Owlett wrote:

    Learning CSS is beyond my current goals.

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?


    At 80 I pursue what's interesting ;}
    When I set personal goals for for the spec of my project I decided on
    doing it in a small as possible sub-set of HTML 2.0 .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Richard Owlett on Sun Jul 14 21:15:44 2024
    On Sun, 14 Jul 2024 03:02:12 -0500, Richard Owlett wrote:

    Learning CSS is beyond my current goals.

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Richard Owlett on Mon Jul 15 01:25:30 2024
    On Sun, 14 Jul 2024 16:48:26 -0500, Richard Owlett wrote:

    On 07/14/2024 04:15 PM, Lawrence D'Oliveiro wrote:

    On Sun, 14 Jul 2024 03:02:12 -0500, Richard Owlett wrote:

    Learning CSS is beyond my current goals.

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?

    At 80 I pursue what's interesting ;}
    When I set personal goals for for the spec of my project I decided on
    doing it in a small as possible sub-set of HTML 2.0 .

    To me, that’s like spending your weekends rebuilding a Morris Minor.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Lawrence D'Oliveiro on Sun Jul 14 23:29:07 2024
    On 07/14/2024 08:25 PM, Lawrence D'Oliveiro wrote:
    On Sun, 14 Jul 2024 16:48:26 -0500, Richard Owlett wrote:

    On 07/14/2024 04:15 PM, Lawrence D'Oliveiro wrote:

    On Sun, 14 Jul 2024 03:02:12 -0500, Richard Owlett wrote:

    Learning CSS is beyond my current goals.

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?

    At 80 I pursue what's interesting ;}
    When I set personal goals for for the spec of my project I decided on
    doing it in a small as possible sub-set of HTML 2.0 .

    To me, that’s like spending your weekends rebuilding a Morris Minor.


    Though I've never seen one, if I were mechanically inclined and a ocean
    away I could see that.
    q.v. https://www.mmoc.org.uk/ says "The MMOC exists to unite these
    people who have a fondness of these loveable jellymoulds, and those
    people who still use them as everyday transport."

    There is even a doctoral thesis on knowledge for its own sake :}! https://academiccommons.columbia.edu/doi/10.7916/d8-eme0-my23

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From candycanearter07@21:1/5 to Lawrence D'Oliveiro on Mon Jul 15 15:30:06 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 21:15 this Sunday (GMT):
    On Sun, 14 Jul 2024 03:02:12 -0500, Richard Owlett wrote:

    Learning CSS is beyond my current goals.

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?


    It is kinda hard for me to get a good looking website up..
    --
    user <candycane> is generated from /dev/urandom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Jul 15 21:59:36 2024
    On Mon, 15 Jul 2024 15:30:06 -0000 (UTC), candycanearter07 wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 21:15 this Sunday (GMT):

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?

    It is kinda hard for me to get a good looking website up..

    MDN is a good resource on all things Web, including CSS.

    <https://developer.mozilla.org/en-US/docs/Web>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Lawrence D'Oliveiro on Mon Jul 15 20:35:02 2024
    On 07/15/2024 04:59 PM, Lawrence D'Oliveiro wrote:
    On Mon, 15 Jul 2024 15:30:06 -0000 (UTC), candycanearter07 wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 21:15 this Sunday (GMT):

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?

    It is kinda hard for me to get a good looking website up..

    MDN is a good resource on all things Web, including CSS.

    <https://developer.mozilla.org/en-US/docs/Web>


    Appears to have useful content.
    Needs at least a "Table of Contents".
    An "Index" would likely be useful.

    A problem of much tech documentation.
    [Seen much of it in last half century. Been told I "write like an
    engineer". Once by an English prof whose son was one.]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Richard Owlett on Tue Jul 16 02:46:08 2024
    On Mon, 15 Jul 2024 20:35:02 -0500, Richard Owlett wrote:

    On 07/15/2024 04:59 PM, Lawrence D'Oliveiro wrote:

    MDN is a good resource on all things Web, including CSS.

    <https://developer.mozilla.org/en-US/docs/Web>


    Appears to have useful content.
    Needs at least a "Table of Contents".

    That page has the links to the various contents.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Lawrence D'Oliveiro on Tue Jul 16 06:17:46 2024
    On 07/15/2024 09:46 PM, Lawrence D'Oliveiro wrote:
    On Mon, 15 Jul 2024 20:35:02 -0500, Richard Owlett wrote:

    On 07/15/2024 04:59 PM, Lawrence D'Oliveiro wrote:

    MDN is a good resource on all things Web, including CSS.

    <https://developer.mozilla.org/en-US/docs/Web>


    Appears to have useful content.
    Needs at least a "Table of Contents".

    That page has the links to the various contents.


    Those do not make a "Table of Contents"!

    See
    https://researchmethod.net/table-of-contents/
    especially https://researchmethod.net/table-of-contents/#Importance_of_Table_of_Contents

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From candycanearter07@21:1/5 to Lawrence D'Oliveiro on Tue Jul 16 13:50:03 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 21:59 this Monday (GMT):
    On Mon, 15 Jul 2024 15:30:06 -0000 (UTC), candycanearter07 wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 21:15 this Sunday (GMT):

    CSS is essentially an indispensable part of HTML at this point. If it
    saves you effort, why not use it?

    It is kinda hard for me to get a good looking website up..

    MDN is a good resource on all things Web, including CSS.

    <https://developer.mozilla.org/en-US/docs/Web>


    alright..
    --
    user <candycane> is generated from /dev/urandom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Richard Owlett on Tue Jul 16 23:49:09 2024
    On Tue, 16 Jul 2024 06:17:46 -0500, Richard Owlett wrote:

    Those do not make a "Table of Contents"!

    It’s a table. It has the contents. Ergo, “table of contents”.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)