Scripting / one liner help
Hal Burgiss
hal at burgiss.net
Wed Aug 10 16:47:51 UTC 2011
On Wed, Aug 10, 2011 at 12:29 PM, Patton Echols <p.echols at comcast.net>wrote:
> I am looking for thoughts on how I might extract image names from an html
> document.
>
> The document started as a Word document with nothing but images, one per
> page, randomly named. It was saved as html using libre office, so I now
> have the images separate. I have a script that will process them through
> imagemagik to clean them up, reduce to from full color to b/w and make them
> into a pdf. But the pages are out of order because the images are randomly
> named.
>
> What I'd like to do is have something read the html file in order and
> either feed the names of the JPGs to the script in order or just spit them
> out to a file that I can feed to the script. The html source has all the
> images listed sequentially without line breaks. Each tag is the same except
> for the image name and looks like this:
> <IMG SRC="source_html_m1463afff.**jpg" NAME="graphics3" ALIGN=BOTTOM
> WIDTH=575 HEIGHT=790 BORDER=0>
>
>
See if this gets close to extracting the image names ...
grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' | whatever_script.sh
--
Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20110810/54b81296/attachment.html>
More information about the ubuntu-users
mailing list