The officially official Devuan Forum!

You are not logged in.

#1 2023-05-30 14:59:27

amaro
Member
Registered: 2022-02-08
Posts: 97  

wget download articles linked to from a page

Hello everybody!

I want to download all the articles this page

https://shepherdexpress.com/lifestyle/out-of-my-mind

has links to.

I use an alias

type get
get is aliased to `wget -mkEpnp'

but it doesn't work. There is only one downloaded page which is '.rss'

ls ~/shepherdexpress.com/lifestyle/out-of-my-mind/
index.rss

This '.rss' page contains links to the articles

perl -ne 'print if /link/..!/\\\s*$/' index.rss
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/do-something/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/aviophobia/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/just-how-incarnated-are-you/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/ageism-hurts-more-than-seniors/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/does-the-nose-know/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/meet-your-therapist-nature/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/an-overlooked-key-to-happiness/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/health-anxiety-hurts/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/messages-from-the-embodied-mind/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/when-people-mess-with-your-autonomy/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/how-to-lose-friends-and-alienate-people/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/can-dreaming-heal-emotional-wounds/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/alcohol-is-not-your-friend/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/family-traits-that-hurt-or-heal/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/loves-lessons-learned/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/moneys-impact-on-mental-health/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/there-are-monsters-among-us/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/the-power-of-distraction/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/pathological-liars/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/your-nights-make-your-days/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/when-bad-things-happen/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/looking-back-to-envision-forward/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/christmas-has-an-identity-crisis/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/why-employee-performance-reviews-suck/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/your-brain-on-animals/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/verbal-abuse-is-a-neurotoxin/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/grateful-people-are-happy-people/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/male-dating-dysfunction/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/your-brain-on-television/</link>
<link>https://shepherdexpress.com/lifestyle/out-of-my-mind/got-election-angst/</link>

but for some reason 'wget' doesn't download them.

I also tried 'r' and 'l1' and 'l2' options but it made no difference.

get -l 1 -r --convert-links

Any suggestions how to download those articles are appreciated?
Thank you!

Offline

#2 2023-05-30 18:57:39

Camtaf
Member
Registered: 2019-11-19
Posts: 436  

Re: wget download articles linked to from a page

Maybe use sed to remove the <link> </link>, then put the list into a 'for in' loop(?)

Offline

#3 2023-05-30 19:34:55

GlennW
Member
From: Brisbane, Australia
Registered: 2019-07-18
Posts: 644  

Re: wget download articles linked to from a page

Hi, I used to download electronics pages with httrack, a webpage/site downloader. The package is in the devuan repos for Daedalus.

With the page on your computer you may move the docs you want and then clean up the left-overs to the rubbish bin.

I hope this helps you.

Last edited by GlennW (2023-06-01 19:30:01)


pic from 1993, new guitar day.

Offline

#4 2023-05-31 08:14:38

amaro
Member
Registered: 2022-02-08
Posts: 97  

Re: wget download articles linked to from a page

Camtaf wrote:

Maybe use sed to remove the <link> </link>, then put the list into a 'for in' loop(?)

kind of a workaround... put the links in a file and then

wget -i file

thank you!

Offline

Board footer