@johntimaeus@infosec.exchange
So how the web scrape is going...
I'm estimating completion by Wednesday.
I'm 170 base sites in. Each one points to about 30 others.
This is with a depth of 1.
Every 10.0s: echo number of sites archived ; ls /var/lib/topgen/vhosts/ | wc -l ; echo; echo current; ps -ef | grep wget | grep -v grep | awk '{ print $NF }'; ps -ef | grep wget | grep -v grep | awk '{ print $NF }' >> wfile ; sort... 1ecf4ac17ca0: Mon Aug 11 00:32:47 2025
number of sites archived
4815
current
lefigaro.fr
free
403G
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp ESTAB 0 0 192.18.0.254:51466 23.193.174.82:https