本文記錄如何從代理IP網站抓取代理IP,網站來源及部分代碼參考自KyxRecon/proxy-scraper.sh。協議類型可分爲HTTPSOCKS兩種,HTTP細分爲HTTPHTTPS兩種,根據匿名等級分爲transparentanonymoushigh-anonymous三種;SOCKS細分爲SOCKS4SOCKS5兩種。就隱藏真實IP而言,HTTPhigh-anonymousSOCKS5類型代理IP爲理想選擇。

Proxy Site Lists

代理IP網站列表如下

No Site CN socks4 socks5 transparent anonymous high-anonymous(elite)
1 SamAir 1 1 1 1 1 1
2 Nntime 1 0 0 1 1 1
3 PROXYS™ 1 0 0 1 0 1
4 Proxz 0 0 0 0 0 1
5 AliveProxy 0 0 1 0 1 1
6 ProxyNova 0 0 0 1 0 1
7 Daily Proxy 0 0 0 1 0 1
8 HideMyAss 0 0 0 1 1 1
9 freeproxylists 0 0 0 1 1 1

IP Extraction

通過代理IP網站的HTML代碼提取所需數據,使用到seqparallelawksed等命令

SamAir

SamAir同時提供HTTPSOCKS類型的IP。

HTTP Proxy

URL地址如下

1
2
3
4
5
https://premproxy.com/list/
https://premproxy.com/list/01.htm
...
https://premproxy.com/list/20.htm

輸出形式爲

1
IP:Port|AnonymityLevel|Country|City|ISP

代碼如下

1
2
3
4
5
6
7
8
9
page_no=$(curl -fsL https://premproxy.com/list/01.htm | sed -r -n '/ptabletitle/{s@(<[^>]*>|\(|\))@@g;s@.*of (.*)@\[email protected]}')
if [[ "${page_no}" < 10 ]]; then
seq -f 0%g 1 "${page_no}" | parallel -k -j 0 -X curl -fsL https://premproxy.com/list/{}.htm 2> /dev/null | sed -r -n '/ptabletitle/,/pageinfo/{/tr class/{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | sed '/^[[:space:]]*$/d' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$6)}'
else
seq -f 0%g 1 9 | parallel -k -j 0 -X curl -fsL https://premproxy.com/list/{}.htm 2> /dev/null | sed -r -n '/ptabletitle/,/pageinfo/{/tr class/{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | sed '/^[[:space:]]*$/d' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$6)}'
seq 10 "${page_no}" | parallel -k -j 0 -X curl -fsL https://premproxy.com/list/{}.htm 2> /dev/null | sed -r -n '/ptabletitle/,/pageinfo/{/tr class/{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | sed '/^[[:space:]]*$/d' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$6)}'
fi

SOCKS

URL地址爲 https://premproxy.com/socks-list/

輸出形式爲

1
IP:Port|AnonymityLevel|Country|City|ISP

代碼如下

1
2
3
page_no=$(curl -fsL https://premproxy.com/socks-list/01.htm | sed -r -n '/next/{s@<[^>]*>@@gp}' | awk '{print $(NF-1)}') # 5
seq -f 0%g 1 "${page_no}" | parallel -k -j 0 -X curl -fsL https://premproxy.com/socks-list/{}.htm 2> /dev/null | sed -r -n '/^<tr><td>/{{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,tolower($2),$4,$5,$6)}'

Nntime

Nntime提供HTTP類型代理IP

URL地址如下

1
2
3
http://nntime.com/proxy-list-01.htm
...
http://nntime.com/proxy-list-18.htm

端口號採用document.write(":"+z+v)形式,字母與數字的對應關係每一頁都不相同

輸出形式爲

1
IP:Port|AnonymityLevel|Country|City|ISP

代碼如下

1
2
3
4
5
6
7
8
9
page_no=$(curl -fsL http://nntime.com/proxy-list-01.htm | sed -r -n '/navigation/{{s@(<[^>]*>|\(|\)|next)@@g;p}}' | awk '{print $NF}')
if [[ "${page_no}" < 10 ]]; then
seq -f 0%g 1 "${page_no}" | parallel -k -j 0 -X curl -fsL http://nntime.com/proxy-list-{}.htm 2> /dev/null | sed -r -n '/<\/thead>/,/<\/table>/{s@<\/?(thead|table|dfn|script)[[:space:]]*[^>]*>@@g;s@<(td|tr)[[:space:]]*[^>]*>@@g;s@<input.*value=\"(.*)\" onclick.*\/>@\[email protected];s@(\"|\:|\+)@@g;[email protected]\((.*)\)@|\[email protected];p}' | sed -r -n '/^[[:blank:]]*$/d;s@(<\/td>)@[email protected];s@\)@@g;p' | awk '{if($0!~/^<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r 's@[[:space:]]*(\|)[[:space:]]*@\[email protected]' | awk -F\| '{str_start_pos=(length($1)-length($3)+1);port=substr($1,str_start_pos); sub(/[[:space:]]*proxy/,"",$4); printf("%s:%s|%s|%s|%s\n",$2,port,$4,$7,$6)}' | sed -r 's@[[:space:]]*\(@[email protected]' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$3)}'
else
seq -f 0%g 1 9 | parallel -k -j 0 -X curl -fsL http://nntime.com/proxy-list-{}.htm 2> /dev/null | sed -r -n '/<\/thead>/,/<\/table>/{s@<\/?(thead|table|dfn|script)[[:space:]]*[^>]*>@@g;s@<(td|tr)[[:space:]]*[^>]*>@@g;s@<input.*value=\"(.*)\" onclick.*\/>@\[email protected];s@(\"|\:|\+)@@g;[email protected]\((.*)\)@|\[email protected];p}' | sed -r -n '/^[[:blank:]]*$/d;s@(<\/td>)@[email protected];s@\)@@g;p' | awk '{if($0!~/^<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r 's@[[:space:]]*(\|)[[:space:]]*@\[email protected]' | awk -F\| '{str_start_pos=(length($1)-length($3)+1);port=substr($1,str_start_pos); sub(/[[:space:]]*proxy/,"",$4); printf("%s:%s|%s|%s|%s\n",$2,port,$4,$7,$6)}' | sed -r 's@[[:space:]]*\(@[email protected]' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$3)}'
seq 10 "${page_no}" | parallel -k -j 0 -X curl -fsL http://nntime.com/proxy-list-{}.htm 2> /dev/null | sed -r -n '/<\/thead>/,/<\/table>/{s@<\/?(thead|table|dfn|script)[[:space:]]*[^>]*>@@g;s@<(td|tr)[[:space:]]*[^>]*>@@g;s@<input.*value=\"(.*)\" onclick.*\/>@\[email protected];s@(\"|\:|\+)@@g;[email protected]\((.*)\)@|\[email protected];p}' | sed -r -n '/^[[:blank:]]*$/d;s@(<\/td>)@[email protected];s@\)@@g;p' | awk '{if($0!~/^<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r 's@[[:space:]]*(\|)[[:space:]]*@\[email protected]' | awk -F\| '{str_start_pos=(length($1)-length($3)+1);port=substr($1,str_start_pos); sub(/[[:space:]]*proxy/,"",$4); printf("%s:%s|%s|%s|%s\n",$2,port,$4,$7,$6)}' | sed -r 's@[[:space:]]*\(@[email protected]' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$3)}'
fi

PROXYS™

PROXYS™提供HTTP類型代理IP,分爲transparenteelite兩種。

URL地址爲 http://www.proxys.com.ar

輸出形式爲

1
IP:Port|AnonymityLevel|Country

代碼如下

1
curl -fsL http://www.proxys.com.ar/ | sed -r -n '/st-tables-page/{s@<\/?(ins|script|a|thead|tbody)[[:space:]]*[^>]*>?@@g;s@<\/tr>@\[email protected];s@<(tr|td)>@@g;p}' | sed -r -n '/^[[:digit:]]+/{s@<\/td>@[email protected];p}' | awk -F\| '{printf("%s:%s|%s|%s\n",$1,$2,tolower($4),$3)}'

Proxz

Proxz提供HTTP類型代理IP

URL地址如下

1
2
3
4
http://www.proxz.com/proxy_list_high_anonymous_0.html
http://www.proxz.com/proxy_list_high_anonymous_0_ext.html
...
http://www.proxz.com/proxy_list_high_anonymous_7_ext.html

注意:使用curlwget命令時,必須指定user-agent,否則無法獲取HTML頁面。頁面中IP地址使用javascript的unescape操作轉換爲長字符串,須對其進行反向操作復原爲IP。

輸出形式爲

1
IP:Port|AnonymityLevel|Country

代碼如下

1
2
3
4
5
6
7
user_agent=${user_agent:-"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6.4) AppleWebKit/537.29.20 (KHTML, like Gecko) Chrome/60.0.3030.92 Safari/537.29.20"}
page_no=$(curl -fsL --user-agent "\"${user_agent}\"" http://www.proxz.com/proxy_list_high_anonymous_0_ext.html | sed -r -n '/^<\/td><\/tr><\/table>/{s@(<[^>]*>|::..)@@g;p}' | awk -F: '{print $NF}')
urldecode() { : "${*}" ; echo -e "${_}" | sed 's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e | sed -r -n 's@.*\("(.*)"\).*@\[email protected];s@%2e@.@g;p'; }
seq 0 "${page_no}" | parallel -k -j 0 -X curl -fsL --user-agent "\"${user_agent}\"" http://www.proxz.com/proxy_list_high_anonymous_{}_ext.html 2> /dev/null | sed -r -n "/eval\(unescape/{s@<\/td><\/tr>@@;s@<\/tr>@\[email protected];s@<noscript>Please enable javascript<\/noscript>@@g;s@<\/?(tr|a|script)[[:space:]]*[^>]*>@@g;s@(<td>|\(|\)|;)@@g;[email protected]@@g;s@'@@g;s@<\/td>@[email protected];s@<td[[:space:]]*[^>]*>@@g;p}" | sed '/^$/d' | while IFS="|" read -r a b c d e f;do ip=$(urldecode $a); echo "$ip:$b|${c,,}|$d"; done

AliveProxy

AliveProxy提供HTTP(anonymous, high-anonymous)和SOCKS5類型代理IP。

URL地址如下

1
2
3
4
5
6
7
8
<!-- Free Proxy List: High anonymity Proxies. -->
http://www.aliveproxy.com/high-anonymity-proxy-list/
<!-- Free Proxy List: Anonymous Proxies. -->
http://www.aliveproxy.com/anonymous-proxy-list/
<!-- Free Socks 5 Proxy Lists. -->
http://aliveproxy.com/socks5-list/

輸出形式爲

1
IP:Port

HTTP Proxy

代碼如下

1
2
3
4
5
# High Anonymous Proxies
curl -fsL http://www.aliveproxy.com/high-anonymity-proxy-list/ | sed -r -n '/^<TABLE class/{s@(.*)@\L\[email protected];s@<\/tr>@\[email protected];s@<\/?(tr|td|table|center|a|br)[[:space:]]*[^>]*>@@g;p}' | sed -r -n '/^[[:digit:].]+/{s@(.*)--.*@\[email protected]}' | awk '{printf("%s|%s\n",$1,"high-anonymous")}'
# Anonymous Proxies
curl -fsL http://www.aliveproxy.com/anonymous-proxy-list/ | sed -r -n '/^<TABLE class/{s@(.*)@\L\[email protected];s@<\/tr>@\[email protected];s@<\/?(tr|td|table|center|a|br)[[:space:]]*[^>]*>@@g;p}' | sed -r -n '/^[[:digit:].]+/{s@(.*)--.*@\[email protected]}' | awk '{printf("%s|%s\n",$1,"anonymous")}'

SOCKS5

代碼如下

1
2
# Socks 5 Proxies IP基本不能用
curl -fsL http://aliveproxy.com/socks5-list/ | sed -r -n '/^<TABLE class/{s@(.*)@\L\[email protected];s@<\/tr>@\[email protected];s@<\/?(tr|td|table|center|a|br)[[:space:]]*[^>]*>@@g;p}' | sed -r -n '/^[[:digit:].]+/{s@(.*)--.*@\[email protected]}'

ProxyNova

ProxyNova提供HTTP類型代理IP,分爲transparentelite兩種。

URL地址爲 https://www.proxynova.com/proxy-server-list/

注意:IP地址使用document.write('2331.16'.substr(2) + '0.4.90')形式進行混淆,合併單引號中的字符串,去除爲首的2個字符後即爲目標IP(此處爲31.160.4.90)。

輸出形式爲

1
IP:Port|AnonymityLevel|Country|City

代碼如下

1
curl -fsL https://www.proxynova.com/proxy-server-list/ | sed -r -n '/<center>/,/<\/center>/d;/<tbody>/,/<\/tbody>/{s@<\/?(tbody|images|script|a|time|img|div|ins)[[:space:]]*[^>]*>@@g;s@<(td|span)[[:space:]]*[^>]*>@@g;s@^[[:blank:]]*@@g;s@<tr>@@g;p}' | sed -r '/^$/d' | awk '{if($0!~/<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r -n "s@<\/span>@@g;s@(document.write|substr\(2\)|\(|\)|'|;|[[:space:]]*\+[[:space:]]*)@@g;s@(<\/td>)@[email protected];s@\.{1,}@\.@g;s@^23@@g;p" | awk -F\| '{printf("%s:%s|%s|%s\n",$1,$2,tolower($7),$6)}' | sed -r -n 's@-@[email protected];s@[[:space:]]+(|)[[:space:]]+@\[email protected];s@: @:@g;p' | sed -r "/^[^[:digit:]]/d;s@(|)[[:space:]]*@\[email protected]"

Daily Proxy

Daily Proxy提供HTTP類型代理IP,分爲transparenthigh-anonymous兩種。

URL地址爲 http://www.dailyproxylists.com/index.php/proxy-lists

注意:網站使用document.write(unescape(...))形式將關鍵部分代碼進行加密,須先解密獲取HTML標籤後再提取數據。

輸出形式爲

1
IP:Port|Country|AnonymityLevel

代碼如下

1
curl -fsL http://www.dailyproxylists.com/index.php/proxy-lists | sed -r -n '/document.write/{s@<[^>]*>@@g;s@(document.write|unescape|\(|\)|\")@@g;s@^[[:space:]]*@@g;p}' | sed -r -n 's@^[[:blank:]]*@@g;s@[[:blank:]]$@@g;p' | sed 's@\\@\\\\@g;s@\(%\)\([0-9a-fA-F][0-9a-fA-F]\)@\\x\[email protected]' | printf $(cat -) | sed -r -n 's@<\/?tr>@\[email protected];s@<(td)[[:space:]]*[^>]*>@@g;p' | sed -r -n '/^[^[:digit:]]+/d;/^$/d;s@<[^>]*>@[email protected];p' | awk -F\| '{printf("%s:%s|%s|%s\n",$1,$2,tolower($4),$3)}'

HideMyAss

HideMyAss的IP提取方式見本人Blog Extract Free IP:PORT Proxy Lists From HIDEMYASS Via SED & AWK

URL地址爲 http://proxylist.hidemyass.com/

freeproxylists.net

freeproxylists暫未實現通過curl抓取HTML代碼。

Shell Script

以上內容已通過Shell Script實現,使用parallel命令進行並行操作以縮短操作時間,時間縮短90%以上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
#!/usr/bin/env bash
set -u #Detect undefined variable
set -o pipefail #Return return code in pipeline fails
# IFS=$'\n\t' #used in loop, Internal Field Separator
#Target: Extract Proxy IP From Proxy Site On GNU/Linux
######### 0-1. Singal Setting #########
# trap '' HUP #overlook SIGHUP when internet interrupted or terminal shell closed
# trap '' INT #overlook SIGINT when enter Ctrl+C, QUIT is triggered by Ctrl+\
trap funcTrapINTQUIT INT QUIT
funcTrapINTQUIT(){
rm -rf /tmp/temp*.txt
printf "Detect $(tput setaf 1)%s$(tput sgr0) or $(tput setaf 1)%s$(tput sgr0), begin to exit shell\n" "CTRL+C" "CTRL+\\"
exit
}
######### 0-2. Variables Setting #########
# term_cols=$(tput cols) # term_lines=$(tput lines)
readonly c_bold="$(tput bold)"
readonly c_normal="$(tput sgr0)" # c_normal='\e[0m'
# black 0, red 1, green 2, yellow 3, blue 4, magenta 5, cyan 6, gray 7
readonly c_red="${c_bold}$(tput setaf 1)" # c_red='\e[31;1m'
readonly c_blue="$(tput setaf 4)" # c_blue='\e[34m'
list_proxy_sites=${list_proxy_sites:-0}
proxy_site_specify=${proxy_site_specify:-}
include_country=${include_country:-}
exclude_country=${exclude_country:-}
protocol_type=${protocol_type:-}
anonymity_type=${anonymity_type:-}
proxy_server=${proxy_server:-}
use_proxy=${use_proxy:-0}
user_agent=${user_agent:-'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6.4) AppleWebKit/537.29.20 (KHTML, like Gecko) Chrome/60.0.3030.92 Safari/537.29.20'}
real_country=${real_country:-}
######### 1-1 Initialization Prepatation #########
funcHelpInfo(){
cat <<EOF
${c_blue}Usage:
script [options] ...
script | sudo bash -s -- [options] ...
Extracting Proxy IP (HTTP/SOCKS) From Proxy Sites On GNU/Linux!
[available option]
-h --help, show help info
-l --list all supported proxy sites
-s site --specify proxy site No. listed in '-l'
-t protocol --protocol type (http|socks4|socks5), default is 'socks5'
-a anonymity --anonymity level for http (low|medium|high), default is 'high'
-p [protocol:]ip:port --proxy host (http|https|socks4|socks5), default protocol is http
${c_normal}
EOF
# -i country --just include specified country
# -e country --exclude specified country
}
funcExitStatement(){
local str="$*"
[[ -n "$str" ]] && printf "%s\n" "$str" && exit
}
funcCommandExistCheck(){
# $? -- 0 is find, 1 is not find
local name="$1"
if [[ -n "$name" ]]; then
executing_path=$(which "$name" 2> /dev/null || command -v "$name" 2> /dev/null)
[[ -n "${executing_path}" ]] && return 0 || return 1
else
return 1
fi
}
funcInitializationCheck(){
# 1 - Check root or sudo privilege
# [[ "$UID" -ne 0 ]] && funcExitStatement "${c_red}Sorry${c_normal}, this script requires superuser privileges (eg. root, su)."
# 2 - OS support check
[[ -f /etc/os-release || -f /etc/SuSE-release || -f /etc/redhat-release || (-f /etc/debian_version && -f /etc/issue.net) ]] || funcExitStatement "${c_red}Sorry${c_normal}, this script doesn't support you system!"
# 3 - bash version check ${BASH_VERSINFO[@]} ${BASH_VERSION}
# bash --version | sed -r -n '1s@[^[:digit:]]*([[:digit:].]*).*@\[email protected]'
[[ "${BASH_VERSINFO[0]}" -lt 4 ]] && funcExitStatement "${c_red}Sorry${c_normal}, this script need BASH version 4+, your current version is ${c_blue}${BASH_VERSION%%-*}${c_normal}."
if ! funcCommandExistCheck 'seq'; then
funcExitStatement "${c_red}Error${c_normal}, No ${c_blue}seq${c_normal} command found!"
fi
if ! funcCommandExistCheck 'parallel'; then
funcExitStatement "${c_red}Error${c_normal}, No ${c_blue}parallel${c_normal} command found!"
fi
}
funcInternetConnectionCheck(){
# CentOS: iproute Debian/OpenSUSE: iproute2
if funcCommandExistCheck 'ip'; then
gateway_ip=$(ip route | awk 'match($1,/^default/){print $3}')
elif funcCommandExistCheck 'netstat'; then
gateway_ip=$(netstat -rn | awk 'match($1,/^Destination/){getline;print $2;exit}')
else
funcExitStatement "${c_red}Error${c_normal}: No ${c_blue}ip${c_normal} or ${c_blue}netstat${c_normal} command found, please install it!"
fi
! ping -q -w 1 -c 1 "$gateway_ip" &> /dev/null && funcExitStatement "${c_red}Error${c_normal}: No Internet connection, please check it!" # Check Internet Connection
}
funcDownloadToolCheck(){
local proxy_pattern="^((http|https|socks4|socks5):)?[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}:[0-9]{1,5}$"
proxy_server=${proxy_server:-}
if [[ -n "${proxy_server}" ]]; then
if [[ "${proxy_server}" =~ $proxy_pattern ]]; then
use_proxy=1
local proxy_proto_pattern="^((http|https|socks4|socks5):)"
if [[ "${proxy_server}" =~ $proxy_proto_pattern ]]; then
local p_proto="${proxy_server%%:*}"
local p_host="${proxy_server#*:}"
else
local p_proto='http'
local p_host="${proxy_server}"
fi
else
funcExitStatement "${c_red}Error${c_normal}: please specify right proxy host addr like ${c_blue}[protocol:]ip:port${c_normal}!"
fi
fi
local retry_times=${retry_times:-5}
local retry_delay_time=${retry_delay_time:-1}
local connect_timeout_time=${connect_timeout_time:-2}
local referrer_page=${referrer_page:-'https://duckduckgo.com/?q=github'}
if funcCommandExistCheck 'curl'; then
download_tool_origin="curl -fsL --retry ${retry_times} --retry-delay ${retry_delay_time} --connect-timeout ${connect_timeout_time} --no-keepalive"
if [[ -n "${proxy_server}" ]]; then
# curl version > 7.21.7
case "${p_proto}" in
# https ) export HTTPS_PROXY="${p_host}" ;;
socks4 ) download_tool_proxy="${download_tool_origin} -x ${p_proto}a://${p_host}";;
socks5 ) download_tool_proxy="${download_tool_origin} -x ${p_proto}h://${p_host}";;
http|* ) download_tool_proxy="${download_tool_origin} -x ${p_host}";;
esac
fi
else
funcExitStatement "${c_red}Error${c_normal}: can't find command ${c_blue}curl${c_normal}s!"
fi
if [[ "${use_proxy}" -eq 1 ]]; then
download_tool="${download_tool_proxy}"
else
download_tool="${download_tool_origin}"
fi
}
######### 1-2 Initialization Operation #########
# start_time=$(date +'%s') # processing start time
while getopts "hls:i:e:t:a:p:" option "$@"; do
case "$option" in
l ) list_proxy_sites=1 ;;
s ) proxy_site_specify="$OPTARG" ;;
i ) include_country="$OPTARG" ;;
e ) exclude_country="$OPTARG" ;;
t ) protocol_type="$OPTARG" ;;
a ) anonymity_type="$OPTARG" ;;
p ) proxy_server="$OPTARG" ;;
h|\? ) funcHelpInfo && exit ;;
esac
done
proxy_site_info=$(mktemp -t tempXXXXXX.txt)
cat > "${proxy_site_info}" <<EOF
No|Site|CN|socks4|socks5|transparent|anonymous|high-anonymous(elite)|Site
1|SamAir|1|1|1|1|1|1|https://premproxy.com
2|Nntime|1|0|0|1|1|1|http://nntime.com
3|PROXYS™|1|0|0|1|0|1|http://www.proxys.com.ar
4|Proxz|0|0|0|0|0|1|http://www.proxz.com
5|AliveProxy|0|0|1|0|1|1|http://www.aliveproxy.com
6|ProxyNova|0|0|0|1|0|1|https://www.proxynova.com
7|Daily Proxy|0|0|0|1|0|1|http://www.dailyproxylists.com
8|HideMyAss|0|0|0|1|1|1|http://proxylist.hidemyass.com
# 9|freeproxylists.net|0|0|0|1|1|1|http://freeproxylists.net/ 暫未實現通過curl抓取
EOF
######### 2-1. List Proxy Sites #########
funcListProxySites(){
awk -F\| 'BEGIN{printf("%-3s %-12s %10s\n","No","Site","URL")}match($1,/^[[:digit:]]/){printf("%-3s %-12s %-20s\n",$1,$2,$NF)}' "${proxy_site_info}"
# awk -F\| 'BEGIN{printf("%-3s %-12s %4s %8s %8s %8s %8s %6s\n","No","Site","CN","socks4","socks5","transparent","anonymous","elite")}match($1,/^[[:digit:]]/){printf("%-3s %-12s %4s %6s %6s %8s %12s %8s\n",$1,$2,$3,$4,$5,$6,$7,$8)}' "${proxy_site_info}"
exit
}
######### 3 Extract Proxy IP From HTML Page #########
proxy_ip_extracted=$(mktemp -t tempXXXXXX.txt)
######### 3-1. SamAir (https://premproxy.com) #########
# HTTP: transparent, anonymous, high-anonymous
funcProxySite_1(){
# IP:Port|AnonymityLevel|Country|City|ISP
local page_url='https://premproxy.com/list/'
page_no=$($download_tool "${page_url}" | sed -r -n '/ptabletitle/{s@(<[^>]*>|\(|\))@@g;s@.*of (.*)@\[email protected]}')
if [[ "${page_no}" -lt 10 ]]; then
seq -f 0%g 1 "${page_no}" | parallel -k -j 0 -X $download_tool "${page_url}"{}.htm 2> /dev/null | sed -r -n '/ptabletitle/,/pageinfo/{/tr class/{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | sed '/^[[:space:]]*$/d' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$6)}' >> "${proxy_ip_extracted}"
else
seq -f 0%g 1 9 | parallel -k -j 0 -X $download_tool "${page_url}"{}.htm 2> /dev/null | sed -r -n '/ptabletitle/,/pageinfo/{/tr class/{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | sed '/^[[:space:]]*$/d' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$6)}' >> "${proxy_ip_extracted}"
seq 10 "${page_no}" | parallel -k -j 0 -X $download_tool "${page_url}"{}.htm 2> /dev/null | sed -r -n '/ptabletitle/,/pageinfo/{/tr class/{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | sed '/^[[:space:]]*$/d' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$6)}' >> "${proxy_ip_extracted}"
fi
}
# SOCKS: socks4, socks5
funcProxySite_1_socks(){
# IP:Port|AnonymityLevel|Country|City|ISP
local page_url='https://premproxy.com/socks-list/'
page_no=$($download_tool "${page_url}" | sed -r -n '/next/{s@<[^>]*>@@gp}' | awk '{print $(NF-1)}')
seq -f 0%g 1 "${page_no}" | parallel -k -j 0 -X $download_tool "${page_url}"{}.htm 2> /dev/null | sed -r -n '/^<tr><td>/{{s@<\/?(tr)[[:space:]]*[^>]*>@@g;s@<td>@@g;s@[[:space:]]*<\/td>@[email protected];s@>.*@@g;s@(<dfn title="|")@@g;p}}' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,tolower($2),$4,$5,$6)}' >> "${proxy_ip_extracted}"
}
######### 3-2. Nntime (http://nntime.com) #########
# HTTP: transparent, anonymous, high-anonymous
funcProxySite_2(){
# IP:Port|AnonymityLevel|Country|City|ISP
local page_url='http://nntime.com/'
page_no=$($download_tool "${page_url}" | sed -r -n '/navigation/{{s@(<[^>]*>|\(|\)|next)@@g;p}}' | awk '{print $NF}')
if [[ "${page_no}" -lt 10 ]]; then
seq -f 0%g 1 "${page_no}" | parallel -k -j 0 -X $download_tool "${page_url}"proxy-list-{}.htm 2> /dev/null | sed -r -n '/<\/thead>/,/<\/table>/{s@<\/?(thead|table|dfn|script)[[:space:]]*[^>]*>@@g;s@<(td|tr)[[:space:]]*[^>]*>@@g;s@<input.*value=\"(.*)\" onclick.*\/>@\[email protected];s@(\"|\:|\+)@@g;[email protected]\((.*)\)@|\[email protected];p}' | sed -r -n '/^[[:blank:]]*$/d;s@(<\/td>)@[email protected];s@\)@@g;p' | awk '{if($0!~/^<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r 's@[[:space:]]*(\|)[[:space:]]*@\[email protected]' | awk -F\| '{str_start_pos=(length($1)-length($3)+1);port=substr($1,str_start_pos); sub(/[[:space:]]*proxy/,"",$4); printf("%s:%s|%s|%s|%s\n",$2,port,$4,$7,$6)}' | sed -r 's@[[:space:]]*\(@[email protected]' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$3)}' >> "${proxy_ip_extracted}"
else
seq -f 0%g 1 9 | parallel -k -j 0 -X $download_tool "${page_url}"proxy-list-{}.htm 2> /dev/null | sed -r -n '/<\/thead>/,/<\/table>/{s@<\/?(thead|table|dfn|script)[[:space:]]*[^>]*>@@g;s@<(td|tr)[[:space:]]*[^>]*>@@g;s@<input.*value=\"(.*)\" onclick.*\/>@\[email protected];s@(\"|\:|\+)@@g;[email protected]\((.*)\)@|\[email protected];p}' | sed -r -n '/^[[:blank:]]*$/d;s@(<\/td>)@[email protected];s@\)@@g;p' | awk '{if($0!~/^<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r 's@[[:space:]]*(\|)[[:space:]]*@\[email protected]' | awk -F\| '{str_start_pos=(length($1)-length($3)+1);port=substr($1,str_start_pos); sub(/[[:space:]]*proxy/,"",$4); printf("%s:%s|%s|%s|%s\n",$2,port,$4,$7,$6)}' | sed -r 's@[[:space:]]*\(@[email protected]' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$3)}' >> "${proxy_ip_extracted}"
seq 10 "${page_no}" | parallel -k -j 0 -X $download_tool "${page_url}"proxy-list-{}.htm 2> /dev/null | sed -r -n '/<\/thead>/,/<\/table>/{s@<\/?(thead|table|dfn|script)[[:space:]]*[^>]*>@@g;s@<(td|tr)[[:space:]]*[^>]*>@@g;s@<input.*value=\"(.*)\" onclick.*\/>@\[email protected];s@(\"|\:|\+)@@g;[email protected]\((.*)\)@|\[email protected];p}' | sed -r -n '/^[[:blank:]]*$/d;s@(<\/td>)@[email protected];s@\)@@g;p' | awk '{if($0!~/^<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r 's@[[:space:]]*(\|)[[:space:]]*@\[email protected]' | awk -F\| '{str_start_pos=(length($1)-length($3)+1);port=substr($1,str_start_pos); sub(/[[:space:]]*proxy/,"",$4); printf("%s:%s|%s|%s|%s\n",$2,port,$4,$7,$6)}' | sed -r 's@[[:space:]]*\(@[email protected]' | awk -F\| '{printf("%s|%s|%s|%s|%s\n",$1,$2,$4,$5,$3)}' >> "${proxy_ip_extracted}"
fi
}
######### 3-3. PROXYS™ (http://www.proxys.com.ar) #########
# HTTP: transparente, elite
funcProxySite_3(){
# IP:Port|AnonymityLevel|Country
local page_url='http://www.proxys.com.ar/'
$download_tool "${page_url}" | sed -r -n '/st-tables-page/{s@<\/?(ins|script|a|thead|tbody)[[:space:]]*[^>]*>?@@g;s@<\/tr>@\[email protected];s@<(tr|td)>@@g;p}' | sed -r -n '/^[[:digit:]]+/{s@<\/td>@[email protected];p}' | awk -F\| '{printf("%s:%s|%s|%s\n",$1,$2,tolower($4),$3)}' >> "${proxy_ip_extracted}"
}
######### 3-4. Proxz (http://www.proxz.com) #########
funcProxySite_4(){
# IP:Port|AnonymityLevel|Country
local page_url='http://www.proxz.com/'
page_no=$($download_tool --user-agent "\"${user_agent}\"" "${page_url}" "${page_url}"proxy_list_high_anonymous_0_ext.html | sed -r -n '/^<\/td><\/tr><\/table>/{s@(<[^>]*>|::..)@@g;p}' | awk -F: '{print $NF}')
urldecode() { : "${*}" ; echo -e "${_}" | sed 's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e | sed -r -n 's@.*\("(.*)"\).*@\[email protected];s@%2e@.@g;p'; }
seq 0 "${page_no}" | parallel -k -j 0 -X $download_tool --user-agent "\"${user_agent}\"" "${page_url}"proxy_list_high_anonymous_{}_ext.html 2> /dev/null | sed -r -n "/eval\(unescape/{s@<\/td><\/tr>@@;s@<\/tr>@\[email protected];s@<noscript>Please enable javascript<\/noscript>@@g;s@<\/?(tr|a|script)[[:space:]]*[^>]*>@@g;s@(<td>|\(|\)|;)@@g;[email protected]@@g;s@'@@g;s@<\/td>@[email protected];s@<td[[:space:]]*[^>]*>@@g;p}" | sed '/^$/d' | while IFS="|" read -r a b c d e f;do ip=$(urldecode "$a"); echo "$ip:$b|${c,,}|$d" >> "${proxy_ip_extracted}"; done
}
######### 3-5. AliveProxy (http://www.aliveproxy.com) #########
# HTTP: anonymous, high-anonymous
funcProxySite_5(){
local page_url='http://aliveproxy.com/'
# High Anonymous Proxies
$download_tool "${page_url}high-anonymity-proxy-list/" | sed -r -n '/^<TABLE class/{s@(.*)@\L\[email protected];s@<\/tr>@\[email protected];s@<\/?(tr|td|table|center|a|br)[[:space:]]*[^>]*>@@g;p}' | sed -r -n '/^[[:digit:].]+/{s@(.*)--.*@\[email protected]}' | awk '{printf("%s|%s\n",$1,"high-anonymous")}' >> "${proxy_ip_extracted}"
# Anonymous Proxies
$download_tool "${page_url}anonymous-proxy-list/" | sed -r -n '/^<TABLE class/{s@(.*)@\L\[email protected];s@<\/tr>@\[email protected];s@<\/?(tr|td|table|center|a|br)[[:space:]]*[^>]*>@@g;p}' | sed -r -n '/^[[:digit:].]+/{s@(.*)--.*@\[email protected]}' | awk '{printf("%s|%s\n",$1,"anonymous")}' >> "${proxy_ip_extracted}"
}
funcProxySite_5_socks(){
# Socks 5 Proxies 基本不能用
local page_url='http://aliveproxy.com/socks5-list/'
$download_tool "${page_url}" | sed -r -n '/^<TABLE class/{s@(.*)@\L\[email protected];s@<\/tr>@\[email protected];s@<\/?(tr|td|table|center|a|br)[[:space:]]*[^>]*>@@g;p}' | sed -r -n '/^[[:digit:].]+/{s@(.*)--.*@\[email protected]}' >> "${proxy_ip_extracted}"
}
######### 3-6. ProxyNova (https://www.proxynova.com) #########
# HTTP: transparent, elite
funcProxySite_6(){
# IP:Port|AnonymityLevel|Country|City
local page_url='https://www.proxynova.com/proxy-server-list/'
$download_tool "${page_url}"| sed -r -n '/<center>/,/<\/center>/d;/<tbody>/,/<\/tbody>/{s@<\/?(tbody|images|script|a|time|img|div|ins)[[:space:]]*[^>]*>@@g;s@<(td|span)[[:space:]]*[^>]*>@@g;s@^[[:blank:]]*@@g;s@<tr>@@g;p}' | sed -r '/^$/d' | awk '{if($0!~/<\/tr>/){ORS=" ";print $0}else{printf "\n"}}' | sed -r -n "s@<\/span>@@g;s@(document.write|substr\(2\)|\(|\)|'|;|[[:space:]]*\+[[:space:]]*)@@g;s@(<\/td>)@[email protected];s@\.{1,}@\.@g;s@^23@@g;p" | awk -F\| '{printf("%s:%s|%s|%s\n",$1,$2,tolower($7),$6)}' | sed -r -n 's@-@[email protected];s@[[:space:]]+(|)[[:space:]]+@\[email protected];s@: @:@g;p' | sed -r "/^[^[:digit:]]/d;s@(|)[[:space:]]*@\[email protected]" >> "${proxy_ip_extracted}"
}
######### 3-7. Daily Proxy (http://www.dailyproxylists.com) #########
# HTTP: transparent, high-anonymous
funcProxySite_7(){
# IP:Port|AnonymityLevel|Country
local page_url='http://www.dailyproxylists.com/index.php/proxy-lists'
$download_tool "${page_url}" | sed -r -n '/document.write/{s@<[^>]*>@@g;s@(document.write|unescape|\(|\)|\")@@g;s@^[[:space:]]*@@g;p}' | sed -r -n 's@^[[:blank:]]*@@g;s@[[:blank:]]$@@g;p' | sed 's@\\@\\\\@g;s@\(%\)\([0-9a-fA-F][0-9a-fA-F]\)@\\x\[email protected]' | printf $(cat -) | sed -r -n 's@<\/?tr>@\[email protected];s@<(td)[[:space:]]*[^>]*>@@g;p' | sed -r -n '/^[^[:digit:]]+/d;/^$/d;s@<[^>]*>@[email protected];p' | awk -F\| '{printf("%s:%s|%s|%s\n",$1,$2,tolower($4),$3)}' >> "${proxy_ip_extracted}"
}
######### 3-8. HideMyAss (http://proxylist.hidemyass.com) #########
# HTTP: high-anonymous
funcProxySite_8(){
local page_url='http://proxylist.hidemyass.com/search-1303043#listable'
local start=1
proxy_list_html=$(mktemp -t tempXXXXX.txt)
tempfile_perip=$(mktemp -t tempXXXXX.txt)
$download_tool "${page_url}" | sed -r -n '/table section/,/table section end/{/^$/d;/indicator/d;s@^[[:space:]]*@@;/^<[\/]?(td|div|span)>$/d;p}' | sed -r -n '/leftborder/,/<\/tr>/{p}' > "${proxy_list_html}"
sed -n '/<\/tr>/=' "${proxy_list_html}" | while read -r line;do
# echo "start $start, end $line";
sed -r -n ''"${start},${line}"'p' "${proxy_list_html}" > "${tempfile_perip}"
country=$(sed -r -n '/img src=/{n;s@<[^>]*>@@p}' "${tempfile_perip}" | sed -r -n 's@^[[:space:]]*@@g;s@[[:space:]]*$@@g;p')
port=$(sed -r -n '/class=\"country\"/{x;s@<[^>]*>@@p};h' "${tempfile_perip}" | sed -r -n 's@^[[:space:]]*@@g;s@[[:space:]]*$@@g;p')
class_none_list=$(sed -r -n '/^\..*none/s@.(.*)\{.*@\[email protected]' "${tempfile_perip}" | awk 'BEGIN{RS=EOF}{gsub(/\n/,"|");print}')
ip=$(sed -r -n '/^<\/style/{s@<\/[^>]*>@\[email protected];p}' "${tempfile_perip}" | sed -r 's@\.@@g' | sed -r -n 's@^([[:digit:]]+)(<.*)$@\1\n\2@;p' | sed -r -n '/^$/d;/(none|\.)/!p' | sed -r -n '/('"${class_none_list}"')/d;s@<[^>]*>@@;/^$/d;p' | awk 'BEGIN{RS=EOF}{gsub(/\n/," ");print}' | awk '{printf("%s.%s.%s.%s",$1,$2,$3,$4)}')
echo "$ip:$port|high-anonymous|$country" >> "${proxy_ip_extracted}"
start=$((line+1));
done
[[ -f "${proxy_list_html:-}" ]] && rm -f "${proxy_list_html}"
[[ -f "${tempfile_perip:-}" ]] && rm -f "${tempfile_perip}"
}
######### 3. Executing Process #########
funcSpecificProxyIPTesting(){
line="$1"
ip_addr=$(echo "${line}" | awk -F\| '{print $1}')
anonymity=$(echo "${line}" | awk -F\| '{print $2}')
country=$(echo "${line}" | awk -F\| '{print $3}')
city=$(echo "${line}" | awk -F\| '{print $4}')
isp=$(echo "${line}" | awk -F\| '{print $5}')
local curl_speed_time=${curl_speed_time:-1} #time second -y, --speed-time <time>
local curl_speed_limit=${curl_speed_limit:-3} # speed byte -Y, --speed-limit <speed>
local curl_max_time=${curl_max_time:-1.5} #time second -m, --max-time <seconds>
case "${anonymity,,}" in
socks5 ) protocol_str="socks5h://" ;;
socks4 ) protocol_str="socks4a://" ;;
* ) protocol_str="" ;;
esac
if [[ -n $(curl -fsL --speed-time "${curl_speed_time}" --speed-limit "${curl_speed_limit}" --max-time "${curl_max_time}" -x "${protocol_str}${ip_addr}" ipinfo.io/country 2> /dev/null) ]]; then
echo "$ip_addr|$country|$city|$isp"
fi
}
funcProxyIPExtraction(){
echo "IP testing process will cost some time, just be patient!"
case "${protocol_type,,}" in
h|http|https ) protocol_type='http' ;;
socks4 ) protocol_type='socks4' ;;
socks5 ) protocol_type='socks5' ;;
* ) protocol_type='socks5' ;;
esac
real_country=$($download_tool_origin ipinfo.io/country)
if [[ "${real_country}" == 'CN' ]]; then
if [[ "${protocol_type}" =~ ^socks ]]; then
funcProxySite_1_socks
else
if [[ "${proxy_site_specify}" -gt 0 && "${proxy_site_specify}" -le 3 ]]; then
funcProxySite_"${proxy_site_specify}"
else
funcProxySite_1
funcProxySite_2
funcProxySite_3
fi
fi
else
if [[ "${protocol_type}" =~ ^socks ]]; then
funcProxySite_1_socks
funcProxySite_5_socks
else
if [[ "${proxy_site_specify}" -gt 0 && "${proxy_site_specify}" -le 9 ]]; then
funcProxySite_"${proxy_site_specify}"
else
funcProxySite_1
fi
fi
fi
if [[ -f "${proxy_ip_extracted}" ]]; then
if [[ "${protocol_type}" =~ ^s ]]; then
filter_str="${protocol_type}"
else
case "${anonymity_type,,}" in
l|low ) filter_str='transparent|transparente' ;;
m|medium ) filter_str='anonymous' ;;
h|high|* ) filter_str='high|high-anonymous|elite' ;;
esac
fi
printf "Protocol type is ${c_red}%s${c_normal}.\n\n" "${protocol_type^^}"
export -f funcSpecificProxyIPTesting
awk -F\| 'match($2,/^('"${filter_str}"')/){print $0}' "${proxy_ip_extracted}" | parallel -k -j 0 funcSpecificProxyIPTesting 2> /dev/null
fi
}
######### 4. Executing Process #########
funcInitializationCheck
funcInternetConnectionCheck
funcDownloadToolCheck
[[ "${list_proxy_sites}" -eq 1 ]] && funcListProxySites
funcProxyIPExtraction
######### 5. EXIT Singal Processing #########
# trap "commands" EXIT # execute command when exit from shell
funcTrapEXIT(){
[[ -f "${proxy_site_info:-}" ]] && rm -f "${proxy_site_info}"
[[ -f "${proxy_ip_extracted:-}" ]] && rm -f "${proxy_ip_extracted}"
}
trap funcTrapEXIT EXIT
# Script End

Script Execution

演示過程如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# time bash /tmp/proxy.sh -t h
IP testing process will cost some time, just be patient!
Protocol type is HTTP.
5.249.148.50:3128|Italy|Arezzo|Aruba S.p.A.
181.65.239.105:3128|Peru||Telefonica del Peru
36.67.85.242:53281|Indonesia||PT Telkom Indonesia
207.154.225.175:8080|Germany|Frankfurt|Digital Ocean
103.248.233.156:80|India|Ahmedabad|Ishan Infotech Limited
186.42.253.246:8080|Ecuador|Quito|Corporacion Nacional De Telecomunicaciones Cnt S.A
185.82.212.95:8080|Czech Republic||Whois protection s.r.o.
real 0m10.559s
user 0m2.352s
sys 0m0.872s

Problem Occuring

在實際操作中,發現有些代理IP網站採用了一些手段以提高爬蟲抓取數據的難度。具體如下

document.write(“:”+z+v)

Nntime端口號採用document.write(":"+z+v)形式,字母與數字的對應關係每一頁都不相同。HTML代碼如下

1
2
<!-- 51.254.214.236:24631 -->
<tr class="odd"><td><input type="checkbox" name="c15" id="row15" value="52576131.254.214.236754230924631" onclick="choice()" /></td><td>51.254.214.236<script type="text/javascript">document.write(":"+i+x+l+y+j)</script></td>

端口號的位數由document.write中字符個數決定,如此例中ixlyj有5位。從input的value中從右往左截取5個字符,爲24631,此即爲端口號

javascript unescape

Proxz須指定user-agent才能獲取HTML頁面。頁面中IP地址使用javascript的unescape操作轉換爲長字符串,須對其進行反向操作復原爲IP。

轉換爲IP的方法如下

1
2
3
4
# 111.8.22.204
urldecode() { : "${*}" ; echo -e "${_}" | sed 's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e | sed -r -n 's@.*\("(.*)"\).*@\[email protected];s@%2e@.@g;p'; }
text="%73%65%6c%66%2e%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%6c%6e%28%22%31%31%31%2e%38%2e%32%32%2e%32%30%34%22%29%3b"
echo $(urldecode "$text")

document.write substr

ProxyNova使用document.write('2331.16'.substr(2) + '0.4.90')形式進行混淆IP地址,要獲取目標IP,須先合併單引號中的字符串,再去除爲首的2個字符後(此處爲31.160.4.90)。

document.writ unescape

Daily Proxy使用document.write(unescape(...))形式將關鍵部分代碼進行加密,須先解密獲取HTML標籤後再提取數據。

Change Logs

  • 2017.06.14 21:43 Wed Asia/Shanghai
    • 初稿完成
  • 2017.06.23 11:28 Fri Asia/Shanghai
    • 添加Shell Script