1
wangbenjun5 2021-07-18 19:33:45 +08:00
爬虫写的好,牢饭吃得饱
|
2
learningman 2021-07-18 20:50:50 +08:00 1
"登录前请求头里的 Authorization 是从哪里获取的"
XMLHttpRequest 或者 fetch 的参数,js 生成的 但是我觉得你这个都搞不懂,多半也就不用指望能自己破解了。。。 |
3
playniuniu 2021-07-19 10:04:09 +08:00
|
4
vone 2021-07-19 10:10:12 +08:00
JWT 做的登录认证,其实就是 Base64URL 编码。
我从 account.t-mobile.com/signin/v2/的请求(未登录)中随便拿了一个请求的 authorization: Bearer eyJraWQiOiI0NDY3MzUxNy04MTc4LTJjYTMtOWU3MC1mZTZiYjg4YjU2OTIiLCJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJTSURXZWIiLCJydCI6IntcInNlZ21lbnRhdGlvbklkXCI6XCJUSVRBTlwifSIsImRlYWxlckNvZGUiOiIiLCJpc3MiOiJodHRwczpcL1wvYXBpLnQtbW9iaWxlLmNvbVwvb2F1dGgyXC92NiIsIm1hc3RlckRlYWxlckNvZGUiOiIiLCJhdXRoVGltZSI6IjE2MjY2NjAyMDcwMTgiLCJzdG9yZUlkIjoiIiwidXNuIjoiOTFhZGJlZDEtYWRiYy1jYTdlLTkzZjQtMmQzMmZjMmIxM2VhIiwiYXVkIjoiU0lEV2ViIiwic2VuZGVySWQiOiIiLCJuYmYiOjE2MjY2NjAyMDcsInNjb3BlIjoiIiwiY25mIjoiLS0tLS1CRUdJTiBQVUJMSUMgS0VZLS0tLS1NSUlCSWpBTkJna3Foa2lHOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDQVFFQXJhVENxSU55c2tldmRCMmlcL1wvV2ttSWhQTHNJcFRvdFN6Z2FJRm94ZFdocGFQQ0NnSkNcL1hsTk9tT0lPQU5ubVZxalpMY3pjSU8xOHlFM3N4UHBXWktOdEgyY0grS1FtaFgrV05NeVNTMWhlem81WWpRcnJka1JhK1hXeFN1ZXl2WXZmNlBTRmtUXC9sZlpESlhUY1hET3g4WlYrMWF0QVp6U1JFbTFVbGpCRVZuODg0T2tUUDh6SENlRFJ3UXFpQ09ZWnZFdkxoTnBRdXk5K0hmMG9Zc0FQcVNTTGdHdmtuXC9RYjVMMytocmlzOWxSQTh1SXlIU0Uxc2F0WU1FcjFWbUUyWExyMkpOOTVaalc2eU50Q0lVSE1aN2MxUHF6emwrcUMzbGVrbHpXRWh5WjBhbWc4SkE2VTlRZEhtdm5La1RWaVZkNlphYWgwOHJKM3VLTkw3Z2xRSURBUUFCLS0tLS1FTkQgUFVCTElDIEtFWS0tLS0tIiwiYXBwbGljYXRpb25JZCI6IiIsImV4cCI6MTYyNjY2MzgwNywiaWF0IjoxNjI2NjYwMjA3LCJjaGFubmVsSWQiOiIiLCJqdGkiOiI4MzlmOWIyYy1lYzRhLWJkODctODU1Mi1lNjk1NDhiYTBlNTkifQ.VL3ycdnrwGyNdN_p201muTg7SUBVNUs6xZdR3B7oEAjask-pWtA2h_9M91I_u1hHkHRoriV1wd1UUPTdJ7DGcWGQtJ2dhb3s_IwpJu_ppY8nnEHhAz8O7fhGOeBpXxlI_W6FEulCznh-c5El3DcHBDccIYiU2xgPcGBQDOv7zU5e3YslOvOFCzLLLgNnRSQDRirf_nKZPOdn79TtL5OzgPiY85OP5YJcJYqAD2QUtOekML59s8Y--wbrTQudS_9uqMOSDFttaF6FzH8hOw0q7-rq-MlrumIQQgPAQxHHFdjy6o3fpo6lDKLSyGYhI90G_Zi4JyeFwpx0p4OHTuG7DQ JWT 有三段信息( Header.Payload.Signature ),用符号 “.”分割,分别用 Base64URL 解码。 Header: {"kid":"44673517-8178-2ca3-9e70-fe6bb88b5692","typ":"JWT","alg":"RS256"} Payload: {"sub":"SIDWeb","rt":"{\"segmentationId\":\"TITAN\"}","dealerCode":"","iss":"https:\/\/api.t-mobile.com\/oauth2\/v6","masterDealerCode":"","authTime":"1626660207018","storeId":"","usn":"91adbed1-adbc-ca7e-93f4-2d32fc2b13ea","aud":"SIDWeb","senderId":"","nbf":1626660207,"scope":"","cnf":"-----BEGIN PUBLIC KEY-----MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAraTCqINyskevdB2i\/\/WkmIhPLsIpTotSzgaIFoxdWhpaPCCgJC\/XlNOmOIOANnmVqjZLczcIO18yE3sxPpWZKNtH2cH+KQmhX+WNMySS1hezo5YjQrrdkRa+XWxSueyvYvf6PSFkT\/lfZDJXTcXDOx8ZV+1atAZzSREm1UljBEVn884OkTP8zHCeDRwQqiCOYZvEvLhNpQuy9+Hf0oYsAPqSSLgGvkn\/Qb5L3+hris9lRA8uIyHSE1satYMEr1VmE2XLr2JN95ZjW6yNtCIUHMZ7c1Pqzzl+qC3leklzWEhyZ0amg8JA6U9QdHmvnKkTViVd6Zaah08rJ3uKNL7glQIDAQAB-----END PUBLIC KEY-----","applicationId":"","exp":1626663807,"iat":1626660207,"channelId":"","jti":"839f9b2c-ec4a-bd87-8552-e69548ba0e59"} Signature:乱码 解码地址: https://base64.guru/standards/base64url/decode |
5
ch2 2021-07-19 11:26:44 +08:00
最简单的做法是用 selenium 打开登录,然后把 cookie 拿下来用 cookie 爬
失效了就再开一次,不要自己模拟。综合两种方法的优点 |
6
rv54ntjwfm3ug8 OP @learningman #2: 这个网站的 js 都做了压缩和混淆,根本找不到 fetch 。
@ch2 #5: 早上尝试了 selenium,但这个网站有大量对自动化程序的检测,尝试了好几种办法都没绕过。 @vone #4: 这个网站 initSession 请求头中还有大量的 x-mag09e7sc- 这样的东西,请问是怎么产生的?也与 JWT 有关么? |
7
learningman 2021-07-19 12:18:34 +08:00 via Android
@theklf4 找不到说明你的水平有问题,毕竟 js 想要发出请求,只能用这两个 API,他没法换别的。
|
8
triplelift 2021-07-19 12:45:06 +08:00
爬虫最省事的方法还是用 puppeteer 。静态工具碰到 js 加载的内容就搞不定了,其他语言的 cdp 实现功能也不够强大。
|
9
ch2 2021-07-19 13:37:13 +08:00
@theklf4 #6 不要用 selenium 直接开 webdriver,用命令行开个 chrome,指定 remote-debug 。然后再用 selenium 去控制,这样就能绕过自动化检测,本质上跟你手动开浏览器没区别
|
10
rv54ntjwfm3ug8 OP @ch2 #9: 一样的,而且我发现即使是 selenium 开的 Chrome,不做任何处理也能正常手动登录。但即使是手动开的 Chrome,一旦 selenium 对页面做了任何操作(即使只是点击了输入框)手动输入密码登录也会失败。
|