您现在的位置是：首页 > 后端

当前栏目

python 提取主域名和子域名代码——先根据规则提取，如果有问题，则使用tldextract

Python 域名规则代码根据提取如果问题

2023-09-14 09:11:54 时间

import tldextract



def extract_domain(domain):
    suffix = {'.com','.la','.io', '.co', '.cn','.info', '.net', '.org','.me', '.mobi', '.us', '.biz', '.xxx', '.ca', '.co.jp', '.com.cn', '.net.cn', '.org.cn', '.mx','.tv', '.ws', '.ag', '.com.ag', '.net.ag', '.org.ag','.am','.asia', '.at', '.be', '.com.br', '.net.br', '.name', '.live', '.news', '.bz', '.tech', '.pub', '.wang', '.space', '.top', '.xin', '.social', '.date', '.site', '.red', '.studio', '.link', '.online', '.help', '.kr', '.club', '.com.bz', '.net.bz', '.cc', '.band', '.market', '.com.co', '.net.co', '.nom.co', '.lawyer', '.de', '.es', '.com.es', '.nom.es', '.org.es', '.eu', '.wiki', '.design', '.software', '.fm', '.fr', '.gs', '.in', '.co.in', '.firm.in', '.gen.in', '.ind.in', '.net.in', '.org.in', '.it', '.jobs', '.jp', '.ms', '.com.mx', '.nl','.nu','.co.nz','.net.nz', '.org.nz', '.se', '.tc', '.tk', '.tw', '.com.tw', '.idv.tw', '.org.tw', '.hk', '.co.uk', '.me.uk', '.org.uk', '.vg'}

    domain = domain.lower()
    names = domain.split(".")
    if len(names) >= 3:
        if ("."+".".join(names[-2:])) in suffix:
            return ".".join(names[-3:]), ".".join(names[:-3])
        elif ("."+names[-1]) in suffix:
            return ".".join(names[-2:]), ".".join(names[:-2])
    print "New domain suffix found. Use tld extract domain..."

    pos = domain.rfind("/")
    if pos >= 0: # maybe subdomain contains /, for dns tunnel tool
        ext = tldextract.extract(domain[pos+1:])
        subdomain = domain[:pos+1] + ext.subdomain
    else:
        ext = tldextract.extract(domain)
        subdomain = ext.subdomain
    if ext.suffix:
        mdomain = ext.domain + "." + ext.suffix
    else:
        mdomain = ext.domain
    return mdomain, subdomain

print extract_domain("baidu.com")  == ("baidu.com", "")
print extract_domain("www.baidu.com") == ("baidu.com", "www")
print extract_domain("www.xx.com.cn") == ("xx.com.cn", "www")
print extract_domain("www.xxx.gov.cn") == ("gov.cn", "www.xxx")
print extract_domain("abc.www.xxx.net.co") == ("xxx.net.co", "abc.www")
print extract_domain("abcwwwxxx.local") == ("local", "abcwwwxxx")
print extract_domain("abcwwwxxxlocal") == ("abcwwwxxxlocal", "")
print extract_domain("attack/www.baidu.com") == ("baidu.com", "attack/www")
print extract_domain("xx.attack/xxx.baidu.com") == ("baidu.com", "xx.attack/xxx")
print extract_domain("attack/xxx.baidu.com") == ("baidu.com", "attack/xxx")
print extract_domain("xxx.baidu.new_suffix") == ("new_suffix", "xxx.baidu")
print extract_domain("attack/xxx.baidu.new_suffix") == ("new_suffix", "attack/xxx.baidu")

猜你喜欢

框架画Button的入口
formail 发送HTML 邮件通过 SENDMAIL
【关于ChatGPT的30个问题】28、如何评价ChatGPT的安全性?/ By 禅与计算机程序设计艺术
【华为云技术分享】开发团队中的任务没人领取，你头疼吗?
How to monitor your mobile application network traffic in your own LAPTOP
[AWS] IAM -- create user and attach policy
Nginx的Gzip功能
Atitit.研发团队与公司绩效管理的原理概论的attilax总结
【收藏】hdfs参数配置详解
sbom-tool下载及使用详解
从2008高考人数想到的软件人才
使用 Excel cdata addin 连接 SAP ABAP 系统时遇到错误消息 Unable to connect to SAP system
基于ASP.NET WebAPI OWIN实现Self-Host项目实战
退出应用工具类

相关主题

python中命令行参数
python 文档

zl程序教程

当前栏目

python 提取主域名和子域名代码——先根据规则提取，如果有问题，则使用tldextract

相关文章