|
|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
1. 引言
Perl作为一种强大的文本处理语言,其正则表达式和输出处理能力一直是其核心优势。本文将深入探讨Perl中的匹配输出技术,从基础的正则表达式概念到高级的输出技巧,帮助读者全面掌握Perl在文本处理方面的强大功能,提升编程效率。
2. 基础正则表达式
2.1 正则表达式简介
正则表达式(Regular Expression,简称regex)是一种用于描述字符串模式的工具。在Perl中,正则表达式被广泛应用于文本搜索、替换和验证等操作。
2.2 基本语法
Perl中的正则表达式通常包含在斜杠//之间,例如:
- if ($string =~ /pattern/) {
- print "匹配成功\n";
- }
复制代码
2.3 元字符
正则表达式中的元字符具有特殊含义,以下是一些常用的元字符:
• .:匹配任意单个字符(除换行符外)
• *:匹配前面的元素零次或多次
• +:匹配前面的元素一次或多次
• ?:匹配前面的元素零次或一次
• []:字符类,匹配方括号内的任意字符
• ^:匹配字符串的开始位置
• $:匹配字符串的结束位置
• |:选择,匹配两个或多个模式之一
• ():分组,将多个元素组合为一个单元
示例:
- my $text = "The quick brown fox jumps over the lazy dog";
- # 匹配包含"fox"的行
- if ($text =~ /fox/) {
- print "文本中包含'fox'\n";
- }
- # 匹配以"The"开头的字符串
- if ($text =~ /^The/) {
- print "文本以'The'开头\n";
- }
- # 匹配以"dog"结尾的字符串
- if ($text =~ /dog$/) {
- print "文本以'dog'结尾\n";
- }
- # 匹配"fox"或"dog"
- if ($text =~ /(fox|dog)/) {
- print "文本中包含'fox'或'dog'\n";
- }
复制代码
2.4 字符类
字符类允许你指定一组字符中的任意一个:
- my $text = "The temperature is 25 degrees Celsius";
- # 匹配任意数字
- if ($text =~ /[0-9]/) {
- print "文本中包含数字\n";
- }
- # 匹配任意大写字母
- if ($text =~ /[A-Z]/) {
- print "文本中包含大写字母\n";
- }
- # 匹配任意小写字母
- if ($text =~ /[a-z]/) {
- print "文本中包含小写字母\n";
- }
- # 使用预定义字符类
- if ($text =~ /\d/) { # \d 等同于 [0-9]
- print "文本中包含数字\n";
- }
- if ($text =~ /\w/) { # \w 等同于 [a-zA-Z0-9_]
- print "文本中包含单词字符\n";
- }
- if ($text =~ /\s/) { # \s 匹配空白字符
- print "文本中包含空白字符\n";
- }
复制代码
2.5 量词
量词用于指定匹配的次数:
- my $text = "The ISBN number is 978-3-16-148410-0";
- # 匹配一个或多个数字
- if ($text =~ /\d+/) {
- print "文本中包含一个或多个连续数字\n";
- }
- # 匹配3个连续的数字
- if ($text =~ /\d{3}/) {
- print "文本中包含3个连续数字\n";
- }
- # 匹配2到4个连续的数字
- if ($text =~ /\d{2,4}/) {
- print "文本中包含2到4个连续数字\n";
- }
- # 匹配零个或多个数字
- if ($text =~ /\d*/) {
- print "匹配零个或多个数字\n";
- }
- # 匹配零个或一个数字
- if ($text =~ /\d?/) {
- print "匹配零个或一个数字\n";
- }
复制代码
3. 匹配操作符使用
3.1 匹配操作符m//
Perl中的匹配操作符m//用于检查字符串是否匹配某个模式。m可以省略,直接使用//。
- my $text = "Perl is a powerful programming language";
- # 使用m//进行匹配
- if ($text =~ m/Perl/) {
- print "匹配成功\n";
- }
- # 省略m的写法
- if ($text =~ /language/) {
- print "匹配成功\n";
- }
复制代码
3.2 修饰符
正则表达式可以带有各种修饰符,改变其行为:
• i:不区分大小写的匹配
• g:全局匹配,找到所有匹配项
• m:多行模式,使^和$匹配每行的开始和结束
• s:单行模式,使.匹配包括换行符在内的所有字符
• x:忽略模式中的空白和注释
• o:只编译一次模式
• e:将替换部分作为表达式执行
示例:
- my $text = "Perl is a powerful programming language\nPerl is also great for text processing";
- # 不区分大小写匹配
- if ($text =~ /PERL/i) {
- print "不区分大小写匹配成功\n";
- }
- # 全局匹配
- my @matches = $text =~ /Perl/g;
- print "找到 " . scalar(@matches) . " 个'Perl'\n";
- # 多行模式
- my $multiline_text = "Line 1\nLine 2\nLine 3";
- if ($multiline_text =~ /^Line/m) {
- print "多行模式匹配成功\n";
- }
- # 单行模式
- my $text_with_newlines = "Start\nMiddle\nEnd";
- if ($text_with_newlines =~ /Start.*End/s) {
- print "单行模式匹配成功\n";
- }
- # 使用x修饰符添加注释
- my $pattern = qr/
- \b # 单词边界
- \w+ # 一个或多个单词字符
- \s # 空白字符
- \w+ # 一个或多个单词字符
- \b # 单词边界
- /x;
- if ($text =~ $pattern) {
- print "带注释的模式匹配成功\n";
- }
复制代码
3.3 替换操作符s///
替换操作符s///用于在字符串中查找并替换匹配的文本:
- my $text = "The quick brown fox jumps over the lazy dog";
- # 简单替换
- $text =~ s/dog/cat/;
- print "$text\n"; # 输出: The quick brown fox jumps over the lazy cat
- # 不区分大小写的替换
- $text =~ s/the/THE/gi;
- print "$text\n"; # 输出: THE quick brown fox jumps over THE lazy cat
- # 使用捕获组
- $text =~ s/(quick) (brown)/$2 $1/;
- print "$text\n"; # 输出: THE brown quick fox jumps over THE lazy cat
- # 使用表达式进行替换
- $text =~ s/(\d+)/$1 * 2/eg;
- print "$text\n"; # 如果文本中有数字,将被乘以2
复制代码
3.4 转换操作符tr///
转换操作符tr///(或y///)用于逐个字符的转换:
- my $text = "The quick brown fox jumps over the lazy dog";
- # 转换为大写
- $text =~ tr/a-z/A-Z/;
- print "$text\n"; # 输出: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
- # 删除所有元音
- $text =~ tr/aeiou//d;
- print "$text\n"; # 输出: TH QCK BRWN FX MPS VR TH LZY DG
- # 计算特定字符的出现次数
- my $count = $text =~ tr/t/t/;
- print "字母't'出现了 $count 次\n";
复制代码
4. 输出格式化
4.1 print 函数
print是Perl中最基本的输出函数:
- my $name = "Alice";
- my $age = 30;
- # 简单输出
- print "Hello, World!\n";
- # 输出变量
- print "Name: $name, Age: $age\n";
- # 输出多个值
- print "Name: ", $name, ", Age: ", $age, "\n";
复制代码
4.2 printf 函数
printf函数提供格式化输出功能:
- my $name = "Bob";
- my $age = 25;
- my $height = 1.75;
- my $money = 1234.5678;
- # 格式化输出
- printf "Name: %s, Age: %d\n", $name, $age;
- # 指定宽度和对齐
- printf "Name: %-10s Age: %3d\n", $name, $age;
- # 浮点数格式化
- printf "Height: %.2f meters\n", $height;
- # 货币格式化
- printf "Money: \$%.2f\n", $money;
- # 科学计数法
- printf "Scientific: %e\n", $money;
复制代码
4.3 sprintf 函数
sprintf函数与printf类似,但它返回格式化后的字符串而不是直接输出:
- my $name = "Charlie";
- my $age = 35;
- # 格式化字符串
- my $formatted = sprintf "Name: %s, Age: %d", $name, $age;
- print "$formatted\n";
- # 在复杂表达式中使用
- my $message = "The user " . $name . " is " . sprintf("%d", $age) . " years old.";
- print "$message\n";
复制代码
4.4 here文档
here文档(heredoc)是一种方便的多行字符串输出方式:
- my $name = "David";
- my $email = "david@example.com";
- # 使用here文档
- print <<EOF;
- User Information:
- Name: $name
- Email: $email
- Registration Date: 2023-05-15
- EOF
- # 带引号的here文档(不进行变量插值)
- print <<'END';
- This is a literal text.
- Variables like $name or $email won't be interpolated.
- END
- # 使用缩进的here文档
- print <<~INDENT;
- This is an indented here document.
- The leading tabs will be removed.
- Name: $name
- Email: $email
- INDENT
复制代码
4.5 格式化输出(format)
Perl的format功能提供了一种类似报表的输出格式:
- # 定义格式
- format STDOUT =
- User Details:
- ================
- Name: @<<<<<<<<<<<<<<<<<<<<<<
- $name
- Age: @##
- $age
- Email: @<<<<<<<<<<<<<<<<<<<<<<
- $email
- ================
- .
- my $name = "Eve";
- my $age = 28;
- my $email = "eve@example.com";
- # 写入格式
- write;
复制代码
5. 错误处理
5.1 正则表达式匹配失败处理
当正则表达式匹配失败时,Perl不会产生错误,但我们可以通过检查返回值来处理:
- my $text = "This is a sample text";
- my $pattern = "missing";
- # 检查匹配是否成功
- if ($text =~ /$pattern/) {
- print "匹配成功: $&\n";
- } else {
- print "匹配失败: 未找到'$pattern'\n";
- }
复制代码
5.2 使用eval处理错误
eval块可以捕获运行时错误:
- my $text = "Sample text";
- my $pattern = "[invalid regex";
- # 使用eval捕获正则表达式错误
- eval {
- if ($text =~ /$pattern/) {
- print "匹配成功\n";
- } else {
- print "匹配失败\n";
- }
- };
- if ($@) {
- print "正则表达式错误: $@\n";
- }
复制代码
5.3 警告处理
使用warnings模块可以获取详细的警告信息:
- use warnings;
- my $text = "Sample text";
- my $pattern = "[invalid regex";
- # 这将产生警告
- if ($text =~ /$pattern/) {
- print "匹配成功\n";
- } else {
- print "匹配失败\n";
- }
复制代码
5.4 自定义错误消息
可以自定义错误处理函数:
- sub regex_match {
- my ($text, $pattern) = @_;
-
- eval {
- if ($text =~ /$pattern/) {
- print "匹配成功: $&\n";
- return 1;
- } else {
- print "匹配失败: 未找到模式\n";
- return 0;
- }
- };
-
- if ($@) {
- print "错误: 无效的正则表达式 '$pattern' - $@\n";
- return -1;
- }
- }
- my $text = "This is a sample text";
- regex_match($text, "sample"); # 成功匹配
- regex_match($text, "missing"); # 匹配失败
- regex_match($text, "[invalid"); # 正则表达式错误
复制代码
6. 性能优化
6.1 预编译正则表达式
对于重复使用的正则表达式,预编译可以提高性能:
- use Benchmark qw(timethese);
- my $text = "The quick brown fox jumps over the lazy dog" x 1000;
- my $pattern = "fox";
- # 普通匹配
- sub normal_match {
- my $count = 0;
- for my $i (1..1000) {
- $count++ if $text =~ /$pattern/;
- }
- return $count;
- }
- # 预编译正则表达式
- sub precompiled_match {
- my $count = 0;
- my $regex = qr/$pattern/;
- for my $i (1..1000) {
- $count++ if $text =~ /$regex/;
- }
- return $count;
- }
- # 性能比较
- timethese(100, {
- 'Normal' => \&normal_match,
- 'Precompiled' => \&precompiled_match,
- });
复制代码
6.2 避免贪婪匹配
贪婪匹配(.*)可能导致性能问题,特别是在长文本中:
- my $html = '<div>Content 1</div><div>Content 2</div><div>Content 3</div>';
- # 贪婪匹配(可能效率较低)
- if ($html =~ /<div>.*<\/div>/s) {
- print "贪婪匹配: $&\n";
- }
- # 非贪婪匹配(通常更高效)
- if ($html =~ /<div>.*?<\/div>/s) {
- print "非贪婪匹配: $&\n";
- }
- # 更精确的匹配(最高效)
- if ($html =~ /<div>[^<]*<\/div>/) {
- print "精确匹配: $&\n";
- }
复制代码
6.3 使用字符类而非选择
字符类比选择操作符|更高效:
- my $text = "The quick brown fox jumps over the lazy dog";
- # 使用选择(效率较低)
- if ($text =~ /(cat|dog|fox)/) {
- print "找到动物: $1\n";
- }
- # 使用字符类(效率较高)
- if ($text =~ /([cdf]ox|[acd]og|[cat]at)/) {
- print "找到动物: $1\n";
- }
复制代码
6.4 避免不必要的捕获
如果不需要捕获组,使用非捕获组可以提高性能:
- my $text = "The quick brown fox jumps over the lazy dog";
- # 使用捕获组
- if ($text =~ /(quick) (brown) (fox)/) {
- print "捕获: $1, $2, $3\n";
- }
- # 使用非捕获组
- if ($text =~ /(?:quick) (?:brown) (fox)/) {
- print "只捕获了: $1\n";
- }
复制代码
6.5 使用适当的锚点
使用适当的锚点(^,$,\b)可以提高匹配效率:
- my $text = "The quick brown fox jumps over the lazy dog";
- # 低效的匹配
- if ($text =~ /.*fox.*/) {
- print "找到'fox'\n";
- }
- # 高效的匹配
- if ($text =~ /fox/) {
- print "找到'fox'\n";
- }
- # 如果知道位置,使用锚点
- if ($text =~ /^The/) {
- print "以'The'开头\n";
- }
- if ($text =~ /dog$/) {
- print "以'dog'结尾\n";
- }
- if ($text =~ /\bfox\b/) {
- print "找到完整的单词'fox'\n";
- }
复制代码
7. 实战应用
7.1 日志文件分析
分析Web服务器日志文件,提取特定信息:
- #!/usr/bin/perl
- use strict;
- use warnings;
- # 假设日志格式为:IP - - [date] "request" status size
- my $log_file = 'access.log';
- my %status_codes;
- my %top_ips;
- my $total_requests = 0;
- open my $fh, '<', $log_file or die "无法打开日志文件: $!";
- while (my $line = <$fh>) {
- # 解析日志行
- if ($line =~ /^(\S+) \S+ \S+ \[([^\]]+)\] "([^"]+)" (\d+) (\d+)$/) {
- my ($ip, $date, $request, $status, $size) = ($1, $2, $3, $4, $5);
-
- # 统计状态码
- $status_codes{$status}++;
-
- # 统计IP访问次数
- $top_ips{$ip}++;
-
- $total_requests++;
- }
- }
- close $fh;
- # 输出统计结果
- print "总请求数: $total_requests\n\n";
- print "状态码统计:\n";
- foreach my $status (sort keys %status_codes) {
- printf " %s: %d (%.2f%%)\n", $status, $status_codes{$status},
- ($status_codes{$status} / $total_requests) * 100;
- }
- print "\nTop 10 IP地址:\n";
- my $count = 0;
- foreach my $ip (sort { $top_ips{$b} <=> $top_ips{$a} } keys %top_ips) {
- last if $count++ >= 10;
- printf " %s: %d 次 (%.2f%%)\n", $ip, $top_ips{$ip},
- ($top_ips{$ip} / $total_requests) * 100;
- }
复制代码
7.2 数据提取和转换
从HTML文档中提取数据并转换为CSV格式:
- #!/usr/bin/perl
- use strict;
- use warnings;
- # 假设HTML格式为:
- # <div class="product">
- # <h2>Product Name</h2>
- # <span class="price">$19.99</span>
- # <p class="description">Product description</p>
- # </div>
- my $html = do { local $/; <DATA> }; # 从__DATA__读取HTML内容
- my @products;
- # 提取产品信息
- while ($html =~ /<div class="product">\s*<h2>([^<]+)<\/h2>\s*<span class="price">([^<]+)<\/span>\s*<p class="description">([^<]+)<\/p>\s*<\/div>/sg) {
- my ($name, $price, $description) = ($1, $2, $3);
-
- # 清理数据
- $name =~ s/^\s+|\s+$//g;
- $price =~ s/^\s+|\s+$//g;
- $description =~ s/^\s+|\s+$//g;
-
- push @products, {
- name => $name,
- price => $price,
- description => $description
- };
- }
- # 输出CSV
- print "Name,Price,Description\n";
- foreach my $product (@products) {
- # 转义CSV中的特殊字符
- my $name_csv = $product->{name};
- $name_csv =~ s/"/""/g;
- $name_csv = ""$name_csv"" if $name_csv =~ /,/;
- my $price_csv = $product->{price};
- $price_csv =~ s/"/""/g;
- $price_csv = ""$price_csv"" if $price_csv =~ /,/;
- my $desc_csv = $product->{description};
- $desc_csv =~ s/"/""/g;
- $desc_csv = ""$desc_csv"" if $desc_csv =~ /,/;
- print "$name_csv,$price_csv,$desc_csv\n";
- }
- __DATA__
- <div class="product">
- <h2>Product 1</h2>
- <span class="price">$19.99</span>
- <p class="description">This is the first product</p>
- </div>
- <div class="product">
- <h2>Product 2</h2>
- <span class="price">$29.99</span>
- <p class="description">This is the second product</p>
- </div>
- <div class="product">
- <h2>Product 3</h2>
- <span class="price">$39.99</span>
- <p class="description">This is the third product</p>
- </div>
复制代码
7.3 文本批量处理
批量处理文本文件,进行搜索替换:
- #!/usr/bin/perl
- use strict;
- use warnings;
- use File::Find;
- use File::Copy;
- # 配置参数
- my $directory = './documents'; # 处理的目录
- my $file_pattern = '\.txt$'; # 处理的文件模式
- my $backup = 1; # 是否创建备份
- my $backup_ext = '.bak'; # 备份文件扩展名
- # 替换规则
- my @replacements = (
- {
- pattern => qr/\b(company name)\b/i,
- replacement => 'ACME Corporation'
- },
- {
- pattern => qr/\b(contact email)\b/i,
- replacement => 'info@acme.com'
- },
- {
- pattern => qr/\b(phone number)\b/i,
- replacement => '+1 (555) 123-4567'
- }
- );
- # 处理文件
- find(\&process_file, $directory);
- sub process_file {
- my $file = $File::Find::name;
-
- # 只处理匹配模式的文件
- return unless $file =~ /$file_pattern/;
-
- # 跳过备份文件
- return if $backup && $file =~ /$backup_ext$/;
-
- print "处理文件: $file\n";
-
- # 读取文件内容
- open my $fh, '<', $file or die "无法打开文件 $file: $!";
- my $content = do { local $/; <$fh> };
- close $fh;
-
- # 记录原始内容用于比较
- my $original_content = $content;
-
- # 应用所有替换规则
- foreach my $rule (@replacements) {
- $content =~ s/$rule->{pattern}/$rule->{replacement}/gi;
- }
-
- # 如果内容有变化,则写入文件
- if ($content ne $original_content) {
- # 创建备份
- if ($backup) {
- my $backup_file = $file . $backup_ext;
- copy($file, $backup_file) or die "无法创建备份文件: $!";
- print " 创建备份: $backup_file\n";
- }
-
- # 写入新内容
- open $fh, '>', $file or die "无法写入文件 $file: $!";
- print $fh $content;
- close $fh;
-
- print " 文件已更新\n";
- } else {
- print " 无需更改\n";
- }
- }
复制代码
7.4 数据验证
使用正则表达式验证各种格式的数据:
- #!/usr/bin/perl
- use strict;
- use warnings;
- sub validate_email {
- my ($email) = @_;
-
- # 基本电子邮件验证
- return $email =~ /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
- }
- sub validate_url {
- my ($url) = @_;
-
- # 基本URL验证
- return $url =~ /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/;
- }
- sub validate_phone {
- my ($phone) = @_;
-
- # 基本电话号码验证(支持多种格式)
- return $phone =~ /^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$/;
- }
- sub validate_date {
- my ($date) = @_;
-
- # 日期验证(YYYY-MM-DD格式)
- if ($date =~ /^(\d{4})-(\d{2})-(\d{2})$/) {
- my ($year, $month, $day) = ($1, $2, $3);
-
- # 基本范围检查
- return 0 if $year < 1900 || $year > 2100;
- return 0 if $month < 1 || $month > 12;
- return 0 if $day < 1 || $day > 31;
-
- # 更详细的日期验证(考虑闰年等)
- my @days_in_month = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
-
- # 闰年检查
- if ($month == 2) {
- if (($year % 400 == 0) || ($year % 100 != 0 && $year % 4 == 0)) {
- return $day <= 29;
- } else {
- return $day <= 28;
- }
- }
-
- return $day <= $days_in_month[$month-1];
- }
-
- return 0;
- }
- # 测试验证函数
- my @emails = (
- 'user@example.com',
- 'invalid.email',
- 'another@example.org'
- );
- my @urls = (
- 'https://www.example.com',
- 'http://example.org/path/to/page',
- 'ftp://example.com',
- 'not_a_url'
- );
- my @phones = (
- '+1 (555) 123-4567',
- '555-123-4567',
- '555.123.4567',
- '123-456-7890',
- 'invalid-phone'
- );
- my @dates = (
- '2023-05-15',
- '2023-02-29', # 无效(2023年不是闰年)
- '2020-02-29', # 有效(2020年是闰年)
- '2023-13-01', # 无效(月份无效)
- '2023-05-32', # 无效(日期无效)
- 'invalid-date'
- );
- print "电子邮件验证:\n";
- foreach my $email (@emails) {
- printf " %-25s: %s\n", $email, validate_email($email) ? "有效" : "无效";
- }
- print "\nURL验证:\n";
- foreach my $url (@urls) {
- printf " %-35s: %s\n", $url, validate_url($url) ? "有效" : "无效";
- }
- print "\n电话号码验证:\n";
- foreach my $phone (@phones) {
- printf " %-20s: %s\n", $phone, validate_phone($phone) ? "有效" : "无效";
- }
- print "\n日期验证:\n";
- foreach my $date (@dates) {
- printf " %-15s: %s\n", $date, validate_date($date) ? "有效" : "无效";
- }
复制代码
7.5 高级文本处理
使用正则表达式进行复杂的文本处理和分析:
- #!/usr/bin/perl
- use strict;
- use warnings;
- # 示例:从非结构化文本中提取结构化信息
- my $text = <<'TEXT';
- Customer Order Information
- Order ID: ORD-2023-001
- Date: May 15, 2023
- Customer: John Smith
- Email: john.smith@example.com
- Items:
- 1. Product A - $19.99 (Qty: 2)
- 2. Product B - $29.99 (Qty: 1)
- 3. Product C - $9.99 (Qty: 3)
- Shipping Address:
- 123 Main Street
- Anytown, ST 12345
- United States
- Payment Method: Credit Card (****-****-****-1234)
- Total Amount: $79.95
- Order ID: ORD-2023-002
- Date: May 16, 2023
- Customer: Jane Doe
- Email: jane.doe@example.com
- Items:
- 1. Product X - $49.99 (Qty: 1)
- 2. Product Y - $15.99 (Qty: 2)
- Shipping Address:
- 456 Oak Avenue
- Sometown, ST 67890
- United States
- Payment Method: PayPal
- Total Amount: $81.97
- TEXT
- # 提取订单信息
- my @orders;
- while ($text =~ /
- Order\ ID:\ ([^\n]+)\n
- Date:\ ([^\n]+)\n
- Customer:\ ([^\n]+)\n
- Email:\ ([^\n]+)\n
- \n
- Items:\n
- ((?:\d+\..+\n)+)
- \n
- Shipping\ Address:\n
- ([^\n]+)\n
- ([^\n]+)\n
- ([^\n]+)\n
- \n
- Payment\ Method:\ ([^\n]+)\n
- Total\ Amount:\ ([^\n]+)
- /gx) {
-
- my ($order_id, $date, $customer, $email, $items_text, $addr1, $addr2, $addr3, $payment, $total) =
- ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10);
-
- # 提取订单项
- my @items;
- while ($items_text =~ /(\d+)\.\ ([^-]+)\ -\ \$(\d+\.\d+)\ \(Qty:\ (\d+)\)/g) {
- push @items, {
- number => $1,
- name => $2,
- price => $3,
- quantity => $4
- };
- }
-
- push @orders, {
- order_id => $order_id,
- date => $date,
- customer => $customer,
- email => $email,
- items => \@items,
- address => {
- line1 => $addr1,
- line2 => $addr2,
- line3 => $addr3
- },
- payment_method => $payment,
- total => $total
- };
- }
- # 输出提取的信息
- foreach my $order (@orders) {
- print "订单ID: " . $order->{order_id} . "\n";
- print "日期: " . $order->{date} . "\n";
- print "客户: " . $order->{customer} . "\n";
- print "邮箱: " . $order->{email} . "\n";
- print "订单项:\n";
-
- foreach my $item (@{$order->{items}}) {
- print " " . $item->{number} . ". " . $item->{name} .
- " - $" . $item->{price} . " (数量: " . $item->{quantity} . ")\n";
- }
-
- print "配送地址:\n";
- print " " . $order->{address}->{line1} . "\n";
- print " " . $order->{address}->{line2} . "\n";
- print " " . $order->{address}->{line3} . "\n";
-
- print "付款方式: " . $order->{payment_method} . "\n";
- print "总金额: " . $order->{total} . "\n";
- print "\n";
- }
- # 生成汇总报告
- my $total_revenue = 0;
- my %product_sales;
- my %customer_orders;
- foreach my $order (@orders) {
- # 计算总收入
- $order->{total} =~ /\$(\d+\.\d+)/;
- $total_revenue += $1;
-
- # 统计产品销售
- foreach my $item (@{$order->{items}}) {
- $product_sales{$item->{name}} ||= {quantity => 0, revenue => 0};
- $product_sales{$item->{name}}->{quantity} += $item->{quantity};
- $product_sales{$item->{name}}->{revenue} += $item->{price} * $item->{quantity};
- }
-
- # 统计客户订单
- $customer_orders{$order->{customer}}++;
- }
- # 输出汇总报告
- print "=== 销售汇总报告 ===\n\n";
- printf "总订单数: %d\n", scalar(@orders);
- printf "总收入: \$%.2f\n\n", $total_revenue;
- print "产品销售排行:\n";
- my $product_count = 0;
- foreach my $product (sort { $product_sales{$b}->{revenue} <=> $product_sales{$a}->{revenue} } keys %product_sales) {
- last if $product_count++ >= 5; # 只显示前5名
- printf " %-15s 数量: %d, 收入: \$%.2f\n",
- $product,
- $product_sales{$product}->{quantity},
- $product_sales{$product}->{revenue};
- }
- print "\n客户订单排行:\n";
- my $customer_count = 0;
- foreach my $customer (sort { $customer_orders{$b} <=> $customer_orders{$a} } keys %customer_orders) {
- last if $customer_count++ >= 5; # 只显示前5名
- printf " %-15s 订单数: %d\n", $customer, $customer_orders{$customer};
- }
复制代码
8. 结论
Perl的正则表达式和输出处理功能为文本处理提供了强大而灵活的工具。通过掌握基础正则表达式、匹配操作符、输出格式化、错误处理和性能优化等技巧,你可以解决各种复杂的文本处理问题,提高编程效率。
在实际应用中,Perl的正则表达式可以用于日志分析、数据提取、文本处理、数据验证等多种场景。合理使用正则表达式,结合适当的错误处理和性能优化技巧,可以构建高效、可靠的文本处理解决方案。
希望本文能够帮助你深入理解Perl的匹配输出功能,并在实际工作中灵活应用这些技术,解决各种文本处理难题。 |
|