使用特殊的分隔线将文本文件拆分为部分-python（python文件分隔符）

25-01-25 25

在这篇文章中，我们将带领您了解使用特殊的分隔线将文本文件拆分为部分-python的全貌，包括python文件分隔符的相关情况。同时，我们还将为您介绍有关java–将文件拆分为四个部分、JavaSE_s

在这篇文章中，我们将带领您了解使用特殊的分隔线将文本文件拆分为部分-python的全貌，包括python文件分隔符的相关情况。同时，我们还将为您介绍有关java – 将文件拆分为四个部分、JavaSE_split 调用特殊的分隔符、PHP脚本将大文本文件拆分为多个文件、Python 2.7-使用字典从文本文件中查找并替换为新的文本文件的知识，以帮助您更好地理解这个主题。

本文目录一览：

使用特殊的分隔线将文本文件拆分为部分-python（python文件分隔符）
java – 将文件拆分为四个部分
JavaSE_split 调用特殊的分隔符
PHP脚本将大文本文件拆分为多个文件
Python 2.7-使用字典从文本文件中查找并替换为新的文本文件

使用特殊的分隔线将文本文件拆分为部分-python（python文件分隔符）

我有这样的输入文件：

This is a text block startThis is the endAnd this is anotherwith more than one lineand another line.

所需的任务是按由特殊行分隔的部分读取文件，在这种情况下，该行为空行，例如[out]：

[[''This is a text block start'', ''This is the end''],[''And this is another'',''with more than one line'', ''and another line.'']]

通过这样做，我一直在获得所需的输出：

def per_section(it):    """ Read a file and yield sections using empty line as delimiter """    section = []    for line in it:        if line.strip(''\n''):            section.append(line)        else:            yield ''''.join(section)            section = []    # yield any remaining lines as a section too    if section:        yield ''''.join(section)

但是，如果特殊行是以#例如以下开头的行：

# Some comments, maybe the title of the following sectionThis is a text block startThis is the end# Some other comments and also the titleAnd this is anotherwith more than one lineand another line.

我必须这样做：

def per_section(it):    """ Read a file and yield sections using empty line as delimiter """    section = []    for line in it:        if line[0] != "#":            section.append(line)        else:            yield ''''.join(section)            section = []    # yield any remaining lines as a section too    if section:        yield ''''.join(section)

如果我允许per_section()拥有分隔符参数，则可以尝试以下操作：

def per_section(it, delimiter== ''\n''):    """ Read a file and yield sections using empty line as delimiter """    section = []    for line in it:        if line.strip(''\n'') and delimiter == ''\n'':            section.append(line)        elif delimiter= ''\#'' and line[0] != "#":            section.append(line)        else:            yield ''''.join(section)            section = []    # yield any remaining lines as a section too    if section:        yield ''''.join(section)

但是有没有办法我不对所有可能的分隔符进行硬编码？

答案1

小编典典

传递谓词怎么样？

def per_section(it, is_delimiter=lambda x: x.isspace()):    ret = []    for line in it:        if is_delimiter(line):            if ret:                yield ret  # OR  ''''.join(ret)                ret = []        else:            ret.append(line.rstrip())  # OR  ret.append(line)    if ret:        yield ret

用法：

with open(''/path/to/file.txt'') as f:    sections = list(per_section(f))  # default delimiterwith open(''/path/to/file.txt.txt'') as f:    sections = list(per_section(f, lambda line: line.startswith(''#''))) # comment

java – 将文件拆分为四个部分

我想将一个文件(假设一个mp3)分成四个部分.我试过这个代码.但只有File1.mp3正常工作.我无法玩别人.我在这里做错了什么？

try     {

        FileInputStream in=new FileInputStream(f);
        long i=f.length();
        long j=i/4;

        FileOutputStream f0=new FileOutputStream("File1.mp3");
        FileOutputStream f1=new FileOutputStream("File2.mp3");
        FileOutputStream f2=new FileOutputStream("File3.mp3");
        FileOutputStream f3=new FileOutputStream("File4.mp3");

        for(long k=0;k<j;k++){
            f0.write(in.read());
        }
        f0.close();
        for(long l=0;l<j;L++){
            f1.write(in.read());
        }
        f1.close();
        for(long m=0;m<j;m++){
            f2.write(in.read());  
        }
        f2.close();
        for(long n=0;n<j;n++){
            f3.write(in.read());
        }
        f3.close();

        in.close();
    }
    catch (IOException e)
    {

    }

解决方法

您不能像这样拆分结构化文件： MP3 file has a header at the beginning of the file描述文件其余部分内部的内容.拆分文件时,只在第一部分中获得标题.

至于削减非结构化文件,比如文本,你的代码应该好得多,只要你不介意你的句子分成一个单词的中间.

JavaSE_split 调用特殊的分隔符

我们经常使用 split 进行字符串。

但是对于一些特殊的分割符要进行特殊的处理，下面列举下需要特殊处理的分割符与相应的处理方式。

关于点的问题是用string.split("[.]") 解决。

关于竖线的问题用 string.split("\\|")解决。

关于星号的问题用 string.split("\\*")解决。

关于斜线的问题用 sring.split("\\\\")解决。

关于中括号的问题用 sring.split("\\[\\]")解决。

PHP脚本将大文本文件拆分为多个文件

我正在努力创建一个PHP脚本,以帮助根据行数将一个较大的文本文件拆分为多个较小的文件.我需要增加分割的选项,因此它从第一个文件的10行开始,在第二个文件的20行开始,依此类推.

解决方法:

这是我脚本中的一个函数：

<?PHP
/**
 *
 * Split large files into smaller ones
 * @param string $source Source file
 * @param string $targetpath Target directory for saving files
 * @param int $lines Number of lines to split
 * @return void
 */
function split_file($source, $targetpath='./logs/', $lines=10){
    $i=0;
    $j=1;
    $date = date("m-d-y");
    $buffer='';

    $handle = @fopen ($source, "r");
    while (!feof ($handle)) {
        $buffer .= @fgets($handle, 4096);
        $i++;
        if ($i >= $lines) {
            $fname = $targetpath.".part_".$date.$j.".log";
            if (!$fhandle = @fopen($fname, 'w')) {
                echo "Cannot open file ($fname)";
                exit;
            }

            if (!@fwrite($fhandle, $buffer)) {
                echo "Cannot write to file ($fname)";
                exit;
            }
            fclose($fhandle);
            $j++;
            $buffer='';
            $i=0;
            $line+=10; // add 10 to $lines after each iteration. Modify this line as required
        }
    }
    fclose ($handle);
}
?>

Python 2.7-使用字典从文本文件中查找并替换为新的文本文件

我是编程的新手，并且在过去的几个月中一直在业余时间学习python。我决定要尝试创建一个小的脚本，将文本文件中的美国拼写转换为英语拼写。

在过去的5个小时里，我一直在尝试各种事情，但最终想出了一些可以使我更加接近目标的东西，但还远远没有达到目标！

#imported dictionary contains 1800 english:american spelling key:value pairs. from english_american_dictionary import dictdef replace_all(text, dict):    for english, american in dict.iteritems():        text = text.replace(american, english)    return textmy_text = open(''test_file.txt'', ''r'')for line in my_text:    new_line = replace_all(line, dict)    output = open(''output_test_file.txt'', ''a'')    print >> output, new_lineoutput.close()

我确信有更好的处理方法，但是对于此脚本，这是我遇到的问题：

在输出文件中，这些行每隔一行写入一行，并且之间有换行符，但是原始的test_file.txt没有此行。test_file.txt的内容显示在此页面底部
仅将一行中的美国拼写的第一个实例转换为英语。
我并不是很想在附加模式下打开输出文件，但是无法在此代码结构中找出“ r”。

任何对此急切的新人表示赞赏的帮助！

test_file.txt的内容为：

I am sample file.I contain an english spelling: colour.3 american spellings on 1 line: color, analyze, utilize.1 american spelling on 1 line: familiarize.

答案1

小编典典

您看到的多余空白行是因为您print要写出末尾已经包含换行符的行。由于也print编写了自己的换行符，因此您的输出将变成双倍行距。一个简单的解决方法是使用outfile.write(new_line)。

至于文件模式，问题在于您要一遍又一遍地打开输出文件。一开始，您只需要打开一次即可。使用with语句来处理打开的文件通常是一个好主意，因为当您使用它们时，它们会为您关闭它们。

我不理解您的其他问题，仅发生了一些替换。是您的字典中失踪的拼写''analyze''和''utilize''？

我建议的一个建议是不要逐行更换。您可以一次读取整个文件file.read()，然后将其作为一个单元进行处理。这可能会更快，因为它不需要在拼写字典中的项目上循环那么频繁（只需循环一次，而不是每行一次）：

with open(''test_file.txt'', ''r'') as in_file:    text = in_file.read()with open(''output_test_file.txt'', ''w'') as out_file:    out_file.write(replace_all(text, spelling_dict))

编辑：

为了使您的代码正确处理包含其他单词的单词（例如包含“ tire”的“ entre”），您可能需要放弃使用str.replace正则表达式的简单方法。

这是一个快速拼凑的解决方案，它使用re.sub，给出了从美式到英式英语的拼写变化字典（即，与您当前字典相反的顺序）：

import re#from english_american_dictionary import ame_to_bre_spellingsame_to_bre_spellings = {''tire'':''tyre'', ''color'':''colour'', ''utilize'':''utilise''}def replacer_factory(spelling_dict):    def replacer(match):        word = match.group()        return spelling_dict.get(word, word)    return replacerdef ame_to_bre(text):    pattern = r''\b\w+\b''  # this pattern matches whole words only    replacer = replacer_factory(ame_to_bre_spellings)    return re.sub(pattern, replacer, text)def main():    #with open(''test_file.txt'') as in_file:    #    text = in_file.read()    text = ''foo color, entire, utilize''    #with open(''output_test_file.txt'', ''w'') as out_file:    #    out_file.write(ame_to_bre(text))    print(ame_to_bre(text))if __name__ == ''__main__'':    main()

关于此代码结构的一个好处是，如果您以其他顺序将字典传递给replacer_factory函数，则可以轻松地将英式英语拼写转换回美式英语拼写。

关于使用特殊的分隔线将文本文件拆分为部分-python和python文件分隔符的问题我们已经讲解完毕，感谢您的阅读，如果还想了解更多关于java – 将文件拆分为四个部分、JavaSE_split 调用特殊的分隔符、PHP脚本将大文本文件拆分为多个文件、Python 2.7-使用字典从文本文件中查找并替换为新的文本文件等相关内容，可以在本站寻找。

本文标签：