Quantcast
Viewing all articles
Browse latest Browse all 102

Extract the content from a file between two match patterns (Extract only HTML from a file)

I have file which contains different kind of text formats, my goal is to extract only HTML part and create a file with this HTML code. I think it is possible with grep or awk. My file contains also lines as this:

Sender name `<test@email.com>`

I wrote this script cat file1.html | grep -E "<[^>]*>". But the problem is that it outputs also the lines as Sender name, etc. I want to extract the content only after the <html> tag. So this is not useful for me:

Return-Path: <test@test.com>    for <test@localhost> (single-drop); Thu, 21 Sep 2017 18:34:07 +0400 (+04)Return-path: <test@test.com>    (envelope-from <test@test.com>)References: <test@test.com>From: test user <test@test.com>X-Forwarded-Message-Id: <test@test.com>Message-ID: <test@test.com>In-Reply-To: <test@test.com>

Viewing all articles
Browse latest Browse all 102

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>