Quantcast
Viewing all articles
Browse latest Browse all 102

Convert HTML to text format

I want to extract the page content from this HTML file:

<BR /><TABLE style=border-color:#32506d border=1 cellspacing=0><caption class=header style=background-color:#32506d><b>Additional M2Ms & Standalone DataMasking List for 09 10 2020            PST</b></caption><tr style=background-color:#32506d class=header><td class=CR>Start Time</td><td class=CR>FM CR</td><td class=CR>CR Type</td><td class=CR>Customer Name</td><td class=CR>Source Pod</td><td class=CR>Target Pod</td><td class=CR>DM Flag</td><td class=CR>Release</td><td class=CR>Data Center</td><td class=CR>CDB Sync</td><td class=CR>FreeSpace Check</td><td class=CR>TDE/DV Check</td><td class=CR>M2M Optin</td><td class=CR>M2M Type</td><td class=CR colspan=2>Database Reorg Details</td><td>Operations Team</td></tr><tr><td>09/10/2020-19:00</td><td class=CR><a href=http://fleetmanager.oraclecloud.com/change/faces/registerChangeRequest?CRID=11124482                target=_blank>11124482</td><td>M2M</td><td>TCS</td><td>KCLB-CDB</td><td>EGLG-TEST</td><td class=CR>N</td><td>Revision 13.20.07</td><td>ks8-US-OCC</td><td class=CR><font color=#34A853>Yes</font></td><td class=CR><font color=#34A853>Passed</font></td><td class=CR><font color=#34A853>Passed</font></td><td class=CR>Y</td><td class=CR><font color=#34A853>sDC</font></td><td><font color=#db3236>Reclaimable Space: 3532 GB</font></td><td><font color=#db3236>Reorg Required</font></td><td><center><font color=#0000FF>RAMU</font></center></td></tr><tr><td>09/10/2020-19:00</td><td class=CR><a href=http://fleetmanager.oraclecloud.com/change/faces/registerChangeRequest?CRID=11170981                target=_blank>11170981</td><td><font color=green>Standalone Data Masking</font></td><td>Wipro, Inc.</td><td></td><td>LMNO-TEST</td><td class=CR></td><td>Revision 13.20.07</td><td>ns2-US</td><td class=CR>NA</td><td class=CR>NA</td><td class=CR>NA</td><td class=CR>NA</td><td class=CR>NA</td><td><center>NA</center></td><td><center>NA</center></td><td>DataMasking</td></tr></TABLE><br /><span>Thanks,</span><br /><span>M2M Ops</span><br /><br /><span>Note: This is a system generated email,    still you can reply with queries/suggestions.</span></HTML>

So far, I have tried doing so using sed:

sed -n '/^$/!{s/<[^>]*>//g;p;}' file.html

I am getting below output:

Start TimeFM CRCR TypeCustomer NameSource PodTarget PodDM FlagReleaseData CenterCDB SyncFreeSpace CheckTDE/DV CheckM2M OptinM2M TypeDatabase Reorg DetailsOperations Team09/10/2020-19:0011124482M2MTCSKCLB-CDBEGLG-TESTNRevision 13.20.07ks8-US-OCCYesPassedPassedYsDCReclaimable Space: 3532 GBReorg RequiredRAMU09/10/2020-19:0011170981Standalone Data MaskingWipro Inc.LMNO-TESTRevision 13.20.07ns2-USNANANANANANANADataMaskingThanks,M2M OpsNote: This is a system generated email, still you can reply with queries/suggestions.

However it is different from the desired output:

StartTime           FMCR      CRType                   CustomerName         SourcePod  TargetPod DMFlag Release               DataCenter       CDBSync  FreeSpaceCheck TDE/DVCheck M2MOptin  M2MType DatabaseReorgDetails                             OperationsTeam09/10/2020-19:00    11124482    M2M                        TCS               KCLB-CDB  KCLB-TEST  N     Revision 13.20.07      ks8-US-OCC       YES     Passed          Passed      Y         sDC     Reclaimable Space: 3532 GB   Reorg Required     RAMU09/10/2020-19:00    11170981 Standalone Data Masking     Wipro, Inc              LMNO-TEST              Revision 13.20.07      ns2-US           NA      NA               NA          NA         NA      NA                           NA              DataMasking

Viewing all articles
Browse latest Browse all 102

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>