I want to extract the page content from this HTML file:
<BR /><TABLE style=border-color:#32506d border=1 cellspacing=0><caption class=header style=background-color:#32506d><b>Additional M2Ms & Standalone DataMasking List for 09 10 2020 PST</b></caption><tr style=background-color:#32506d class=header><td class=CR>Start Time</td><td class=CR>FM CR</td><td class=CR>CR Type</td><td class=CR>Customer Name</td><td class=CR>Source Pod</td><td class=CR>Target Pod</td><td class=CR>DM Flag</td><td class=CR>Release</td><td class=CR>Data Center</td><td class=CR>CDB Sync</td><td class=CR>FreeSpace Check</td><td class=CR>TDE/DV Check</td><td class=CR>M2M Optin</td><td class=CR>M2M Type</td><td class=CR colspan=2>Database Reorg Details</td><td>Operations Team</td></tr><tr><td>09/10/2020-19:00</td><td class=CR><a href=http://fleetmanager.oraclecloud.com/change/faces/registerChangeRequest?CRID=11124482 target=_blank>11124482</td><td>M2M</td><td>TCS</td><td>KCLB-CDB</td><td>EGLG-TEST</td><td class=CR>N</td><td>Revision 13.20.07</td><td>ks8-US-OCC</td><td class=CR><font color=#34A853>Yes</font></td><td class=CR><font color=#34A853>Passed</font></td><td class=CR><font color=#34A853>Passed</font></td><td class=CR>Y</td><td class=CR><font color=#34A853>sDC</font></td><td><font color=#db3236>Reclaimable Space: 3532 GB</font></td><td><font color=#db3236>Reorg Required</font></td><td><center><font color=#0000FF>RAMU</font></center></td></tr><tr><td>09/10/2020-19:00</td><td class=CR><a href=http://fleetmanager.oraclecloud.com/change/faces/registerChangeRequest?CRID=11170981 target=_blank>11170981</td><td><font color=green>Standalone Data Masking</font></td><td>Wipro, Inc.</td><td></td><td>LMNO-TEST</td><td class=CR></td><td>Revision 13.20.07</td><td>ns2-US</td><td class=CR>NA</td><td class=CR>NA</td><td class=CR>NA</td><td class=CR>NA</td><td class=CR>NA</td><td><center>NA</center></td><td><center>NA</center></td><td>DataMasking</td></tr></TABLE><br /><span>Thanks,</span><br /><span>M2M Ops</span><br /><br /><span>Note: This is a system generated email, still you can reply with queries/suggestions.</span></HTML>
So far, I have tried doing so using sed
:
sed -n '/^$/!{s/<[^>]*>//g;p;}' file.html
I am getting below output:
Start TimeFM CRCR TypeCustomer NameSource PodTarget PodDM FlagReleaseData CenterCDB SyncFreeSpace CheckTDE/DV CheckM2M OptinM2M TypeDatabase Reorg DetailsOperations Team09/10/2020-19:0011124482M2MTCSKCLB-CDBEGLG-TESTNRevision 13.20.07ks8-US-OCCYesPassedPassedYsDCReclaimable Space: 3532 GBReorg RequiredRAMU09/10/2020-19:0011170981Standalone Data MaskingWipro Inc.LMNO-TESTRevision 13.20.07ns2-USNANANANANANANADataMaskingThanks,M2M OpsNote: This is a system generated email, still you can reply with queries/suggestions.
However it is different from the desired output:
StartTime FMCR CRType CustomerName SourcePod TargetPod DMFlag Release DataCenter CDBSync FreeSpaceCheck TDE/DVCheck M2MOptin M2MType DatabaseReorgDetails OperationsTeam09/10/2020-19:00 11124482 M2M TCS KCLB-CDB KCLB-TEST N Revision 13.20.07 ks8-US-OCC YES Passed Passed Y sDC Reclaimable Space: 3532 GB Reorg Required RAMU09/10/2020-19:00 11170981 Standalone Data Masking Wipro, Inc LMNO-TEST Revision 13.20.07 ns2-US NA NA NA NA NA NA NA DataMasking