Quantcast
Channel: Active questions tagged bash+awk - Ask Ubuntu
Viewing all articles
Browse latest Browse all 102

Using sed or awk to remove near-duplicates

$
0
0

I currently use the following to get as close as I can do to a file

cut -d '' -f 3- /var/log/issues.log | sed -E 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}//g' | sort -u

So far it gets rid of the timestamp at the start of each line and removes the IP address.

However I'm still left with dozens of line of the format(s)

Failed login from for AFailed login from for BFailed login from for CFailed login from for DFailed login from for EInvalid heartbeat 'A' from Invalid heartbeat 'B' from Invalid heartbeat 'C' from Invalid heartbeat 'D' fromInvalid heartbeat 'E' from

How would I further amend my command to take these "near" duplicates away leaving only. A, B, C, D and E could be any string.

Failed login from for Invalid heartbeat from 

Thanks


Viewing all articles
Browse latest Browse all 102

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>