I have been searching through the internet on how to do this but have failed to find a way to actually do it..
Basically, I have a series of large dataframes, and I wish to convert one of the column vectors in all of these datasets from a string of characters, say for example: ASDFSDFSAFDSA, to its equivalent in ASCII format. I want to do this in bash because they are too large to process in R.
I know in java and R there are functions that do this, but have failed to find its equivalent in bash. I have looked at the xxd
function as well as some posts on the forum but they end up giving me just a couple of integers rather than the proper ASCII format.
Here is the dput
output of a small snippet of the data:
structure(list(Clone.ID = 0:5, Clone.count = c(2454L, 1915L, 1369L, 1255L, 1152L, 1099L), AA..Seq..CDR3 = c("CASSNSDRTYGDNEQFF", "CATSSVLTQQETQYF", "CASSSRGLANTQYF", "CASSLGTALNTEAFF", "CASSRRHLGNTGELFF", "CASSEGRSNQPQHF")), row.names = c(NA, 6L), class = "data.frame")
The data uploaded looks like this:
Clone.ID Clone.count AA..Seq..CDR31 0 2454 CASSNSDRTYGDNEQFF2 1 1915 CATSSVLTQQETQYF3 2 1369 CASSSRGLANTQYF4 3 1255 CASSLGTALNTEAFF5 4 1152 CASSRRHLGNTGELFF6 5 1099 CASSEGRSNQPQHF
The desired output would be for the col
AA..Seq.CDR3 to have the following entries instead:
067 065 083 083 078 083 068 082 084 089 071 068 078 069 081 070 070067 065 084 083 083 086 076 084 081 081 069 084 081 089 070067 065 083 083 083 082 071 076 065 078 084 081 089 070 #and so on...
Also, it would be ideal if the ascii representation was as an integer rather than an array of ints as is the output for R's conversion - and pythons too I believe.
Any help would be much appreciated.
Thank you all for your time,