Yesterday midnight, when I enjoyed watching X-files in the living room, a call disturbed me. That is my friend who is in the trouble of doing some data management in her dissertation. She has some character variables which looks like:
A/A
C/C
G/G
T/T
She needs to create two new variables. The first new variable contains the first word in old variable, and the second variable contains the last word in old variable. Both of them should also be converted into numeric variables. The criteria is A for 1, C for 2, G for 3, and T for 4.
Her first request is very easy, we can use a special function named "SUBSTR" to extract some word in specific location. The names of two new variables should be names as, for example:
old variable = rs1234567
1st new variable = rs1234567a
2nd new variable = rs1234567b
That's not difficult as well because we can just define the new names in data procedure. However, she said, "I have 3000 variables."
That's is a very very big dataset. If she define new variables by herself in SAS, she needs to write 6000 new variables. She thought a SAS macro could probably save her time, but it is still not efficient. Finally, I found out a good way to finish her job.
Step 1: Export her dataset into Excel, and the first line of the Excel file lists all variable names.
Step 2: Cut down those 3000 variables, and create a new dataset with 3000 temporary variables containing those 3000 variables.
Step 3: Put all 3000 temporary variables in an array named "old", and create 6000 new variables in another two array named "newa" and "newb".
Step 4: Use "Do...End" to concatenate 3000 old variables with "a" and "b", then put them into array newa and newb, respectively.
Step 5: Export this dataset into Excel, so all 6000 new variables are created.
The following program is a simplified example with only 5 old variables:
data test;
input (old1-old5) ($) ;
array old[5] $ old1-old5;
array newa[5] $12 newa1-newa5; /*adjust the lenght of new variables longer*/
array newb[5] $12 newb1-newb5;
do i=1 to 5;
newa[i]=right(old[i])||"a"; /*keep old variable right to prevent from a gap between old variable and a*/
newb[i]=right(old[i])||"b";
end;
keep newa1-newa5 newb1-newb5;
cards;
rs12 rs34 rs56 rs78 rs90
run;
proc print;run;
proc export data=test outfile="C:\Temp\newtest.xls"; run;
quit;
- Feb 07 Wed 2007 08:46
How to concatenate two variables with array
全站熱搜
留言列表
禁止留言