KDDCup99 Full Data - 1 Line of Error

Using TunedIT website, Repository, Knowledge Base, TunedTester

KDDCup99 Full Data - 1 Line of Error

Postby ati_ozgur » Thu Aug 25, 2011 4:20 pm

Thank you for converting KDDCup99 data to arff. Full data (http://tunedit.org/repo/KDD_Cup/KDDCup99_full.arff.zip) has 1 line of mistake.
You can correct it with following cygwin/unix commands.

Code: Select all
grep -n "0,tcp,private,S0,0,0,0,0,0,0,0,0,0,0,00,tcp,http,SF" KDDCup99_full.arff

which gives following output.

4817251:0,tcp,private,S0,0,0,0,0,0,0,0,0,0,0,00,tcp,http,SF,334,1684,0,0,0,0,0,1
,0,0,0,0,0,0,0,0,0,0,1,9,0.00,0.00,0.00,0.00,1.00,0.00,0.33,0,0,0.00,0.00,0.00,0
.00,0.00,0.00,0.00,0.00,normal

Code: Select all
sed -n 1,4817250p KDDCup99_full.arff > full1.arff
sed -n 4817252,4898582p KDDCup99_full.arff > full2.arff
cat full1.arff full2.arff > KDDCup99_fullCorrected.arff
ati_ozgur
 
Posts: 2
Joined: Thu Mar 17, 2011 7:28 pm

Re: KDDCup99 Full Data - 1 Line of Error

Postby Marcin » Tue Aug 30, 2011 8:44 pm

Hi,
Great thanks for pointing this out. I corrected the error according to your guidelines, with one modification: I didn't remove the erroneous line completely, but fixed it by deleting the superfluous characters at the beginning - to keep the dataset exactly the same as KDD Cup original. Now, the command:
Code: Select all
sed -n 4817249,4817253p KDDCup99_full.arff

displays:
Code: Select all
0,tcp,private,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,245,22,1.00,1.00,0.00,0.00,0.09,0.05,0.00,255,22,0.09,0.06,0.00,0.00,1.00,1.00,0.00,0.00,neptune
0,tcp,private,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,23,1.00,1.00,0.00,0.00,0.09,0.05,0.00,255,23,0.09,0.06,0.00,0.00,1.00,1.00,0.00,0.00,neptune
0,tcp,http,SF,334,1684,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,9,0.00,0.00,0.00,0.00,1.00,0.00,0.33,0,0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,normal
0,tcp,http,SF,270,2721,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,10,0.00,0.00,0.00,0.00,1.00,0.00,0.30,1,1,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,normal
0,tcp,http,SF,344,3854,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,3,11,0.00,0.00,0.00,0.00,1.00,0.00,0.27,2,2,1.00,0.00,0.50,0.00,0.00,0.00,0.00,0.00,normal

(3rd line is the fixed one)

Corrected file can be downloaded from http://tunedit.org/repo/KDD_Cup/KDDCup99_full.arff.zip

Thanks again
Marcin
Marcin
 
Posts: 115
Joined: Fri Oct 09, 2009 6:45 pm


Return to How to use TunedIT

Who is online

Users browsing this forum: No registered users and 2 guests

cron