[From nobody Wed Dec  8 07:44:54 2004
Message-ID: &lt;41B6D466.7050405@ebi.ac.uk&gt;
Date: Wed, 08 Dec 2004 10:16:06 +0000
From: Emmanuel Quevillon &lt;tuco@ebi.ac.uk&gt;
Reply-To: tuco@ebi.ac.uk
Organization: EBI/EMBL
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030425
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: allenday@ucla.edu
CC: heikki@ebi.ac.uk
Subject: Re: bad entries in interpro again
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Dear Allen,


Heikki forwarded me your email about your problem.
There are two different thing, InterPro which is a database and InterProScan which is a tool using 
InterPro data to find proteic domains in sequences.
The version 3.3 you referred in your last email is for InterProScan (tool). Actually there is a new 
release of InterProScan 4.0. The actual release of InterPro is now 8.1, files (xml) are available at :
  ftp://ftp.ebi.ac.uk/pub/databases/interpro

By the way the error you mentionned in your previous email come from a file called match.xml, which 
is distributed with InterProScan (data archive). Each archive for InterProScan are in sync with 
InterPro release (interpro.xml and match.xml), it means that every time there is a new InterPro 
release, I create a new data archive for InterProScan containing last InterPro release data.

Your example (SSF46785 and SSF55486) come from match.xml. I tried to find these entries in match.xml 
7.1, 7.2, 8.0 and 8.1 without success. Which release of InterPro data are you using at the moment 
wiht InterProScan (if you use InterProScan) ?

I don't know if what I tried to explain is clear enough, if not don't hesitate to send me email(s).

Regards

Emmanuel


Subject: Re: [Bioperl-l] Fw: bad entries in interpro again (fwd)
Date: Wednesday 08 December 2004 08:21
From: Allen Day &lt;allenday@ucla.edu&gt;
To: Heikki Lehvaslaiho &lt;heikki@ebi.ac.uk&gt;
Cc: bioperl-l@portal.open-bio.org, Jared Fox &lt;jaredfox@ucla.edu&gt;

what release are you referring to, with URL please?  i am looking here,
and it is the same old iprscan with the bugs:

ftp://ftp.ebi.ac.uk/pub/databases/interpro/iprscan/RELEASE/3.3/

-allen

On Tue, 7 Dec 2004, Heikki Lehvaslaiho wrote:

 &gt;&gt; This came up last week and it turned out that the new release made
 &gt;&gt; available on Monday already had all double quotes escaped.
 &gt;&gt;
 &gt;&gt; Download the new interpro.
 &gt;&gt;
 &gt;&gt; 	-Heikki
 &gt;&gt;
 &gt;&gt; On Tuesday 07 December 2004 03:59, Jared Fox wrote:
 &gt;
 &gt;&gt;&gt; &gt; The problem with Interpro XML is that there are entries like:
 &gt;&gt;&gt; &gt;
 &gt;&gt;&gt; &gt;  &lt;match id=&quot;SSF46785&quot; name=&quot;&quot;Winged helix&quot; DNA-binding domain&quot;
 &gt;&gt;&gt; &gt; dbname=&quot;SUPERFAMILY&quot;&gt;
 &gt;&gt;&gt; &gt;
 &gt;&gt;&gt; &gt; or
 &gt;&gt;&gt; &gt;
 &gt;&gt;&gt; &gt; &lt;match id=&quot;SSF55486&quot; name=&quot;Metalloproteases (&quot;zincins&quot;), catalytic
 &gt;&gt;&gt; &gt; domain&quot; dbname=&quot;SUPERFAMILY&quot;&gt;
 &gt;&gt;&gt; &gt;
 &gt;&gt;&gt; &gt; The double quotes are supposed to mark the beginning and end of the name
 &gt;&gt;&gt; &gt; attribute, but the xml is not valid so it has double quotes inside the
 &gt;&gt;&gt; &gt; attribute itself. I believe this also happens with other illegal xml
 &gt;&gt;&gt; &gt; characters.
 &gt;&gt;&gt; &gt;
 &gt;&gt;&gt; &gt; If Interpro were to start producing valid XML, everything should work
 &gt;&gt;&gt; &gt; happily.
 &gt;&gt;&gt; &gt;
 &gt;&gt;
 &gt;&gt;&gt;&gt; &gt; &gt; ---------- Forwarded message ----------
 &gt;&gt;&gt;&gt; &gt; &gt; Date: Wed, 01 Dec 2004 16:16:46 +0000
 &gt;&gt;&gt;&gt; &gt; &gt; From: Mikko Arvas &lt;Mikko.Arvas@vtt.fi&gt;
 &gt;&gt;&gt;&gt; &gt; &gt; To: bioperl-l@portal.open-bio.org, Hilmar Lapp &lt;hlapp@gmx.net&gt;,
 &gt;&gt;&gt;&gt; &gt; &gt;     Allen Day &lt;allenday@ucla.edu&gt;
 &gt;&gt;&gt;&gt; &gt; &gt; Subject: bad entries in interpro again
 &gt;&gt;&gt;&gt; &gt; &gt;
 &gt;&gt;&gt;&gt; &gt; &gt; Hi,
 &gt;&gt;&gt;&gt; &gt; &gt;
 &gt;&gt;&gt;&gt; &gt; &gt; we've been discussing the problems of interpro parsing. I have a friend
 &gt;&gt;&gt;&gt; &gt; &gt; who
 &gt;&gt;&gt;&gt; &gt; &gt; is going to interpro consortium meeting next week and I could send some
 &gt;&gt;&gt;&gt; &gt; &gt; regards through him. After reading your e-mails, I am (being quite a
 &gt;&gt;&gt;&gt; &gt; &gt; newbie) a little bit confused of what kind of regards would you like to
 &gt;&gt;&gt;&gt; &gt; &gt; send if any?
 &gt;&gt;&gt;&gt; &gt; &gt;
 &gt;&gt;&gt;&gt; &gt; &gt; Is the &amp;apos the source of the problem? Is it really a problem in
 &gt;&gt;&gt;&gt; &gt; &gt; BioPerl or in expat? Is somebody trying to solve the problem for
 &gt;&gt;&gt;&gt; &gt; &gt; Bioperl now and is there any sensible thing that the interpro team
 &gt;&gt;&gt;&gt; &gt; &gt; could do to help?
 &gt;&gt;&gt;&gt; &gt; &gt;
 &gt;&gt;&gt;&gt; &gt; &gt; Cheers,
 &gt;&gt;&gt;&gt; &gt; &gt; mikko
 &gt;&gt;&gt;&gt; &gt; &gt;
 &gt;&gt;&gt;&gt; &gt; &gt; Mikko Arvas
 &gt;&gt;&gt;&gt; &gt; &gt; VTT Biotechnology
 &gt;&gt;&gt;&gt; &gt; &gt;
 &gt;&gt;&gt;&gt; &gt; &gt; e-mail:            mikko.arvas@vtt.fi
 &gt;&gt;&gt;&gt; &gt; &gt; tel:                 +358-(0)9-456 5827
 &gt;&gt;&gt;&gt; &gt; &gt; mobile:           +358-(0)44-381 0502
 &gt;&gt;&gt;&gt; &gt; &gt; fax:                +358-(0)9-455 2103
 &gt;&gt;&gt;&gt; &gt; &gt; mail:               Tietotie 2, Espoo
 &gt;&gt;&gt;&gt; &gt; &gt;                       P.O. Box 1500
 &gt;&gt;&gt;&gt; &gt; &gt;                       FIN-02044 VTT, Finland
 &gt;&gt;
 &gt;&gt;&gt; &gt;
 &gt;&gt;&gt; &gt; _______________________________________________
 &gt;&gt;&gt; &gt; Bioperl-l mailing list
 &gt;&gt;&gt; &gt; Bioperl-l@portal.open-bio.org
 &gt;&gt;&gt; &gt; http://portal.open-bio.org/mailman/listinfo/bioperl-l


-------------------------------------------------------
-- 

------------------------------------------------------------------
Emmanuel Quevillon - Software Engineer
EBI - European Bioinformatics Institute    Tel: +44(0) 1223 494443
Wellcome Trust Center - Hinxton            Fax: +44(0) 1223 494472
Cambridge CB10 1SD, UK                     email: tuco@ebi.ac.uk
------------------------------------------------------------------

]