Page 1 sur 1

REGEX complexe dans code source html

Posté : 22 mars 2010, 16:38
par energie13
Bonjour,
Je dois chercher des informations dans une page contenant le code source suivant :

Code : Tout sélectionner

<tbody><tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536848&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536848&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;OK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=57774&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=54327&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=57774&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=54327&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536902&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536902&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERROR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898590&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898590&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;OK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=59308&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=56135&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=59308&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=56135&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898640&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898640&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERROR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;1262459&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;1262459&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:32:11&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:32:11&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-15.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-15.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;OK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=60761&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=57943&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=60761&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=57943&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr>
Je dois dans un premier temps récupérer la dernière balise <tr> contenant ERROR </tr> puis extraire les informations contenues entre les balises.
Et la je ne sais comment m'y prentre, est ce que je dois supprimer toutes les balises et les remplacer par des ; par ex pour apres pouvoir faciliter le traitement ou autre. Je galère depuis 2 jours si qqn pouvait me donner une piste car la je suis perdu.
Je vous remercie par avance

Re: REGEX complexe dans code source html

Posté : 22 mars 2010, 16:53
par stealth35
pour parser du HTML il faut utilisé DomDocument (couplé avec xpath si tu veux)
http://fr2.php.net/manual/fr/class.domdocument.php

Re: REGEX complexe dans code source html

Posté : 22 mars 2010, 18:50
par energie13
Nickel je connaissais pas je pensais utiliser getElementsByTagNameNS pour filtrer. Le problème est que je ne peux pas installer la classe je n'ai l'accès au serveur :s.
Ca te semble vraiment pas possible en regex ?
Merci encore d'avoir jeter un œil a mon problème

Re: REGEX complexe dans code source html

Posté : 22 mars 2010, 19:00
par stealth35
si mais c'est pas adapter, t'as pas DomDocument installer sur ton serveur ? parce que c'est activer par défaut normalement, t'as quelle version de PHP ?
sinon ca vient d'un log ton truc ?

Re: REGEX complexe dans code source html

Posté : 22 mars 2010, 19:17
par energie13
non cela provient d'une application qui génère les résultats dans une interface web (page html). Je me connecte en CURL pour les certificats et les requêtes puis je dois exploiter ces résultats (et c'est la ou je galère vu la source) :(
C'est bizarre la version de php est la 5.1.6 ca devrait fonctionner.

Re: REGEX complexe dans code source html

Posté : 22 mars 2010, 19:28
par stealth35
oula 5.1.6 ca date de 5 ans déjà, par sur que DomDocument était enable par défaut la du coup,
je vais essayer te sortir un truc

Re: REGEX complexe dans code source html

Posté : 23 mars 2010, 12:59
par stealth35
je t'ai fais ca mais bon (sous php 5.1 faut voir ce que ca donne):
$data = file('php://filter/read=string.strip_tags/resource=test.php', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES | FILE_TEXT);

foreach($data as $key => $value)
{
    $temp     = preg_replace('/\xA0/u', ' ', html_entity_decode($value, ENT_QUOTES, 'UTF-8'));
    $value    = preg_split('/\b\s{2}\b|\?/u', trim($temp));
    
    if(count($value) === 16)
    {
        switch($value[12])
        {
            case 'OK':
                parse_str($value[15], $tmp);
                $value[15] = $tmp;
                break;
            case 'ERROR':
                $value[15] = preg_split('/\s\/\s/', $value[15]);
                break;
        }
        
        $data[$key] = $value;
    }
    else
    {
        unset($data[$key]);
    }
}

print_r($data);