REGEX complexe dans code source html

Eléphant du PHP | 59 Messages

22 mars 2010, 16:38

Bonjour,
Je dois chercher des informations dans une page contenant le code source suivant :

Code : Tout sélectionner

<tbody><tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536848&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536848&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;OK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=57774&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=54327&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=57774&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=54327&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536902&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;536902&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;03:34:21&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-21.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERROR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898590&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898590&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;OK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=59308&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=56135&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=59308&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=56135&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898640&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;898640&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:01:28&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-23.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERREUR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;ERROR&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;Appli : SETUP / User error : SN9 / Internal error : / Details : _stepEnd : Setup._setCardlessCak;mac=00043025E0AF&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr> <tr> <td class="datagrid_red_cell" nowrap="nowrap">&nbsp;1262459&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;1262459&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;2010-03-20&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:32:11&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;04:32:11&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;10.23.8.213&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-15.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;frontal-15.tv&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;STATUS&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;OK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;NOK&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=60761&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=57943&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;?pkg=stats&fct=qostv&op=neuf&roomId=1&mac=00043025E0AF&fw=4.7.44&rxPackets=60761&rxErrors=0&intRxErrors=0&rtpPackets=0&joinCount=0&joinHist=0_0_0_0_0_0_0_0_0_0_0_0&hw=n10-3-15&uptime=57943&intSpeed=0&&nbsp;</td><td class="datagrid_red_cell" nowrap="nowrap">&nbsp;&nbsp;</td></tr>
Je dois dans un premier temps récupérer la dernière balise <tr> contenant ERROR </tr> puis extraire les informations contenues entre les balises.
Et la je ne sais comment m'y prentre, est ce que je dois supprimer toutes les balises et les remplacer par des ; par ex pour apres pouvoir faciliter le traitement ou autre. Je galère depuis 2 jours si qqn pouvait me donner une piste car la je suis perdu.
Je vous remercie par avance

ViPHP
ViPHP | 5462 Messages

22 mars 2010, 16:53

pour parser du HTML il faut utilisé DomDocument (couplé avec xpath si tu veux)
http://fr2.php.net/manual/fr/class.domdocument.php

Eléphant du PHP | 59 Messages

22 mars 2010, 18:50

Nickel je connaissais pas je pensais utiliser getElementsByTagNameNS pour filtrer. Le problème est que je ne peux pas installer la classe je n'ai l'accès au serveur :s.
Ca te semble vraiment pas possible en regex ?
Merci encore d'avoir jeter un œil a mon problème

ViPHP
ViPHP | 5462 Messages

22 mars 2010, 19:00

si mais c'est pas adapter, t'as pas DomDocument installer sur ton serveur ? parce que c'est activer par défaut normalement, t'as quelle version de PHP ?
sinon ca vient d'un log ton truc ?

Eléphant du PHP | 59 Messages

22 mars 2010, 19:17

non cela provient d'une application qui génère les résultats dans une interface web (page html). Je me connecte en CURL pour les certificats et les requêtes puis je dois exploiter ces résultats (et c'est la ou je galère vu la source) :(
C'est bizarre la version de php est la 5.1.6 ca devrait fonctionner.

ViPHP
ViPHP | 5462 Messages

22 mars 2010, 19:28

oula 5.1.6 ca date de 5 ans déjà, par sur que DomDocument était enable par défaut la du coup,
je vais essayer te sortir un truc

ViPHP
ViPHP | 5462 Messages

23 mars 2010, 12:59

je t'ai fais ca mais bon (sous php 5.1 faut voir ce que ca donne):
$data = file('php://filter/read=string.strip_tags/resource=test.php', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES | FILE_TEXT);

foreach($data as $key => $value)
{
    $temp     = preg_replace('/\xA0/u', ' ', html_entity_decode($value, ENT_QUOTES, 'UTF-8'));
    $value    = preg_split('/\b\s{2}\b|\?/u', trim($temp));
    
    if(count($value) === 16)
    {
        switch($value[12])
        {
            case 'OK':
                parse_str($value[15], $tmp);
                $value[15] = $tmp;
                break;
            case 'ERROR':
                $value[15] = preg_split('/\s\/\s/', $value[15]);
                break;
        }
        
        $data[$key] = $value;
    }
    else
    {
        unset($data[$key]);
    }
}

print_r($data);